Critique of a Fifth Grade Science Test 1

Running Head: CRITIQUE OF A FIFTH GRADE SCIENCE TEST

Critique of a Fifth Grade Science Test for

Virginia Standard of Learning 5.4

Lesley R. Hunley

The College of William & Mary

Critique of a Fifth Grade Science Test for Virginia Standard of Learning 5.4

The Test

For purposes of this assignment, I obtained a test that wascreated by a veteranteacher who is also my colleague in a local elementary school and was the cooperating teacher for my student teaching experience. The test’s author earned a Bachelor’s degree in Elementary Education and a Master’s degree in Reading, Language, and Literacy. Her experience in the field of education includes working as a special education teaching assistant, two years teaching second grade, and ten years teaching fifth grade. In addition to teaching, she also serves as grade level chairperson and member of the school leadership team and is the gifted mentor for the school. As my former cooperating teacher, she is very supportive of my graduate studies and willingly provided me with a test to use for my assignment upon my request. We chose a unit test for science, a subject that she is not teaching this school year but taught for many years previously (personal communication, February 9, 2009).

The test used for this assignment was written for fifth grade science. This test was designed to measure mastery of Virginia Science Standard of Learning [SOL] 5.4, which deals with topics related to matter. It is a summative assessment that was administered at the end of a unit lasting approximately four weeks. There was no corresponding pre-assessment for the unit. In addition to determining the effectiveness of the teaching of this unit and the degree of student learning, the teacher used this assessment to determine which students needed remediation on the topic of matter prior to the division-wide benchmark tests and the SOL test (personal communication, February 9, 2009). In terms of consequential validity, a negative unintended consequence for a student who did poorly on this unit test was losing opportunities for enrichment during the intervention and enrichment block of the day because he or she received remediation during that time. Over time, this could affect a student’s level of motivation if they continued to do poorly on unit assessments (Gronlund, 2009).

The test is three typewritten pages in length. It is written in Arial font and the size is 14 point. The overall appearance is neat and organized. There is adequate space between each question and the answer choices and between each test item. There is also adequate space between the number of the item and the question and the letter of each answer choice and the choice itself. Directions are also in 14 point Arial font, but are bolded for emphasis. There are a total of 21 items, 19 of which the student must respond to. There are 18 select-response (multiple-choice) items and 3 supply-response (restricted response essay) items. Stimulus materials included on the test are a diagram of an atom with arrows pointing to specific parts and illustrations of the activity of molecules in a solid, liquid, and gas.

Intended Learning Outcomes

The intended learning outcomes of this test are for students to demonstrate mastery of Virginia Science SOL 5.4. When I asked the teacher for her objectives for this unit or for her table of specifications for the unit test, she told me that her objectives came directly from the division pacing guide, which came directly from the Virginia SOLs, and that she had never created a table of specifications for an assessment (personal communication, February 9, 2009). I used the Virginia SOL Curriculum Framework for fifth grade scienceSOL 5.4 to determine the specific content and behaviors that were taught during this unit and were assessed on the test. Science standard 5.4 states that:

The student will investigate and understand that matter is anything that has mass, takes up space, and occurs as a solid, liquid, or gas. Key concepts include a) atoms, elements,

molecules, and compounds; b) mixtures, including solutions; and c) the effect of heat on the states of matter. (Virginia Department of Education [VDOE], 2003, p. 11)

Specific behaviors mentioned in the essential knowledge, skills, and processessection of the curriculum framework are for students to construct, interpret, design, determine, compare, and contrast. Many of these behaviors are higher cognitive level skills on Bloom’s taxonomy. Additionally, some foundational background knowledge is mentioned in the overview section of the curriculum framework, including definitions of key terminology such as atom, molecule, element, compound, mixture, and solution (VDOE, 2003). When I pointed out to the test’s author the high cognitive level of many of the behaviors in the intended learning outcomes, she still believed that her test was written in such a way that it would be able to accurately measure mastery of these skills (personal communication, February 9, 2009).

The intended learning outcomes that this test was designed to measure came directly from the division pacing guide and Virginia SOL Curriculum Framework, according to the teacher who authored the test. There were no additionalobjectives written by the teacher (personal communication, February 9, 2009). Therefore, the intended learning outcomes for the 5.4 matter test are directly aligned with the curriculum at both the division and state level. Although the teacher did not consider alignment with national standards when determining her objectives, the intended learning outcomes do align with National Science Education Content Standard B, Physical Science, for grades five through eight. This standard includes properties and changes of properties in matter (Center for Science, Mathematics, and Engineering Education, 1996). The alignment between the teacher’s intended learning outcomes and the local, state, and national standards appear to give the test construct validity. However, I have detailed below some concerns I have with the content validity of the test which call into question the true alignment of the teacher’s intended learning outcomes and the level of construct validity that the test actually possesses. Critical thinking, investigating, and evaluating, which are all key elements in the construct of science, are not included on this test which therefore makes the level of construct validity lower that it initially appeared (Gronlund, 2009).

Content Validity

During creation of a table of specifications for this test [Table 1], I began to think that the level of content validity for the test would be low. By comparing the test items to the intended learning outcomes, I could tell that the test was not measuring what it was designed to measure in all cases. As mentioned previously, and demonstrated in Table 1, all of the intended learning outcomes fall at higher cognitive levels of thinking, including synthesis and analysis. However, there is only one instance during test-taking in which the student may have to apply higher level thinking skills: during the essay question. There are three supply-response test items (restricted response essay) that assess student learning at the analysis level and the student is only required to answer one of those three items on the test. The supply-response questions ask students to compare and contrast, but they may choose whether they compare and contrast mixtures and solutions, elements and compounds, or atoms and molecules. Additionally, although I have identified these items to be at the analysis level on Table 1, they could quite possibly be recall if the students had been given this information previously and are familiar with the similarities and differences of the items they are asked to compare and contrast.

There are three multiple-choice items written at the comprehension level that are appropriate for the behaviors expected for the first content objective on Table 1 (models of atoms, elements, molecules, and compounds). However, there is no knowledge required for that content and there are still two multiple-choice items written at the knowledge level. The remaining 15 multiple-choice items fall at the knowledge level. There is some knowledge of the content required in the intended learning outcomes, especially for foundational background and terminology, but the majority of this unit test should not be simple recall. For two content objectives on Table 1, the behaviors expected are at the analysis level and they are only assessed at the knowledge level. That will also be the case for the two content objectives that the student does not select for their compare and contrast essay response. All of the required content is assessed on this test, but not at the appropriate cognitive level.

In terms of emphasis, each intended learning outcome is weighted similarly. There is a range of three to five questions related to each of the six content objectives I identified on Table 1. All questions are directly related to a specific intended learning outcome, with the exception of number one on the test, which is a question about foundational background knowledge of matter related to all intended learning outcomes. Overall, there is a representative sample of the content that was included on Table 1 (Gronlund, 2009).

Other strategies were used to assess student learning during this unit. The teacher explained that she conducted formative assessments as she observed students constructing models of molecules out of marshmallows and toothpicks, which corresponds with the first content objective on Table 1. However, this activity did not achievethe higher level cognitive behavior (interpret) that the intended learning outcome targets, which was also not measured on the test. Similarly, students observed a demonstration of the effects of heat on matter, which corresponds with the second content objective on Table 1, but did not design an experiment (synthesis level) as the standard calls for (personal communication, February 9, 2009).

In summary, my initial impression was correct and the content validity of this test is low. There is a representative sample of content covered on this unit test, but not at the appropriate cognitive levels. Although the items are aligned with the intended learning outcomes, because they are at a lower cognitive level, they are not measuring what the test was intended to measure. Although the teacher made efforts to extend learning beyond knowledge and assess the higher cognitive levels during formative assessment, this was not ultimately successful.

Reliability

The unit test on matter has some issues related to reliability. I do not feel that the total number of items is adequate. Although all of the content is measured, there are only two instances in which Table 1 indicates that there are at least three items for an intended learning outcome at an appropriate cognitive level. In some instances, as discussed when examining content validity, there are no items or only one item measuring a content objective at an appropriate cognitive level. This is cause for concern because if there is a problem with an item, the reliability of the overall test is compromised (Gronlund, 2009). One major issue I identified is the use of choice on the supply-response section. Three supply-response items (restricted response essay questions) are included on the test, but the student only chooses one to respond to. That will result in the content of two intended learning outcomes not being assessed at the appropriate cognitive level and thus no reliable information will be available about the students’ mastery of that content.

The first section of the test is select-response, multiple-choice. All multiple-choice items are grouped together. Directions are to, “Circle the letter of the correct answer.” This offers no explanation of what the test will be measuring. Generally, I would choose the wording “best answer” instead of “correct answer,” but I know fifth grade students have taken multiple-choice tests before and will be able to follow the directions as written. The use of multiple-choice items is appropriate for this test. On the whole, they are constructed well and use clear and concise language. The reading level appears developmentally appropriate for most fifth grade students. The distractors are plausible and fit well with the question stems. Many use parallel construction, such as number four on the test. The test is also free from grammatical errors. There are no clues in the question that give away the correct answer. For example, in numbers two, three, five, six, and seven the article “a” or “an” is written as part of the answer choice instead of as a clue in the question stem. However, to make the answer choices more clear, I would have used “a(n)” and included it as part of the question. Correct terminal punctuation is also used at the end of each answer choice. One multiple-chioce question that I do see as having a reliability issue is number 18, in which the correct answer is much longer that the distractors (Gronlund, 2009).

Questions 9 through 14 introduce stimulus material to the student. Between items 8 and 9, there is another set of directions for items 9, 10, and 11, stating, “For questions 9-11 use the picture to the right.” The picture is a diagram of an atom, which is not labeled. There are four arrows pointing to various parts of the atom, but only three of them are labeled with a letter, A through C. This could be confusing to the students, as it was to me when I first examined the test. There is no additional space between items 11 and 12. I think the three items that were using the diagram of the atom and had separate directions should have been separated somehow. Items 12 through 14 each have a diagram for the students to examine, but no additional directions are offered. I think it would have been helpful to clarify that students are expected to use the diagram to answer each question. It would also be helpful to have items 12 through 14 on one page so the students could compare them when answering the questions without having to flip back and forth between pages. Finally, for item 13 the diagram did not show up clearly during copying and the bottom portion is very faint. Following question 14, the remaining items are multiple-choice with no stimulus material. Perhaps it would have been better to separate the two sections of the test that use stimulus material instead of putting them together in the middle of the multiple-choice section (Gronlund, 2009).

The supply-response (restricted response essay) section of the test had the most severe reliability problems. As mentioned previously, the option of choice is given and compromises the test’s reliability (Gronlund, 2009). There are three bulleted prompts and the directions above state for the student to, “Choose one of the following to answer in a short paragraph. Make sure you tell how they are both similar and different (compare and contrast). Circle the one you choose.” The directions and prompts are at the bottom of the third and final page of the test. There is no direction as to where the student should respond to his or her chosen question. If the teacher intends for the student to use the back on the test paper, it should be stated. Students, even in upper elementary grades, also need lines to write on and would have difficultywriting on a blank sheet of white paper. If the teacher instead intends for students to answer on a separate sheet of notebook paper, that should be stated as well. Students would be confused and some may try to squeeze their answers in the small space below the prompts and then feel that their response is limited. The directions do not give any suggestions as to how many similarities and differences the student should state. The directions do not give any guidelines as to the amount of time that should be spent responding to the question or what the total length of the response should be. I think a better way to assess this content would be having students complete Venn diagrams, included on the test, with a predetermined amount of similarities and differences. Additionally, the essay portion counts for 10 total points but there is no explanation as to how the points are awarded. This would affect inter-rater and intra-rater reliability as well as causing confusion for a student who is unsure of the expectations (Gronlund, 2009).

Overall, the test’s reliability suffers as a result of the content validity issues (too few questions measuring the content of the intended learning outcomes at the appropriate cognitive level). Many of the multiple-choice items are free from majorreliability issues, but those which incorporate stimulus material should be revised to improve the reliability of the test. The supply-response portion of the test has a low degree of reliability because of the formatting, ambiguity, and use of choice. There may also be inter-rater and intra-rater reliability issues because of the lack of objectivity in the scoring of the responses. By adding items that assess the content at a higher cognitive level and making some simple changes, the reliability of this test could be easily improved.