Evaluation Assessment

APPENDIX G:

Methods for

ICE

Evaluation Assessment

ICE users create multidimensional, quantifiable descriptions of reading and language arts instruction that includes instructional topics, grouping patterns, materials, student engagement, content emphasis, and overall instructional quality. The instrument has four preset dimensions, each with several subcategories, for classifying reading instruction. The four dimension in primary grade versions include:

Main instructional category —Dimension A: Topics identify the broad domain of instructional content.
Instructional sub-category —Dimension B: Sub-categories classify the specific activities that occur, allowing for a more detailed description of content. The subcategories are broad enough so that activities with common objectives, but known by several different names, can be readily classified.
Grouping —Dimension C: Instructional activities are categorized into one of five grouping patterns —whole class, small groups, pairs, independent, and individual.
Materials —Dimension D: Coding options for materials utilized during instruction include multiple types of text (e.g., basal readers, patterns books) and a wide range of ancillary materials.

ICE was developed and refined over the course of two separate evaluations of policies related to a statewide reading initiative. The first evaluated program funded reading interventions for upper elementary and middle school struggling readers. The second program, evaluated in the succeeding year, funded primary grade interventions for students at risk for reading difficulties and professional development on beginning reading instruction. The ICE version used in the first evaluation addressed instructional content of upper-elementary and middle school reading and language arts the instruction. While the original structure was maintained, the instrument’s categories had to be adapted for the second evaluation project to capture the nuances of primary-grade reading instruction. For example, Dimension A categories from the original design included reading and language arts topics not typically seen in primary grades (e.g., writing involving library or other research resources) and excluded important instructional topics associated with beginning reading (e.g., phonological awareness) (see Figure 1). Revisions to the original instrument included adding categories for critical components of beginning reading, and extending the instrument’s scope to incorporate a student engagement scale and a subset of scales for rating instructional quality.

Both the original version and the primary-grade version contain an item for rating content emphasis on an incremental rating scale ranging from 1 (incidental)- occupying less than X% of observed instruction time) to 5 (maximum emphasis- occupying X% of observed instructional time). Whereas the first version captured instructional content almost exclusively, the revised version injected more process variables. For example, the primary grade version of ICE contains a likert scale for rating student engagement (1 = low engagement, 2 = medium engagement, 3 = high engagement) and seven indicators to rate overall instructional quality. Quality indicators are rated on a scale from 1 (unacceptable) to 4 (outstanding) and include: (a) classroom management, (b) classroom environment, (c) instructional balance, (d) level of instructional scaffolding, (e) level of student self-regulation, (f) academic expectations, and (g) teaching in context.

Figure 1. Dimension A: Main instructional category
Upper-elementary/middle school version / Primary-grade version
Word level
Comprehension skills
Thinking/metacognition
Writing
Assessment / Alphabetics,
Fluency,
Reading (e.g., practice reading aloud)
Comprehension skills
Writing/language arts

When coding instruction using ICE, observers assign each instructional activity (defined as a distinct or unique activity where the content, grouping, and materials are coordinated around a certain topic or curricular domain) a numeric description documenting what is being taught (Dimension A: Main instructional category and Dimension B: Instructional subcategory), howit is being taught (Dimension C: Grouping), what is being used (Dimension D: Materials), how long the instruction lasts (Content Emphasis), and how well students attend to what is being presented (Student Engagement).

The multi-dimensional, taxonomical design of the instrument was derived from the Reform Up Close study of instructional content in high school mathematics and science classes (Porter, 1994; Smithson and Porter, 1994). Previous studies that have used content-area standards as the basis for categorizing instructional content (Porter, 1994; Saxe, Gearhart, and Seltzer, 1999) served as a model for creating the instrument’s taxonomy.

To be valid, the instrument had to provide coding options that accurately portrayed what was occurring in the classroom. Content validity was established by a thorough literature review and through consultation with experts in the field. Instructional categories and sub-categories in ICE were culled from a variety of sources documenting instructional practices found in beginning reading instruction, upper elementary and middle school reading and language arts classes, and reading intervention programs (Bond & Dykstra, 1997; Graves and Dykstra, 1997; Morrow, Tracey, Woo, & Pressley, 1999; Pressley et al., 1998; Pressley, Rankin, & Yokoi, 1996; Pressley, Wharton-McDonald, Mistretta-Hampston, & Echevarria, 1998; Pressley, Wharton-McDonald, Mistretta-Hampston, and Yokoi, 1997; Searfoss, 1997; Stahl, 1992; Wharton-McDonald, Pressley, and Mistretta-Hampston, 1998). Instrument developers also reviewed national and state reading standards (CIERA, 1998; National Center, 1999; Texas Education Agency, 2000) and research on best practice in literacy instruction (Educational Research Service, 1999; Gambrell & Mazzoni, 1999; Juel, 1994; National Reading Panel, 2000; Osborn & Lehr, 1998; Snow, Burns, & Griffin, 1998). Quality indicators were culled from the work reviewed for the content categories and from research on effective teaching (Brophy, 1979; Porter & Brophy, 1988).

Decisions regarding the scope and specificity of ICE’s main dimensions, and subcategories within the dimensions, was informed by the literature and restricted by practicality. The initial list of instructional topics was reduced to include only those instructional topics that appeared in more than one major source. With the endless number of possible reading activities, the level of coding detail could have easily become too distinctive and narrow to be usable. Instrument refinement followed a pilot test in primary-grade classrooms. A panel of educators, university professors, and researchers also reviewed the instrument’s format and categories and provided valuable feedback on both versions. Input from these authorities in the field established that the instrument measured what it purported to—instructional content of reading and language arts instruction.

While the included categories for coding instruction are certainly not exhaustive, they are representative of the most common types of reading activities implemented in elementary and middle school classrooms. Including common instructional activities, not limited to only the most effective, protected against the possibility of observing instruction that defied classification. However, because ICE includes categories of instruction that the research has shown to be effective, one can argue that students who have the opportunity to learn the content represented on the instrument are likely to develop skills needed for success in reading. This premise makes ICE a theoretically valid measure of instructional quality.

To increase reliability, indicators for each subcategory were developed. Drawn from descriptions of reading activities found in the best practice literature, the indicators define the parameters of the subcategories by providing specific examples of instruction associated with a given domain. For example, the corresponding indicators for the subcategory “Predicting/previewing/prior knowledge” are (a) students preview the material before reading, (b) students predict outcomes based on prior knowledge, and (c) students participate in activities designed to measure their level of knowledge before reading.

Using ICE requires some familiarity with reading instruction. Observers, for example, must be able to differentiate between phonological awareness instruction and word study activities. While training observers on the ICE system takes a good deal of time and effort, once trained raters are highly reliable. For the two evaluations during which ICE was developed, an inter-rater reliability rate of 91% was achieved.

Short Version

A total of 103 observations were conducted in 16 kindergarten and 18 first grade classrooms. Reading instruction was observed to document instructional content. The mean observation length was 48 minutes. To the extent possible, the same team members observed the same teachers over the course of the evaluation. Observers used the Instructional Content Emphasis (ICE) (see Appendix B), an instrument designed to record the type and frequency of reading and language arts instruction in primary grade classrooms (Edmonds & Briggs, 2000). The instrument’s multi-dimensional design is modeled after the instrument used in the Reform Up Close study (Smithson & Porter, 1994). ICE categorized each instructional event, defined as a distinct or unique activity where the content, grouping, and/or materials were coordinated around a certain curriculum domain. An example of an instructional event would be a teacher asking students to clap out the syllables of words presented on flashcards. Training on the instrument was conducted before site visits to ensure inter-rater reliability. Reliability of .90 was achieved.

In analyzing the observation data, we calculated the total amount of instructional time observed and the proportion of time that was allocated to specific instructional domains. For each grade level, specific instructional areas were collapsed into broad domains to allow for meaningful comparison to the topics covered in the TRA. For example, instruction coded as oral language development, book and print concepts, and phonological awareness was combined into the broader “pre-reading skills” topic. The broad domains reported include: pre-reading skills, phonics, writing/publishing, reading text, text read aloud, vocabulary, comprehension monitoring, writing mechanics, and fluency. Analysis was conducted in this manner to determine to what extent teachers’ reading instruction incorporated the components of effective reading instruction advocated by the state and presented in the TRA.