Running Head: ALIGNMENT of ALTERNATE ASSESSMENTS

Alignment of Alternate Assessments

Running Head: ALIGNMENT OF ALTERNATE ASSESSMENTS

An Analysis of Three States’ Alignment Between Language Arts and Mathematics Standards and Alternate Assessments

Claudia Flowers, Diane Browder, and Lynn Ahlgrim-Delzell

The University of North Carolina at Charlotte

Date Resubmitted: October 28, 2004

Correspondence concerning this article should be addressed to Dr. Claudia Flowers, Department of Educational Administration, Research, and Technology, University of North Carolina at Charlotte, 9201 University City Blvd, Charlotte, NC28223. Contact information: 704-687-4545 (W), 704-687-3493 (Fax),

Support for this research was funded in part by Grant No. H324C010040 of the U.S. Department of Education, Office of Special Education Programs, awarded to the University of North Carolina at Charlotte. The opinions expressed do not necessarily reflect the position or policy of the Department of Education, and no official endorsement should be inferred.

Abstract

Recent guidelines for No Child Left Behind (NCLB) permit the use of alternate achievement standards for counting up to 1% of students with significant cognitive disabilities as proficient in AYP calculations. This provision increases the importance of assessing the technical quality of alternate assessments and their alignment toacademic content standards. This study used a quantitative method of alignment to estimate the match between three states’ alternate assessments and their state academic content standards. The results indicate that alternate assessments are aligned to academic content standards, but alternate assessments only capture a narrow range and depth of the content standards. Recommendations for using alignment procedures to address critical elements in Title I as amended by the NCLB Act are discussed.

An Analysis of Three States’ Alignment Between Language Arts and Mathematics Standards and Alternate Assessments

In the past, students with moderate or severe disabilities were often exempted from the large-scale assessments that were a key component of school reform. Since the passage of the 1997 amendments of the Individuals with Disabilities Education Act (IDEA), inclusion of all students with disabilities in accountability systems has been mandatory.

IDEA 1997 also required that all students have access to the general curriculum. At this same time, Title I Guidance on Standards, Assessments, and Accountability (1997) that Supplements the Elementary and Secondary Education Act (as amended by the Improving America’s School Act of 1994, P.L. 103-382) emphasized that assessments for students with disabilities, referred to as alternate assessments, were related to the same standards used for all students: “It is important that standards for students with disabilities be included in these assessments because they are expected to meet the same standards as other students” (II. Assessments, Questions #42, p. 10 of 16, as cited in Thompson, Quenemoen, Thurlow, & Ysseldyke, 2001, p. 21).Even more recently, No Child Left Behind (NCLB, 2001) legislation created reporting requirements in math, language arts, and science for all students, including those with significant disabilities. Alternate assessment based on alternate achievement standards must be aligned with a state’s academic content standards, promote access to the general curriculum, and reflect professional judgment of the highest achievement standards possible (200.1(d)) (U.S. Department of Education, 2003).

Methods for determining the degree of alignment between assessments and states’ content standards have become a priority since No Child Left Behind Act (NCLB) of 2001. Many methods of alignment are available, ranging from low to high complexity. An example of a low complexity method is asking content experts to examine assessment items and match them to content standards using a Likert scale. Moderate complexity alignment methods not only examine the alignment of item to standard, but also examine the item for another dimension, such as level of cognitive demand. More complex methods examine the alignment plus many other criteria, such as breadth of knowledge, balance of representation, congruence between emphasis in standard and number of items used on the assessment. Bhola, Impara, and Buckendahl (2003) provide a detailed review of these methods.

Previous research on the alignment of alternate assessments to academic content standards used low complexity methods to assess the degree of alignment (Browder, Flowers, et al., 2004; Browder, Spooner, Ahlgrim-Delzell, et al., 2003). In Browder, Flowers, et al., 31 states’ performance indicators (i.e., the most detailed statement of states’ expectations for students with severe cognitive disabilities) were aligned tonational reading and math content standards. The findings indicated that some states’ expectations had strong degree of alignment to reading and math, some states’ expectations had weak alignment and other states’ had a combination of both weak and strong degrees of alignment.In another research study, states’ performance indicators were classified as functional, academic, social, or early childhood (Browder, Spooner, Ahlgrim-Delzell, et al., 2003). The findings suggested that states that had exemplary alternate assessmentshad significantly more academic content reflected in their performance indicators.

These earlier alignment studies had two limitations. First, theresearch focus was on the alignment between states’ alternate assessment performance indicators and national academic standards, and not alternate assessment items and state academic content standards. Second, the experts were instructed to make a holistic judgment of alignment rather than evaluating specific numeric criteria. Given the importance of improving the technical quality of alternate assessments for their inclusion in school and district AYP calculations under NCLB, the need exists for research that provides a more comprehensive view of the alignment between states alternate assessments and state academic content standards.

The purpose of this study was to apply and evaluatea high complexity alignment method to measurethe degree of alignment of alternate assessment to states’ academic content standards. Three states that used performance-based and portfolio assessment formats were selected to participate in this study.The results of this study inform educators of how well alternate assessments capture language arts and mathematics constructs.

Method

An approach recommended by Webb (1997) was used to examine the alignment of alternate assessments to state general education expectations. The procedure combines qualitative expert judgments and quantified coding for evaluating the alignment of standards and assessments. The product of the analysis is a set of statistics that describes the degree of intersection, or alignment, between the content embedded in state content standards and the content in state assessments.The procedure was used to examine alternate assessments from three states in two academic areas.

Selection of State Alternate Assessment Systems

The first step in this research was to identify three states that had exemplary alternate assessments with a clear focus on academics. Exemplar alternate assessments were selected to insure that reviewers could successfully align at least some of the general education standards to the assessments, and to provide practitioners with a reference for evaluating degree of alignment for other alternate assessments. To identify these states, four researchers in the area of alternate assessments were sent a questionnaire that asked them to (a) list five states with the best alternate assessments and (b) list five states with the strongest link between the alternate assessment and general education curriculum. Three states were identified that were consistent across the researchers. A state department of education representative was contacted for each state and all three agreed to participate in this study. The representatives agreed to select one representative alternate assessment for both language arts and mathematics.

Description of ThreeState's Alternate Assessments

In State A, a performance-based assessment developed by the state was used. The reading and writing performance assessments included attending to a story and answering questions about the content, gaining information from a variety of other sources, and creating a written product with a specific purpose. The math performance assessment included a demonstration of number sense and computational skills through use of physical models, application of simple calculation strategies to basic addition problems, recognizing and creating patterns using simple geometric shapes, and demonstration of an understanding of data collection, data display and estimation. While the tasks were the same for all students, the presentation of the tasks and response styles of the students were individualized. In this study, 4th grade language arts and 5th grade mathematics assessments were studied.

State B’s alternate assessment was a portfolio of evidence collected over the course of an academic year of an individual student’s performance and achievement on the established State’s Academic Expectations. Each portfolio includes five entries from the six possible content areas (e.g., math, language arts, and science) depending upon the grade level of the student. Other components to be submitted in the portfolio included letters from teachers and parents, daily schedule, and student’s mode of communication. Teachers selected the skills to include on the student’s alternate assessment based on the student’s IEP objectives and goals, and then the teacher alignedthe skills to the state’s Academic Expectations using the Program of Studies, which specifies grade level performance expectations.In this study, 4th grade language arts and mathematics assessments were studied.

State C's alternate assessment was a portfolio collection of evidence over the course of an academic year of an individual student's performance and achievement on the learning standards outlined in the Curriculum Frameworks. Teachers selected the learning standards to address in an individual's portfolio. At least three pieces of distinct work products were required to document one or more learning standards in the strand. These work products could be work samples, instructional data, and videos, audiotapes or photographs. Secondary evidence such as self-reflection and letters of support from others could be used to supplement primary evidence. In this study, 7th grade language arts and 6th grade mathematics assessments were studied.

Alignment Procedure

The focus of the alignment procedure is on academic content only and not the other dimensions (e.g., generalization, self-determination) that are assessed by some of the alternate assessments. Our intent wasto examine the degree that alternate assessments are accessing the general curriculum; therefore, each state’s general curriculum standards were used as the expectation for students with severe disabilities.

Because the nomenclature to describe standards and assessments is different across the states, we used common language to describe the levels of specificity within the standards. The following levels, from the most general statement to the most detailed description of the standards, were used in this study: (a) subject area (e.g., Mathematics), (b) content standards (e.g., students develop number sense and use numbers and number relationships in problem-solving situations and communicate the reasoning used in solving these problems), (c) objectives (e.g., using numbers to count, to measure, to label, and to indicate location) and (d) performance indicators(describe numbers by their characteristic—for example, even, odd, prime, square). In this study, we used the term assessment item to represent the performance response that could be a behavioral event or a student work sample.

Five steps were used in the alignment procedure. The first step was to identify the criteria. Content focus of the three states alternate assessments was examined using four criteria recommended by Webb (1997): (a) categorical concurrence, (b) range-of-knowledge correspondence,(c) balance of representation, and (d) depth-of-knowledge consistency.

Categorical concurrence is the consistency of categories for content in the standards and assessments. In this study, we use the term hit to indicate a content standard that has been aligned to an assessment item. While categorical concurrence is the most obvious criteria, additional criteria are needed to determine if the academic construct is being fully assessed. For example, all the assessment items could be aligned to only a few of the many academic content standards.Examining the range of standards an assessment covers and the balance of assessment items across the standards provides critical information about how well the assessment is capturing the standards.

Range-of-knowledge correspondencecriterion examines the alignment of assessment items to the multiple objectives within the content standards. The range-of-knowledge numeric value is the percentage of content standards with at least 50% of the objectives having one or more hits.

The balance of representation criterion is used to indicate the extent to which items are evenly distributed across content standards. The formula used to compute the balance of representation index is

)/2,

where O is the total number of objectives hit (i.e., item has been judged to be aligned) for the standard, Ik is the number of items hit corresponding to objective k, and H is the total number of items hit for the content standard.The balance index can range from 0 (indicate unbalanced representation) to 1.0 (indicate balance representation) with values between .6 to .7 considered a weak acceptable balance and values .7 or greater considered acceptable.

Depth-of-knowledge (DOK) examines the consistency between the cognitive demands of the standards and cognitive demands of alternate assessments. Completely aligned standards and assessments requires an assessment system designed to measure in some way the full range of cognitive complexity within each specified content standard. An acceptable level for DOK, usually rated using a 4-point scale (see Tables 1 and 2), is directly related to what is considered passing work on the assessment scale for that standard. For a more detailed description of the alignment criteria see Webb (1999).

For the next step, each of the participating state’s standards and assessments were collected. To insure that the research team accurately interpreted the documents, interviews were conducted with each state department of education representative responsible for alternate assessment to clarify their alternate assessment. The outcomes of these interviews results in minor rewording of some of the states’ standards for clarity.

In step three, the research team developed a description of the depth-of-knowledge (DOK) levels for both language arts and mathematics and two coding matrices, one describing the state standards and the other listing the assessment items, for collecting reviewers’ data. A description of the four levels of DOK for language arts and mathematics are reported in Tables 1 and 2. The first coding matrix listed the state’s subject area, content standards, and content objectives. Reviewers used this coding matrix to evaluate DOK for each state’s general education content objective. The state standards coding matrix served two purposes. First, the results of the coding of DOK for each objective provided evidence of the DOK expectations for each state, and secondly, it familiarized the reviewers with each of the states standards. The second coding matrix listed each item/activity on the assessment with columns for reviewers to code (a) DOK, (b) primary content standard hit, and (c) secondary content standard hit. A hit was defined as a reviewer identifying a link between an assessment item and a standard.

In step 4, reviewers were recruited and trained to align the assessments to the standards. To provide a diverse perspective of the alternate assessments, reviewers from general and special education backgrounds were asked to evaluate the alignment of the state standards and alternate assessments. Three different groups were used as reviewers: (a) six content experts with experience in test development and standards writing (three language arts and threemathematics), (b) three state directors of alternate assessment, and (c) four researchers who included two national experts in severe disabilities.

All reviewers attended a one-day training session. Since some of the reviewers were not familiar with alternate assessments, the first part of the training involved describing the three states alternate assessments. Copies of the alternate assessments were available for each reviewer to examine. Next, the reviewers read the definitions of the depth-of-knowledge levels (see Tables 1 and 2) and were asked to think about typical behaviors for the grade level. Language arts and math experts were present to answer specific questions and provide detailed examples for reviewers. Then the reviewers were provided five performance indicators from a state standard and, as a group, discussed the DOK for each performance indicator. The ratings were discussed and consensus was reached before continuing to the next performance indicator. Next the reviewers independently coded DOK for a sample of five alternate assessment items. Reviewers then compared ratings and discussed agreements and disagreements. Any disagreements were discussed until consensus was reached with the facilitation of the general education content expert in the respective subject area. It should be noted that it was not necessary to get exact agreement among reviewers because results across reviewers are averaged in the DOK.

After practice coding the DOK of the assessment items, reviewers were instructed to link 10 assessment items to the state standards. Five of the items were linked as a group with discussion lead by an expert in language arts and mathematics and five items were rated independently. Reviewers’ links were discussed and consensus was reached before discontinuing the activity.