1

Consequences of Alternate Assessments

Running Head: Consequences of Alternate Assessments

The Intended and Unintended Consequences of the Use of Alternate Assessments

Claudia Flower, Lynn Ahlgrim-Delzell, and Diane Browder

University of North Carolina at Charlotte

Meagan Karvonen

Western Carolina University

Abstract

The purpose of this study was to construct a measurement model to examine potential consequences of the use of alternate assessments (AAs). A 28-item survey was designed to measure five intended consequences of the use of AA (access to the general curriculum, improve instruction, increase student expectations, increase educational resources and training, and support from administrators/principal) and one unintended consequence (increase workload and stress). A national sample of 708 special education administrators, who were members of the Council of Administrators in Special Education, completed the web-based survey. An exploratory factor analysis extracted six factors that closely paralleled the intended and unintended consequences. A confirmatory factor analysis supported the six factor solution.

The Intended and Unintended Consequences of the Use of Alternate Assessments

The 1997 amendments to Individuals with Disabilities Education Act (IDEA) required states to provide access to the general curriculum and alternate assessments for students with disabilities unable to participate in statewide assessments. It was thought that the inclusion of students with disabilities in high stakes testing programs would result in positive consequences for students with disabilities (Ysseldyke, Dennison, & Nelson, 2004). Hypothesized and empirically supported benefits included greater access to the general curriculum, improved instructional methods, additional resources, and higher expectations for student learning (Browder, Fallin, Davis, & Karvonen, 2003; Browder, Spooner, Algozzine, Ahlgrim-Delzell, Flowers, & Karvonen, 2003; Quenemoen, Lehr, Thurlow, & Massanari, 2001; Thurlow & Johnson, 2000; Wehmeyer, Lattin, & Agran, 2001). While the intent of IDEA 1997 was to benefit students with disabilities, some aspects of testing can have adverse, unintentional educational consequences (Messick, 1992).

The Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) states “It is the responsibility of those who mandate the use of tests to monitor their impact and to identify and minimize potential negative consequences. Consequences resulting from the uses of the test, both intended and unintended, should also be examined by the test user” (p. 145). Gathering evidence of the intended effects is the first, necessary step (Lane, Parke, & Stone, 1998). Accountability systems are intended to impact the following: (a) implemented curriculum; (b) the instructional content and strategies; (c) the content and format of classroom assessments; (d) student, teacher and administrator motivation and effort, (e) learning experienced by all students; (f) the nature of professional development support; (g) teacher participation in administration, development, and scoring of assessments; (h) student, teacher, administrator, and public awareness and beliefs about the assessment, criteria for judging performance, and the use of the assessment results; and (i) the use and nature of test preparation materials (see Cizek, 2001; Frederiksen & Collins, 1989; Koretz, Barron, Mitchell & Stecher, 1996; Koretz, Stecher, Klein, & McCaffrey, 1994; Linn, 1993; Linn, Baker & Dunbar, 2001; Messick, 1992). The unintended consequences of the use of assessment results in accountability systems may include: (a) the narrowing of curriculum and instruction to focus only on the specific learning outcomes assessed and ignoring of the broader construct reflected in the specified learning outcomes (e.g., Chabran, 2003; Cizek, 2001; Corbett & Wilson, 1991; Smith & Rottenberg, 1991; Mehrens, 1998); (b) an inappropriate or unfair use of test scores, such as questionable practices in the reassignment of principals and teachers; (c) undesirable effects on teacher morale (e.g., Corbett & Wilson, 1991; Lattimore, 2001; Smith & Rottenberg, 1991); and (d) diminished students’ self-esteem and increased stress level (Smith & Rottenberg, 1991).

Given the increased emphasis on including students with significant disabilities in school accountability systems, it is important to understand the impact this requirement is having on students, teachers, school administrators, and educational programs. The reaction of participants to the assessment ultimately affects interpretation of the assessment scores (Messick, 1989). A few studies have asked professionals to identify the impact of alternate assessments. Kleinert and his colleagues (Kleinert, Kearns, & Kennedy, 1997; Kleinert, Kennedy, & Kearns, 1999) found that teachers reported benefits for students who had participated in Kentucky’s accountability system including increased student choice making and increased use of student schedules. The teachers also reported that the alternate assessment portfolios used in the accountability system were challenging in both documenting evidence and the amount of time required. In a survey across several states, Flowers, Ahlgrim-Delzell, Browder & Spooner (2005) also found that special education teachers reported both benefits and challenges of implementing alternate assessments. The teachers believed that inclusion in a state accountability system raised expectations for students with significant disabilities, but also thought the alternate assessment competed with teaching time and meeting individual student needs and increased the paperwork burden. In a survey of teachers in Massachusetts Zatta (2003 as cited in Zatta & Pullin, 2004) found that teachers’ willingness and understanding of the alternate assessment and scoring process; commitment from school leadership; and availability of resources such as consultants, time, and adequate materials and equipment impacted how teachers administered alternate assessments. Positive benefits cited in this study included an increased focus on the general curriculum and the use of professional development activities to assist them in aligning instruction with general curriculum frameworks.

Teachers report that the commitment of school leadership and the use of alternate assessment in school accountability scores influence how they perceive alternate assessment. Kearns, Kleinert, Clayton, Burdge, & Williams (1998) assign a great deal of credit for students performing successfully on the Kentucky alternate assessment to school principals who create a climate of educational inclusion. Given the limited information currently available regarding the views of, and impact of, school administrators upon alternate assessment, the need exists to examine the impact alternate assessment is having on the educational programs of students with significant disabilities from special education administrators’ perspectives. As local and state administrators have more systemic perspectives on alternate assessment from outside the classroom, a study on their perceptions contributes to depth of understanding of perceived consequences. In this study, a survey of potential intended and unintended consequences of the use of alternate assessments was administered to special education administrators and exploratory and confirmatory factor analyses were conducted to examine the underlying constructs of the consequences of the use of alternate assessment.

Method

Participants

Participants were recruited from a 2003 list of 5,031 members of The Council of Administrators in Special Education (CASE), a national organization of special education administrators. Because the survey focus was on state, district, and school-level special education administrators, 323 members who identified themselves as having other positions, such as university professors or auxiliary personnel (e.g., speech and language pathologists, occupational therapists) were deleted from the database. Another 142 surveys were returned due to problems with the address, leaving a total of 4,566 CASE members that were asked to complete the survey.

Instrumentation

The survey items were developed based on previous alternate assessment research (Browder, et al., 2003; Flowers et al., 2005; Jones, Jones, & Hargrove, 2003; Kleinert et al., 1999). Six domains served to guide in item development: (a) access to the general curriculum [Access]; (b) improved instruction [Instruct]; (c) increase student expectations [Expect]; (d) increased educational resources and training [Resources]; (e) support from principal [Principal]; and (f) increased workload and stress [Workload]. A single item concerning individual educational plans (IEPs) and communication with parents was also included in the survey. During development, the instrument was examined by five administrators at the 13th Annual Council of Administrators of Special Education (CASE) Conference in Pittsburgh, PA in November, 2002. The survey was then presented to a Project Advisory Committee of 30 stakeholders (i.e., special education teachers, advocates, individuals with disabilities, university professors, and school and state level special education administrators) in January, 2003. During these pilot administrations, feedback was obtained on survey length, applicability of the scales, clarity of questions, and suggestions for additional items. Finally, the instrument was pilot tested with 50 school-level administrators in North Carolina in February, 2003 and the responses analyzed (Wakeman, Ahlgrim-Delzell, & Browder, 2004). Minor changes to survey items were made to create the final instrument.

The first section of the survey included 28 items related to the domains of interest (items are included in Appendix A). Items were rated on a four-point rating scale of strongly agree (1), agree (2), disagree (3), and strongly disagree (4). The second section contained demographic questions designed to describe relevant characteristics of the respondents.

Procedures

A total of 4,566 potential respondents were mailed a postcard announcing the internet-based survey and URL address. Administrators could request a paper copy of the survey and a self-addressed stamped return envelope, but no one made such request. Once connected to the web-based survey, respondents were introduced to the survey with instructions and informed consent information. Each respondent was given an incentive to complete the survey by entering a prize drawing for one $100 gift certificate to Amazon.com. A reminder to complete the survey was sent to all potential respondents one month after the initial mailing.

Descriptive statistics were used to describe demographic characteristics of respondents. An exploratory factor analysis and confirmatory factor analysis were use to examine the underlying structure of the survey items.

Results

Description of Participants

A total of 708 administrators from 49 states and the District of Columbia responded to the survey. Twenty-two percent (n = 155) of the respondents were school-level administrators, 65% (n = 459) were district level administrators, 4% (n = 30) were state-level administrators, 8% (n = 56) did not answer this question, and 1% (n = 8) were university professors.

The school-level administrators represented systems of various sizes with a mean of 676 students (Mdn = 360) including an average of 141 students with disabilities (Mdn = 112). Most of the school-level respondents were Coordinators/Supervisors/Specialists (37%), Administrators/Directors (23%), or Principals or Assistant Principals (18%). Each school submitted an average of 15 alternate assessments during the 2002-2003 academic school year (Mdn = 7). Forty-one percent were employed at the elementary school level, 18% at the middle school level, and 33% at the high school level (8% did not answer this question). Forty-seven percent reported that their school had a designated disability subgroup for NCLB reporting. They conducted an average of 9.5 hours of training related to alternate assessments? in the past year (Mdn = 5 hours, range 0 to 200 hours).

District-level administrators represented systems of various sizes with a mean of 12,225 students (Mdn = 4,200) and 1,871 students with disabilities (Mdn = 600). Most of the district-level respondents were Administrators/Directors (62%) or Coordinators/Supervisors/Specialists (27%). Each district submitted an average of 82 alternate assessments during the 2002-2003 school year (Mdn = 12). They conducted an average of 12.5 hours of training on alternate assessments in the past year (Mdn = 6 hours, range 0 to 250 hours).

Most of the state-level respondents were Administrators/Directors (40%) or Coordinators/Supervisors/Specialists (33%). They conducted an average of 14 hours of training on alternate assessments in the past year (median 8 hours, range 0 to 750 hours).

Consequences of Use of Alternate Assessments

The means, standard deviations, rank order, and domain for the 28 survey items are reported in Table 1. The respondents were randomly divided into two samples, one sample used in the EFA (n=352) and the other sample used in the CFA (n=356). Results of the EFA were used to fix and free paths (lambdas) between the items and the factors in the CFA.

Exploratory Factor Analysis

Principal factors extraction with both orthogonal (varimax) and nonorthogonal (direct oblimin) rotations were performed using SPSS on the 28 items for the first sample of respondents. Both rotation methods yielded similar results and only the varimax rotation solution is reported. Principal components extraction was used prior to principal factors extraction to estimate the number of factors. The assumptions were evaluated using SPSS. There were no univariate or multivariate outliers; however, there was evidence that univariate and multivariate normality were violated, with the direction of skewness changing for different items. The results of the EFA are expected to be weakened because of the lowering of the correlations among the skewed items.

Six factors were extracted with eigenvalues greater than 1.0. Communality values, loadings of items on factors, and percent variance accounted for by the six factors are reported in Table 2. Items are ordered and grouped by size of loading. Loadings under .40 are not reported in Table 1. All the items loaded on at least one factor except item 24 (central office staff assist teachers with AA). Four items (items 5, 11, 15, and 25) loaded on two factors; these items were interpreted to be associated with the factor with the highest loading. The six factors extracted were labeled (a) Instruction, (b) Principal, (c) Training, (d) Workload, (e) Access, and (f) Support. Two items (items 11 and 14) loaded on factors that were not theoretically relevant to the factor and were excluded from further analyses. The remaining items closely paralleled the factors used to design the survey.

Confirmatory Factor Analysis

A confirmatory factor analysis, based on the results of the EFA, was performed through LISREL. A six factor model was hypothesized with items identified in the EFA as indictors of the factors. The assumptions were evaluated using SPSS and PRELIS. The dataset contained 356 respondents. As with the respondents in the EFA, there were no univariate or multivariate outliers but there was evidence that univariate and multivariate normality were violated. The model was estimated with maximum likelihood estimates. A Spearman rank correlation matrix was the data source for the CFA.

There was marginal support found for the hypothesized model developed from the EFA, 2 (df=237, N=356) = 401.06, p<.000001, root mean square error of approximation (RMSEA) =.05, and comparative fix index (CFI) =.93. The 2 suggested that the hypothesized model did not fit the observed model but the RMSE and CFI suggested an adequate fit. Post hoc model modifications were performed in an attempt to develop a better fitting model. On the basis of the Lagrange multiplier test and theoretical relevance, error variances between the observable variables were allowed to correlate. Item 15 was excluded from the follow up analysis due to a large error variance (.9952). In addition, the path between item 18 (principals support teachers completing AA) was set free between the Support factor. The final best fitting model demonstrated support for the hypothesized model, 2 (df=233, N=356) =320.56, p=.00012, RMSEA=.037, comparative fix index (CFI) =.96. The coefficients (i.e., estimated lambdas) and the variability accounted for (R2) in the final model are reported in Table 3. All coefficients were statistically significant.

The correlation coefficients between the six factors are reported in Table 4. The workload factor was inversely related to the other five factors, with correlation coefficients ranging from -.47 (Instruction) to -.22 (Training). All other correlation coefficients were positive, ranging from .82 (between Instruction and Access) to .12 (between Access and Support).

Summary

Regardless of the accuracy and consistency of identifying proficient students and well-performing schools, results of large-scale assessments are meaningless if the results are not used to improve the work of schools (Marion & Gong, 2003). Understanding the consequences of alternate assessments will help inform decision makers of changes that need to be made in the educational system to positively impact student learning and instruction. Obtaining empirical evidence of these consequences is an essential part of good educational practice.

In this study, a survey was designed based on six factors hypothesized as intended and unintended consequences of the use of alternate assessments. The results of EFA and CFA partially supported the hypothesized six factor model. The final model suggested five factors of intended consequences: (a) instruction, (b) administrator/principal support, (c) professional training, (d) access to the general curriculum, and (e) support and resources. There was one unintended consequence factor, workload demands and stress level.

The focus of this study was to examine the measurement model of consequences of AA uses. Since elements of educational systems influence each other, some of the factors in this study could be impacting other factors. This possibility implies that structural models should be examined to fully understand the consequences of AA uses. One example of a structural model to examine is the effects of professional training and resources on instruction and access. Another model to test is the mediating effect of workload demands on the relationship between professional training and resources and instruction. Reporting a single factor in isolation of other factors could mislead decision makers about the consequences of the use of AAs.

Limitations

There are several limitations of this study. First, there may be many other consequences associated with the use of AAs. This study only considered intended and unintended consequences found in the literature. Including AA results in accountability systems could have many unintended consequences that are not fully understood at this early stage of assessing students with significant disabilities in academic areas. Second, a survey was used to assess the perceptions of special education administrators. How accurately these results reflect the actual consequences of AA use are not known. However, as these administrators are in positions to influence the design, administration, and use of alternate assessments, their perceptions should not be discounted unnecessarily.

The intended and unintended consequences of the use of AA results in accountability systems share many of the same consequences found in the use of assessments in general education. Improving teacher instruction and student learning, and increasing educational support and training are all desirable outcomes of accountability systems. Teachers and administrators increased workload and stress level associated with assessments use in accountability systems are a common unintended consequence. A unique intended consequence of the use of AAs is increasing access to the general curriculum for students with the most significant cognitive disabilities. Providing access to the curriculum, which requires inclusive assessment systems, is an important step towards insuring that every student benefits from standards-based reform efforts.