Representative Samples and PARCC to MCAS Concordance Studies, 2016

Representative Samples and
PARCC to MCAS Concordance Studies
This report describes the methods and outcomes for a) selecting representative samples of test-takers for MCAS and PARCC in 2015, and b) identifying estimated MCAS results for PARCC test-takers.
February 2016
Massachusetts Department of Elementary and Secondary Education
75 Pleasant Street, Malden, MA 02148-4906
Phone 781-338-3000TTY: N.E.T. Relay 800-439-2370

This document was prepared by the
Massachusetts Department of Elementary and Secondary Education
Mitchell D. Chester, Ed.D.
Commissioner
The Massachusetts Department ofElementary and Secondary Education, an affirmative action employer,
is committed to ensuring that all of its programs and facilities are accessible to all members of the public.
We do not discriminate on the basis of age, color, disability, national origin, race, religion, sex, gender identity,
or sexual orientation.
Inquiries regarding the Department’s compliance with Title IX and other civil rights laws may be directed to the
Human Resources Director, 75 Pleasant St., Malden, MA 02148-4906. Phone: 781-338-6105.
© 2016Massachusetts Department of Elementary and Secondary Education
Permission is hereby granted to copy any or all parts of this document for non-commercial educational purposes.
Please credit the “Massachusetts Department ofElementary and Secondary Education.”
This document printed on recycled paper
Massachusetts Department of Elementary and Secondary Education
75 Pleasant Street, Malden, MA 02148-4906
Phone 781-338-3000TTY: N.E.T. Relay 800-439-2370

Table of Contents

Introduction

Background and Purpose

Part 1: Selecting Representative Samples

The Need for Representative Samples

Method to Identify Representative Samples

Results from the Representative Sample Study

Part 2: Concordance Tables and Guidance for Use of Data

Concordance Tables Comparing MCAS to PARCC Results

Introduction

Methods for Generating

Composite Performance Index (CPI) Results for PARCC Schools and Districts

Guidance for Using Representative Samples and Concordance Tables

Concordance Tables

Conducting Analyses at the State Level with Representative Samples

Conducting Analyses that are Not State-Level

References

Appendix A: Proof-of-Concept Study

Counts

Balance

Replication of 2013–14 Psychometric Results

Replication of 2013–14 Student Growth Percentiles

Replication of 2013–14 Accountability Results

Summary of Results from the Proof-of-Concept Study

Appendix B: Method Used to Select Representative Samples

Appendix C: Logistic Regression Variables and Results

Introduction

During the 2014–15 school year, school districts in Massachusetts were offered a choice regarding their grades 3–8 summative testing programs: whether to participate in MCAS or PARCC.In order to generate stable trends for the 2014–15 school year, the State embarked on two analytical studies. The first addressed non-equivalence in MCAS and PARCC samples of test-takers through the selection of representative samples from each group.The second estimated MCAS scores for PARCC test-takers to generate Composite Performance Index values (CPIs, which are measures of proficiency for schools and districts).

Although each test was taken by roughly half of the grades 3–8 examinees, demographic differences between the two groups of examinees remained. If left unaddressed, these demographic differences would distort state trends and other analyses.To reduce unintended differences between the two groups of examinees, the Department, with assistance from national testing experts (members of the MCAS Technical Assistance Committee), developed a method to select representative samples from the total samples of examinees taking MCAS and PARCC in 2015. This first analysis produced representative samples of examinees taking MCAS and PARCC that were significantly more similar to each other than the total samples were.

The second analysis used the representative samples produced in the first analysis to match MCAS scores for examinees, by grade and subject/test, to PARCC scores, using an equipercentile linking approach (which links scores across the distributions of the two tests). The resulting data were used to generate CPIs for students, schools, and districts.

This report details the methods used to identify representative samples for MCAS and PARCC test-takers and the methods used to estimate MCAS scores for PARCC examinees, and presents outcomes from both analyses to show how well each study worked.Guidance for using the representative samples is also provided.

Background and Purpose

Massachusetts has administered its Massachusetts Comprehensive Assessment System (MCAS) tests in English language arts and mathematics every year since 1998. In 2010 it joined the PARCC consortium to develop new tests aimed at measuring college and career readiness. In 2013–14 Massachusetts participated in PARCC field testing, and in 2014–15 Massachusetts continued its trial of the PARCC test for a second year while continuing to administer the MCAS.

For the spring 2015 test administration, Massachusetts public school districts serving grades 3 to 8 were offered the option to administer either the MCAS or PARCC tests in English language arts and mathematics.[1] Because districts were not assigned randomly to take PARCC or MCAS, the groups of students who took MCAS were likely to be systematically different (i.e., higher- or lower-performing or having different demographic characteristics) than those who took PARCC. When samples systematically differ, it interferes with the ability to observe whether changes in state-level student achievement from one year to the next are due to actual changes in performance or to differences in the samples (or both), and simply combining results from the two assessments will not produce an accurate picture of statewide performance.

To address this issue, the State developed a methodology to identify samples of 2015 MCAS and PARCC test-takers that were representative of all students in the state. These students’ performance would be used to determine how MCAS and PARCC results compared and could be linked. The purposes for doing so were

to report state-level results for 2015, including results from both MCAS and PARCC test-takers;
to maintain trends for MCAS results relative to prior years;
to calculate student growth percentiles (SGPs) for MCAS and PARCC test-takers; and
to calculate accountability levels for all districts and schools; PARCC accountability levels are calculated using concordance tables that identify associated MCAS score estimates for a range of PARCC scores.

Part 1 of this report explains in further detail the need for representative samples, and describes the methodology the Department used to select them.Part 2 of the report explains the process for generating the concordance tables linking PARCC results to MCAS, and provides guidance about how to interpret and use assessment data from the 2015 school year.

Part 1: Selecting Representative Samples

The Need for Representative Samples

As expected the students taking MCAS and PARCC were not equivalent, with differences on prior performance and student demographic variables.In terms of numbers, although about 50% of the districts participated in each test[2], the number of PARCC test-takers was slightly higher.Table 1 compares the numbers of districts and grades 3–8 students that participated in PARCC and MCAS.The full list of district choices for the 2015 assessments is available on the State PARCC website, in the Excel file (“list by district”).

Table 1: District Assessment Choices for 2015

Assessment Choices for Spring 2015*
# of Districts / % of Districts / # of Students / % of Students
MCAS / 230 / 55% / 197480 / 47%
PARCC / 192 / 45% / 225572 / 53%
Total / 422 / 100% / 423052 / 100%
*District counts do not include the three largest districts or any single-school district. Schools in the three largest districts (Boston, Springfield, and Worcester) were assigned either MCAS or PARCC. In single-school districts, 188 districts administered MCAS and 6 administered PARCC.

MCAS and PARCC 2015 test-takers scored similarly on MCAS in 2014, as shown in Table 2. In both English language arts and mathematics, the percentages scoring at each proficiency level are similar across the assessments, with the 2015 MCAS test-takers performing slightly higher at the Advanced level.

Table 2: 2014 MCAS Results for 2015 MCAS and PARCC Test-Takers

Group Achievement Levels and SGP Differences, Grades 3–8
2014 Average MCAS & PARCC Test-Takers / 2014 MCAS Results of 2015 MCAS-takers / 2014 MCAS Results of 2015 PARCC-takers
ELA Achievement Level: Advanced / 14.4% / 15.1% / 13.8%
ELA Achievement Level: Proficient / 52.6% / 52.8% / 52.4%
ELA Achievement Level: Needs Improvement / 25.2% / 24.3% / 26.0%
ELA Achievement Level:Warning / 7.8% / 7.9% / 7.8%
ELA Student Growth Percentile / 50.1 / 50.2 / 49.9
Total Number ELA / 410811 / 187465 / 223346
Math Achievement Level: Advanced / 24.7% / 25.5% / 24.1%
Math Achievement Level: Proficient / 33.3% / 33.5% / 33.3%
Math Achievement Level: Needs Improvement / 27.5% / 26.7% / 28.1%
Math Achievement Level:Warning / 14.5% / 14.3% / 14.6%
Math Student Growth Percentile / 50.2 / 50.6 / 49.8
Total Number Math / 412005 / 187704 / 224301

Table 3 compares MCAS and PARCC test-takers by demographic characteristics. The demographic differences between the two are somewhat larger than the achievement differences, driven in part by the decision in some large school districts to administer PARCC. Overall, students with higher needs are more heavily weighted in the PARCC sample.

Table 3: 2014 Demographics for 2015 MCAS and PARCC Test-Takers

Group Demographic Differences, Across Grades
2014 Overall Population / 2015 MCAS-takers / 2015 PARCC-takers
Ever ELL / 14.5% / 15.1% / 17.0%
High Needs* / 49.1% / 47.5% / 54.0%
Free/Reduced Lunch** / 39.1% / 36.7% / 44.8%
Race: AA/Black / 8.3% / 5.1% / 11.0%
Race: Asian / 6.2% / 7.5% / 6.0%
Race: Hispanic / 16.4% / 17.3% / 18.8%
Race: White / 66.0% / 67.2% / 60.7%
Race: More than One / 3.0% / 2.9% / 3.1%
Race: Other / 0.3% / 0.3% / 0.4%
Race: AA/Hispanic / 24.6% / 22.5% / 29.8%
No Special Needs Services / 81.8% / 81.1% / 81.3%
Minimal Hours Special Needs Services / 2.7% / 2.7% / 2.7%
Low Hours Special Needs Services / 3.6% / 3.4% / 3.5%
Moderate Hours Special Needs Services / 9.6% / 9.3% / 9.2%
High Hours Special Needs Services / 2.2% / 3.4% / 3.3%
Total N / 442982 / 202938 / 240044
*High Needs Students belong to at least one of these groups: current/former English Language Learner (ELL), low income, student with disabilities.
**2014 Values, Imputed.

Although the demographic differences between MCAS and PARCC test-takers are not great, they are large enough to call into question whether the two groups can fairly be compared without making an adjustment for selection bias.

Method to Identify Representative Samples

The process used to identify representative samples involved matching each of the 2015 testing populations (MCAS test-takers and PARCC test-takers) to the characteristics of the overall 2014 MCAS population using student-level data. (The Department chose 2014 as the target population because 2014 was the last year for which the state has statewide results on a single assessment: MCAS). By removing from each 2015 sample those test-takers who were most dissimilar to the 2014 test-takers, the Department was able to create two 2015 samples that are well-matched to the 2014 student population. By definition, the two 2015 samples are also roughly equivalent. This matching process is represented visually in the logic model in Figure 1.

Figure 1: Logic Model for the Sample-Matching Study

The methodology for selecting representative samples is a variation of propensity score matching, a statistical technique commonly used to estimate the impact of a treatment when participants are not randomly assigned to it (Angrist & Pischke, 2009; Austin, 2011; Murnane & Willett, 2011; Rosenbaum, 2010). The situation here is not precisely analogous, as the self-selection into the MCAS or PARCC test is determined by districts, not by student characteristics. But the principle applies nonetheless: we can identify a representative sample of students who are similar to one another in all measurable ways except the assignment of taking MCAS or PARCC.We can then use these representative groups to estimate state findings.

The propensity score matching conducted in this analysis used prior MCAS results and student demographic variables to match test-takers in each sample (MCAS and PARCC) in the current year to the population of test-takers in the prior year. (It should be noted that prior MCAS results were emphasized in the analysis, resulting in better balance on prior achievement than on demographic variables, although it will be shown that the method worked to create better balance on both sets of variables.)The method worked by removing test-takers who were more unlike the prior year’s population of test-takers, creating two sets of representative samples comprised of test-takers more like those of the prior year’s population of students.

Results using this methodology were evaluated in a “proof-of-concept study” that applied the method to draw representative samples in 2014 that were equivalent to the population of examinees in 2013.If the method worked well, then we would expect to get identical results for analyses conducted in 2014, which we did.The four critical checks conducted and the results were

1)The prior achievement and key demographic variables looked similar across the samples and were similar to the prior year’s data (2013).

2)The MCAS cut scores (i.e., the raw scores that correspond with the MCAS achievement levels of “220, Needs Improvement,” “240, Proficient,” and “260, Advanced”) were replicated for the representative sample of examinees assigned to MCAS in 2014.[3]

3)The student growth percentiles (SGPs) had a uniform (flat) distribution with a median at or near 50.[4] The majority of SGPs generated using the representative samples were the same as or very close to the actual SGPs.

4)School- and district-level accountability results were nearly equivalent to what was reported in 2014 for both samples.

The proof-of-concept study provided evidence that the methodology worked well.Consequently, the State should be able to use the representative samples as the data source for psychometric and analytical work and still obtain the same results as it would have if it had used the full sample. A full presentation of the evidence from the proof-of-concept study is presented in Appendix A.

The proof-of-concept study also allowed the State to establish the methodology for selecting the samples prior to the generation of 2015 assessment data, to avoid any concern that the State might select a sampling strategy that would advantage students who took one or the other assessment.

Using a slightly refined methodology, the same analysis used in the proof-of-concept study was conducted for 2015 to select representative samples of MCAS and PARCC test-takers from the 2015 administration, measuring their representativeness by the characteristics of the state in 2014. Further details on the matching methodology are provided in Appendix B.

Results from the Representative Sample Study

The number of overall test-takers and the number of students selected for each representative sample are shown in Table 4.

Table 4: PARCC and MCAS Samples for 2015

PARCC and MCAS Samples, 2015
MCAS / PARCC
Total MCAS / MCAS Rep. Sample / MCAS % Removed / Total PARCC / PARCC Rep. Sample / PARCC % Removed
Grade 3 / 33251 / 25086 / 25% / 39534 / 29704 / 25%
Grade 4 / 33205 / 25324 / 24% / 39114 / 30026 / 23%
Grade 5 / 33962 / 26058 / 23% / 39828 / 30416 / 24%
Grade 6 / 33978 / 25357 / 25% / 40284 / 30198 / 25%
Grade 7 / 33579 / 26154 / 22% / 40327 / 30624 / 24%
Grade 8 / 34963 / 26252 / 25% / 40957 / 31209 / 24%
Total / 202938 / 154231 / 24% / 240044 / 182177 / 24%

Approximately 75 percent of test-takers were retained in each representative sample. Retaining a large N was important to minimize error, particularly for down-the-line calculations such as student growth percentiles that depend on a large amount of student data to be estimated accurately.

Looking first at how well the representative samples are matched to the population in 2014 and to each other, Tables 5 and 6 demonstrate that the MCAS and PARCC samples are well-matched to the state on students’ prior performance and demographic characteristics. As shown in Table 5, the MCAS sample is nearly identical on prior performance to MCAS test-takers as a whole, but the PARCC representative sample selects disproportionately from higher-performing PARCC test-takers to make the sample more similar to the state.

Table 5: Comparison of Achievement Outcomes for MCAS and PARCC Test-Takers,

by Grade and Sample, to 2014 MCAS Population

Comparison of Achievement Outcomes for 2015 Test-Takers, by Grade and Sample
to 2014 Population
2014 MCAS Population Average / All 2015 MCAS Test-Takers / All 2015 PARCC Test-Takers / 2015 Rep. Sample MCAS / 2015 Rep. Sample PARCC
Gr. 3 * / 53% / 54% / 51% / 53% / 53%
Gr. 4–8** / 50% / 51% / 49% / 51% / 51%
Average Ach. Gr. 4–8*** / 58% / 58% / 55% / 56% / 56%
*2014 Achievement Outcome Grade 3: Estimated percent scoring Proficient+ on MCAS ELA & Math, by school and demographic group
**2014 Achievement Outcome Grade 4–8: Percent scoring Proficient+ on MCAS ELA & Math
***Average percent of examinees scoring Proficient+ on 2014 MCAS ELA and Math, separately

As shown in Table 6, the MCAS and PARCC representative samples are fairly equivalent across most demographic comparisons. The largest differences are identified in the Black/African American and High Needs categories, again likely stemming from the choice of some large school districts to administer PARCC. The representative samples do balance this difference somewhat, but the PARCC representative sample still has slightly higher percentages of test-takers in these categories (along with fewer White students) than the 2014 Population and the 2015 representative sample for MCAS.In addition, the PARCC sample has slightly more examinees who were English language learners or who received free- or reduced-priced lunch in 2014.

Table 6:Comparison of Demographics for 2015 MCAS and PARCC Test-Takers to

2014 Population of Examinees

Comparison of 2015 Demographics to 2014 Examinee Population
Demographic / 2014 Population / All 2015 MCAS-Takers / All 2015 PARCC-Takers / 2015 Rep. Sample MCAS / 2015 Rep. Sample PARCC
Ever ELL / 14.7% / 15.1% / 17.0% / 14.2% / 16.7%
High Needs* / 47.2% / 47.5% / 54.0% / 46.0% / 47.9%
Free Lunch (2014, imp.)** / 38.0% / 36.7% / 44.8% / 35.6% / 39.7%
Race: Black/African American / 8.5% / 5.1% / 11.0% / 5.7% / 10.8%
Race: Asian / 5.8% / 7.0% / 5.9% / 7.0% / 6.4%
Race: Hispanic / 15.3% / 17.3% / 18.8% / 16.0% / 16.0%
Race: White / 67.7% / 67.2% / 60.7% / 67.9% / 63.6%
Race: Other / 0.3% / 0.3% / 0.4% / 0.3% / 0.4%
Special Education / 16.9% / 17.7% / 17.6% / 17.2% / 15.8%
*Students in the High Needs category belong to any of these groups: special education, low-income, and ELL or ever-ELL students
**Free lunch values were estimated for students with missing values

Student growth percentiles (SGPs) generated for 2015 MCAS and PARCC (provided in Table 7)show a median at or near 50 in all grades for the representative samples, while there is a greater departure from 50 for examinees not included in the representative samples. Across all test-takers in the state, SGPs hover at or near a median of 50, as expected.