Government Performance and Results Act (GPRA) Indicators for the MSP Program, Performance Period 2007
Analytic and Technical Support for Mathematics and Science Partnerships
Contract # ED-04-CO-0015
Task Order # 0012
Draft
June 16, 2009
Prepared for
Patricia O’Connell Johnson
Miriam Lund
Jimmy Yun
Michelle Meier
U.S. Department of Education
OESE/Mathematics and Science
Partnerships
400 Maryland Ave, SW
Washington, DC 20208
Prepared by
Abt Associates Inc.
Government Performance and Results Act (GPRA) Indicators for the MSP Program, Performance Period 2007
The U.S. Department of Education’s Mathematics and Science Partnerships (MSP) Program creates partnerships between high-need school districts and mathematics, science, and/or engineering departments at institutions of higher education for the purpose of providing intensive content-rich professional development to teachers and thus improving student achievement in mathematics and/or science. The program requires projects to evaluate the impact of participation in MSP professional development on gains in teacher content knowledge and on student achievement.
Under the Government Performance and Results Act (GPRA), all federal agencies are required to develop indicators in order to report to the U.S. Congress on federal program impacts and outcomes. For the MSP Program, the following indicators have been developed:
· Teacher Knowledge
- The percentage of MSP teachers who significantly increase their content knowledge as reflected in project-level pre- and post-assessments.
· Student Achievement
- The percentage of students in classrooms of MSP teachers who score at the basic level or above in state assessments of mathematics or science.
- The percentage of students in classrooms of MSP teachers who score at the proficient level or above in state assessments of mathematics or science.
· Evaluation Design
- The percentage of MSP projects that report using an experimental or quasi-experimental design for their evaluations.
- The percentage of MSP projects using an experimental or quasi-experimental design for their evaluations whose evaluations are conducted successfully and yield scientifically valid results.
· Timeliness
- The percentage of SEAs that submit complete and accurate data on MSP performance measures in a timely manner.
Data for the MSP Program for Performance Period 2007 (PP07)[1] on each of these GPRA indicators are presented in the sections below. Report data were analyzed from 575 MSP projects,[2] serving a total of 59,969 teachers.
Teacher Knowledge
1) The percentage of MSP teachers who significantly increase their content knowledge, as reflected in project-level pre- and post-assessments.
As part of their evaluations, MSP projects are required to assess pre- and post-test teachers’ content knowledge in mathematics and/or science during the years in which they receive intensive professional development. Projects reported the number of MSP teachers who significantly increased their content knowledge in mathematics and/or science topics on project pre- and post-assessments[3]. Exhibit 1 presents data for those teachers who were assessed for gains in content knowledge. Among the teachers assessed, approximately two-thirds (68 percent) of teachers showed significant gains in mathematics content knowledge and nearly three-quarters (73 percent) of teachers showed significant gains in science content knowledge.
Exhibit 1. Percent of Teachers with Significant Gains In Content Knowledge, Among Teachers with Pre-Post Content AssessmentsContent Area / Total number of teachers served / Number of teachers with content assessments / Percent of assessed teachers with significant gains
Mathematics / 34,567 / 11,696 / 68%
Science / 26,552 / 11,546 / 73
Note: Individual teachers who received professional development in both mathematics and science may be double counted.
Student Achievement
2) The percentage of students in classrooms of MSP teachers who score at the basic level or above in state assessments of mathematics or science.
3) The percentage of students in classrooms of MSP teachers who score at the proficient level or above in state assessments of mathematics or science.
Projects also reported the number of students served, the number of students assessed, and the number of students scoring at the basic level or above and at the proficient level or above in state assessments in both mathematics and science. In Exhibit 2, it can be seen that nearly 1.3 million students were taught by teachers who received professional development in mathematics, and over 800 thousand students were taught by teachers who received professional development in science. State assessment data were reported for over 600 thousand students (48 percent) in mathematics and for approximately 250 thousand students (30 percent) in science. Approximately half of the students with assessment data scored at the basic level or above in mathematics (52 percent) or in science (50 percent).[4] Slightly fewer students with assessment data scored at the proficient level or above in mathematics (45 percent) or in science (49 percent). These numbers were aggregated across all grade levels and all schools with teachers in the MSP project.
Exhibit 2. Percent of Students Scoring at Basic or Above, Among Students Taught by MSP Teachers And Assessed In Each Content AreaContent area / Total number of students taught by MSP teachers / Number of students with assessment data / Percent of assessed students at basic level or above / Percent of assessed students at proficient level or above
Mathematics / 1,284,911 / 610,868 / 52% / 45%
Science / 844,749 / 253,216 / 50 / 49
Evaluation Design
4) The percentage of MSP projects that report using an experimental or quasi-experimental design for their evaluations.
Exhibit 3 presents the percentages of MSP projects using various types of evaluation designs in PP07. Two percent of projects reported that they were implementing an experimental design, which is the most rigorous research design for testing the impact of an intervention, wherein schools, teachers, or students are randomly assigned to treatment or control groups.
Forty percent of projects used a quasi-experimental, or comparison group design, to compare the effects of the MSP Program on participating teachers and/or their students to non-participating teachers and/or students. Nearly one-fourth of projects (24 percent) used a matched comparison group design, which attempts to show causality by demonstrating equivalence between groups at baseline or adjusting for any initial differences between groups. The remaining 16 percent of projects that used a comparison group design reported using a non-matched comparison group.
The remaining projects reported using one-group designs, with no comparison group, or an “other” design type. One-fifth (20 percent) of projects reported using pre-tests and post-tests to assess the gains of the teachers served by MSP. Fourteen percent of projects reported using mixed quantitative and qualitative methods, and 11 percent of projects reported using qualitative methods only.
Exhibit 3. Percent of Projects Using Various Types of Evaluation DesignsEvaluation design categories / Percent of projects
Experimental / 2%
Matched comparison group / 25
Non-matched comparison group / 17
One-group pre-post / 20
Mixed methods / 14
Qualitative / 11
Other / 11
5) The percentage of MSP projects using an experimental or quasi-experimental design for their evaluations whose evaluations are conducted successfully and yield scientifically valid results.
As part of its program, every MSP project is required to design and implement an evaluation and accountability plan that allows for a rigorous assessment of its effectiveness. The requirement for a rigorous evaluation of MSP projects is specified in the program’s enabling legislation in the No Child Left Behind Act. In order to ensure that projects are providing high-quality information on program outcomes, a rubric was developed as part of the Data Quality Initiative (DQI) through the Institute for Education Sciences (IES) at the U.S. Department of Education. The six criteria that comprise the rubric, as shown below, specify conditions that must be met by projects that use experimental designs and comparison group designs to be deemed successful evaluations that yield scientifically valid results.[5]
1. Baseline equivalence of groups—there were no significant pre-intervention differences between treatment and comparison group participants on variables related to key outcomes, or groups have similar background characteristics.
2. Sample size—sample size was adequate based on a power analysis or on meeting predetermined thresholds based on assumptions about the number of students, teachers, or schools needed to have adequate power.
3. Quality of the measurement instruments—the study used existing data collection instruments that had already been deemed valid and reliable to measure key outcomes; data collection instruments developed specifically for the study were sufficiently pre-tested with subjects who were comparable to the study sample; or data collection instruments used selected items from a validated and reliable instrument or instruments if the resulting instrument included at least 10 items and at least 70 percent of the items were from the validated and reliable instrument(s).
4. Quality of the data collection methods—the methods, procedures, and timeframes used to collect the key outcome data from treatment and comparison groups were the same.
5. Data reduction rates (i.e., attrition rates, response rates)—1) the study measured the key outcome variable(s) in the post-tests for at least 70% of the original study sample (treatment and comparison groups combined) or there is evidence that the high rates of data reduction were unrelated to the intervention, AND (2) the proportion of the original study sample that was retained in follow-up data collection activities (e.g., post-intervention surveys) and/or for whom post-intervention data were provided (e.g., test scores) was similar for both the treatment and comparison groups (i.e., less than or equal to a 15-percent difference), or the proportion of the original study sample that was retained in the follow-up data collection was different for the treatment and comparison groups, and sufficient steps were taken to address this differential attrition in the statistical analysis.
6. Relevant statistics reported—the final report includes treatment and comparison group post-test means and tests of statistical significance for key outcomes; or provides sufficient information for calculation of statistical significance (e.g., mean, sample size, standard deviation/standard error).
The evaluations of all final year MSP projects that reported using an experimental or comparison group design were reviewed according to the rubric to determine the number of projects that conducted successful evaluations yielding scientifically valid results. Most projects implemented multiple evaluations, assessing the effects of various outcomes, including teacher content knowledge, teacher practices, or student achievement. Each of their evaluations for which data were provided for both the treatment and comparison groups was reviewed separately.
One hundred eighty-three projects reported submitting a final year report in PP07. Of these, 37 completed an experimental or comparison group design and submitted data for both groups. Four of these projects implemented at least one evaluation that met all design criteria specified in the rubric. One project successfully evaluated the effects of its MSP intervention on student achievement using an experimental design. The remaining three studies successfully implemented comparison group designs examining the effects of their interventions on teacher content knowledge, classroom practices, and/or student achievement. (See Exhibit 4.)
Exhibit 4. Number of Final Year Projects with At Least One Evaluation Design that was Conducted Successfully and Yielded Scientifically Valid ResultsCategorization of evaluation type / Number of projects
Comparison group design / Experimental design
Conducted an experimental or comparison group design / 36 / 1
Met all design criteria specified in the rubric / 3 / 1
Timeliness
6) The percentage of SEAs that submit complete and accurate data on MSP performance measures in a timely manner.
Submission guidelines for Annual Performance Reports (APRs) were developed as a basis for the timeliness calculation. MSP State Coordinators were responsible for ensuring that all projects within their state submit complete and accurate data by this date. APRs for PP2007 were accepted until March 17, 2009. Projects that informed the Department of Education that they would not receive teacher and/or student data in time or that had other extenuating circumstances were given an extension on the due date of their reports. Only two states (Kansas and New Hampshire) had projects with APRs that were considered outstanding as of March 17, 2009. Thus 96 percent of states were considered to have submitted complete and accurate data on MSP performance measures in a timely manner.
Abt Associates Inc. MSP Performance Measures 6
[1] Performance Period 2007 (PP07) refers to the period between October 1, 2007 and September 30, 2008. PP07 projects are those for which the majority of months of activities described in the Annual Performance Report take place between October 1, 2007 and September 30, 2008
[2] The annual performance reports included in these analyses included all reports submitted by February 28, 2009. These primarily included PP07 reports, but may also have included some PP06 reports for which teacher and/or student data were not available in time to submit during the previous year.
[3] Statistical significance of a teacher’s gain score (computed as post-test score minus pre-test score) is calculated at the 0.15 level, using a paired-samples t-test for 30 or more participating teachers or a Wilcoxon signed ranks test for fewer than 30 participating teachers.
[4] There was variation in the way projects reported on the number of students who scored at the basic level or above on statewide assessments. Some projects reported the number of students who scored at basic only, while others reported the number of students who scored at the basic level or higher. Thus, the figures in Exhibit 4 represent a lower bound of the actual proportion of students who scored at the basic level or above.
[5] Since the rubric was only developed and approved after the PP07 projects developed their evaluation designs, we would not expect a large number of projects to meet the criteria at this point.