Program Evaluation
A Methodological Primer for Program Administrators
By
George R. Reinhart, Ph.D.
National Research Council
The National Academies
So, you have been tasked to conduct an evaluation of your program and you don’t want the person to whom you report to tell you to go to hell. Well, I’m here to give you some pointers that will tame the task and enable you to conduct an evaluation that is doable with limited resources.
There are many approaches to evaluation and each of these approaches requires different levels of resources. I prepared a handout that you can read at your leisure that describes some of them. The Department of Education has prepared a document “Identifying and Implementing Educational Practices Supported by Rigorous Evidence: A User Friendly Guide” that describes additional approaches. This can be found on the web at
Because the User Friendly Guide recommends the use of control groups, randomized design, and a clinical trial approach,some programs may find difficult to implement the programs described in the document. The use of quasi-experimentation with comparison groups is an approach that seeks to overcome the barriers imposed by randomization and can give results that are comparable to those obtained by randomized trials.
In addition, clinical trials are focused on determining if an intervention produces the desired results and not in assisting the program to achieve its goals and objectives. While clinical trials may be essential for determining the efficacy of an intervention, the clinical trial approach may not be effective for determining the efficacy of the whole program.
Consequently, I advocate an approach based onutilization-focused evaluation developed by Michael Patton. This approach views the evaluator as a member of the program team who works interactively with staff to ensure that milestones, outcomes, and goals are reached.
Let’s take a look at one program as an example – Upward Bound. Upward Bound provides fundamental support to participants in their preparation for college entrance. It serveslow income high school students and those who are the first in their family to seek postsecondary education. The goal of Upward Bound is to increase the rates at which participants enroll in and graduate from institutions of postsecondary education. Upward bound projects provide instruction in math, laboratory science, composition, literature, and foreign languages in addition to mentoring, tutoring, and facilitating the transition to college.
The question of interest to the program and to the evaluator is “How well is the program accomplishing its goals of moving the targeted high school students into college?” They are not hypothesizing whether the services offered through Upward Bound enhance entrance into postsecondary education. The question for the program is one of dose and response, is there some relationship between the amount and/or intensity of services provided and the number or percent of students entering into and graduating from postsecondary institutions.
The Upward Bound Program mandates a detailed record structure from all grant recipients. This record structure contains 78 fields for each program participant. The data in the record covers the background characteristics, services offered, short-term outcomes, and long-term outcomes for all participants. These data can be used as the basis for an outcome evaluation that examines the dose – response relationship between services offered through the program and the extent to which the program is meeting its goals. Rather than the identification, collection, and analysis of new data, the program can conduct an evaluation using extant data routinely collected as part of its normal operation. Consequently, the requirement for new resources is minimized.
A logic model provides a picture of what the program hopes to achieve and a chain of connections showing how the program intends to work. The Logic model includes: resources or inputs, program activities, program outputs, short- and long-term outcomes, and the program’s impact. The Kellogg Foundation has published a good description of logic models and their use, which can be found at
The components of the model are linked by “what – if” relationships, such as what happens to a short-term outcome if the number and/or intensity of activities are increased. The flow chart below shows the elements of the logic model.
Resources/ Activities OutputsShort-Term Long-TermImpact
InputsOutcomes Outcomes
Resources include the human, financial, organizational, and community resources available. Inputs may also include factors that may affect resource allocation or outcomes. Activities are what the program does with its resources. Outputs are the direct products of the program and may types, levels, and targets of services. Outcomes are specific changes in participant’s behavior, knowledge, skills, and level of functioning. Impact refers to changes in communities, organizations, or systems that follow from outcomes – in some cases, the program may not be able to assess its impact.
I classified the data elements required for the Upward Bound record structure into the five types of core elements used in a logic model. Please note that those of you with more experience with Upward Bound may wish change this classification. This breakdown is shown in the table below.
Data Collection Elements
Upward Bound and Upward Bound Math-Science Projects
1
Inputs
Number of Students Entering Program
Number of Faculty Participating
Number of Schools
Social Security Number
Gender
Race/Ethnicity
Age
Eligibility
UB Initiative
Targeting
Participant Status
Participation Level
Academic Need
Grade Level at Entry
Date of Entry
High School GPA at entry
High School Cumulative GPA atentry
High School Cumulative GPA at beginning of reporting period
Activities
Mathematics Instruction/Tutorials
Mathematics Instruction Summer Program
Science Instruction/Tutorials
Science Instruction Summer Program
Foreign Language Instruction/Tutorials
Foreign Language Instruction SummerProgram
English Instruction/Tutorials
English Instruction Summer Program
Reading Instruction/Tutorials
Computer Instruction/Tutorials
Tutoring
Supplemental Instruction
College Entrance Examination Preparation
Peer Counseling/Mentoring
Professional Mentoring
Study Skills
Cultural Activities
Career Awareness
Campus Visitation
Assistance with College Admissions
Family Activities
TargetSchool Advocacy
Work Study Position
Employment
Math-Science Activities
Outputs
Projected Date of Re-Entry
Date of Last Entry
Reason for Dropout
Grade Level at end
High School GPA at end
High School Cumulative GPA at end
Number of HS Credits Earned
Short-Term Outcomes
College Entrance Exam
Type of Standardized Tests Used
PSAT Test Score
SAT verbal/math scores
ACT Scores
First Postsecondary Enrollment Date
Student Financial Aid Awarded
Postsecondary Enrollment Status
Long-Term Outcomes
Number of Post Secondary Education Credits Earned
Number of Non-Credit Hours Earned
College Grade Level at Program End
Postsecondary Academic Standing
Degree/Certificate Completion
Post Secondary Grading Term
1
Now, there are way too many data elements in the table to use simultaneously. However, it is possible to identify a number of smaller models. Let’s say you want to examine the effect of tutoring on changes in GPA, SAT scores, and the decision to enter college and its consequences. For this model the variables of interest might include:
Inputs – Number of Students, Number of Faculty, Number of Tutors, Gender, Age, Grade, and GPA at Entry
Activities – Math, Science, English, Reading, and Computer Tutorials
Outputs – Reason for Dropout, Grade Level, GPA at end of Program
Short-Term Outcomes – SAT Scores, ACT Scores, College Entrance Exam
Long-Term Outcomes – Postsecondary Entry Date, Degree/Certification Completion, College Grade Level
You can measure changes in GPA, SAT Scores, and the Postsecondary Entry decision as a function of the aggregate level of tutoring, controlling for Gender, Age, Grade, and GPA at Entry. You would be looking for 1) increases in GPA over time to be positively correlated with aggregate level of tutoring, 2) high SAT and ACT Scores are associated with increases in GPA, and 3) entry into postsecondary institution is associated with high SAT and ACT Scores. All relationships should be mediated (controlled) by examining the impact of gender, age, and grade, e.g., are the correlations the same for both genders.
There are many possible interrelationships among variables that could be investigated. It is the role of the evaluator and the program staff to identify the outcomes that are most important to them – the outcomes that the staff believes define success. More importantly, these relationships should be assessed early and often. If evaluation of data at the beginning of the programdoes not demonstrate the hoped-for relationships, then the staff has time to modify the program to ensure that goals are met. It is important to identify successes and failures early in the program when there is time to amplify positive aspects and correct negative aspects of the program’s design and implementation.
The model presented above does not meet the “gold standard” of the clinical trial. Nevertheless, it provides the program staff important information to assess the program’s effectiveness. Thorough evaluations, evaluations that ate targeted to each of the programs outcomes and goals, can provide a complete roadmap of the program’s processes and outcomes. Demonstrating the effectiveness of the roadmap provides the program administrators with strong and convincing evidence for continuation of existing funds and justification for new funds.
1