Information for Accountability: Impact Evaluation of Teacher Training and Early Grade Reading Assessment in Liberia
Concept Note
INTRODUCTION
Fourteen years of civil war in Liberia created a number of challenges in the education sector, including lack of textbooks and learning materials, and estimated 62% of teachers with no formal training, and basic reading and math competency significantly lower than students in the region. To address these problems, the MOE in collaboration with The World Bank and USAID have developed the EGRA PLUS. The EGRA PLUS refers to the administration of the student reading assessment, teacher training, and school report cards. As such, the EGRA seeks to diagnose student reading ability, increase teacher capacity to address reading deficiencies through the provision of specialized teacher training, and provide information to parents and communities about students’ performance for monitoring purposes.
Although there is an increasing focus on information provision to parents, schools, and communities on status of student outcomes, there exists mixed evidence on the impact of these policy interventions on student reading and long-term education outcomes (Banerjee et al. 2008, Kremer and Gugerty 2006, Pandey et al. 2008)[1]Therefore, arigorous impact evaluation was designed due to thedesire of the stakeholders in Liberia to identify the effectiveness of alternative programmatic alternatives to improve student learning. Thus, this concept note outlines the objectives, timeframe, and budget designed by the Government of Liberiafor the impact evaluation of student assessments report cardand teacher training as an instrument for ensuring information for accountability and addressingstudents’ reading deficiencies.
CONTEXT AND RELEVANCE
Without the ability to monitor learning results, communities and schools are unable to assess the efficacy of education interventions, or guide their programs to yield the highest returns (World Bank/IEG 2007). Generating and disseminating this information—about inputs, outputs or outcomes such as test scores—holds substantial promise as a relatively low-cost way of intervening in an education system (see Hoxby 2002, cited in Deon Filmer, 2007). Increasingly, education interventions have focused on the idea that communities are able to monitor and address deficiencies in children’s education if armed with the proper information.[2]. This idea underlies many of the new education interventions: dissemination of school report cards, devolution of decision making to the local level, and school grants, to name a few. The development of simple tools that school and ministry staff can use to diagnose critical problems and monitor student progress could fill an important gap in the cycle of evaluation, intervention and monitoring school improvement. More importantly, assessments measuring results at the critical initial period of the schooling cycle, can be particularly useful.Yet, mostnational and international assessments are paper and pencil tests administered to students in grades four and above (that is, they assume students can read and write). In contrast, EGRA is designed to orally assess the foundation skills for literacy acquisition in grades 1 through 3, including pre-reading skills such as listening comprehension. Test components are derived from recommendations made by an international panel of experts and often (not always) include timed, 1-minute assessments of: letter naming, nonsense and familiar words, and paragraph reading. Additional (un-timed) segments can include: comprehension, relationship to print, dictation, and student and teacher context questionnaires.
Thus, the MoE approach has two main objectives in implementing EGRA PLUS.
- To ensure information for accountability by assessing students’ reading ability, and providing the results to the relevant stakeholders for corrective action (i.e., parents, schools directors, and education officers).
- Improve pedagogy regarding instruction in reading through the introduction of a structured approach for early grades.
Links to Other Activities
Liberia Education Recovery Program
The MoE has laid out its vision of primary education in its Education Recovery Program (LPERP). Since a recognized gap in the education system is the lack of knowledge on student ability – and more specifically the reading skills of Liberian children – the EGRA PLUS comprises a priority area under the LPERP.
APEIE
The Africa Program for Education Impact Evaluation (APEIE) iscurrently working with 12 countries to build in-country capacity to develop and implement rigorous evaluations of policy interventions to improve schooling and learning outcomes. As part of the larger community of practice established under APEIE, Liberiawill benefit from cross-country learning and cross-country harmonization of instruments and methods that will allow benchmarking and comparisons across participating countries. The comparability across countries will stem from a common approach for assessment development; however, timing and successful acquisition of the foundations of reading will differ by language (and the degree to which these languages vary in their orthographic complexity). Despite the challenge of comparing results across countries and languages, basic information regarding students’ reading ability at various grades in different countries will provide important policy information for curriculum development and teacher training.
Primary Research Questions and Outcome Indicators
The main research questions of the impact evaluation concern the average impact of EGRA and teacher training on parental involvement, teacher performance, and learning outcomes.
- Doreports that provide information on student reading ability lead to corrective action by parents and schools?
- Does teacher training improve teachers’ ability to address student deficiencies in literacy that has been identified through classroom assessments?
- Does the combination of EGRA and teacher training improve student achievement in later years?
The main outcomes indicators for this intervention are the following:
Teacher Performance
- absenteeism
- teaching methodology for providing literacy (classroom observations)
Learning Outcomes
- reading fluency
- reading comprehension
- general achievement tests
Parental involvement
- knowledge of student performance
- visits to school
- reading at home
- books at home
METHODOLOGY
Evaluation Design/Identification Strategy
This study will employ a randomized controlled trial.
Timeline
- Baseline assessment: October 2008, Complete
- Teacher training: October 2008, Complete
- Post-treatment assessment: June 2009
- Final post-treatment assessment: June 2010
- Standardized achievement test in 2011
Evaluation Design
Intervention / ControlEGRA w/o information dissemination / EGRA w/ information dissemination / Treatment (2)
EGRA w/Information Dissemination + Teacher Training
Number of schools / 60 / 60 / 60
Number of students in 2nd grade per school / 10 / 10 / 10
Number of students in 3rd grade per school / 10 / 10 / 10
The assessments and training will be conducted by a private institute.
Control: EGRA without information dissemination
Students in the control schools will take the EGRA, but the results will not be provided to the students or the school.
Treatment (1): EGRA with information dissemination
Students in the treatment (1) schools will take the EGRA, and assessment findings will be disseminated to parents, school administrators and community groups in a student report card.
Treatment (2): EGRA with Information Dissemination + Teacher Training
Students in treatment (2) schools will take the EGRA, and assessment findings will be disseminated to parents, school administrators, and community groups in a student report card. In addition, teachers of students taking the EGRA will be trained in specific techniques for teaching literacy.Teacher training will consist of a week-long initial face-to-face teacher training (provided at the beginning of each academic year – October 2008 and September 2009). Three days will be spent familiarizing teachers with EGRA teaching instructions and practicing among themselves. The remaining two days will be spent practicing the newly acquired techniques with students.
Selection of schools
There are four types of schools in Liberia as per the EMIS database: public, self-help/community, religious/mission, and private. As per the advice of the MOE, we used an expanded definition of “public” to include “self-help/community schools.” It was agreed previously that in order to make this a proper experiment, allocation of schools into these three groups would be randomized. It also was agreed that to make the schools representative of all of Liberianchildren(because the unit of interest, ultimately, is the child), selection would be random but proportional to school population (enrollment).
In order to make the intervention cost-effective, and to make its implementation reminiscent of what a scaled-up process would look like, the project team proceeded to select groups of schools that were similar in nature to the natural intervention or supervision area of district officers. Thus, schools were selected in clusters. Schools will be visited and assisted in clusters of four. This is a good compromise between the need to work efficiently and the need for representativeness, and will minimize problems with “design effect”. The project could have worked in clusters of one or two schools, but this would have raised the cost astronomically, and would not simulate what happens in reality, since officers work with groups of schools—that is the nature of supervision. On the other hand, the project could just do two, three, or four clusters of 30, 20, or 15 schools, but this would mean that the first-stage selection (of two or three clusters) could not possibly be representative of the country. A wise compromise is 15 clusters of four schools, with random selection at both stages.
Selection of districts
First, 15 districts were selected in a manner proportional to public school population. The selection tool is an Excel spreadsheet containing data on schools by region, county, and district. A simple sampling program was written in Excel. The =rand() function in Excel is used to random sample. Sampling techniques are used to make the random sample proportional to population.
The project team has simulated the selection over many hundreds of repeated samples.The resulting correlation between the actual share of each district’s population and the resulting proportion of the time each district shows up in a sample is 0.99 (converging to 1 at the limit). In repeated samples, the proportion of times in which the largest three districts show up is 11%, if they are allowed to sometimes show up more than once in a sample, which is exactly proportional to the population.
Selection of schools within clusters
Once the districts were selected, clusters of four schools each were selected. The EMIS database has data on the X-Y geographical coordinates of the schools. The procedure used is as follows.
For the selected districts, the sampling team created a distance matrix of all schools i to all other schools j, simply calculating the length of the hypotenuse between x(i), y(i) and x(j), y(j). This is not perfect, as it does not take into consideration infrastructure, but it is a good first approximation; it can be overcome with real information about closeness. There may be a need to substitute out some schools anyway, because of poorly entered X-Y coordinates, or other reasons, such as there being an unaffordable river between schools that in terms of X-Y coordinates appear close to each other. Staff then selected one school at random in the district, which can be considered the centroid of that district’s cluster.
There remained two choices or options. A first choice was to find the three schools closest to the centroid. The problem is that it creates a bias towards higher-population density areas. If the selection of the centroid schools is truly random, some of them will be in low-density areas, and the nearest schools will actually be quite far, precisely because they are in low-density areas. This option does have the advantage of minimizing the cost of intervention, and also more closely mimicking the way an actual supervisor would work, by going from school to school, taking the closest ones in sequence. In that sense, it has all the “realism” and representativeness one needs.
However, the option does suffer from one problem, which is some design effect. Clustering on the closest schools, after picking one at random, does minimize the range of schools one is dealing with, to some degree. A clustering process, relative to a pure random sample, will restrict the range of observation somewhat, because schools within clusters will tend to differ from each other less than schools selected totally at random. In the ideal world, clusters should all be “mini populations.” If that were the case, then clustering would be extremely efficient. But we know that in the real world, clustering censors the observed total variance relative to the real variance, because units will tend to be similar to each other. Thus, this is somewhat of a disadvantage. But because EGRA Plus: Liberia is a very labor-intensive intervention, it is important to economize. And because it is an intervention that one hopes is replicable, it is important to work in a way that mimics the “real” work that would be done in a project that is taken to large scale. This is why we have clusters of four schools to begin with. If, having clustered at the district level, one picks a single school in the cluster, and then finds the closest three schools, one is then also restricting the range of observation. The other extreme is to select the four schools completely at random within the cluster. But some of the districts in Liberia are big, so this creates an artificial cluster, unlike anything that anyone in real life would work. The average district has 90 schools, which is way beyond anything any one agent could truly help. In real life, any improvement process most likely would require a span smaller than 90 schools.
The cost saving involved in clustering within districts by picking a school at random, and then the three closest, seems worth the possible sacrifice in variability between schools. Again, this will not create an urban bias, or a bias towards areas with higher population density. It just makes the sampling a little less efficient in a statistical sense, but a great deal more efficient in a cost and substantive sense.
In conclusion, taking the above described two steps allows the sampling to be proportional to population, and groups the schools into reasonably natural clusters that are more or less similar to the administrative or jurisdictional units that would occur in reality, but also is random at each step.
Power calculations
The minimum effect size the experiment would like to detect is 0.3 to 0.5. The average intergrade difference, identified thorough similar tests in other countries, is approximately 10 words per minute. The current Grade 2 average is 18 and Grade 3 average is 28. We are hoping at the very least to increase the kids by one grade level. The Std Dev is about 20 to 30. In the EGRA studies done so far the ICC tends to be around 0.32. We took this into account in determining the number of schools and the number of students per school.
Description of instruments
To develop the achievement tests, the team reviewed existing assessment instruments, including the Dynamic Indicators of Basic Early Literacy and Illinois Snapshot of Early Literacy (DIBELS/ISEL), the Peabody Picture Vocabulary Test, and instruments applied in Spain, Peru, Kenya, Mongolia, and India. Based on this and other expert consultations, a complete Early Grade Reading Assessment was developed for application in English. The resulting instrument contains eight tasks, or subtests. Liberia assessment tool was based on two foundations: (1) a well-vetted default instrument that has received input from leading international reading experts at various workshops convened by USAID, the World Bank, and RTI; and (2) input from Liberian experts at a workshop carried out in June 2008 (funded by the WB). The Liberian EGRA assessment, in the end, has components on orientation to print, phonological awareness,letter-naming fluency, familiar-word fluency, unfamiliar-word fluency, fluency in reading connected text, comprehension based on read text, and a listening comprehension test.The internal cohesion and reliability of this tool was checked using various statistical procedures, and the reliability was found to be good, certainly in the range of other similar assessments used in both developed and developing countries. For example, the alpha coefficient of reliability is above 0.8, which is a good benchmark (0.7 being considered an absolute minimum).
The following three instruments will be administered for baseline assessment: Student instrument, Teacher instrument, Principal instrument. The same instruments—i.e., the same format—will be used in the midterm assessment in June 2009. New questions asking respondents about the newly introduced techniques will be included in both the principal and teacher instruments. And in the student instrument, the changes made will pertain to the student instrument and specific tasks within it. Words will be reshuffled and new ones will be introduced; letters will be reshuffled; new passages will be written; etc. While doing so, the project team will take great care to ensure that the instrument used in the midterm assessment is of equal difficulty to the one used in the baseline assessment in order to ensure as much accuracy as possible in comparing data collected on two different occasions.