Design and implementation of the Impact evaluation of ISKUR training programs
June 15, 2011
This note describes the design and implementation of the impact evaluation of ISKUR’s vocational training programs.The study is a joint effort by ISKUR and the World Bank aimed at identifying ways to improve vocational training programs in the context of the rapid expansion of these programs in Turkey. In particular the study is designed to answer the following questions: (1) What is the average impact ofISKUR training on the labor market (as measured by the likelihood and quality of employment)? (2) Which trainees benefit the most from training (in terms of gender, age, level of education and skills, work experience etc.)? (3) What are mechanisms/processes through which training affects labor market outcomes (e.g. improved skills, reduced search costs etc.) (4)What provider characteristics make training most effective?
The impact evaluation study started in the spring of 2010 and is expected to be completed in the spring of 2012. The first phase has been completed, including (i) the selection of provinces (23), training courses (130) and participants (5,700) for the evaluation; and (ii) the collection of baseline data from about 5,300 evaluation participants before the start of the training courses. Data was collected between September 2010 and January 2011 as training courses were rolled out. Data on training providers of the trainings selected for the evaluation is being collected in June 2011. The final evaluation of ISKUR vocational training programs will be mainly based on the follow up survey of evaluation participants in early 2012.
The evaluation relies on an experimental approach based on the excess demand for ISKUR’s vocational training programs. Training providers interview all eligible and interested candidates. Participation in the training programs is then randomly awarded among the best candidates into treatment (i.e. those receiving training) and control (those not receiving the training) groups. Provided the randomization is successful (i.e. treatment and control groups are statistically equivalent at the outset), the evaluation strategy is simply to compare the labor market outcomes of treatment and control groups.
The note describes in detail the experimental approach used and the most important steps in the implementation of the impact evaluation.Among others, it discusses the implementation of the experimental design, the data collection activities and the main changes implemented in the ISKUR monitoring and information system. The ultimate goal of this note is to provide hands-on information on how to conduct impact evaluations of similar programs in Turkey and elsewhere, particularly in middle income countries (MIC), as this is the first rigorous impact evaluation of a large-scale publicly provided training program in MIC, and many in MIC face a similar excess demand for vocational training. This note accompanies another note analyzing the profile, job-search behavior and expectations of ISKUR vocational trainees (based on the baseline survey).
The rest of this note is organized as follows.Section 2 describes the evaluation design. Section 3 describes the data collection needs, main counterparts involved and the different steps in the evaluation. In particular, it describes the sampling and the course selection. Section 4 presents a detailed description of the sample including, its geographical coverage, the power calculations and regional targets for the participants and the set of vocational courses. It also discusses when and how the interviews take place. Section 5 discusses how the treatment and control groups were selected and section 6 discusses some of the most important changes in the ISKUR’s Monitoring and Information System.
1.Evaluation Methodology: Identifying the Treatment and Control Groups
The traditional assessment of the effectiveness of the ISKUR trainings, which is based on whether the trainees are placed into jobs,is likely to lead to biased estimates. Most ISKUR provincial offices still use the placement rates among trainees as a good proxy for the effectiveness of the trainings provided. However, this is often difficult to implement due to the lack of data. ISKUR’s Management Information System (MIS) does not keep up to up-to-date information on all the trainees nor does it collect information on labor market outcomes after trainings. Furthermore,to isolate the effects of the trainings from the overall economic conditions in the labor market, a control group is needed to establish a counterfactual (i.e. what would have happened in the absence of training). It is important to carefully construct this control group. ISKUR trainees are likely not identical to the average unemployed in Turkey. They meet very specific skills and experience requirementsthat are usually set ex-anteby the training providers. All trainees also needto go through an interview with the training providers and local ISKUR staff before being selected for training.[1]
This evaluation study is based on a sample of the best eligible applicants to the ISKUR trainings.The sample frame includes a wide set of provinces where there is a large excess demand for the trainings. In particular, the sampling frame, includes all the registered unemployed individuals who are eligible to take up training (according to ISKUR basic criteria such as being registered to ISKUR) and are interested in taking up the training, i.e., they have they applied for them. Latter, the training provider’s interview this group of eligible and interested individuals and select the set of best candidates.
The evaluation study is based on an experimental design, comparing the labor market outcomes of treatment and control individuals before and after the trainings. The participation into the trainings is randomly awarded among the sample of the best applicants. The effects of the trainings in improving the labor market outcomes are quantified by comparing the performance in the labor market, before and after the trainings take place, for the two groups: treatment and control. Treatment and control groups together account for the evaluation sample.[2]The random assignment to treatment and control groups ensures that both groups are as similar as possible (both in socio-economic background and in motivation).This is also the fairest way possible to allocate a small number of slots among a larger list of applicants. To ensure that the effects of the trainings will be identified separately for youth and women, the randomization is stratified by these two characteristics.
2.Main “Steps” and Counterparts in the Evaluation
The evaluation involves a close collaboration among the evaluation team, World Bank Ankara office, ISKUR central and provincial offices, the survey firm, and to smaller extent, some municipalities.The field coordinator and researchers are part of the evaluation team. The baseline and follow up surveys are implemented by an external agency (private firm). The firm is responsible for hiring and training of enumerators. Some guidelines are provided to the agency and it presents a close monitoring of data quality is ensured with the supervision of the field coordinator. The success of the evaluation lies in the strong coordination across all these counterparts.
Figure 1 summarizes the main steps in the implementation of the evaluation. These include: the design of the sample and the province and course selection; the evaluation pilot and field coordination; the main pre-data collection activities; the individual randomization, the baseline survey and the delivery of the training programs; and, finally, the data analysis and planning of the follow up activities. Figure also shows that, throughout the entire implementation process, there is a continuous improvement of the ISKUR’s monitoring and information system. The figure reviews selected activities in each of these steps.
1
Figure 1: Process of the Evaluation
Note: Some of the activities (for e.g. maintaining treatment and control groups) cover the entire duration of the evaluation and some are conducted simultaneously (for e.g. preparing baseline survey and improvement of ISKUR's IT System).
1
Box 1: A Workshop on Impact Evaluation: Building capacity and support and working together on the design of the impact evaluation
3.The Sample: Provinces, Courses and Power Calculations
3.1.The provinces
The evaluation sample covers 23 provinces, which together are representative of the labor market conditions all over Turkey. Figure 2 reports the final set of 23 provinces selected to be included in the sample. Prior to the start of the study, ISKUR identified the set of provinces with at least two oversubscribed courses during the period 2009/2010.[3] Oversubscription in the training courses was defined as having at least twice the number of applicants than training slots. The sample was stratified by the unemployment rate. A sample of 20 provinces was randomly selected, of which 10 provinces had “high” unemployment rates and ten provinces had a “low” rate.[4] Selection is proportional to percentage of individuals trained in 2009. Three additional provinces (Antalya, Gaziantep, Diyarbakir) were proposed by ISKUR because of their importance in representing varying labor market conditions across Turkey.
Figure 2: Provinces in the Evaluation Sample
3.2.Power Calculations and Regional Targets for the Participants
The power calculations determined the optimal sample size to be able to detect a difference in job placement rates across treatment and control group. The outcome indicator used in the power calculations was the employment rate. Calculations also took into account observed past attrition from the trainings (or the treatment). This was based on the analysis of previous drop-out rates from training courses. Possible attrition rates also considered the possiblenon-response rates to the face-to face survey.[5]Thesample size estimates accounted for an imperfect take-up of the training programs a power of 90% and an “effect size” of 6%-7%.[6]
But the available information in the MIS may not lead to realistic assumptions on the share number of applicants that may drop-out of the trainings. The regional ISKUR offices and course providers use the information from the MIS to reach out to the registered unemployed and invite them to attend the training programs. When a candidate rejects to participate in the trainings or drops-out of the trainings, some of the individuals in a waitlist are invited to participate.[7]Usually, the MIS records are updated based on the paper records that are obtained after these slots are filled with the waitlisted candidates. Therefore, the “real”share of drop-outs is not easily retrieved from the ISKUR records.
And, indeed, the pilot study showed that the initial assumption on the drop-out rates of the trainees from the sample was heavily underestimated. The Ankara pilot and the analysis of retrospective data, led to more realistic assumptions for (1) the share of trainees dropping out of treatment (also known as the uptake of the treatment group) and for (2) the share of control individuals getting into the training programs (also known as the uptake of control group). More realistic assumptions for the program take up and for the contamination of the control until the end of the evaluation period led to revisions in sample size. Table 1 reports the optimal sample size under different assumptions for the effect size; take up of treatment and control, base job placement rate.
Table 1: Optimal Sample Size[8]
Effect size / Treatment take-up / Control take-up / Base Job Placement Rate / Optimal Sample size6% / 85% / 15% / 20% / 5390
6% / 85% / 20% / 20% / 6268
7% / 80% / 15% / 20% / 4645
7% / 80% / 20% / 20% / 5468
7% / 85% / 15% / 30% / 4973
7% / 80% / 15% / 30% / 5745
For reasonable assumptions, the optimal sample size is close to 5,700 individuals; the geographical composition replicates the importance of each province in the national provision of the trainings. Considering a80%-85% uptake of treatment and 15%-20% uptake of control, sample attrition of 20% and a base job placement rate of 20%-30% the optimal sample size is approximately 5,700 individuals. 50% of the individuals are in the treatment and 50% in the control group. The sample is disaggregated in a representative way across the country so that the number of evaluation participants in each province depends on the share of vocational trainees in that province in 2009. There is a minimum number of traineesat province level of 100 to benefit from the economies of scale in the implementation and data collection. Table 2 reports thetarget shares of each province in the overall sample and the number of trainees in 2009. Figure3 reports the actual number of individuals successfully interviewed at the baseline survey.
Table 2: Sample size byProvinces
No / Province / Number of trainees in 2009 / Targets for the evaluation participants1 / ANKARA / 3,344 / 308
2 / ANTALYA / 1,771 / 163
3 / BAYBURT / 249 / 100
4 / DENIZLI / 1,420 / 131
5 / DIYARBAKIR / 450 / 100
6 / DUZCE / 702 / 100
7 / ELAZIG / 1,204 / 111
8 / ERZURUM / 836 / 100
9 / ESKISEHIR / 2,450 / 226
10 / GAZIANTEP / 751 / 100
11 / HATAY / 2,889 / 266
12 / ISPARTA / 1,152 / 106
13 / ISTANBUL / 19,208 / 1,771
14 / IZMIR / 1,971 / 182
15 / KAYSERI / 3,067 / 283
16 / KIRIKKALE / 1,329 / 123
17 / KOCAELI / 7,519 / 693
18 / MANISA / 3,547 / 327
19 / MUS / 1,938 / 179
20 / SAKARYA / 1,208 / 111
21 / TEKIRDAG / 1,709 / 158
22 / TRABZON / 1,628 / 150
23 / USAK / 1,484 / 137
TOTAL / 61,826 / 5,925
Source: Author’s calculations.
Figure 3: Provincial Breakdown of the Evaluation Sample at Baseline
3.3.The selection of the vocational courses
The evaluation team worked closely with the regional ISKUR offices to determine the final set of courses to be included in the evaluation. The sample was finalized by the ISKUR provincial directorates in August 2010. The number of courses used in the evaluation sample is 130. The course capacities range from 12 to 100 trainees. However, most of the courses in our sample have a capacity between 20 and 25 trainees.Several criteria dictate the final selection of courses:
- Type of vocation: Even though ISKUR provides a wide set of training programs, including general vocational training programs, job guaranteed internships and entrepreneurship courses - the evaluation focuses only on general vocational training programs.[9]
- Oversubscription: The evaluation focusesonly on the training courses which are likely to beoversubscribed. This is the only way to guarantee the random selection into treatment and control among individuals in the evaluation sample. Therefore, only the most popular and more demanded courses are selected.
- Course capacity: The evaluation prioritizes larger courses to simplify the implementation of the randomization and the data collection activities.
- Course providers: The evaluation prioritizes diversity in the training providers for the same vocation. This will enable the analysis of the effectiveness of the trainings for providers of the same courseswith different qualities. A combination of public and private service providers is also explicitly targeted.[10]
- Timing of the courses: Evaluation sample covers courses that start between October and December 2010. Preference was given to groups of courses starting at approximately the same time but terminating before the end of March 2011.[11] This geographical proximity promoted some savings in the baseline survey which is conducted face-to-face. The latter is imposed to facilitate the collection of follow-up data at approximately the same time after the trainings have been completed.
The evaluation team worked closely with all provincial offices to plan and disseminate the trainings well in advance. In particular, an official letter explaining in detail the criteria was sent by the ISKUR general directorate to all the participating provincial offices. There, the importance of oversubscription is explained in detail. Provincial offices are also encouraged and supported to start to plan and announce courses as soon as possible after these are selected. Moreover, because of the risks of low over subscription, the regional offices were incentivized to select a set of reserve courses. Each province provide at least one reserve course whose interview is scheduled to be later than the evaluation courses’ interviews.[12]Some provinces anticipated more challenges in achieving oversubscription of the courses than others. For example, the provinces of Istanbul, Kayseri, Denizli and Gaziantep anticipated having more oversubscription problems than other provinces. The ISKUR headquarters contacted periodically the heads of provincial offices and supported them to advertise more strongly the courses to avoid the risk of not oversubscribing.
The final set of courses includedthe evaluation is quite similar to the most popular vocations in previous years. Figure 4 refers to the final composition of trainings in the sample. The ISKUR provincial offices are requested to submit a set of proposed courses during the summer 2009. After receiving the initial proposals from all 23 provinces, the provincial course capacities are contrasted with the provincial targets (reported in Table 2 above). Those provinces that did not meet the initial target numbers and/or did not present any reserve courses, are contacted. Table A3, Table A4 and Table A5 in the annex show the trainings which have the largest number of trainees in 2006, 2007 and 2008, respectively. In Table A6 in the annex, the trainings with the largest number of trainees between Jan-Nov 2009 are listed.
Figure 4: The Set of Vocations included in the Evaluation Sample
3.4.Scheduling the interviews
The interviews for the trainings were scheduled so that a member of the evaluation team could be physically present to monitor the quality of the data. Provinces were grouped according to their geographical proximity and the interviews for each group are set in a nearby date. Within each group, the interviews were scheduled to be as close as possible. In practice, however, this was difficult to implement. The team tried to be present in at least the first interviews of each group to identify and solve early on the main problems. All courses were scheduled to have a minimum of 12 days between the interview and the start of the courses. Although a larger period is better for the data collection purposes, it may raise attrition from the treatment. For example, the trainees may be discouraged from not hearing the result and may apply another course or be less motivated. As a result, the optimum number of days between the interviews dates and the announcements (trainee list) are 10 to 15 days. The trainee lists must be announced 2 weekdays before the starting dates of the courses.Special attention was given to scheduling of interviews to promote the oversubscription of the courses. Below we highlight some important steps taken by the team regarding: