When Can Experimental Evidence Mislead?

A Re-Assessment of Canada’s Self Sufficiency Project*

Chris Riddell

ILR School

Cornell University

and

W. Craig Riddell

Vancouver School of Economics

University of British Columbia

September 2015

Preliminary and Incomplete

* We thank John Abowd, David Card, David Green, Josh Gottlieb, Hilary Hoynes, Jesse Rothstein and Chris Walters for helpful comments. We also benefitted from comments of seminar participants at Berkeley, Santa Barbara, UBC, Brandeis, Minnesota, IZA/OECD/World Bank Social Safety Nets workshop in Paris, CLRSN workshop in Toronto, the IZA/IFAU Labour Market Policy Evaluation workshop in Uppsala, and the J-PAL Field Experiments and Social Policy conference in Paris.

  1. Introduction

The increased emphasis on obtaining credible evidence has resulted in much greater use of randomized experiments in economics. Random assignment ensures that the treatment and control groups are statistically indistinguishable at the baseline. Thus the behavior of the control group provides an unbiased estimate of the counterfactual behavior of the treatment group and any difference in outcomes between treatments and controls can be attributed to the causal effects of the intervention. A further advantage is that experimental impact estimates are simple – often differences in mean outcomes between treatments and controls – and easily understood by experts and non-experts alike.

However, social experiments have limitations, some of which may affect the internal validity of the experimental evidence (Heckman and Smith, 1995; Heckman, Lalonde and Smith, 1999). Non-random attrition can result in treatment and control groups that differ, even though the two groups had very similar characteristics at the baseline. Those assigned to the control group may obtain services similar to those provided to the treatment group, resulting in “substitution bias” that may result in under-estimating the impact of the intervention (Heckman, Hohmann, Khoo and Smith, 2002).

In this paper we illustrate another potential problem with social experiments – sometimes referred to as ‘contamination’ – that influences the interpretation ofexperimental results. Randomization ensures that the treatment and control groups are statistically equivalent at the baseline. However, once treatment begins the characteristics of the experimental groups will generally diverge. For example, in a randomized drug trial the health of the treatment group will improve relative to that of the control group if the treatment is effective. Subsequent events such as changes to the economic or policy environment may exert different impacts on the two groups. In such circumstances the behavior of the control group may no longer provide an appropriate counterfactual for the altered treatment group. Different experimental estimates could have been obtained if the social, economic or policy environment had evolved differently – raising questions about the external validity of the experiment. Because of the credibility associated with random assignment, there is a risk that experimental evidence may be interpreted too broadly or literally, rather than being viewed as being conditional on the evolution of events during and after the experiment.

This paper argues that the conclusions that have been reached on the basis of a well-known welfare-to-work experiment – the Self-Sufficiency Project – need to be re-assessed because of policy changes that took place during the SSP demonstration. The SSP was carried out in the 1990s in two Canadian provinces – British Columbia and New Brunswick. The experimental sample consisted of single parents who were long-term welfare recipients. The treatment was a generous but time-limited earnings supplement provided to treatment group members who left welfare and took up full-time work. One objective of the SSP was to rigorously test whether “making work pay” would lead to significant reductions in welfare use and increases in labor force participation among this population. A second objective was to investigate whether temporary financial incentives can lead to lasting reductions in welfare use. The results of the SSP Demonstration were striking. The financial incentive resulted in large impacts during the supplement period. However, treatment-control differences in employment and welfare receipt gradually faded and not long after the supplement period ended there were no significant treatment-control differences in employment and welfare receipt. The absence of lasting impacts of a generous but temporary earnings supplement reduced enthusiasm for this approach to welfare reform.

However, important developments took place in both provinces – events that call for a re-assessment of the SSP experimental findings. During the SSP Demonstration, BC introduced a major ‘work first’ welfare reform that made continuing receipt more difficult and created financial incentives to work. In NB another welfare-to-work program – New Brunswick Works -- operated at the same time as SSP, and members of the SSP experimental sample were eligible for and did participate in this alternative program. Participation in NB Works implied leaving welfare, and was more common among the SSP control group than among the treatment group. We show that the developments in both provinces raise questions about the interpretation of the experimental estimates. For BC we estimate the impacts that the SSP treatment would have had in a stable policy environment. In the case of NB we adjust the experimental estimates for participation in NB Works. In both provinces this re-assessment leads to significant changes in the lessons previously reached on the basis of the SSP demonstration.

A key feature of the SSP was that intake and initial random assignment was staggered over time. The BC welfare reforms therefore affected intake cohorts at different stages of their SSP treatment. We use this variation to estimate the impacts of the policy changes and to simulate what would have occurred in the absence of the reforms. Similarly, SSP intake cohorts in NB were affected in different ways by the NB Works program, and we use this variation to estimate the impact of the financial incentive in the absence of NB Works.

The paper makes three contributions. First, we show that time-limited earnings supplements that make work pay can have lasting effects on welfare use and labor force participation, a result of interest for welfare policy and more generally for earnings supplements for low income workers. Second, we provide a reminder that experimental findings need to be interpreted with care. Events that occur during or after the experiment may affect the validity of the experimental estimates. Finally, for the design of social experiments, our study illustrates the value of staggering entry over time. Doing so may allow identification of treatment effects in the event of unanticipated changes in the economic or policy environment during or after the experiment.

The paper is organized as follows. The next section provides examples in which the validity of a randomized trial is threatened by events that take place after treatment has begun. Section three describesthe SSP experiment and its key findings. Section four summarizes the BC welfare reforms enacted during the SSP, and estimates the impacts that the SSP earnings supplement would have had in the absence of these reforms. The fifth section deals with the NB Works program and its effects on the experimental estimates. The final section concludes.

  1. Contamination in Social Experiments

Consider a randomized drug trial to test a medication intended to lower blood pressure. If the drug is effective, once treatment is underway the control group will have a larger fraction of members with elevated blood pressure and the treatment group will contain a greater proportion of individuals with normal levels. Now nutritional scientists discover a new ‘super food’ that reduces blood pressure, and is particularly effective in lowering blood pressure from the ‘high’ range to the ‘normal’ range. Both experimental groups have access to this food. Even if both groups increase consumption of this superfood to the same extent, the impact on blood pressure will be greater for the control group than the treatment group. In most circumstances the behavior of the control group will no longer provide an appropriate counterfactual for the impact of the medication being studied. The experimental estimates reflect a combination of two treatments: the blood pressure medication and the discovery of the super food and its properties. The experimental groups were randomly assigned for the blood pressure medication, but are not ‘as good as randomly assigned’ for the second treatment. The experimental estimates are unlikely to provide an unbiased estimate of the impact of the blood pressure medication alone.

As a second example, consider a randomized trial of a chemotherapy treatment carried out in a hospital or other institutional setting. We assume that the chemotherapy is effective, so in the absence of contamination the survival rate of the treatment group would exceed that of the controls. However, during the chemotherapy treatment an outbreak of C-difficile occurs in the institution. An unfortunate side effect of chemotherapy is a compromised immune system; as a consequence more treatment group members die from C-difficile than do control group members. Because of this contamination the experimental estimates will not yield an unbiased estimate of the impacts on survival of the chemotherapy treatment.

Figure 1 illustrates the potential for contamination in the context of welfare-to-work initiatives evaluated with an experimental design. The factors that influence the probability of leaving welfare are combined into an index referred to as “job readiness.” Those with higher values of job readiness are more likely to exit welfare. At the baseline t0, when random assignment takes place, all members of the experimental sample are receiving welfare. Half of the experimental sample is randomly assigned to the treatment group, which receives an intervention that provides an incentive to leave welfare and enter the workforce. The remainder is randomly assigned to the control group. Treatment status is independent of the observed and unobserved characteristics of both groups at time t0. Thus the distributions of job readiness in the treatment and control groups are identical (see the top panel in Figure 1).

For the purposes of exposition we assume that the treatment is effective, so that the welfare exit rate of the treatment group exceeds that of the control group in the post-baseline period. Subsequently, at time t* > t0, the existing welfare policy is changed in a way that encourages recipients to exit welfare and enter the workforce. This change can be thought of as a second treatment, the first being the incentive offered to the treatment group and the second being the new policy that applies to members of both the treatment and control groups who remain on welfare at time t* (and to recipients not in the experimental sample). However, while the initial incentive treatment was independent of the characteristics of the treatment and control groups, the second policy change “treatment” is not independent of the characteristics of the two groups. In particular, at time t*, compared to the control group a smaller proportion of the treatment group remains on welfare and those in the treatment group who still receive welfare have a lower average propensity to exit. This is illustrated in the bottom panel in Figure 1, where the area to the left of the vertical line indicates the fraction of each group that remains on welfare at t*. At time t* the control group has more job ready welfare recipients than does the treatment group. Thus the policy change introduced at t* is likely to have a larger impact on the exit rate of the control group than on that of the treatment group. In these circumstances the behavior of the control group after t* may no longer provide an appropriate counterfactual.[1]

In the next two sections we present evidence that strongly suggests that this type of contamination occurred in the SSP demonstration.

  1. Welfare Reform and the Self-Sufficiency Project

A frequent criticism of welfare programs is that they provide little incentive for recipients to seek employment. Under many such programs, recipients who enter the workforce are required to forego benefit payments by the amount of their labor market earnings -- implying that earnings are taxed at a rate of 100 percent. The implicit tax rate may even exceed 100% if, for example, those leaving welfare are no longer eligible for medical benefits or subsidized housing.

Several reforms have been proposed to deal with this incentive problem. One strategy is to raise the market wage of recipients through training and employment programs, thus making work more attractive relative to welfare. Another approach improves work incentives by reducing the implicit tax rate on market earnings. Examples of this approach include the negative income tax, earnings disregards, and income supplementation policies such as the Working Income Tax Benefit in Canada and the Earned Income Tax Credit in the U.S.[2] A third strategy attempts to alter the preferences of recipients, either by raising the stigma associated with welfare receipt or enhancing the perceived value of work.

Some policies combine elements of two or more of these approaches. An interesting example is a temporary earnings supplement for welfare recipients who enter the workforce. During the period the supplement is in place, this policy has the work incentive features of many income supplementation schemes. Labor market earnings are implicitly taxed at a rate less than 100% and program participants receive income (market earnings plus the supplement) that exceeds welfare benefits. By encouraging recipients to leave welfare and enter the workforce for at least the period of the supplement, former welfare recipients may gain work experience and enhance their skills, thus raising their market earnings. The experience of working for an extended period of time may also alter individual's preferences between welfare and work. As a result of enhanced earnings capacity and/or altered preferences toward work, a temporary financial incentive may have lasting effects on welfare receipt and labor force participation.

During the 1990s the Government of Canada funded an innovative demonstration project, the SSP, designed to provide evidence on the effects of a financial incentive on long-term welfare recipients.[3] The SSP demonstration was carried out in British Columbia and New Brunswick, and focused on single parents with dependent children who had been on income assistance (IA)[4] for at least 12 of the previous 13 months.[5] Among those who agreed to participate, one-half were randomly assigned to the treatment or program group that was eligible for the earnings supplement; the rest were assigned to the control group. Random assignment took place between February 1992 and November 1995. Those in the treatment group were offered a financial incentive to leave welfare and take up full-time employment.[6] The financial incentive was generous, approximately doubling income from work for the typical participant and providing total income substantially higher than welfare benefits.

The SSP demonstration incorporated two important time limits. Members of the program group were given up to 12 months following random assignment to obtain full-time employment. Once they had qualified, participants could continue to receive the supplement for three years providing they maintained full-time employment. Those in the control group could remain on welfare or enter the workforce. Card and Hyslop (2005) show that the two SSP time limits generated an “establishment” incentive to find a full-time job and exit welfare within 12 months after random assignment, and an “entitlement” incentive to choose work over welfare once eligibility was established.

A key objective of the SSP Demonstration was to determine whether financial incentives lead to reductions in welfare use among long-term IA recipients, and whether the magnitudes of program impacts on IA use and employment are sufficient to support this approach to welfare reform. Another key objective was to test whether a temporary financial incentive could have lasting effects on welfare receipt and work activity. The potential receipt of a substantial earnings supplement for up to three years was intended to provide such a test.

The experimental findings are summarized in the SSP Final Report (Michalopoulos et. al., 2002). Because of staggered entry, experimental impacts are typically reported in “SSP time” or time since baseline; we show the behavior of income assistance rates on this basis in the Appendix. More than one-third of the treatment group obtained full-time employment and qualified for the earnings supplement. During the eligibility period, the treatment group experienced substantial gains in employment and earnings and reduced welfare use relative to the control group. The largest impacts were observed during the first 12-15 months following random assignment. After this time the differences in outcomes between the treatment and control groups gradually narrowed. By the end of the 3-year period of supplement eligibility treatment-control differences in employment, earnings and welfare receipt were small. In particular, by the 54-month point – by which time the three-year supplement period had ended for all eligible participants -- there was no difference in full-time employment rates, part-time employment rates and average earnings between the two experimental groups (Michalopoulos et. al., 2002, chapter 3). Similarly, treatment-control differences in income assistance receipt had faded to zero by month 69 (Card and Hyslop, 2005).