Reading: Wooldridge, Chapter 6.3

Stat 521: Notes 10.

I. Difference-in-difference design

A data structure that is useful for a variety of purposes, including policy analysis is pooled cross sections over time. The idea is that during each year a new random sample is taken from the relevant population.

Natural experiments involving pooled cross sections over time can be used for policy analysis.

In the difference-in-difference design, we have two time periods, say year 1 and year 2. There are also two groups, which we will call a control group and a treatment group. In year 1, both groups did not have the treatment. In year 2, only the treatment group receives the treatment. In a natural experiment, people (or firms, or cities and so on) find themselves in the treatment group essentially by accident.

Examples:

1. David Card (1990). Mariel boatlift caused increases in low-educated labor supply in Miami.

Card compares labor market outcomes in Miami before and after the boatlift with labor market outcomes in comparable cities before and after the boatlift.

Groups: Individuals in Miami vs. individuals in other cities.

Treatment: Inflow of labor.

2. Eissa and Liebman (1996).

Eissa and Liebam evaluate a tax reform. Only some groups are

affected by tax change. They compare changes in outcomes

for group affected by tax reform with changes in outcomes for

groups not affected by reform.

Groups: individual affected by reform (defined partly by spousal

income)

Treatment: change in tax rate

3. Jin and Leslie (2001).

Jin and Leslie evaluate effect of information disclosure law on

restaurant profitability and sales: restaurants in LA county

need to post hygiene score cards in the window.

Groups: restaurants in LA County versus restaurants in neighbouringcounties.

Treatment: information disclosure requirement

4. Meyer, Viscusi and Durbin (1995).

MVD evaluate effect of increase in disability payments. This

was affected through change in maximum disability payments.

This applies only to workers with high earnings (who would

otherwise hit the maximum), not to low earners.

Groups: high and low earners

Treatment: change in disab. payment

To formalize the discussion, call A the control group and B the treatment group; the dummy variable dB is 1 for those in the treatment group and 0 otherwise. Letting d2 denote a dummy variable for the second (post-policy change) time period, the simplest equation for analyzing the impact of the policy change is:

(1.1)

The period dummy d2 captures aggregate factors that affect y over time in the same way for both groups. The presence of dB by itself captures possible differences between the treatment and control groups before the policy change occurs. The coefficient of interest, , multiplies the interaction (which is simply a dummy variable equal to unity for those observations in the treatment group in the second year).

The OLS estimator has a very interesting interpretation. Let

denote the sample average of y for the control group in the first period and let denote the sample average for the for the control group in the second period. Let denote the corresponding sample averages for the treatment group. Then can be expressed as

This estimator has been labeled the difference-in-differences (DID) estimator.

To see how effective is for estimating policy effects, we can compare it with some alternative estimators. One possibility is to ignore the control group completely and use the change in the mean over time for the treatment group to measure the policy effect. The problem with this estimator is that the mean response can change over time for reasons unrelated to the policy change. Another possibility is to ignore the first time period and compute the difference in means for the treatment and control groups in the second time period . The problem with this pure cross-section approach is that there might be systematic, unmeasured differences in the treatment and control groups that have nothing to do with the treatment; attributing the difference in averages to a particular policy might be misleading.

By comparing the time changes in the means for the treatment and control groups, both group-specific and time-specific effects are allowed for. The main threat to the difference-in-difference design producing an unbiased estimate of the policy’s effect is that there is an interaction due to something other than the policy between the treatment group’s outcomes and the second time period outcomes. For example, changes in other state laws or macroeconomic conditions are not likely to always influence all groups in the same way. A recession may have a disproportionate effect on income group compared to another or in one state than another. A situation favorable to the difference in difference design is that the control group is very similar to the treatment group so that interactions are less likely.

In most applications, additional covariates appear in (1.1); for example, characteristics of unemployed people or housing characteristics. These account for the possibility that the random samples within a group have systematically different characteristics in the two time periods. The OLS estimator of no longer has the simple differences-in-differences representation but its interpretation is essentially unchanged.

Example: Meyer, Viscusi and Durbin (1995) (hereafter MVD) study the length of time (in weeks) that an injured worker receives workers’ compensation. On July 15, 1980, Kentucky raised the cap on weekly earnings that were covered by workers’ compensation. An increase in the cap has no effect on the benefit for low-income workers, but it makes it less costly for a high-income worker to stay on workers’ comp. Therefore, the control group is low-income workers and the treatment group is high-income workers; high income workers are defined as those for whom the pre-policy cap on benefits is binding. Using random samples both before and after the policy change, MVD are able to test whether more generous workers’ compensation causes people to stay out of work longer (everything else fixed). The data is in injury.csv. MVD start with a difference-in-difference analysis using log(duration) as the outcome variable.

mvddata=read.table("injury.csv",sep=",",header=TRUE);

attach(mvddata);

reg1=lm(ldurat~afchnge+highearn+afhigh,subset=(ky==1));summary(reg1)

Call:

lm(formula = ldurat ~ afchnge + highearn + afhigh, subset = (ky ==

1))

Residuals:

Min 1Q Median 3Q Max

-2.966647 -0.887205 0.004200 0.812637 4.078391

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1.125615 0.030737 36.621 < 2e-16 ***

afchnge 0.007657 0.044717 0.171 0.86404

highearn 0.256479 0.047446 5.406 6.72e-08 ***

afhigh 0.190601 0.068509 2.782 0.00542 **

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.269 on 5622 degrees of freedom

Multiple R-squared: 0.02066, Adjusted R-squared: 0.02014

F-statistic: 39.54 on 3 and 5622 DF, p-value: < 2.2e-16

Therefore, =0.191 (t=2.78) which implies that the average duration on workers’ compensation increased by an estimated 19 percent due to the higher earnings cap. The coefficient on afchnge is small and statistically insignificant: as is expected, the increase in the earnings cap has no effect on duration for low-earnings workers. The coefficient on highearn shows that, even in the absence of any change in the earnings cap, high earners spent much more time—an estimated 25.6 percent more time – on workers’ compensation . This is not ideal for the difference-in-difference design since it means that the treatment and control group are rather different in the absence of treatment.

Analysis II – add covariates that control for male, married, and a full set of industry and injury-type dummy variables.

> # Add covariates

> reg2=lm(ldurat~afchnge+highearn+afhigh+male+married+as.factor(indust)+as.factor(injtype),subset=(ky==1));

> summary(reg2)

Call:

lm(formula = ldurat ~ afchnge + highearn + afhigh + male + married +

as.factor(indust) + as.factor(injtype), subset = (ky == 1))

Residuals:

Min 1Q Median 3Q Max

-3.34355 -0.85412 0.09887 0.78560 4.43720

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.57135 0.10266 5.565 2.74e-08 ***

afchnge 0.01063 0.04492 0.237 0.812974

highearn 0.17576 0.05175 3.397 0.000687 ***

afhigh 0.23088 0.06952 3.321 0.000904 ***

male -0.09794 0.04455 -2.198 0.027959 *

married 0.12210 0.03912 3.121 0.001812 **

as.factor(indust)2 0.27087 0.05867 4.617 3.98e-06 ***

as.factor(indust)3 0.16067 0.04090 3.928 8.67e-05 ***

as.factor(injtype)2 0.78381 0.15617 5.019 5.36e-07 ***

as.factor(injtype)3 0.33536 0.09234 3.632 0.000284 ***

as.factor(injtype)4 0.64035 0.10087 6.348 2.36e-10 ***

as.factor(injtype)5 0.50530 0.09281 5.445 5.42e-08 ***

as.factor(injtype)6 0.39361 0.09356 4.207 2.63e-05 ***

as.factor(injtype)7 0.78661 0.20703 3.800 0.000147 ***

as.factor(injtype)8 0.51390 0.12928 3.975 7.13e-05 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.251 on 5334 degrees of freedom

(277 observations deleted due to missingness)

Multiple R-squared: 0.0412, Adjusted R-squared: 0.03868

F-statistic: 16.37 on 14 and 5334 DF, p-value: < 2.2e-16

The results are similar to the analysis that did not adjust for covariates. Duration on workers’ compensation is estimated to increase by 23.1 percent due to the high earnings cap.