DeMaris Online Supplement 12
Supplemental Materials
Combating Unmeasured Confounding in Cross-Sectional Studies: Evaluating Instrumental-Variable and Heckman Selection Models
by A. DeMaris, 2014, Psychological Methods
http://dx.doi.org/10.1037/a0037416.supp
Part 1: SAS Code and Simulation Calculations
SAS code for running IVR and HSM is shown below. For Stata users, ivreg and treatreg were the procedures for running IVR and HSM, respectively, in earlier versions of the package. In Stata 13, the corresponding commands are ivregress and etregress.
SAS Code for Running the IVR Test of Endogeneity
In what follows, y is the substantive outcome of interest, c is the treatment indicator, z is the instrumental variable (or vector of variables) for c, and x1 and x2 are other continuous regressors. Capitalized words are SAS keywords that must be typed as shown; lowercase words are user-supplied SAS variable names.
PROC REG; MODEL c = x1 x2 z;
OUTPUT OUT = test R = error;
PROC REG DATA = test; MODEL y = x1 x2 c error;
The test for endogeneity is the t test for the coefficient of the variable “error” in this last regression.
SAS Code for Running IVR
PROC SYSLIN FIRST 2SLS;
ENDOGENOUS c;
INSTRUMENTS x1 x2 z;
equatn: MODEL y = x1 x2 c;
SAS Code for Running HSM
PROC QLIM;
MODEL c = x1 x2 z / DISCRETE;
MODEL y = x1 x2 c;
Calculations for Treatment Skew, r, and R2 for Simulation Model
Treatment skew with normal errors. For the simulation condition with an effective instrument (or unique regressor) and unmeasured heterogeneity present, the model for C* is .5 + 1.7x1 – 2.3 x2 + 1.75z + .75a + e, where x1, x2, z, a, and e are standard normal random variables that are all independent of each other. By theorem, C* is therefore normally distributed with mean = .5 and with variance = 1.72 + 2.32 + 1.752 + .752 + 1 = 12.805 and standard deviation = 3.578. For q to be the 85th percentile of this distribution, it must be that (q - .5)/3.578 = 1.036, which implies that q = 4.2068. Other cutoffs for C* when it is normally distributed are similarly computed.
Calculation of r. The simulation model for y when errors were normally distributed (and a treatment effect is present) was -2 + x1 + 2x2 + 1.25c + 1.5a + u. We note also that Cov(a,e) = Cov(a,u) = Cov(e,u) = 0. Let w1 = .75a + e and w2 = 1.5a + u. The error correlation for estimation models is therefore:
When e and u were exponentially distributed with variances of 4, the covariance of w1 and w2 is unchanged. But the variances of w1 and w2 are .752 + 4 = 4.5625 and 1.52 + 4 = 6.25, respectively. This means that the error correlation under nonnormality was:
R2 for simulation models. The estimation models for C* and y when, say, an effective instrument (or unique regressor) and unmeasured heterogeneity are present are C* = .5 + 1.7x1 – 2.3 x2 + 1.75z + .75a + e and y = –2 + x1 + 2x2 + 1.25c + 1.5a + u. As calculated above, the variance of C* is 12.805. Of this, var(.5 + 1.7x1 – 2.3 x2 + 1.75z) = 11.2425, or 88% is due to the regression on the measured explanatory variables. Similarly, the variance of y is var(-2 + x1 + 2x2 + 1.25c + 1.5a + u) = 9.8125, of which var(-2 + x1 + 2x2 + 1.25c) = 6.5625, or 67% is due to the regression on measured explanatory variables.
Part 2: MSE and Bias Figures
Note: Figures S1–S3 present MSE values for OLS, IVR, and HSM estimators of the treatment effect for sample sizes of 50, 250, and 2,000 when the treatment effect is absent. Figures 4–9 present information pertinent to bias of the estimators. In particular, the figures show the means of OLS, IVR, and HSM treatment-effect estimates when the treatment is either present or absent. The criterion values of 1.25 (treatment present) and 0 (treatment absent) are marked on the graphs with horizontal lines. Bias in each case is the discrepancy between the mean of the estimator and the criterion value.
Figure S1. Simulation results for N = 50 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S2. Simulation results for N = 250 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S3. Simulation results for N = 2000 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S4. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 50 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S5. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 50 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S6. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 250 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S7. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 250 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S8. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 2000 with treatment effect present. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.
Figure S9. Mean values for OLS, IVR, and HSM estimators of the treatment effect for N = 2000 with treatment effect absent. Sym T = symmetric treatment condition, Asym T = asymmetric treatment condition, Norm E = normal errors, Exp E = exponential errors, Cnfd P = confound present, Cnfd A = confound absent, Inst P = instrument present, Inst A = instrument absent, MSE = mean square error, OLS = ordinary least squares; IVR = instrumental-variable regression; HSM = Heckman selection model.