Responses to Concerns About Causal Inference For

Responses to Concerns about Causal Inference for

Thomas Dietz , Kenneth A. Frank, Cameron Whitley, Jennifer Kelly, Rachel Kelly.Political Influences on Greenhouse Gas Emissions from U.S. States. Proceedings of the National Academy of Sciences. June 15 2015.

Some reporters raised concerns about the causal inferences we made from Table 1.

Here are some responses:

By analogy: do you trust your car to accelerate?

We can make the case for causal inference by the following analogy.

You are driving and come to an intersection with a traffic light where you want to make a left turn. You wait for traffic to clear, step on the accelerator, and then go through. Could you have proved that the car would react and accelerate uniformly quickly when you stepped on the accelerator? Lots of things could have happened: it could have backfired (I had an old car that did this to me), the accelerator pedal may have stuck, you could have run out of gas. And yet the car’s response is consistent enough to warrant your inference that you will get through the intersection – the evidence, while not proof, was actionable. And how many times would it have to fail for you to change your mind and never rely on the accelerator again?

Just so, we believe our evidence is strong enough to be actionable, to be the basis of the inference that environmentalism reduces CO2 emissions: 44% of the estimate would have to be due to bias to invalidate the inference. We might be wrong, but there would have to be a lot of misfires to reduce the evidence below our threshold for inference.

Cross-sectional analysis.

The first basis of our inference was a state by state comparison at 1990. This seems to be the one to which your reader is responding below. We acknowledge that there could be an alternative explanation such as having fossil fuel production in the state. But we note that to invalidate our inference the alternative explanation would have to account for 44% of the estimated effect of environmentalism on CO2 emissions, and that is net of the other factors we controlled for (population, employment and gross state product). The strongest alternative explanation we found was % women in the legislature, which accounted for about a 15% change in our estimated effect. So the alternative fuels explanation would have to be about 2.5 times stronger than the strongest alternative explanation we could find and measure to invalidate our inference. So we can acknowledge that some of our estimate may be due to alternative factors, but make the inference that environmentalism reduces CO2 emissions.

spreadsheetfor calculating indices [KonFound-it!]

The more sophisticated techniques referred to by your reader one might consider are propensity score matching or instrumental variables. Regarding propensity score matching (in which you first model assignment to a treatment, such as environmentalism, and then account for that assignment in a model): it all boils down to what you controlled for (Heckman, 2005; Morgan & Harding, 2006, page 40; Rosenbaum, 2002, page 297; Shadish et al., 2002, page 164). Regarding instrumental variables (in which you first model assignment to treatment and then use the predicted value of assignment in a model – it’s an alternative instrument to measure the predictor): they only work well when there is a strong instrument, otherwise standard errors can be severely inconsistent (see Wooldridge, 2002, page 102). We could not identify any such instrument in our data.

Moreover, there is very recent work in education and the social sciences that compares results of randomized experiments with observational (correlational) studies by randomly assigning subjects to one or the other type of study and then comparing results. They conclude that choice of covariates is critical (propensity scores work about the same as regression based techniques that we used, but instrumental variables perform very poorly). See references below.

Longitudinal analysis.The literature I have cited below all points to the fact that covariates are important, and the best covariates are measures of the outcome measured at previous points in time which allow you to model change in an outcome over time. True, our analysis of the effect of environmentalism at 1990 is cross-sectional, but the causal inference in the paper is also based on a longitudinal analysis in which we are modeling change over time in CO2 emissions for each state. This is the level 1 analysis in the Table 1. And in this analysis environmentalism again has a strong effect, reducing the change in CO2 output over time. Now the alternative explanation of fossil fuel production is more difficult to make. In this context we would accept that having fossil fuel plants may quite likely create increases in CO2 over time – but that a strong environmentalism orientation reduces this tendency, or any other tendency the state has in its trajectory that was initiated in 1990. Moreover, this inference is even more robust to alternative explanations than the cross-sectional analysis. 60% of the estimate would have to be due to bias to invalidate the inference of an effect of environmentalism on change in CO2 over time (we did not report this sensitivity analysis in the paper, choosing the more conservative but still persuasive sensitivity analysis from the cross-sectional data in 1990, which we thought would also be more intuitive). This 60% is also healthy enough to cover some of the concerns about the measurement of our variables.

spreadsheetfor calculating indices [KonFound-it!]

So the upshot is that we acknowledge there could be alternative explanations to our findings, but those explanations would have to be extremely powerful, far more so than anything we could find in our data, to invalidate our inference.

On the importance of covariates

– the quote below is from one of the most preeminent social scientists on causal inference, Thomas Cook:

Results are similar across the samples of studies reviewed with their wide range of non-experimental designs and topic areas. Covariate choice counts most, unreliability next most, and the mode of data analysis hardly matters at all. Unreliability has larger effects the more important a covariate is for bias reduction, but even so the very best covariates measured with a reliability of only .60 still do better than substantively poor covariates that are measured perfectly. Why regression methods do as well as propensity score methods used in several different ways is a mystery still because, in theory, propensity scores would seem to have a distinct advantage in many practical applications, especially those where functional forms are in doubt.

Abstract from: Cook, T. D., P. Steiner, and S. Pohl. 2010.Assessing how bias reduction is influenced by covariate choice, unreliability and data analytic mode: An analysis of different kinds of within-study comparisons in different substantive domains.Multivariate Behavioral Research44(6): 828-47.

Other supporting evidence.

Berk R (2005) Randomized experiments as the bronze standard. J ExpCriminol 1:417–433

Berk R, Barnes G, Ahlman L, Kurtz E (2010) When second best is good enough: a comparison between a

true experiment and a regression discontinuity quasi-experiment. J ExpCriminol 6:191–208. ‘‘the results

from the two approaches are effectively identical’’ page 191.

Pohl, S., Steiner, P. M., Eisermann, J., Soellner, R., & Cook, T. D. (2009). Unbiased causal inference from an observational study: Results of a within-study comparison.Educational Evaluation and Policy Analysis,31(4), 463-479.

Concato, J., Shah, N., & Horwitz, R. I. (2000). Randomized, controlled trials, observational studies, and the hierarchy of research designs.New England Journal of Medicine, 342(25), 1887-1889.

Cook, T. D., Shadish, S., & Wong, V. A. (2008). Three conditions under which experiments and observational studies produce comparable causal estimates: New findings from within-study comparisons. Journal of Policy and Management.27 (4), 724–750.

Shadish, W. R., Clark, M. H., & Steiner, P. M. (2008). Can nonrandomized experiments yield accurate answers? A randomized experiment comparing random to nonrandom assignment.Journal of the American Statistical Association, 103(484), 1334-1344.

Steiner, Peter M., Thomas D. Cook & William R. Shadish (in press). On the importance of reliable covariate measurement in selection bias adjustments using propensity scores.Journal of Educational and Behavioral Statistics.

Steiner, Peter M., Thomas D. Cook, William R. Shadish & M.H. Clark (2010).The importance of covariate selection in controlling for selection bias in observational studies.Psychological Methods.Volume 15, Issue 3. Pages 250-267.more than just pretest.

Kane, T., & Staiger, D. (2008).Estimating Teacher Impacts on Student Achievement: An Experimental Evaluation.NBER working paper 14607.

35. Kane, T., & Staiger, D. (2008).

Bifulco, Robert . "Can Nonexperimental Estimates Replicate Estimates Based on Random Assignment in Evaluations of School Choice? A Within‐Study Comparison."Journal of Policy Analysis and Management31, no. 3 (2012): 729-751.

Reports 64% to 96% reduced with pre-test

Other references:

Heckman, J. (2005).The Scientific Model of Causality.Sociological Methodology, 35, 1-99.

Morgan, S. L. & Harding, D. J. (2006).Matching estimators of causal effects: Prospects and pitfalls in theory and practice.Sociological Methods and Research35, 3-60.

Rosenbaum, P. 2002.Observational Studies. New York: Springer.

(Shadish, Cook, & Campbell, 2002)

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002).Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Boston, NY: Houghton Mifflin.

Wooldridge, Jeffrey M.Econometric Analysis of Cross Section and Panel Data. Cambridge, Ma: MIT Press.