A Matched Pairs Analysis of State Growth Differences*
Brian Goff
Alex Lebedinsky
Stephen Lile
Department of Economics/Ford College of Business
Western Kentucky University
Bowling Green, KY 42101
Contact:
*The authors thank Bob Tollison, Dennis Wilson, and participants on the panel at the Academy of Economics and Finance Meetings for useful comments and suggestions.
Abstract
The American states have provided a rich laboratory in which to examine influences on economic growth, including convergence, physical capital, human capital, and a variety of policy variables. Existing studies have typically used broad cross-sections of all states or particular regional sub-samples. Pairwise matching has been used in many observational studies to better control for omitted variables. We estimate a growth model for U.S. states for 1997-2005 before and after applying different pairwise matching techniques. Our results indicate that a sample based on pairwise matching substantially improves the overall explanatory capability of the model and provides much more support for particular hypotheses, such as convergence and the growth-enhancing effects of lower individual income-tax rates.
I. Introduction
The American states have been and continue to be a useful laboratory for testing influences on economic growth, including convergence, physical capital, human capital, and a variety of policy variables. Routinely, all 50 states or regional subsets of them are grouped together and regression analysis is performed to estimate parameters of the models. Implicitly, this practice generates coefficients by comparing states with widely divergent populations such as California and Wyoming, widely divergent incomes such as Connecticut and Mississippi, widely divergent historical and cultural backgrounds such as Pennsylvania and Utah, and with other extreme differences. Although these regressions take into account a variety of "control" variables, they omit or use proxy measures of important, long-run differences between the states. Regional sub-samples or regional dummy variables are used as a means to "soak up" some of the variability due to these omitted factors.
In this paper, we offer an alternative methodology to estimate state growth models by using matched pairs based on common geographical characteristics. The intuition behind matching is straightforward. Matching of birth twins, for example, can eliminate or reduce genetic differences as a source of variation so that other factors can be isolated and estimated with greater accuracy. Although matching can be used, and may be most frequently considered, as a pre-treatment experimental design method, it has also been developed as a post-treatment method.[1] Although the method is not a panacea to cure all observational study problems, under certain conditions, the post-treatment matching imitates the results of randomizing treatments among observations.
Statistical investigators in a variety of disciplines -- including economics, finance, statistics, psychology, and biomedicine -- have made use of post-treatment matched pairs to improve estimates. In economics, the methods have been used where the observational units have ranged from individual employees or borrowers to financial institutions, and the subjects studied have ranged from bank problems, to risk, wage rates, and firm size.[2] Although the theoretical links have not been worked out, creating sub-samples of data based on common socio-demographic characteristics is a closely related method that has been shown to improve estimates. For example, Barro and Sala-i-Martin (1991) show that while the Solow growth model's prediction of convergence in growth rates does not hold for a large sample of countries, it does hold for a reduced sample containing only OECD countries. The practice of reducing samples based on Chow tests rejecting coefficient equality is closely related to ex post matching methods because the objective, as in pairwise matching, is to better align the sample to the underlying process generating outcomes.
In spite of the growing use of matching methods in statistical studies in economics, their use at a state level of analysis has largely been limited to time series studies of a particular pair of states or small groupings of states.[3] The U.S. states provide an attractive basis for matching. State-based, pairwise matches compare observational units generated, at least in part, by similar underlying processes, doing so on the front end of the estimation process using similar characteristics of the data rather than on the back end of the estimation process (as with a Chow-type sample partitioning method). State matching pairs states that share important similarities and, therefore, can implicitly eliminate differences that are difficult to measure. In particular, we pair states by location, and these pairs also capture historical, political, climatic, topographic, and transportation similarities. Many states share similar geographic and historical backgrounds and traits. For example, Kentucky and Tennessee or Arizona and New Mexico provide specific examples of states that are near "twins" when viewed from a long-run historical, political, and geographic perspective. Kentucky and Tennessee are not only contiguous, and therefore share similar continental locations, but also share a 350-mile border, are within 95 percent of the same land area, and are both landlocked. In terms of historical similarities, they entered the Union within four years of each other (Kentucky in 1792 and Tennessee in 1796). Arizona and New Mexico, as another example, entered the Union one month apart, are about 94 percent of the same land area, are landlocked, share similar topography, and share a 300-plus mile border.
In the next section, we discuss our specific matching methodology and the model of economic growth that we employ. Section III provides estimates of the growth model, comparing results using a typical cross-section of all 50 states for 1997-2005 with pairwise matched samples. Section IV offers concluding remarks.
II. State Matching Theory and Methods
Matching has been used most frequently to compare means (or another simple distributional parameter) where the matching is intended to take into account all (or as many as possible) of the influences on the dependent variable other than the variable of primary interest. In these studies, a simple t-test of mean differences is sufficient to estimate the effect of that (binary) variable of interest. To a lesser extent, matching has also been employed in conjunction with regression modeling as a method to control for some or most of the non-treatment differences.[4] The regression model extends this effort to control for other effects that the matching may have missed. It is this matching-regression combination that we employ. Our purpose is to use matching to eliminate many of the differences in natural endowments, as well as long-run historical and cultural differences, between states, and then estimate a regression for a basic empirical growth model.
The potential advantages of matching can be illustrated by starting with a cross-sectional equation for growth (Y) of state i that is a function of an observed variable Xi and an omitted variable Zi as shown below:
(1) Yi = β0 + β 1Xi + β 2Zi + ui .
When this equation is estimated using only data on Yi and Xi, the expected value of b1, the OLS estimator of β 1, is
(2) E[b1] = β 1 + β 2[Cov(Xi, Zi)/Var(Xi)].
The second term on the right hand is the bias that results from omitting Zi: Whenever Xi and Zi are correlated, b1 will be biased. With pairwise matching between states i and j, the underlying model changes to
(3) Yi - Yj = β 1(Xi - Xj) + β 2(Zi - Zj) + (ui - uj).
The bias for β 1 from omitting (Zi - Zj) is
(4) E[b1] = β 1 + β 2(Cov[(Xi - Xj),(Zi - Zj)])/Var(Xi - Xj)) .
The benefit of matching states with similar unobserved characteristics is quickly apparent. If these unobserved characteristics between states i and j are equal, then
Zi - Zj = 0, Cov[(Xi - Xj),(Zi - Zj)] = 0, and E[b1] = β 1. Of course, if the matching does not equate Zi and Zj, then bias will be present. However, the bias in the matched sample will be less than the bias in the unmatched cross-section as long as
(5) Cov[(Xi - Xj),(Zi - Zj)]/Var[Xi - Xj] < Cov[Xi, Zi]/Var[Xi].
Simply put, if correlations between the omitted growth equation variables and the included variables are lower in the matched pair sample of states, then matching will produce lower bias.
By comparing cross-section and matched pairs estimates, we will be able to see if cross-section estimates are affected by omitted variable bias: If Zi is not correlated with Xi, both methods should produce similar estimates of β1. On the other hand, a big difference in estimates obtained using these two methods would indicate that cross-section estimates are biased.
Two common matching methods exist. The first method matches directly on covariate values. The second method matches on equivalence or similarity of likelihood of a treatment conditioned on the covariate values. This method, known as "propensity scoring,” has the advantage of using information across a wide variety of covariates but condensing this multi-dimensional information into a scalar on which to base matches. Comparisons of matching based on covariate values versus propensity matching has not yielded results indicating that one method dominates the other in terms of reducing bias of estimated parameters in all situations.[5]
Most important to our purposes, the statistical and econometric literature on propensity matching has concentrated primarily on pairing observations based on a binary, treatment/no-treatment variable. The method has been extended to multi-valued discrete treatment variables, but not to continuous variables. Our interest does not lie in matching based on whether a single, binary policy variable has been adopted. Instead, we seek to estimate a model of state economic growth that takes into account not only continuous policy variables as well as binary ones, but also non-policy growth model variables.
Our method is similar to covariate matching. Instead of using a large set of covariates on which to match, we rely heavily on location as a variable that condenses many dimensions of a state into a single observation. In particular, location tends to incorporate information about date of statehood, migration history, ethnic and religious backgrounds and migration, climate, topography, neighboring states, and so on. Our matching approach looks for other states that share a common geographic relationship rather than matching based on current values for a set of variables or current likelihood of adoption of a particular policy. We compute different match pair samples for the 48 contiguous states based on the following criteria:
Criterion 1-- All Contiguous Pairs;
Criterion 2 -- All Contiguous Pairs + within the interquartile range for the ratio of land area
OR
All Contiguous States + within interquartile range for the ratio of population.
Table 1 displays the pairwise matched samples based on the matching procedures discussed above. The sample includes the 48 contiguous states with 102 unique paired observations. The most frequently matched states are Missouri, with eight matches, followed by Kentucky and Tennessee, with seven. Five states have six matches. All states, with the exception of Maine, have at least two matches. Limiting the sample to the interquartile ranges for the land area and population ratios reduces the total pairs of observations to 51 and 52. The land area-restricted sample includes 37 different states, while the population-based restricted sample includes 41 states. The sample of common states appearing in both the land and population interquartile range includes 28 pairs.
III. State Growth Model and Regression Estimates
Growth theory looks to capital, both human and physical, as the primary source of economic growth. While measures of tangible capital exist, capital in a broader sense may include a variety of natural geographical features, such as climate, ports, or arable land. The measurement of human capital typically requires the use of some kind of proxy variable, such as educational attainment levels of the population. Standard theoretical growth models, such as the Solow Model, predict convergence of growth rates. Given the comparatively free movement of labor and capital across state boundaries, one would expect greater convergence among U.S. states than among nations.[6] However, the existing evidence on convergence of state growth rates is mixed. Not only has convergence been incomplete, but also Bauer, Schweitzer and Shane (2006) provide evidence that it has stalled since the mid-1970s.[7]
Public policies that alter incentives to utilize labor and capital are potential influences on growth. High taxes, unless associated with improved infrastructure that makes for higher productivity, can be expected to discourage work and possibly saving and investment. Laws governing the employer-employee relationship, such as policies that encourage collective bargaining, are likely to increase wages and, unless associated with higher labor productivity, can be expected to increase production costs and decrease expected profit and business investment.
Empirical studies of the effects of taxes have produced mixed results. An early study (Helms, 1985) found that higher taxes reduce growth when used to finance transfer payments but do not reduce growth when used to finance infrastructure. Another study from this time period (Genetski and Ludlow, 1982), using 1970-1977 data, found that states that decreased their tax burdens relative to the national average tended to experience above-average growth. States that increased their tax burden, relative to the national average, tended to suffer below-average economic growth. In a time series study of a single matched pair (New Hampshire and Vermont), Campbell (1994) observed greater economic vitality in New Hampshire and attributed the higher growth rate to lower taxes. Drawing from analogies to portfolio theory, Crain (2003) finds that tax rates are much more important determinants of state growth than convergence. In contrast, Bauer, Schweitzer, and Shane (2006) included a measure of “knowledge stock” and found that tax burden is statistically insignificant.
Our base regression model estimates changes in Gross State Product (GSP) across the 48 contiguous states as a function of a set of standard growth-model variables, including the initial level of income, physical capital, human capital, and state public policy variables. The model is
(6) DGSPPCi = a0 + a1GSPPCi0 + a2PhysicalPCi + a3Coastline Per Sq. Mi.i +
a4PCT Collegei + akPolicy Variableik +ui;
where
DGSPPCti = percent change in GSP per capital from 1997 to 2005 for state i;
GSPPCi0 = level of GSP per capital in 1997 for state i;
Physical PCi = tangible physical capital per capita for state i[8];
Coastline Per Sq. Mi.i = miles of coastline per square mile of land area for state i;