1

Applied Econometrics - Panel Data 2Jennifer Smith - University of Warwick

4. Proposed solutions

(a) Introduce exogenous variables

If exogenous variables are added (to a first-order autoregressive process), the bias in the OLS estimator is reduced in magnitude but remains positive. The coefficients on the exogenous variables are biased towards zero. (The direction of bias for a pth order AR process is difficult to identify a priori.)

The LSDV estimator remains biased if exogenous variables are added to (2), for small T.

(b) Instrumental variable methods

(i) Anderson-Hsiao

A normal technique for dealing with variables that are correlated with the error term is to instrument them.

Taking first differences eliminates the , which were the source of the bias in the OLS estimator. This gives:

(15)

Now we need to instrument , which is still clearly correlated with the error . The second lag of the level, , and the first difference of this second lag, , are possible instruments, since they are both correlated with but are uncorrelated with , as long as the themselves are not serially correlated.

Both the resulting instrumental variables estimators (known as Anderson-Hsiao):

(16)

and

(17)

are consistent when N or T or both.

  • Instrumenting with the second lag of the level, (17), has the advantage over instrumenting with the second lagged difference, (16), that only two time periods are required, rather than at least three.
  • When , the choice of instrument can be based on correlations between and each of and .
  • It has been found that the estimator resulting from instrumenting using differences has a singularity point and very large variances over a significant range of parameter values. Instrumenting using levels does not lead to the singularity problem, and results in much smaller variances, and so is preferable.

(ii) Arellano-Bond

The Anderson-Hsiao instrumental variables estimator may be consistent, but it is not efficient because it does not take into account all the available moment restrictions. (Moment restrictions are restrictions on the covariances between regressors and the error term. Regressors may be orthogonal to the error term, in which case we are justified in imposing, or using, orthogonality restrictions that the covariance between regressor and error is zero.)

Arellano and Bond (1991) argue that a more efficient estimator results from the use of additional instruments whose validity is based on orthogonality between lagged values of the dependent variable and the errors . The Arellano-Bond estimator is now widely used in short dynamic panels, not least due to the fact that they wrote a Gauss-based regression package, DPD, which gives the standard OLS, fixed effects (Within or differences), random effects estimators, plus their own. (See below for a schematic discussion of the estimators available in DPD.)

Take (15) above, the first-differenced simple AR(1) model with no regressors. At t=3, the first period we observe the relationship in (15),

(18)

[NB 3 is t, 2 is t-1, 1 is t-2]

is a valid instrument for , since these are highly correlated, and is not correlated with unless the are serially correlated. At t=4,

(19)

Here and are both valid instruments: neither is correlated with unless the are serially correlated. Proceeding in this manner, we can see that at T, the valid instrument set is (,,...,).

The matrix of instruments is , where

[NB The top row of this matrix refers to t=3, and the last to t=T: it is a square (T-2) matrix.]

The moment conditions are given by

(20)

or, in vector form,

(21)

where

(22)

There are m=(T-2)(T-1)/2 linear moment restrictions for .

Premultiplying (15) (here written in vector form) by gives

(23)

Performing generalised least squares (GLS) on (23) gives the Arellano-Bond (1991) preliminary one-step consistent estimator:

(24)

where and G is a (T-2) square matrix with twos in the main diagonal, minus ones in the first subdiagonals and zeros otherwise:

Arellano and Bond also put forward a consistent 2-step generalised method of moments (GMM) estimator:

(25)

where , and in practice the differenced residuals from the preliminary one-step consistent estimator are used in place of .

  • Why should we use rather than ? Because does not rely on knowledge about the distribution of the components of ; and are asymptotically equivalent if the are IID(0,). (Nor does require knowledge about initial conditions, .)

What happens if we have additional independent explanatory variables in our equation?

Specifically, assume there are K additional independent explanatory variables, so we revert to equation (1):

(1)

where and  and are .

The two-step estimators for  and  are given by:

(26)

where X is the N(T-2)K matrix of observations on . The one-step estimator is obtained if is replaced by (cf. (25) and (24) above).

The instrument matrix W can be expanded to take advantage of the additional independent explanatory variables. The instrument matrix that is optimal (i.e. efficient) differs according to whether the additional explanatory variables are correlated with the fixed effects or not, and whether they are predetermined or strictly exogenous.

If the are all correlated with the fixed effects :

1.If the are predetermined, then future values of these regressors are correlated with the current error, i.e. for st and 0 otherwise. Then we can use as instruments up to the same date as our error term. In that case, then at time s, only are valid instruments in the first-differenced version of (1) (only up to because the differenced error includes ). Then the optimal instrument matrix will be:

2.If the are strictly exogenous, i.e. for all t,s, then all the s are valid instruments, so all will appear in all elements of the leading diagonal of the optimal instrument matrix:

3.The optimal instrument matrix when includes both predetermined and strictly exogenous variables should be obvious.

If at least some of the are not correlated with the fixed effects :

We can exploit that lack of correlation by estimating levels as well as differenced versions of the equations, and using extra restrictions in the levels equations. Specifically, let a subset of be uncorrelated with .

4.If are predetermined, observations on up to and including t=s are valid instruments for the levels equation at t=s. So at t=2 - the first levels equation we can estimate - we can use and as additional instruments for the levels equation, and for t=3,...,T we can use as an additional instrument. All other restrictions that could be placed on the levels equations have effectively already been imposed through the restrictions placed on the equations in differences (i.e. the additional restrictions are redundant). For example, at t=3, the only additional instrument we can get from the is , since and are already used in the difference equation for t=3. This means that there are T extra restrictions in the levels equations:

and , t=2,...,T.

In estimation, the levels equations from t=2 to t=T are stacked under the equations in differences (which themselves run from t=3 to t=T). The optimal instrument matrix can then be written:

5.If are strictly exogenous, observations on for all t become valid instruments in the levels equations. But we still only have T extra restrictions:

,

given those already exploited for the equations in first differences. This implies that the 2-step estimator would combine the (T-1) first difference equations and the average level equation.

Tests for the validity of the GMM estimator

The GMM estimator is consistent if there is no second-order serial correlation in the error term of the first-differenced equation: it requires . A test for the validity of the instruments (and the moment restrictions) is a test of second-order serial correlation in these residuals. The test statistic is

(27)

under the null . m2 is only defined for T, since it involves differenced residuals two periods apart. The have a completely hideous formula that need not concern us but is given in Arellano and Bond equation (9), p.284. (The m2 test might not reject if the residuals in levels follow a random walk, as well as if the errors in levels are not serially correlated of order one. To exclude the former (when OLS as well as GMM would be consistent), you could, for example, check that the first-differenced residuals have first-order serial correlation.)

The most common test of the instruments is Sargan’s (1958) test of over-identifying restrictions (reference: Sargan, J. D. (1958), “The estimation of economic relationships using instrumental variables”, Econometrica, vol.26, 393-415).

(28)

where p is the number of columns in the instrument matrix, and are the residuals from the 2-step estimation of (26). When T=4, for example, the Sargan statistic tests two linear combinations of the three moment restrictions available:

.

In this case, the Sargan test is available when the m2 test is not (there are no differenced residuals 2 periods apart, as involved in the m2 test).

A related possibility is the Sargan difference test, given by:

(29)

if the errors in levels are not serially correlated. sI is defined as the Sargan test above, except that only the instruments that remain valid when the errors in levels are MA(1) are included in the W matrix. This test clearly requires the econometrician to have a good idea about which regressors are strictly exogenous: it is these which will be used as instruments in the regression that forms the basis for the sI test. sI will not reject when errors in levels are MA(1) as well as when they are MA(0), but ds will reject when errors in levels are serially correlated, i.e. are MA(1). Since s should already show if serial correlation in levels is present, the ds test can be regarded as a back-up.

A final possibility is a Hausman test.

(30)

where r=rank ; if all variables except the lagged dependent variable are strictly exogenous, then r=1. [ indicates a generalised inverse. Like the previous test, the h test also requires a clear idea about which columns of the X matrix are strictly exogenous.
Application: Employment equations for UK companies, 1979-84 (Arellano and Bond, 1991)

Arellano and Bond (1991) (AB) apply various procedures to estimate an employment equation for an unbalanced panel of 140 UK companies for whom they have at least 7 continuous observations (on employment, real product wage, capital stock) between 1976 and 1984. (They have 7 observations on 103 companies, 8 on 23 and 9 on 14. 3 lags will be lost through taking first differences and including lags, so estimation is over 1979-84. 611 observations are used in estimation.)

Their basic equation is:

(31)

where is the natural log of domestic employment in company i at the end of year t (which is the accounting year, so varies across companies). They include time-specific effects , which is the calendar year in which the accounting year ends.

include the log of the real product wage (pay bill per employee divided by industry price, adjusted by average weekly hours worked in manufacturing industries), the log of an inflation-adjusted estimate of the company’s capital stock (Gross fixed assets), and the log of industry output (value added), each company having been classified into one of 9 sub-sectors of manufacturing according to their main product by sales.

The equation can be motivated along the lines of Layard and Nickell’s work. With zero adjustment costs, a price-setting firm facing a constant elasticity demand curve would choose to set employment according to a log-linear labour demand equation of the form:

(32)

where , and . is the log of the real product wage, is the log of gross capital, and is a measure of the expected demand for the firm’s product relative to potential output (industry output captures industry demand shocks in the estimated equation (31), and time dummies capture aggregate demand shocks). If it is costly to change employment, actual employment will deviate from in the short run, suggesting a lag structure as in (31).

AB’s Table 4 shows GMM estimates of equation (31) (based on first differences) [we will ignore column (d) since it is not relevant for our purposes]. Table 5 shows other estimates (the two Anderson-Hsiao estimators, OLS and Within-groups). Employment adjustment does appear to take 2 years, employment responds negatively to current wage rises, and positively to (industry) output shocks, and is higher, the higher is the capital stock. Column (c) is AB’s preferred specification, and suggests a long-run wage elasticity of -0.24 (but s.e.=0.28), and a long-run elasticity w.r.t. capital of 0.7 (s.e.=0.14). Employment appears affected by changes in industry output (0.8900.875), which accords with the Layard-Nickell interpretation that employment responds to movements in demand relative to potential output.

Columns (a1) and (a2) instrument the lags of the dependent variable with the efficient levels of employment as in the Wi matrix on page 9 above. Despite assuming the other regressors are exogenous, AB don’t exploit any additional restrictions (which would be as in 2. above if the regressors were correlated with the individual effect, and as in 5. if they were not). (a1) shows 1-step estimates and (a2) shows 2-step estimates. Increased efficiency resulting from the 2nd step might be shown in the roughly 30%-lower (asymptotic) standard errors, but AB also refer to simulation results in which they found 2-step standard errors to be biased downwards (in finite samples) by around 20% (see AB p.285). Column (b) omits insignificant dynamics from the 2-step model, with little change in the long-run properties.

Column (c) allows for the fact that the real wage and capital stock may be endogenous. In principle, these would each be instrumented with own (t-2)-and-earlier lags. In practice, only (t-2) and (t-3) lags are used as instruments due to computational complexity and relatively small sample size, but additional instruments are used (lags of company sales and inventories).

The instrument tests discussed above are reported under the coefficient estimates. None of the m2, the Sargan s, and the difference-Sargan ds tests reject the null of serially uncorrelated errors in the levels equations for the 2-step GMM estimator. The s and ds tests reject for the 1-step estimator, but AB’s simulation suggested these tests reject too often in the presence of heteroskedasticity. The Hausman test rejects for both 1- and 2-step, but again, in simulations, AB found this test over-rejects.

AB hypothesise that the rejections reflect the fact that some of the regressors that have been assumed exogenous are in fact endogenous. When these variables - wages and capital - are instrumented (column (c)), none of the tests reject the necessary null of no serial correlation in the levels disturbances.

Column (e) of Table 5 reports the Anderson-Hsiao estimator with the differenced lagged dependent variable instrumented with the own differenced third lag. The number of observations falls as one further observation per individual is lost (estimation is over 1980-84). Column (f) reports the other Anderson-Hsiao estimator, using the third lag of the level as the instrument. In both cases, the estimates are poorly determined: there appears to be a large gain in efficiency through using the additional instruments in the AB GMM procedure.

Column (f) reports OLS estimates (these are over 1978-84 as one observation is gained). The lagged dependent variable coefficient is biased upward, as we would expect in the presence of firm-specific effects (which we have of course been assuming).

Column (g) reports Within-groups estimates (again, a year is gained). Surprisingly, the coefficient on the lagged dependent variable is greater than that using the GMM estimators (we would expect it to be biased downwards in the presence of fixed effects). AB point out that the endogeneity of some regressors could cloud the comparison between Within-groups and GMM.

Using DPD

The above relates directly to the estimators available in DPD. The package gives the options:

State form of model

- type0 for levels

1 for first differences

2 for orthogonal deviations

3 for combined first differences and levels

4 for combined first differences and average level

5 for combined orthogonal deviations and levels

6 for combined orthogonal deviations and average level

7 for within groups

8 for error components generalised least squares

Model 0 - levels

is OLS. Use this if you have a static model and don’t want to allow intercepts to vary across individuals.

Model 1 - first differences

is OLS on first differences, i.e. the fixed effects model transformed using first differencing to remove the fixed effects. Use this if you have a static model in which you think intercepts vary across individuals and are non-random. Arguments for fixed rather than random include: my regressors are correlated with the error term; I am content to make inferences conditional on the set of individuals in my dataset (e.g. I can talk happily about what matters for OECD countries, and I don’t want to make inferences about all countries in the world); everyone else uses the fixed effects model too.

Model 7 - within groups

is OLS on data demeaned by the Within transformation, i.e. the fixed effects model transformed by subtracting time-means to eliminate the fixed effects. Use this if you have a static model in which you think intercepts vary across individuals and are non-random. Rationales for using this model are as for Model 1, plus additional efficiency since you don’t lose a time period through differencing.

Model 2 - orthogonal deviations

is OLS on data demeaned by the orthogonal deviations transformation, i.e. the fixed effects model transformed by subtracting forward time-means to eliminate the fixed effects (see Note below for more detail on orthogonal deviations). Use this if you have a static model in which you think intercepts vary across individuals and are non-random. Rationales for using this model are as for Model 7. In addition, it has computational advantages: it reduces the size of the computational problem of calculating the instrumental variables estimators. [Arellano and Bover (1995), “Another look at the instrumental variables estimation of error-component models”, Journal of Econometrics, vol.68, 29-51, has more detail and motivation.]