1Applied Econometr

EC322 APPLIED ECONOMETRICS

Panel Data 2

1. DYNAMIC PANEL DATA MODELS

Plan

1. Introduction

2. Bias when intercepts are heterogeneous

3. Application: The demand for natural gas (Balestra and Nerlove (1966))

4. Proposed solutions

5. Applications: The demand for cigarettes (Baltagi and Levin (1992) and Baltagi (1995)); Employment equations for UK companies (Arellano and Bond (1991))

6. Bias when slope parameters are heterogeneous, in the cases of stationarity and nonstationarity

7. Applications: Real wage determination in OECD countries (Robertson and Symons (1992)); Labour demand across UK industries (Pesaran and Smith (1995)); A multi-country analysis of growth and convergence (Lee, Pesaran and Smith (1996))

References

Baltagi (1995), ch.8 “Dynamic panel data models”.

Hsiao (1986), ch.4 “Dynamic models with variable intercepts”.

1. Introduction

Dynamic models are important: many economic relationships are dynamic in nature and should be modelled as such. The time dimension of panel data enable us to capture the dynamics of adjustment.

A dynamic model is characterised by the presence of a lagged dependent variable among the regressors:

(1)

where  is a scalar,  and are each .

Assume that the follow a one-way error component model:

where and independent of each other and among themselves.

2. Bias

(a) Bias in the OLS estimator

The OLS estimator is unbiased and consistent when all explanatory variables are exogenous and are uncorrelated with the individual specific effects. (However, because the OLS estimator ignores the error-component structure of the model, it is not efficient.) But things are different when the model includes a lagged dependent variable.

Rewrite (1):

(2)

For simplicity, the regressors have been omitted. are individual effects: there is no need here to specify whether they are fixed or random. As before,  is a scalar. Add the condition that

so that yit is (weakly) stationary (we will investigate the consequences of relaxing this condition later on). Assume over i and t.

We can show that the OLS estimator will be seriously biased due to correlation of the lagged dependent variable with the individual specific effects. Since is a function of (or in (1)), is also a function of . Therefore , a right-hand regressor, is correlated with the error term. This renders the OLS estimator biased and inconsistent, even if the (or in (1)) are not serially correlated.

This holds whether the individual effects are considered fixed or random (note in passing, in the case of fixed effects we would be unlikely to use OLS without also effectively including individual intercepts, in which case, (b) below applies; but we might attempt OLS in the case of random effects since the OLS estimator is unbiased in the static case.)

The OLS estimator of  is:

(3)

The probability limit of the second term of (3) gives the asymptotic bias of the OLS estimator.

  • The OLS estimator exists if the denominator of the second term of (3) is nonzero.
  • The OLS estimator is consistent if the numerator of the second term of (3) converges to zero.

Regarding the numerator of the second term of (3), it can be shown that as N tends to infinity,

(4)

And for the denominator of the second term of (3),

(5)

We need to make some assumption about the initial values . It is normally assumed that these are arbitrary constants, in which case , or that they are generated by the same process that generates the other , in which case is positive. As long as the are bounded, so that is finite:

  • The bias in the OLS estimator is positive: the OLS estimator overstates the true autocorrelation coefficient when N or T or both tend to infinity.
  • The bias is larger, the larger is the variance of the individual effects, .
  • Monte Carlo studies have shown that the above two (asymptotic) results also hold in finite samples.

(b) Bias in the fixed effects model

The bias and inconsistency of the OLS estimator stems from correlation of the lagged dependent variable with the individual specific effects. It might therefore be thought that the Within transformation,

,

which wipes out the individual effects, would eliminate the bias. But it does not solve the problem.

Rewrite (1):

(6)

Again, regressors have been omitted. As before,  is a scalar and . Other assumptions: are fixed, and . (We can either let , where  is the non-varying intercept in the regression, or impose the restriction . Otherwise there is perfect multicollinearity between the set of individual dummies and the non-varying intercept , and we can only estimate (), not each component separately.)

Let , (Hsiao, p.73) (i.e. (Nickell (1981), p.1419) (Baltagi, p.126, uses , which is the approximation we would have to use in practice), and .

We can show that the LSDV estimator will be biased for small (‘fixed’) T. The bias in the fixed effects estimator is caused by having to eliminate the unknown individual effects (constants) from each observation. This creates a correlation of order (1/T) (a bias of O(1/T)) between the explanatory variables in the (Within-) transformed (i.e. demeaned) model and the residuals. will be correlated with even if are not serially correlated. This is because is correlated with by construction: is an average containing , which is obviously correlated with . These correlations do not go to zero as N tends to infinity.

The LSDV estimators for and  are:

(7)

(8)

(See Nickell (1981), “Biases in dynamic models with fixed effects”, Econometrica, 49 (6), 1417-26, for an explicit derivation.)

The probability limit of the second term of (8) gives the asymptotic bias of the LSDV estimator of the autocorrelation coefficient.

  • The LSDV estimators exist if the denominator of the second term of (8) is nonzero.
  • The LSDV estimators are consistent if the numerator of the second term of (8) converges to zero.

Regarding the numerator of the second term of (8), it can be shown that as N tends to infinity,

(9)

And for the denominator of the second term of (8),

(10)

  • If T tends to infinity, (9) tends to zero and (10) tends to a nonzero constant . So in this case, the LSDV estimator is consistent.
  • But if T is fixed, then no matter how large N is, (9) is a nonzero constant, and the LSDV estimators are inconsistent.

For fixed T, the asymptotic bias in the LSDV estimator of  is:

(11)

(See Nickell, p.1422, for explicit derivation.)

  • For small T and , the bias is always negative.
  • The bias is larger, the larger is 
  • The bias does not tend to zero as  tends to zero.
  • The bias is larger, the smaller is T.
  • When T is large, the right-hand side variables become asymptotically uncorrelated; the bias tends to zero as T tends to infinity.

When T=2, the asymptotic bias is equal to -(1+)/2.

When T=3, the asymptotic bias is equal to -(2+(1+)/2.

When T is reasonably large, the asymptotic bias is approximately equal to .

=0.1 / =0.5 / =0.9
T=2 / -0.550 / -0.750 / -0.950
T=3 / -0.373 / -0.576 / -0.706
T=10 / -0.111 / -0.162 / -0.243
T=30 / -0.037 / -0.052 / -0.077

Table 1: Magnitude of asymptotic bias (N) in LSDV estimator of coefficient on lagged dependent variable

(based on (10))[1]

The magnitude of the bias has also been studied using Monte Carlo techniques, which confirm these analytical calculations in most cases (see e.g. Nickell pp.1424-25 for more detail).

(c) Bias in the random effects model

The problem with GLS estimation of the random effects model is similar to that of LSDV estimation of the fixed effects model. In order to apply GLS, we undertake quasi-demeaning. The resulting dependent variable will be correlated with the quasi-demeaned residuals , and the GLS estimator will be biased and inconsistent.

(d) Comparison with Hurwicz bias

A standard regression, where there is just one time series (i.e. just one value of i), in any dynamic model (e.g. with a first-order autoregressive process and a constant term) on a short time series, yields estimates that are more severely biased than those considered in the LSDV case (b) above. The approximation to this standard Hurwicz bias to O(1/T) is given by

(12)

where, for general i,

and

.

To O(1/T) this standard Hurwicz bias can be approximated by , which is larger than that considered above. The basic reason why the bias is smaller in the panel case is that for a panel we compute the bias as N, and we are thus considering E(A)/E(B), where expectations are taken across all i (so we are considering E(Ai)/E(Bi)). The second and third terms additionally appearing in the standard case (12) make the standard bias bigger. Why, then, should we worry so much about biases in panel data models? Because the typical panel is very much smaller in the T dimension than the typical time series.

(e) Bias when N is also small

For small N, the second and third terms in (12) become important; e.g. at N=5, they account for 22% of the total bias (Beggs and Nerlove (1988), “Biases in dynamic models with fixed effects”, Economics Letters, 26, 29-31). But for N that are quite small in a panel context, e.g. N=25, they do not materially alter the estimates.

3. Application: The demand for natural gas (Balestra and Nerlove, 1966)

Balestra and Nerlove (1966) model residential and commercial demand for natural gas in 36 US states during 1957-67. Their model was:

(13)

Let the new demand for gas be G*. New demand for gas includes demand due to net increases in the stock of gas appliances plus demand due to replacement of gas appliances (assume the depreciation rate for gas appliances is r). The demand for gas at time t, , is related to new demand for gas, G*, as follows:

(14)

New demand for gas G* was hypothesised to be determined by the price of gas, P, and total new demand for all types of fuel. Total new demand for fuel was assumed to be related to total fuel consumption by a relation similar to (14), and total fuel consumption was itself assumed to be determined by population N and per-capita income I. Assuming linear relations gives (13), where . The stock of gas appliances was relatively new, which suggested that r would be positive but small, so would be below, but not too much lower than, unity. The model was estimated using OLS, LSDV and GLS. Table 1 gives the resulting estimates.

Coefficient (regressor) / OLS / LSDV / GLS
0 (constant) / -3.650
(3.316) / --- / -4.091
(11.544)
1 () / -0.0451
(0.0270) / -0.2026
(0.0532) / -0.0879
(0.0468)
2 () / 0.0174
(0.0093) / -0.0135
(0.0215) / -0.00122
(0.0190)
3 () / 0.00111
(0.00041) / 0.0327
(0.0046) / 0.00360
(0.00129)
4 () / 0.0183
(0.0080) / 0.0131
(0.0084) / 0.0170
(0.0080)
5 () / 0.00326
(0.00197) / 0.0044
(0.0101) / 0.00354
(0.00622)
6 () / 1.010
(0.014) / 0.6799
(0.0633) / 0.9546
(0.0372)

Table 2: Estimates of Balestra and Nerlove (1966) demand for gas model, pooled sample, 1957-62

(standard errors given in parentheses under coefficient estimates)

  • The OLS coefficient on the lagged dependent variable is incredibly large (1.01), implying a negative depreciation rate. This is in accordance with the expected positive bias in OLS estimators in the presence of a lagged dependent variable and state-specific effects.
  • The addition of 36 state-specific dummy variables resulted in a much-reduced coefficient on the lagged dependent variable. The estimate 0.6799 implies a depreciation rate of over 30%, which now seems implausibly large. (Recall, we expect the bias from the LSDV estimator to be negative.)
  • Two-step GLS assuming the initial observations are fixed give a LDV coefficient of 0.9546, implying a depreciation rate of about 4.5%.

[1] Nickell (1981) calculates the bias for T=10 and =0.5 as -0.167, based on his equation (17) - note that (10) corresponds to his (18).