HETEROSCEDASTICITY

Regression of lnsalary on years of experience for professors- USE DATA 3-11

Original Equation-with Uncorrected HSK

Dependent Variable: LNSALARY
Sample: 1 222
Included observations: 222
Variable / Coefficient / Std. Error / t-Statistic / Prob.
C / 3.809365 / 0.041338 / 92.15104 / 0.0000
YEARS / 0.043853 / 0.004829 / 9.081645 / 0.0000
YEARS2 / -0.000627 / 0.000121 / -5.190657 / 0.0000
R-squared / 0.536179 / Mean dependent var / 4.325410
Adjusted R-squared / 0.531943 / S.D. dependent var / 0.302511
S.E. of regression / 0.206962 / Akaike info criterion / -0.299140
Sum squared resid / 9.380504 / Schwarz criterion / -0.253158
Log likelihood / 36.20452 / F-statistic / 126.5823
Durbin-Watson stat / 1.434005 / Prob(F-statistic) / 0.000000


To carry out the White’s Test: under equation above, imtest, white

To carry out a Breusch-Pagan test: hettest

Breusch-Pagan / Cook-Weisberg test for heteroskedasticity

. hettest

Ho: Constant variance

Variables: fitted values of lnsalary

chi2(1) = 16.20

Prob > chi2 = 0.0001

White Heteroskedasticity Test:

. imtest, white

White's test for Ho: homoskedasticity

against Ha: unrestricted heteroskedasticity

chi2(4) = 20.00

Prob > chi2 = 0.0005

An important command to remember is rvfplot. This command generates the above graph when used after regression (reg) command and provides a visual check on the existence of heteroscedasticity.

Interpretation of the Formal tests: Based on the p-values of both tests (White and B-P above) which are less than alpha (of say 5%), we conclude that there is substantial amount of HSK in the model. Alternatively, we can use the direct approach and test for HSK by using three different specifications for the error variances:

Breusch-Pagan: Auxillary Regression (a)

reg uhatsq years yearssq

Source SS df MS Number of obs = 222

F( 2, 219) = 8.84

Model .080454658 2 .040227329 Prob > F = 0.0002

Residual .996377813 219 .00454967 R-squared = 0.0747

Adj R-squared = 0.0663

Total 1.07683247 221 .004872545 Root MSE = .06745

uhatsq Coef. Std. Err. t P>t [95% Conf. Interval]

years .0060837 .0015737 3.87 0.000 .0029821 .0091853

yearssq -.0001288 .0000394 -3.27 0.001 -.0002065 -.0000512

_cons -.011086 .0134726 -0.82 0.411 -.0376386 .0154665

Glesjer: Auxillary Regression (b)

reg absuhat years yearssq

Source SS df MS Number of obs = 222

F( 2, 219) = 16.40

Model .476518221 2 .23825911 Prob > F = 0.0000

Residual 3.18104824 219 .014525334 R-squared = 0.1303

Adj R-squared = 0.1223

Total 3.65756646 221 .016550074 Root MSE = .12052

absuhat Coef. Std. Err. t P>t [95% Conf. Interval]

years .0143923 .0028119 5.12 0.000 .0088503 .0199342

yearssq -.0002975 .0000704 -4.23 0.000 -.0004362 -.0001588

_cons .0312025 .0240727 1.30 0.196 -.0162413 .0786462

Harvey-Godfrey: Auxillary Regression (c)

. reg lnusq years yearssq

Source SS df MS Number of obs = 222

F( 2, 219) = 20.71

Model 133.956216 2 66.9781081 Prob > F = 0.0000

Residual 708.415287 219 3.234773 R-squared = 0.1590

Adj R-squared = 0.1513

Total 842.371503 221 3.81163576 Root MSE = 1.7985

lnusq Coef. Std. Err. t P>t [95% Conf. Interval]

years .235562 .0419628 5.61 0.000 .1528595 .3182645

yearssq -.0047765 .0010503 -4.55 0.000 -.0068465 -.0027065

_cons -6.562664 .3592388 -18.27 0.000 -7.270672 -5.854656

If we compute the N. Rsq (aux) of all these three models, we will see that at the appropriate critical value, we will reject homoscedasticity in favor of HSK (confirming the p-value approach of the two formal tests previously reported).

Since the chi-square statistic has a p-value less than 10%, reject the null hypothesis of homoscedasticity in favor of Heteroscedasticity. Hence, HSK is present in the model.

Correcting for Heteroscedasticity

The easiest way is to use the command reg dept. variables indept. variables, robust. This yields HSK corrected robust standard errors (especially if the structure of HSK is unknown). This is White Heteroskedasticity-Consistent Standard Errors & Covariance method.

Below is the result of such a correction using the robust option.

. reg lnsalary years yearssq, robust

Regression with robust standard errors Number of obs = 222

F( 2, 219) = 216.87

Prob > F = 0.0000

R-squared = 0.5362

Root MSE = .20696

Robust

lnsalary Coef. Std. Err. t P>t [95% Conf. Interval]

years .0438528 .0043609 10.06 0.000 .0352582 .0524475

yearssq -.0006273 .0001179 -5.32 0.000 -.0008597 -.000395

_cons 3.809365 .026119 145.85 0.000 3.757889 3.860842

If the structure is known, following a regression of uhatsq on known factors causing HSK, then appropriate transformation of variables is called for (try this approach in your HW#3 for data 8-3 and compare with robust option)

Estimation by FGLS under HSK disturbances

a)  Breusch-Pagan Specification

1.  Regress the original model, calculate uhat and uhatsq, then regress uhatsq on known factors causing HSK. This is the auxiliary regression.

2.  Then type: “predict yhat” to get the estimated (fitted) values of uhatsq.

3.  Weight (w)used in FGLS estimation is the inverse of the squared root of the yhat (the fitted uhatsq)

Problem: No guarantee that the yhat will be positive, may not take the square root. If this situation arises, treat negative values as positive (by taking the absolute value) and then take the square root.

4.  Multiply each variable with the weights (w), including the constant.

5.  Regress: reg Yw w X1w X2w, no constant (suppressing the constant term) where Yw is the product of the dependent variable and the inverse of the squared root of the yhat (the fitted uhatsq)

b) Glesjer Specification

1.  Regress the original model, calculate uhat and absuhat (by gen absuhat=abs(uhat)), then regress absuhat on known factors causing HSK. This is the auxiliary regression.

2.  Then type: “predict yhat2” to get the estimated (fitted) values of absuhat.

3.  Weight (w)used in FGLS estimation is the inverse of the yhat2 (the fitted absuhat)

Problem: No guarantee that the yhat will be positive, may not take the square root. If this situation arises, treat negative values as positive (by taking the absolute value).

4.  Multiply each variable with the weights (w), including the constant.

5.  Regress: reg Yw w X1w X2w, no constant (suppressing the constant term) where Yw is the product of the dependent variable and the inverse of the squared root of the yhat (the fitted absuhat)

c)  Harvey-Godfrey Specification

1.  Regress the original model, calculate uhat and lnuhat (by gen lnhat=log(uhat)), then regress lnuhat on known factors causing HSK. This is the auxiliary regression.

2.  Then type: “predict yhat3” to get the estimated (fitted) values of lnuhat.

3.  Weight (w)used in FGLS estimation is the inverse of the squared root of the antilog of yhat3 (the fitted lnuhat). Taking anti-log means to “exponentiate.”

There is no problem of negative values because exponentiation generates only positive values.

4.  Multiply each variable with the weights (w), including the constant.

5.  Regress: reg Yw w X1w X2w, no constant (suppressing the constant term) where Yw is the product of the dependent variable and the inverse of the squared root of the yhat (the fitted lnuhat)

All these methods are alternative ways of correcting for HSK.