HETEROSCEDASTICITY
Regression of lnsalary on years of experience for professors- USE DATA 3-11
Original Equation-with Uncorrected HSK
Dependent Variable: LNSALARYSample: 1 222
Included observations: 222
Variable / Coefficient / Std. Error / t-Statistic / Prob.
C / 3.809365 / 0.041338 / 92.15104 / 0.0000
YEARS / 0.043853 / 0.004829 / 9.081645 / 0.0000
YEARS2 / -0.000627 / 0.000121 / -5.190657 / 0.0000
R-squared / 0.536179 / Mean dependent var / 4.325410
Adjusted R-squared / 0.531943 / S.D. dependent var / 0.302511
S.E. of regression / 0.206962 / Akaike info criterion / -0.299140
Sum squared resid / 9.380504 / Schwarz criterion / -0.253158
Log likelihood / 36.20452 / F-statistic / 126.5823
Durbin-Watson stat / 1.434005 / Prob(F-statistic) / 0.000000
To carry out the White’s Test: under equation above, imtest, white
To carry out a Breusch-Pagan test: hettest
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
. hettest
Ho: Constant variance
Variables: fitted values of lnsalary
chi2(1) = 16.20
Prob > chi2 = 0.0001
White Heteroskedasticity Test:
. imtest, white
White's test for Ho: homoskedasticity
against Ha: unrestricted heteroskedasticity
chi2(4) = 20.00
Prob > chi2 = 0.0005
An important command to remember is rvfplot. This command generates the above graph when used after regression (reg) command and provides a visual check on the existence of heteroscedasticity.
Interpretation of the Formal tests: Based on the p-values of both tests (White and B-P above) which are less than alpha (of say 5%), we conclude that there is substantial amount of HSK in the model. Alternatively, we can use the direct approach and test for HSK by using three different specifications for the error variances:
Breusch-Pagan: Auxillary Regression (a)
reg uhatsq years yearssq
Source SS df MS Number of obs = 222
F( 2, 219) = 8.84
Model .080454658 2 .040227329 Prob > F = 0.0002
Residual .996377813 219 .00454967 R-squared = 0.0747
Adj R-squared = 0.0663
Total 1.07683247 221 .004872545 Root MSE = .06745
uhatsq Coef. Std. Err. t P>t [95% Conf. Interval]
years .0060837 .0015737 3.87 0.000 .0029821 .0091853
yearssq -.0001288 .0000394 -3.27 0.001 -.0002065 -.0000512
_cons -.011086 .0134726 -0.82 0.411 -.0376386 .0154665
Glesjer: Auxillary Regression (b)
reg absuhat years yearssq
Source SS df MS Number of obs = 222
F( 2, 219) = 16.40
Model .476518221 2 .23825911 Prob > F = 0.0000
Residual 3.18104824 219 .014525334 R-squared = 0.1303
Adj R-squared = 0.1223
Total 3.65756646 221 .016550074 Root MSE = .12052
absuhat Coef. Std. Err. t P>t [95% Conf. Interval]
years .0143923 .0028119 5.12 0.000 .0088503 .0199342
yearssq -.0002975 .0000704 -4.23 0.000 -.0004362 -.0001588
_cons .0312025 .0240727 1.30 0.196 -.0162413 .0786462
Harvey-Godfrey: Auxillary Regression (c)
. reg lnusq years yearssq
Source SS df MS Number of obs = 222
F( 2, 219) = 20.71
Model 133.956216 2 66.9781081 Prob > F = 0.0000
Residual 708.415287 219 3.234773 R-squared = 0.1590
Adj R-squared = 0.1513
Total 842.371503 221 3.81163576 Root MSE = 1.7985
lnusq Coef. Std. Err. t P>t [95% Conf. Interval]
years .235562 .0419628 5.61 0.000 .1528595 .3182645
yearssq -.0047765 .0010503 -4.55 0.000 -.0068465 -.0027065
_cons -6.562664 .3592388 -18.27 0.000 -7.270672 -5.854656
If we compute the N. Rsq (aux) of all these three models, we will see that at the appropriate critical value, we will reject homoscedasticity in favor of HSK (confirming the p-value approach of the two formal tests previously reported).
Since the chi-square statistic has a p-value less than 10%, reject the null hypothesis of homoscedasticity in favor of Heteroscedasticity. Hence, HSK is present in the model.
Correcting for Heteroscedasticity
The easiest way is to use the command reg dept. variables indept. variables, robust. This yields HSK corrected robust standard errors (especially if the structure of HSK is unknown). This is White Heteroskedasticity-Consistent Standard Errors & Covariance method.
Below is the result of such a correction using the robust option.
. reg lnsalary years yearssq, robust
Regression with robust standard errors Number of obs = 222
F( 2, 219) = 216.87
Prob > F = 0.0000
R-squared = 0.5362
Root MSE = .20696
Robust
lnsalary Coef. Std. Err. t P>t [95% Conf. Interval]
years .0438528 .0043609 10.06 0.000 .0352582 .0524475
yearssq -.0006273 .0001179 -5.32 0.000 -.0008597 -.000395
_cons 3.809365 .026119 145.85 0.000 3.757889 3.860842
If the structure is known, following a regression of uhatsq on known factors causing HSK, then appropriate transformation of variables is called for (try this approach in your HW#3 for data 8-3 and compare with robust option)
Estimation by FGLS under HSK disturbances
a) Breusch-Pagan Specification
1. Regress the original model, calculate uhat and uhatsq, then regress uhatsq on known factors causing HSK. This is the auxiliary regression.
2. Then type: “predict yhat” to get the estimated (fitted) values of uhatsq.
3. Weight (w)used in FGLS estimation is the inverse of the squared root of the yhat (the fitted uhatsq)
Problem: No guarantee that the yhat will be positive, may not take the square root. If this situation arises, treat negative values as positive (by taking the absolute value) and then take the square root.
4. Multiply each variable with the weights (w), including the constant.
5. Regress: reg Yw w X1w X2w, no constant (suppressing the constant term) where Yw is the product of the dependent variable and the inverse of the squared root of the yhat (the fitted uhatsq)
b) Glesjer Specification
1. Regress the original model, calculate uhat and absuhat (by gen absuhat=abs(uhat)), then regress absuhat on known factors causing HSK. This is the auxiliary regression.
2. Then type: “predict yhat2” to get the estimated (fitted) values of absuhat.
3. Weight (w)used in FGLS estimation is the inverse of the yhat2 (the fitted absuhat)
Problem: No guarantee that the yhat will be positive, may not take the square root. If this situation arises, treat negative values as positive (by taking the absolute value).
4. Multiply each variable with the weights (w), including the constant.
5. Regress: reg Yw w X1w X2w, no constant (suppressing the constant term) where Yw is the product of the dependent variable and the inverse of the squared root of the yhat (the fitted absuhat)
c) Harvey-Godfrey Specification
1. Regress the original model, calculate uhat and lnuhat (by gen lnhat=log(uhat)), then regress lnuhat on known factors causing HSK. This is the auxiliary regression.
2. Then type: “predict yhat3” to get the estimated (fitted) values of lnuhat.
3. Weight (w)used in FGLS estimation is the inverse of the squared root of the antilog of yhat3 (the fitted lnuhat). Taking anti-log means to “exponentiate.”
There is no problem of negative values because exponentiation generates only positive values.
4. Multiply each variable with the weights (w), including the constant.
5. Regress: reg Yw w X1w X2w, no constant (suppressing the constant term) where Yw is the product of the dependent variable and the inverse of the squared root of the yhat (the fitted lnuhat)
All these methods are alternative ways of correcting for HSK.