Agricultural and Applied Economics 637
Applied Econometrics II
Assignment 1
Review of GLS: Heteroscedasity and Autocorrelation
In this assignment, you are asked to develop relative simple models that account for error terms in a linear regression model that are either heteroscedastic or autocorrelated. In answering these questions you are expected to develop the necessary GAUSS code for estimation and hypothesis testing. Please hand in all code developed for this assignment used in parameter estimation or hypothesistesting along with resulting output files.
1.(30 pts) Assume it is 2006 and you are a health economist working for the U.S. Department of Health and Human Services. As such, you have been asked to investigate the relationship between coronary heart disease and lifestyle choices of the U.S. population. To that end, you have collected time series data for the entire U.S. population encompassing the 1947-2005 period. This data is contained in the GAUSS data set, CORONARY.zip. The following table contains an overview of the dataset variables to be used in your analysis.
Description of a Subset of Variables in the CORONARY Dataset
Variable / Description / UnitsCHD / Rate of death per 100,000 of population due to coronary heart disease / #
CIGS_LB / Per capita consumption of tobacco by persons 18 yrs. of age or older (approx. 339 cigs./lb) / Lbs.
REDMEAT / Per capita consumption of beef, veal, pork, lamb, and mutton / Lbs.
PLTFSH / Per capita consumption of poultry and fish / Lbs.
BEER / Per capita consumption of beer by persons 18 years of age or older / Gallons
OTHALC / Sum of per capita distilled spirits and wine consumption by persons 18 years of age or older / Gallons
a.(5 pts) What are your hypotheses concerning the influences of the consumption of the above foods and alcoholic beverages on the incidence of CHD in the U.S. population?
b.(10 pts) To evaluate these hypotheses you decide to estimate the following model:
ln(CHDt) = α + β1ln(CIGS_LBt) + β2 ln(REDMEATt) + β3 ln(PLTFSHt) + [1.1]
β4ln(BEERt) + β5 ln(OTHALCt) +β6Trendt + εt
where TREND is a trend variable equal to 1 for 1947, 2 for 1948, etc, εt~(0,σ2) and (t=1947,…,2005). The TREND variable is included to account for technological advances in the ability to reduce death rate due to CHD, ceteris paribus (Hint: You may want to use the seqaGAUSS command to create theTrend variable).
Using the above GAUSS dataset estimate the unknown parameters of the regression model represented by [1.1] using a modified version of the GAUSS CRM procedure we reviewed last semester and available in the Assignments section of the class website. That is, in addition to the typical regression summary statistics, coefficient estimates, coefficient standard errors, equation F-statistic, etc., modify your code so that it automatically calculates and displays (i) the Durbin-Watson statistic, (ii) the ρ value assuming AR(1), (iii) the asymptotic standard normal test statistic for testing for AR(1) using your estimate of ρ, and (iv) generation of the Lagrange Multiplier (LM) test statistic for the AR(1) process.[1]
b.(10 pts) What is the DW statistic for this model? Does this statistic indicate the presences of an AR(1) error structure? Using an asymptotic test, what does the estimated ρ value indicate in terms of the presence of autocorrelation? What does the LM test statistic indicate with respect to the presence of autocorrelation?
c.(5 pts) Assuming there is autocorrelation, modify the above code used for estimation to present not only the traditional CRM parameter estimates and associated biased CRM parameter standard errors but also the CRM inefficient but unbiased parameter standard errors. Are there any major differences in standard error estimates?
d.(5 pts)Using the CRM procedure developed in (c) undertake hypothesis tests of the role of cigarette use, food consumption patterns and the consumption of alcoholic beverages on the rate of CHD (individually) using the unbiased standard error estimates. Do your results make sense?
- (20 pts) I would like you to extend the CRM procedure you developed above to undertake a non-iterative, two-stepFeasible Generalized Lease Squares (FGLS) estimation assuming that AR(1) does exist. The file ar_1_general_algorithm.pdf contains in words, a general algorithm for undertaking a FGLS estimation of the AR(1) model. The following diagram parallels this description:
The GAUSS AR(1) procedure you develop should enable you to estimate an AR(1) model with a invoking command that looks something like the following:
{bgls,covbgls, d_w_stat, rho_hat}=auto_ar1(rhsvar,depend);
The procedure auto_ar1 is a procedure that you define that takes two arguments, the matrix of explanatory variables, rhsvar (that may or may not include a vector of ones depending on how you define your procedure) and a vector that identifies your dependent vector, depend. There are four returns to this procedure, the vector of estimated coefficients, bgls, the GLS parameter covariance matrix, covbgls, an estimate of the DW calculated from the CRM residuals, d_w_statand an estimate of ρ, rho_hat. Make sure that your AR procedure reports both the CRM results (which includes both the traditional but incorrectCRM parameter covariance matrix and the modified but inefficient CRM covariance matrix under AR(1)), the results of your AR(1) test and the resulting GLS estimates, standard errors, t-values, etc.[2] [NOTE: As with the development of your CRM procedure, you should design your procedure for use with any data set. The matrices, rhsvar and depend are defined by you in the GAUSS code before you call out the procedure and they can be named anything you want. For example in another application you may call out your AR(1) procedure via the following
{ar1_b,ar1_covb, d_w_est, rho_hat}=auto_ar1((age~income~kids),foodexp);
That is, your procedure knows that the 1st argument to the auto_ar1 procedure call is the matrix of exogenous variables and the 2nd argument is the endogenous variable whose variance we are attempting to explain.]
I would like you to apply your AR(1) procedure to the coronary dataset. Given the above,obtain feasible generalized least squares [AR(1)] estimates of the parameters of the relationship represented by [1.1]. Report your GLS regression results and compare these results to those obtained under the CRM. Calculate and display the squared correlation between the predicted and actual values of CHD (not logarithm) both under the CRM as well as the GLS-based model. Which one does better?
f.(5 pts) Compareyour hypothesis tests results obtained with respect to the role of cigarette use, food consumption patterns and the consumption of alcoholic beverages on the rate of CHD (individually) under the CRM (with correct but inefficient coefficient covariance matrix)and the FGLS results. Are there any differences?
g.(5 pts) Undertake a joint hypothesis test that the amount of total meat (red meat and fish) consumed has no impact on CHD using the FGLS results. What are the results of your joint test? Has alcohol consumption impacted the rate of CHD?
2.As you have learned, there are a variety of ways to control for the effect of heteroscedasticity in the estimation of a linear regression model. A common approach is referred to as the multiplicative heteroscedasticity specification. Under this approachwe have yt = Xtβ + εt where E(εt2)= σt2 = exp(Ztα) where Zt is a (1 x S) vector containing the tth observation on S nonstochastic explanatory variables and α is a (S x 1) parameter vector. The following provides a method for obtaining FGLS estimates of the unknown parameters using above structure:
(i)Use CRM to obtain consistent estimates of error term. That is, βS=(X'X)-1 X′y continues to be a consistent estimator of β even with multiplicative heteroscedasticity. Thus we have: es=y – Xβs is a consistent estimate of the true, unknown error vector.
(ii)→ ln(σt2)= Ztα given σt2 = exp(Ztα)
(iii)→ ln(est2)+ ln(σt2)= ln(est2)+ Ztα
(iv)→ln(est2) = Ztα+νt where νt≡ ln(est2) - ln(σt2)
(That is, one can treat (iv) as a traditional linear regression model)
(v)→αs=(Z'Z)-1Z' ln(est2)
It can be shown that in (v) αs0 is an inconsistent estimator of the intercept term with an inconsistency of -1.2704 →a consistent estimator of the intercept can be obtained by calculating: αs0+ 1.2704. Also the matrix 4.9348(Z′Z)-1 can be used to approximate the (S x S) covariance matrix of αs. Testing this model as an alternative to one with homoscedastic errors is equivalent to testing the null hypothesis H0: α*=0 against the alternative H1: α*≠ 0 where α* is an ((S-1) x 1) parameter vector that excludes the intercept term. Let D be the matrix of (Z′Z)-1 with its first row and column deleted. This implies that α*S~ N(α*,4.9348D). This implies that we can test the null hypothesis H0: α*=0 via the following statistic ( Γ ) which has a χ2 distribution with (S-1) df: . Given the above, we have the following flowchart of how one can estimate the parameters of a model that is characterized by multiplicative heteroscedaticity using a two-step FGLS approach:
where
Lets revisit the airline cost data provided by Greene, Greene_Airline. If you remember, this data set consists of cost information for 6 airline firms over the 1970-1984 period which implies there are a total of 90 observations in the data set. The variables of this data set are given below:
VariableVariable Name
Total Cost ($1,000)Tot_Cost
Year (1 – 15)Year
Output (An Index of Revenue Passenger Miles)Output
Load Factor (Avg. Capacity Utilization)Load_Fac
Fuel PriceFuel_pr
Airline ID(1 – 6)Airline
(a)(10 pts) With this data as a base, use your CRM GAUSS code to estimate the following model:
ln(Tot_Cost) = β0 + β1ln(Output) + β2ln(Fuel_pr) + β3Year + Σ6j = 2 γjDj
where Dj is a dummy variable =1 for the jth airline, 0 otherwise. Report your standard statistical results. Using Excel or whatever software you feel comfortable with, plot the vector of errors obtained from the above model against the Output variable. Also plot these residuals against the Load Factor variable. Do you see any relationship that would indicate that you have heteroscedastic errors?
(b)(5 pts) Modify your CRM procedure to calculate White’s heteroscedastic consistent estimate of parameter standard errors which we reviewed in Lab #3 last semester. How do these standard errors compare with the traditional CRM standard errors?
(c)(20 pts) Assume that if you have heteroscedastic errors that you will account for this heteroscedasticity via the multiplicative heteroscedasticity specification. Also let’s assume that the error variance is impacted by the level of output of each airline (OUTPUT) and itsload factor (LOAD_FAC). Using the above flowchart as a template, modify the CRM procedure used to answer (b) to undertake a two-step estimation procedure that calculates the consistent parameters of the above multiplicative heteroscedasticity model and parameter standard errors associated with the error variance component of the model. Is there statistical evidence that we have multiplicative heteroscedasticity? (Hint: Use the above χ2 test).
(d) (5 pts) Given the above error structure, modify the above CRM code further to correctly calculate CRM standard errorswhen we have multiplicative heteroscedasticity [As noted above, the CRM parameter covariance matrix is not σ2(X′X)-1]. How do these standard errors compare with the traditional CRM formulation and with White’s heteroscedasticly consistent standard errors?
(e) (10 pts) Based on the information in (c), estimate the Feasible GLS regression parameter estimates and associated standard errors given the multiplicative error structure. How do the t-statistics obtained under the FGLS model compare with the values correctly calculated under the CRM and with White’s standard errors?
(Hint: By the time you answer (e) you should have a GAUSS program that can be used to estimate the parameters of the multiplicative heteroscedastic model for any size regression model, calculate a number of regression statistics, and undertake a variety of hypothesis tests.)
1
[1]Under suitable conditions, it can be shown that will be approximately normally distributed with mean ρ and variance (1-ρ2)/T. If the null hypothesis that ρ=0 is true the variance becomes 1/T. You can therefore use a Z-statistic to test the above hypothesis. With respect to the Lagrangian multiplier statistic, refer to the file LM_auto.ppt for more information.
[2] In general when we have error terms that are characterized as being autocorrelated or heteroscedastic, the correct formulas to use to estimate the CRM parameter covariance matrix is not σ2(X′X)-1 but rather the following: Σβ=(X′X)-1X′ΦX(X′X)-1=σ2(X′X)-1X′ΨX(X′X)-1