ECO420, Fall 2013, Second homework assignment.
Prof. Bill Even
The assignment is due via email submission by 5 p.m. on Wednesday10/30 (20 point penalty per day, or any part thereof, late). Insert all your answers in this Word document, leaving the original questions in place. Be sure to provide both the relevant Stata code and results for each answer.
1. (50 points) A data set named Mroz.dta is included in g:\eco\evenwe\eco420. A description of the variables contained in the data set is contained in the file Mroz_descr.txt.
a. Estimate a log(wage) equation as a function of the person’s age, years of education, experience, and experience2 using OLS. Re-estimate a log(wage) equation with the same controls, except allow for education to be an endogenous variable by using IVREG2 (depending on which version of Stata you are using, you may have to download download IVREG2 – go to help, search net resources, search for IVREG2, and then install it). Use mother’s education and father’s education as instruments for a person’s own education. Use outreg2 to put the OLS and IV estimates in a single table. Clearly identify the columns.
b. Based on how the coefficients change from the OLS to the IVREG2 results, what can you conclude about the nature of the endogeneity problem? Explain.
c. IVREG2 automatically generates a Cragg-Donald Wald F-statistic of “weak identification”. Explain how this test statistic is generated. Given the results for this example, what conclusion can be drawn from the resulting test statistic for this particular empirical problem? Explain.
d. IVREG2 automatically generates a Sargan statistic. Read about the Sargan statistic in the IVREG2 help under the section titled “testing overidentifying restrictions”and in chapter 15 of Wooldridge. State precisely what hypothesis the Sargan statistic is testing, how the test statistic is calculated, and provide a brief description of what the results imply for this empirical problem.
e. Use the OLS and IVREG2 estimates to calculate a Hausman test of the hypothesis that education is exogenous. (See help on Hausman in stata). Interpret the results.
2. (50 points) For this problem, you will be using panel data extracted from the panel study of income dynamics between 1980 and 1997. The data set is g:\eco\evenwe\eco420\psid.dta. It contains the following variables:
earnings = annual earnings
white, black, othrace = dummies indicating whether the person's race is white, black, or some other race.
educ=years of education (-1 implies missing value; you may delete these observations)
age=age in years
married=dummy indicating whether a person is married
female=dummy indicating whether person is female
year=year of observation
id=id number that identifies each person
It is important to note that people could be in the panel for as many as 18 years and as little as 1 year. This is referred to as an "unbalanced" panel since the number of years of observations is not the same for all individuals.
For this problem, you will use the xtreg procedure in stata. xtreg allows you to estimate random effects and fixed effects models (among others). It is important to note that before proceeding with xtreg, you must identify the variables that identify the time period and the group. That is, if the model is written as
yit = xitb + vi + uit
the t-subscript is identified by the year variable, and the i subscript is identified by the id variable. In this case, you would tell stata what indexes i and t by executing
xtset id year
Check out the xtreg procedure to see how you would estimate a fixed or random effects model.
a. Estimate earnings as a function of education, race, age, marital status, gender and the year of the observation using:[1]
i. ols
ii. random effects (RE)
iii. fixed effects (FE)
Output the results to a single table using the outreg2 command and clearly identify the 3 specifications.
b. Why are some variables automatically dropped in the FE model? Provide an econometric justification for this.
c. Re-estimate the FE model by creating “deviations from individual specific means”.[2] Recall that this model should not have an intercept included (see the noconstant option in reg.) Demonstrate that you get the same slope coefficients on the variables.
d. Compare the RE and the FE coefficient estimate for the married variable. Given the implied bias in the RE estimate, what does this tell you about the unobservables of married people?
e. From the FE model, generate predictions of the FE (u in stata ... check out the predict options for xtreg). Compute the correlation between the married variable and the fixed effects. Does this confirm what you observed in part d? why or why not?
f. Compare the standard error of the estimates for the RE and FE estimates. How do they compare? Why should you expect this?
g. Test whether the assumptions necessary for the random effects model are appropriate (check out hausman). Explain the difference in the assumptions of the RE and FE model. If the RE assumptions are inappropriate, why is the FE model preferred? If the RE assumptions are appropriate, why would the RE model be preferred over the FE model? (You can find more info on the Hausman test in xtreg at where one of the examples provided shows how to test for appropriateness of random effects assumption.)
1[1]To control for year, create a dummy variable for each year. A shortcut for this is:
tabulate year, gen(ydum)
This will create dummies for each year, labeled ydum1 through ydum18. To include dummies 2-18 in the regression, you can refer to them as ydum2-ydum18 rather than type out all 17 names.
[2]You can create a variable containing individual specific means for age as follows:
bysort id: egen agemn=mean(age); *creates individual specific means;
Now each observation will have a new variable called “agemn” that contains the individual specific mean of age across their years in the sample.