Economics 515 Assignment #1 Professor Thornton
Econometrics 100 Points Winter 2018
This homework is due Wednesday, February 7 and covers the material on the classical linear regression model, hypothesis testing, general linear regression model and analysis of experimental data. When reporting results, you may cut and paste results from Stata’s results window. To copy from Stata’s results window, highlight the results, right click the mouse, and then click copy text. When you paste the results into your Word document, use the 9-point, courier new font for tables, but use an 11-point or 12-point font for your written answers. When answering each question, copy the question on your answer sheet and then provide the answer. Each question is 5 points.
QUESTION #1
You have been hired as a consultant by the U.S. Department of Energyto analyze factors that affect the price of gasoline. The factors in which it is most interested are the excise tax on gasoline and consumer income. An excise tax is a tax that is collected directly from the producer. Both the federal government and each of the 50 state governments impose excise taxes on gasoline. The federal excise tax is the same for all states, but may change over time. The state excise tax differs across states and may also change over time. The Energy Department wants to know if each of these factors has an effect on the price of gasoline, the direction of the effects, the size of the effects, and which one has the biggest effect.
Data
The data for your study are contained in the data set gas6908. This file hasannual data on 48 States for the period 1969 to 2008 for a total of 1,920 observations. The variables are: id is state id number; state is state name; year is year; pgas is the nominal gas price in cents per gallon; gastax is the total gas tax in cents per gallon.; incap is nominal income per capita in thousands of dollars; poil is the nominal price of oil in dollars per barrel; driverscap is the number of licensed drivers per capita; urban is the percent of the population that lives in a metropolitan area; railpop is the percent of the population living in metro areas with rail transit; rdmilesis the number of miles of roads; demgov takes a value of 1 if a state has a democrat governor and 0 otherwise.
Statistical Model
You begin by specifying the following regression model.
(1) lpgast = β1 + β2lgastaxt + β3lincapt + μt
where lpgast is the natural logarithm of the price of gasoline, lgastaxt is the natural logarithm of the gasoline tax, and lincaptis the natural logarithm of income. You assume that the error term has a normal distribution, mean zero, constant variance, the errors are uncorrelated, and the error term is not correlated with the explanatory variables. As a result, your statistical model is a classical linear regression model.
You begin by estimating this model for the most recent year in your data file, 2008. To do this, add the qualifier if year==2008 to the end of the regress command. Do not separate by a comma.
1. Estimate the model using the OLS estimator and the data for year 2008. Report your results.
Interpret the estimates of β2 and β3.
2. Choose a measure of precision and assess the precision of the estimates of β2 and β3. Are your estimates relatively precise or imprecise? Explain.
3. Choose a measure of the strength of evidence that the gasoline tax and income each has an effect on gasoline price. Interpret this measure. Is the evidence that each variable has an effect relatively strong or weak? Explain.
4. Test the null hypothesis that the gasoline tax and income have no joint effect on gasoline price. Interpret the result.
5. Do you believe your estimates of β2 and β3 are good estimates or poor estimates of the effect of the gasoline tax and income on gasoline price? Carefully explain.
Next, you decide to re-estimate model (1) using all 1,920 observations for the 48 states for the period 1969 to 2008.
6. Estimate the model using the OLS estimator and the data for the full sample of 1,920 observations. Report your results. Interpret the estimates of β2 and β3. Are the estimates of β2 and β3for the full sample similar to or noticeably different from those you reported in part 2 for the year 2008 sample? Explain.
7. Are the estimates of β2 and β3 for the full sample more precise or less precise that those for the year 2008 sample? What do you think is the main reason why the precision of the estimates is different for the two samples? Explain.
8. Once again, assess the strength of evidence that the gasoline tax and income each has an effect on gasoline price. Is there relative strong evidence, moderate evidence, weak evidence, or little or no evidence that each variable has an effect?
9. Do a White test for heteroscedasticity. Interpret the result. Re-estimate the model with White robust standard errors. Report the results.
You are concerned that your estimates of β2 and β3 may be biased because of omitted confounding variables. You decide to estimate the following model.
(2) lpgasit = β1 + β2lgastaxit + β3lincapit + β4lpoilt + β5ldriverscapit + β6urbanit + β7demgovit+ μit
The subscriptit indicates the observed value of a variable for the ith state in the tth year. A variable with t subscript only indicates it varies over time but not across states.
The variables lpoil (natural log of the price of oil), ldriverscap (naturallog of drivers per capita),urban (percent of population that lives in a metropolitan area, not in log form), and demgov are included as potential confounding variables.
10. What is a confounding variable? What requirements must a variable satisfy to be a confounding variable? Make an argument that the number of drivers per capita is a confounding variable for the gasoline tax.
11. Estimate model (2) using the OLS estimator and White robust standard errors. Report the results.
Interpret the estimates of the coefficients of lpoil, ldriverscap, urban and demgov. Do they provide evidence that these variables are confounding variables? Yes or no. Explain.
12. Compare the estimates of β2 and β3 for the full sample from model (1) that excludes control variables to model (2) that includes control variables. Are the estimates similar or noticeably different? If noticeably different, then explain what you believe resulted in different estimates? Does a comparison of the estimates of β2 and β3 for the two models provide additional evidence that lpoil, ldriverscap, urban, and demgov are confounding variables? Yes or no. Explain.
You are trying to decide whether to include two additional potential confounding variables in the model: lrdmiles (natural log of number of miles of roads) and railpop (percent of the population that lives in a metropolitan area with rail transit, not in log form). You decide to perform two tests to help you make your decision.
13. Do a Lagrange multiplier test and test the null hypothesis that railpop and lrdmiles should not be included in model (2) against the alternative hypothesis that at least one should be included in model (2). Interpret the result.
14. Include the variables railpop and lrdmiles in model (2). Do an F-test and test the null hypothesis that railpop and lrdmiles should be excluded from the model against the alternative hypothesis that at least one should be included. Interpret the result.
15. Which test is more appropriate, the Lagrange multiplier test or the F-test? Carefully explain.
16. Choose what you believe is your best model. Use the results from this model to address the questions of interest of the Department of Energy.
QUESTION #2
You have received a large grant from the state of Texas to analyze whether class size has a causal effect on the academic performance of middle school children (grades 7 and 8). You have received parental consent to randomly assign 5,786 children to two groups: small class size and large class size. Students are assigned to these two groups at the beginning of 7th grade. At the end of 8th grade, these students are given a standardized test to measure their reading and math aptitude. The maximum possible score on this test is 1,300 (650 on reading and 650 on math) and the minimum possible score is 600 (300 on reading and 300 on math).
Data
The data for your study are contained in the data set classsize. The variables are: score is the score on the standardized test; small takes a value of 1 if the student is assigned to a small class and 0 if the student is assigned to a large class; boy takes a value of 1 if the student is a boy and 0 if the student is a girl; rural takes a value of 1 if the school is located in a rural area and 0 if the school is located in an urban area; texp is teacher years of experience.
Statistical Model
To analyze the data, you begin by specifying the following single treatment regression model .
(1) scoret = β0 + β1smallt + μt
1. Estimate this model using the OLS estimator. Report the results. Interpret the estimates of β0 and β1. Test the hypothesis that β1 = 0. What conclusion can you draw from this test?
Next, you decide to include an interaction variable between small and boy, and the variables rural and texp in model (1).
(2) scoret = β0 + β1smallt + β2smboyt + β3ruralt + β4texpt + μt
where smboy is the interaction variable.
2. Why did you include the variable smboy in the model? Why did you include the variables rural and texp in the model? Explain.
3. Create the new variable smboy, which is the product of small and boy. Estimate model (2) using the OLS estimator. Report the results. Interpret the estimates of β0, β1, β2. Test the hypothesis: β1= 0 and β2 = 0. What conclusion can you draw from this test? Test the hypothesis: β2 = 0. What conclusion can you draw from this test?
4. Use model (2) to address the following. Does class size have a causal effect on academic performance of middle school children? If so, what is the direction of the effect? What is the size of the effect? Is the effect the same on all middle school children? If there is an effect, do you think it is relatively big or small? Justify your answer.
1