Instructions for Regression Assignment
This assignment is due by 11:59 p.m. on Tuesday, 11/23/10.
Email me an Excel file that includes your answers to all parts of the assignment. The file name should be your last name and first initial (for example, Mark Johnson’s file would be called johnsonm.xls).
Check your file for viruses! I will mark down your paper by two letter grades if you send me an infected file!
1. Download the file regressionF10.txt from the website. Put this data into an Excel file. Make sure all the data ends up in the correct columns. Name this worksheet “data.”
2. Create a dummy variable based on the variable “MAJOR.” The dummy should equal 1 for any student in a natural science major (physics, biology, or chemistry) and 0 otherwise. Call this variable “SCIENCE.”
2. Perform a multiple regression using the following model:
GPA = α + β1SAT + β2STUDY + β3WORK + β4SCIENCE + ε
Name the worksheet with the regression output “regression 1.” Expand the columns as needed to make the results look nice.
3. In a text box inserted in the “regression 1” worksheet, interpret the regression results. Specifically, you should discuss (a) the meaning of eachslope coefficient, including an explanation of the effect of the variable on GPA and whether the coefficient has the sign you would expect, (b) thestatistical significance of each coefficient, and (c) the explanatory power of the regression as a whole, using both R-squared and F-stat. (Be sure to put your explanations of slope coefficients in terms of the original units of measure, as given in the original data file.)
4. In your regression output, look at the residual plot for STUDY (be sure to expand the graph so you can see it clearly). Do you notice a pattern? In another text box inserted in the regression 1 worksheet, describe the pattern (if any), and what it implies about the effectiveness of studying. Use economic terminology to describe this phenomenon.
5. Create a new variable that is equal to the hours of study squared. Call it STUDYSQ. Perform a second regression, using the same explanatory variables as the first regression and this additional variable. Expand the columns to make the results look nice. Name the worksheet with the new regression output “regression 2.” (a) In a text box inserted into this worksheet, interpret the results of this regression (as in question 3 above, parts a, b, and c). However, don’t worry about the STUDY and STUDYSQ yet. (b) In a second text box in this worksheet, explain the meaning of STUDY and STUDYSQ. What effect will studying one more hour per weekhave for a student who currently only studies 1 hour per week? What effect will it have for a student who already studies 6 hours per week? (b) In a third text box in this worksheet, compare your results from this regression to the previous one. Does the second regression do a better job of explaining differences in GPAs?
6. Think carefully about the possible determinants of a student’s GPA. What other variables do you think might be relevant? How would you go about including them in the regression? (You do not need to perform another regression. Just explain in words the approach you might use.) Create a new worksheet called “analysis,” and put your answer to this question in a text box in that worksheet.
7. Can you think of any other problems or difficulties with the approach we’ve used? Put your answer in asecond text box in the “analysis” worksheet. (Do not repeat your answer to #6, and do not expect full credit for pointing out just one potential problem or difficulty. Refer to the “Regression Difficulties” lecture for further guidance.)