STAT 360 – Regression Analysis – Assignment 5 (70 pts.)
Fall 2017
1 –Used Cars Datafiles: Used-cars.JMP and Used-cars.txt
These data come from a study of the asking price for a sample of used cars. These are fairly old data so the selling prices are very low by today’s standards. Your goal is develop a multiple regression model for the Asking Price ($) using several potential predictors.
The variables in these data are:
- Asking.Price ($)
- this predictor fills in the
- – number of available options the car has.
- – number of miles on the car in 1,000’s (e.g. 115 = 115,000 mi.)
- – price of the car when new ($)
- – remaining loan value
- – average retail price for the car when new
- – car manufacturer (Acura, Audi, …, Volvo, VW)
- –specific model(e.g. ToyotaCorolla, VWGolf, etc.)
- – combined make/model info (e.g. ToyotaCorolla, VWGolf)
a)Find the correlation matrix and comment on the use of the correlation as a measure of linear association between the response and the individual . (3 pts.)
b)Examine a scatterplot matrix of Asking.Price () and the numeric predictors . Comment on anything interesting you find by examining this plot. You should comment/address the following: (8 pts. – 2 pts. each)
- marginal/univariate distributions of .
- relationships between
- relationships between
- any unusual cases
c)Explain why using both and together in a multiple regression model makes no sense. (2 pts.)
d)Examine the partial correlations between the response and each of the .
Which predictor has the strongest adjusted relationship with the response. The weakest adjusted relationship? (2 pts.)
e)Write out the full mean function for the model using all of the numeric predictors in their original scale as the terms in the model, i.e. . (3 pts.)
f)Fit the full model from part (e), give the and interpret. (2 pts.)
g)Comment on the adequacy of this model from an assumption standpoint by examining plots of the residuals (, normal quantile plot of the residuals ), and an NCV plot (. (5 pts.)
h)Are the any unusual cases that stand out in your residual plots? If so, what makes and models do they represent. Note: I have set MakeModel as a label. (3pts.)
i)Which predictors/terms in model are statistically significant at the level? (2 pts.)
j)Conduct a Big F-test for removing the insignificant nonconstant terms (i.e. non-intercept terms) from the full model. Completely specify the for this test, conduct the test, and state your conclusion. Carefully show how you calculated the F-statistic citing all necessary quantities needed to find it. (8 pts.)
k)Perform Backward Elimination using Stepwise regression using P-value Threshold for the Stopping Rule. How does the model chosen compare to the NH model from part (i)? Include the output from the Stepwise Fit. (3 pts.)
l)Interpret each of the coefficients in the final model using proper units and a suitable increment (i.e. a 1-unit increase might be too small to consider for some of the terms in your model). Also find and discuss a 95% CI for each. (6 pts.)
m) How does the for the reduced model compare to the full model? (1 pt.)
n)Which variable has the strongest adjusted relationship with response? Use the AVPs to answer this question and include them below. (3 pts.)
o)Use your model to estimate the asking price for a car that was $10,000 when new, has 25,000 miles on it, and the average retail is $11,000. (2 pts.)
p)Construct a bar graph for the Make of the car (. Explain why using Make as factor variable in our model is probably a bad idea. Hint: How many dummy variable terms would be added to the model if we used Make in our model? (3 pts.)
2 - Car Purchase Price Datafiles: Car Purchase.JMPand Car Purchase.txt
These data come from a study of the price individuals spend on a car purchase. You will use multiple regression to develop a model for the mean price paid as function of demographic information on the person making the purchase.
The variables in the data file are:
a)Write out the full model using all available predictors (4 pts.)
Note: you may want to reorder the levels of the dichotomous nominal variables
so the reference group is acceptable to you.
b)Use Backward Elimination to simplify your model using P-value Threshold for the Stopping Rule. Fit the reduced model chosen and interpret each of the estimated coefficients along with the CI’s. (6 pts.)
c)Examine residual plots - () and a normal quantile plot of ( – and comment on the adequacy of your model. (4 pts.)
d)Interpret each of the parameter estimates in the final model using proper units and increments. (4 pts.)