Worksheet – Model Building LPGA Golf Performance and Prize Winnings

Response: Prize (prz) Money (per round) Predictors: average drive distance, fairway percent, greens in regulation percent, average putts per round, sandshots per round, sandsavepercent

Part 1: Fit model with Prize as D.V. and Full set of predictors and check normality of errors

Fit the full model.

a)What proportion of the variation in prize money is “explained” by the model.

b)What is the F-statistic for testing whether prize money is associated with any of the predictors?

c)Based on the t-tests for the individual regression coefficients, which seem to be most important?

d)Based on the residual versus predicted plot and the normal probability plot, do the normal distribution and constant variance assumptions seem reasonable?

e)Give the P-value of the test for normality of errors (Shapiro-Wilk test)

f)Use the Box-Cox transformation to obtain a power transformation on Y that may make the errors approximately distributed (the focal value is just outside the 95% Confidence Interval).

Part 2: Fit models with ln(Prize) as D.V. and and use Automated Methods for Model Selection

a)What model (set of predictors) is selected based on AIC by Backward Elimination?

b)What model (set of predictors) is selected based on AIC by Forward Selection?

c)What model (set of predictors) is selected based on AIC by Stepwise Regression?

d)What proportion of the variation in prize money is “explained” by the model chosen by stepwise.

e)What is the F-statistic for testing whether prize money is associated with any of the predictors?

f)Based on the t-tests for the individual regression coefficients, which seem to be most important?

g)Based on the residual versus predicted plot and the normal probability plot, do the normal distribution and constant variance assumptions seem reasonable?

h)Give the P-value of the test for normality of errors (Shapiro-Wilk test)

i)Give the P-value for the Breusch-Pagan test (constant error variance)

Part 3: Use K-fold Cross-Validation to Compare out-of sample Error for 4 Models

a)Model 1: drive, fairway, greens, putts, sandshots, sandsave MSE1 = ????

b)Model 2: fairway, greens, putts, sandshots, sandsave MSE2 = ????

c)Model 3: drive, greens, putts, sandshots, sandsave MSE3 = ????

d)Model 4: greens, putts, sandshots, sandsave MSE4 = ????