Homework 3. Multiple linear regression modeling.

1  Variable selection [65]

1.1  Use the Fit model platform to regress biomass (Y-variable or response) on all continuous predictor variables (X-variables or predictors). [10]

1.2  Obtain the VIF’s and standardized regression coefficients for the model. Indicate if there is a severe collinearity problem and what variables are involved. [20]

1.3  Given that these data are observational and we want to understand the effects of soil characteristics on biomass, use the Fit Model platform to develop models using the Cp and AIC criteria.

1.3.1  Select Stepwise Personality and Run. In the new window, select the red triangle and choose All possible regressions. Using CTL-click or right-click, add the Cp to the table of results for all models that appears inside the window. Then, use CTL-click or right-click to Make into Data Table. Save your new table. Add a formula to the table to calculate the AIC. Note that “RMSE” is the square root of the MSE for each model. [20]

1.3.2  Plot the best two Cp’s and AIC’s for each level of number of variables included in the model against the number of variables plus 1. On the graphs, circle the best subset of X variables based on each criterion and write down what variables are in each model selected. You will circle one model based on the Cp and another one (or the same one) based on the AIC. [15]

2  Variance of Yhat and of Beta hat [35]

2.1  Run a new model with only the X variables selected on the basis of AIC. [5]

2.2  Using the custom test, estimate biomass when all soil variables are at their average value both in the 14-variable (step 1.1) and the reduced model according the step 2.1. [15]

2.3  Compare the std errors of the partial regression coefficients between the full and reduced model and indicate the main reason for the difference. [8]

2.4  Compare the std error of the Custom tests between the full and reduced model. Indicate if there is a difference and why. [7]