Multiple Linear Regression

Note: here we show the SPSS procedure; you can find SAS code on the class webpage

Example: Wish to predict test performance in a statistics course. There are 6 variables:

  • Mathtestscore on a math aptitude test taken senior year of high school
  • Engtestscore on a English aptitude test taken senior year of high school
  • Eng_gpaHigh school GPA in English courses
  • Math_gpaHigh school GPA in math courses
  • Othr_gpaHigh school GPA other than math and English
  • Statexamaverage % correct on exams in a college statistics course.

Analyze > Regression > Linear, choose dependent and independent variables and model selection method “stepwise”

Stepwise Procedure Results: (can change the criteria under “options”)

Variables Entered/Removeda
Model / Variables Entered / Variables Removed / Method
1 / Math aptitude test score / . / Stepwise (Criteria: Probability-of-F-to-enter <= .120, Probability-of-F-to-remove >= .150).
2 / English aptitude test score / . / Stepwise (Criteria: Probability-of-F-to-enter <= .120, Probability-of-F-to-remove >= .150).
a. Dependent Variable: Average percentage correct on statistics exams
Model Summary
Model / R / R Square / Adjusted R Square / Std. Error of the Estimate
1 / .484a / .234 / .227 / 17.401
2 / .505b / .255 / .240 / 17.251
a. Predictors: (Constant), Math aptitude test score
b. Predictors: (Constant), Math aptitude test score, English aptitude test score
Coefficientsa
Model / Unstandardized Coefficients / Standardized Coefficients / t / Sig.
B / Std. Error / Beta
1 / (Constant) / 3.061 / 10.557 / .290 / .772
Math aptitude test score / .124 / .023 / .484 / 5.479 / .000
2 / (Constant) / -14.088 / 14.750 / -.955 / .342
Math aptitude test score / .119 / .023 / .467 / 5.286 / .000
English aptitude test score / .040 / .024 / .146 / 1.650 / .102
a. Dependent Variable: Average percentage correct on statistics exams

Fitted Regression Equation and interpretation:

Model 1:

Model 2:

Check Assumptions for the model 1

Plots are produced under “plots”:

You can use *ZRESID instead of *SDRESID as Y.The residuals/predicted values can be saved by clicking the “save” icon.

  • Here we can see the Normal Q-Q plot (the left plot) shows a pretty straight line, so the normality (of residuals) assumption is reasonable. To confirm, you need to conduct a normality test for residuals. The test results, not shown, all indicate passing normality (the p-values are all above .05).
  • The assumption of equal variances (of residuals) in MLR model can only be accessed by residual plot (standardized residuals vs. predicted values). No test is available. In the above residual plot, it shows no extreme outliers (standardized residuals beyond +/-3) and no pattern (a scattered cloud). Except one mild outlier (exceeding +/-2), we can conclude the equal variances assumption is reasonable.

More on statistical inference:

  1. You can get a confidence interval of mean of Y and/or prediction interval of a future observation of Y by:

Analyze > Regression > Linear, choose dependent and independent variables> Save, click “mean” or “individual” of prediction intervals

  1. Click “collinearity diagnostic” under “statistics”. If VIF is greater than 10, there is a serious multicollinearity problem and so we need to investigate this issue.