252solnJ2 11/26/07 (Open this document in 'Page Layout' view!)
J. MULTIPLE REGRESSION
1. Two explanatory variables
a. Model
b. Solution.
2. Interpretation
Text 14.1, 14.3, 14.4 [14.1, 14.3, 14.4] (14.1, 14.3, 14.4) Minitab output for 14.4 will be available on the website; you must be able to answer the problem from it.
3. Standard errors
J1, Text Problems 14.9, 14.14, 14.23, 14.26 [14.13, 14.16, 14.20, 14.23] (14.17, 14.19, 14.24, 14.27)
4. Stepwise regression
Problem J2 (J1), Text exercises 14.32, 14.34[14.28, 14.29] (14.32, 14.33)
(Computer Problem – instructions to be given)
This document includes solutions to text problems 14.28 through 14.29 and Problem J2. Again, there are extra problems included. They are very worth looking at!
______
Stepwise Regression Problems.
Exercise 14.32 [14.28 in 9th] (14.32 in 8th edition): Assume the following ANOVA summary, where there are 2 independent variables and the regression sum of squares for X1 is 20 and the regression sum of squares for X2 is 15. a) Is there a significant relationship between Y and each of the independent variables at the 5% significance level? b) Compute and
SOURCE DF SS MS F p
Regression 2 30
Error 10 120
Total 12 150
Solution: Complete the ANOVA
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 30 30/2=15
Error 10 120 120/10=12
Total 12 150
SOURCE DF SSR
X1 1 20
X2 1 15
14.28(a)For X1: this is the additional explanatory power from adding X1 after X2. We are itemizing the regression sum of squares.
The ANOVA would read
SOURCE DF SS MS F p
X2 1 15
X1 1 1515 1.25
Error 10 120 120/10=12
Total 12 150
This is compared to . Do not reject H0. There is not sufficient evidence that the variable X1 contributes to a model already containing X2.
For X2: . This is additional explanatory power from adding X2 after X1.
The ANOVA would read
SOURCE DF SS MS F p
X1 1 20
X2 1 1010 0.833
Error 10 120 120/10=12
Total 12 150
This is compared to . Do not reject H0.There is not sufficient evidence that the variable X2 contributes to a model already containing X1.
Neither independent variable X1 nor X2 makes a significant contribution to the model in the presence of the other variable. Also the overall regression equation involving both independent variables is not significant:
This is compared to
Neither variable should be included in the model and other variables should be investigated.
(b)
= 0.1111. The denominator is what is unexplained after adding only. Holding constant the effect of variable X2, 11.11% of the variation in Y can be explained by the variation in variable X1.
= 0.0769. For this one the denominator is what is unexplained after adding only. Holding constant the effect of variable X1, 7.69% of the variation in Y can be explained by the variation in variable X2.
Exercise 14.34 [14.29 in 9th] (14.33 in 8th edition): Recall in the Warecost problem (14.4 in 10th edition)
Analysis of Variance
Source DF SS MS F P
Regression 2 3368.1 1684.0 74.13 0.000
Residual Error 21 477.0 22.7
Total 23 3845.1
Source DF Seq SSNote that these two add to the Regression SS in the ANOVA.
Sales 1 2726.8
Orders 1 641.3
Or we could run the independent variables in opposite sequence.
Source DF SS MS F P
Regression 2 3368.1 1684.0 74.13 0.000
Residual Error 21 477.0 22.7
Total 23 3845.1
Source DF Seq SS
Orders 1 3246.1
Sales 1 122.0
a) Find if the independent variables make a significant contribution to the regression model and what the most appropriate model is. ? b) Compute and .
Solution:
(a)For X1: . . Compare this with . Reject H0. There is evidence that the variable X1 contributes to a model already containing X2.
For X2: . Compare this with . Reject H0. There is evidence that the variable X2 contributes to a model already containing X1.
Since each independent variable X1 and X2 makes a significant contribution to the model in the presence of the other variable, both variables should be included in the model.
Analysis of Variance
Source DF SS MS F P
Regression 2 3368.1 1684.0 74.13 0.000
Residual Error 21 477.0 22.7
Total 23 3845.1
Source DF Seq SSNote that these two add to the Regression SS in the ANOVA.
Sales 1 2726.8
Orders 1 641.3
Or we could run the independent variables in opposite sequence.
Source DF SS MS F P
Regression 2 3368.1 1684.0 74.13 0.000
Residual Error 21 477.0 22.7
Total 23 3845.1
Source DF Seq SS
Orders 1 3246.1
Sales 1 122.0
(b) Holding constant the effect of the number of orders, 20.37% of the variation in Y can be explained by the variation in sales.
. Holding constant the effect of sales, 57.35% of the variation in Y can be explained by the variation in the number of orders.
More of Old text exercise 11.5:
The Minitab printout read
The regression equation is
y = 1.10 + 1.64 x - 0.160 x*x
Predictor Coef Stdev t-ratio p
Constant 1.09524 0.09135 11.99 0.000
x 1.63571 0.07131 22.94 0.000
x*x -0.15952 0.01142 -13.97 0.000
s = 0.1047 R-sq = 99.7% R-sq(adj) = 99.6%
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 15.0305 7.5152 686.17 0.000
Error 4 0.0438 0.0110
Total 6 15.0743
SOURCE DF SEQ SS
x 1 12.8929
x*x 1 2.1376
Two sections remain unexplained: First R-squared adjusted .
adjusted for degrees of freedom is or , where is the number of independent variables and is the number of observations. It is intended to compensate for the fact that increasing the number of independent variables always raises . In this version of the regression, we have observations and independent variables, so . If this does not go up as you add new independent variables, you can be rather sure that the new variables accomplish nothing.
Second sequential sums of squares:
The two values given, 12.8929 and 2.1376 represent an itemization of the regression sum of squares, 15.0305. This means that we could split up the ANOVA to read
SOURCE DF SS MS F p
x 1 12.8929 12.8929 1172.08
x*x 1 2.1376 2.1376 194.32
Error 4 0.0438 0.0110
Total 6 15.0743
If we compare these Fs to , for example, we will see that both Fs are highly significant indicating that x explained Y well, but that adding x*x definitely improved the explanation. Note that, for the coefficient of x*x has a t-ratio of -13.97. If this is squared, it will give us, except for rounding errors 194.32, so the test is essentially the same as a t-test on the last independent variable added.
Problem J1: n = 80, k = 3, R2 = .95
n = 80, k = 4, R2 = .99
Use an F test to show if the second regression is an improvement.
Solution: There are two ways to do this.
a) Fake an ANOVA. Call the first result and the second . Remember that ,
so that if and , then . For the two regressions we get
Source / SS / DF / Source / SS / DF3 Xs / 95 / 3 / and / 4 Xs / 99 / 4
Error / 5 / 76 / Error / 1 / 75
Total / 100 / 79 / Total / 100 / 79
If we combine these and get new values of F by dividing the MS values by 0.013333, our new error MS, we get
Source / SS / DF / MS / F /3 Xs / 95 / 3 / 31.67 / 2375.24 /
1 more X / 4 / 1 / 4 / 300.00 /
Error / 1 / 75 / 0.013333
Total / 100 / 79
The second F test gives us our answer. We reject the hypothesis that the 4th x does not contribute to the
explanation of Y.
b). If we add independent variables so that we end with independent variables, use the formula
. Here and , so
The test gives the same results as in a)
Exercise 11.87 in James T. McClave, P. George Benson and Terry Sincich, Statistics for Business and Economics, 8th ed. , Prentice Hall, 2001, last year’s text:
a)Minitab was used to fit the complete model, and the reduced model,
The ANOVAs follow.
Complete model:
Analysis of Variance
SOURCE DF SS MS F p
Regression 4 831.09 207.77 20.41 0.002
Error 15 152.66 10.18
Total 19 983.75
Reduced model:
Analysis of Variance
SOURCE DF SS MS F p
Regression 2 831.31 411.66 43.61 0.000
Error 17 160.44 9.44
Total 19 983.75
b) The Minitab printout shows that in the complete model the error sum of squares is 152.66 and in the reduced model it is 160.44. These represent the unexplained part of each model. The amount of reduction of the unexplained part was thus only 7.78 out of 160.44.
c) We have 5 parameters in the complete model and 3 in the reduced model.
d) We can investigate the null hypothesis against the alternative that at least one of the betas is significant.
d) We can do this using an ANOVA or using the formula in problem J1. Note that between the two regressions, the regression sum of square rose from 832.31 to 831.09, an increase of 7.78.
If we combine these two ANOVA tables and get new values of F by dividing the MS values by the new MSE we get
Source / SS / DF / MS / F /2 Xs / 823.31 / 2 / 411.66 / 40.449
2 more Xs / 7.78 / 2 / 3.89 / 0.3822 /
Error / 152.66 / 15 / 10.17733
Total / 983.75 / 19
We cannot reject the null hypothesis because our computed F is less than the table F.
Alternately, if we add independent variables so that we end with independent variables, use the formula
. Here and , so
f) By comparing 0.38 with other values of on the F table, you should be able to figure out that the p-value is above 10%.