252solnJ1 11/9/07 (Open this document in 'Page Layout' view!)

J. MULTIPLE REGRESSION

1. Two explanatory variables

a. Model

b. Solution.

2. Interpretation

Text 14.1, 14.3, 14.4 [14.1, 14.3, 14.4] (14.1, 14.3, 14.4) Minitab output for 14.4 will be available on the website; you must be able to answer the problem from it.

3. Standard errors

J1, Text Problems 14.9, 14.14, 14.23, 14.26 [14.13, 14.16, 14.20, 14.23] (14.17, 14.19, 14.24, 14.27)

4. Stepwise regression

Problem J2 (J1), Text exercises 14.32, 14.34[14.28, 14.29] (14.32, 14.33)

(Computer Problem – instructions to be given)

This document includes solutions to text problems 14.1 through 14.26 and Problem J1. Note that there are many extra problems included here. These are to give you extra practice in sometimes difficult computations. You probably need it.

______

Multiple Regression Problems – Interpretation

Exercise 14.1: Assume that the regression equation is and . Explain the meaning of the slopes (5 and 3), the intercept (10) and .

Solution: Answers below are from the Instructor’s Solution Manual.

(a) Holding constant the effect of X2, for each additional unit of X1 the response variable Y is expected to increase on average by 5 units. Holding constant the effect of X1, for each additional unit of X2 the response variable Y is expected to increase on average by 3 units.

(b) The Y-intercept 10 estimates the expected value of Y if X1 and X2 are both 0.

(c) 60% of the variation in Y can be explained or accounted for by the variation in X1 and the variation in X2.

Exercise 14.3: Minitab output (faked) follows. We are trying to predict durability of a shoe as measured by Ltimp as a function of a measure of shock-absorbing capacity (Foreimp)and a measurement of change in impact properties over time (Midsole). State the equation and interpret the slopes.

Regression Analysis
The regression equation is
Predictor Coef Stdev t-ratio p
Constant -0.02686 0.06985 -0.39 0.7034
Foreimp 0.79116 0.06295 12.57 0.0000

Midsole 0.60484 0.07174 8.43 0.0000

s = 0.2540 R-sq = 94.2% R-sq(adj) = 93.2%

Analysis of Variance

SOURCE DF SS MS F p

Regression 2 12.61020 6.30510 97.69 0.000

Error 12 0.77453 0.06554

Total 14 13.38473


Answers below are (edited) from the Instructor’s Solution Manual.

(a) . The printout reads Ltimp = -0.0286 + 0.791 Foreimp + 0.604 Midsole.

(b) For a given measurement of the change in impact properties over time, each increase of one unit in forefoot impact absorbing capability is expected to result in an average increase in the long-term ability to absorb shock by 0.79116 units. For a given forefoot impact absorbing capability, each increase of one unit in measurement of the change in impact properties over time is expected to result in the average increase in the long-term ability to absorb shock by 0.60484 units.

(c) . So, 94.21% of the variation in the long-term ability to absorb shock can be explained by variation in forefoot absorbing capability and variation in midsole impact.

(d) The formula in the outline for ( adjusted for degrees of freedom) is where is the number of independent variables. So

. The text uses

Exercise 14.4: The data set given follows. The problem statement is on the next page.

MTB > Retrieve "C:\Berenson\Data_Files-9th\Minitab\Warecost.MTW".

Retrieving worksheet from file: C:\Berenson\Data_Files-9th\Minitab\Warecost.MTW

# Worksheet was saved on Mon Apr 27 1998

Results for: Warecost.MTW

MTB > Print c1-c3

Data Display Original Data

Row DistCost Sales Orders

1 52.95 386 4015

2 71.66 446 3806

3 85.58 512 5309

4 63.69 401 4262

5 72.81 457 4296

6 68.44 458 4097

7 52.46 301 3213

8 70.77 484 4809

9 82.03 517 5237

10 74.39 503 4732

11 70.84 535 4413

12 54.08 353 2921

13 62.98 372 3977

14 72.30 328 4428

15 58.99 408 3964

16 79.38 491 4582

17 94.44 527 5582

18 59.74 444 3450

19 90.50 623 5079

20 93.24 596 5735

21 69.33 463 4269

22 53.71 389 3708

23 89.18 547 5387

24 66.80 415 4161


We are trying to predict warehouse costs in $thousands (DistCost) as a function of sales in $thousands (Sales) and the number of ordefrs received (Orders). From the output the text asks for a) the regression equation, b) the meaning of the slopes, c) the meaning or rather the lack thereof of the intercept. It also asks for rough c) confidence and d) prediction intervals.

The Minitab regression results follow. Regression was done using the pull-down menu. The ‘constant’ subcommand is automatic and provides a constant term in the regression. Response was ‘Distcost’ in c1, Predictors were ‘sales’ and ‘orders’ in c2 and c3. The VIF option was taken.

According to the Minitab ‘help’ output, “The variance inflation factor is a test for collinearity. The variance inflation factor (VIF) is used to detect whether one predictor has a strong linear association with the remaining predictors (the presence of multicollinearity among the predictors). VIF measures how much the variance of an estimated regression coefficient increases if your predictors are correlated (multicollinear).

VIF = 1 indicates no relation; VIF > 1, otherwise. The largest VIF among all predictors is often used as an indicator of severe multicollinearity. Montgomery and Peck suggest that when VIF is greater than 5-10, then the regression coefficients are poorly estimated. You should consider the options to break up the multicollinearity: collecting additional data, deleting predictors, using different predictors, or an alternative to least square regression. (© All Rights Reserved. 2000 Minitab, Inc.).”

The brief 3 option in results provides the most complete results possible, including the effect on the regression sum of squares of the independent variables in sequence and the predicted value of the dependent variable ‘fit’ and the variance for the confidence interval ‘SE Fit’

In order to do prediction and confidence intervals for and , these values were placed in c4 and c5 and these were mentioned under options as ‘prediction intervals for new observations’ and ‘fits’, ‘confidence intervals’ and ‘prediction intervals’ were checked. The 2 lines below were generated by the command because a storage option was also checked. The intervals requested in c, d and e appear at the end of the printout.

MTB > Name c6 = 'PFIT1' c7 = 'CLIM1' c8 = 'CLIM2' c9 = 'PLIM1' &

CONT> c10 = 'PLIM2'

MTB > Regress c1 2 c2 c3;

SUBC> Constant;

SUBC> VIF;

SUBC> PFits 'PFIT1';

SUBC> CLimits 'CLIM1'-'CLIM2';

SUBC> PLimits 'PLIM1'-'PLIM2';

SUBC> Brief 3.

Regression Analysis: DistCost versus Sales, Orders Minitab regression Output.

The regression equation is

DistCost = - 2.73 + 0.0471 Sales + 0.0119 Orders

Predictor Coef SE Coef T P VIF

Constant -2.728 6.158 -0.44 0.662

Sales 0.04711 0.02033 2.32 0.031 2.8

Orders 0.011947 0.002249 5.31 0.000 2.8

S = 4.766 R-Sq = 87.6% R-Sq(adj) = 86.4%

Analysis of Variance

Source DF SS MS F P

Regression 2 3368.1 1684.0 74.13 0.000

Residual Error 21 477.0 22.7

Total 23 3845.1

Source DF Seq SS Note that these two add to the Regression SS in the ANOVA.

Sales 1 2726.8

Orders 1 641.3

Obs Sales DistCost Fit SE Fit Residual St Resid

1 386 52.950 63.425 1.332 -10.475 -2.29R

2 446 71.660 63.755 1.511 7.905 1.75

3 512 85.580 84.820 1.656 0.760 0.17

4 401 63.690 67.082 1.332 -3.392 -0.74

5 457 72.810 70.127 0.999 2.683 0.58

6 458 68.440 67.796 1.193 0.644 0.14

7 301 52.460 49.839 2.134 2.621 0.62

8 484 70.770 77.528 1.139 -6.758 -1.46

9 517 82.030 84.196 1.525 -2.166 -0.48

10 503 74.390 77.503 1.126 -3.113 -0.67

11 535 70.840 75.199 1.838 -4.359 -0.99

12 353 54.080 48.800 2.277 5.280 1.26

13 372 62.980 62.311 1.483 0.669 0.15

14 328 72.300 65.626 2.847 6.674 1.75

15 408 58.990 63.852 1.152 -4.862 -1.05

16 491 79.380 75.145 1.069 4.235 0.91

17 527 94.440 88.789 2.004 5.651 1.31

18 444 59.740 59.407 2.155 0.333 0.08

19 623 90.500 87.302 2.535 3.198 0.79

20 596 93.240 93.867 2.097 -0.627 -0.15

21 463 69.330 70.087 1.049 -0.757 -0.16

22 389 53.710 59.898 1.349 -6.188 -1.35

23 547 89.180 87.401 1.657 1.779 0.40

24 415 66.800 66.535 1.107 0.265 0.06

R denotes an observation with a large standardized residual

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI

1 69.878 1.663 ( 66.420, 73.337) ( 59.381, 80.376)

Values of Predictors for New Observations

New Obs Sales Orders

1 400 4500

Answers below are (edited) from the Instructor’s Solution Manual.

(a)

(b) For a given number of orders, each increase of $1000 in sales is expected to result in an estimated average increase in distribution cost by $47.114. For a given amount of sales, each increase of one order is expected to result in the estimated average increase in distribution cost by $11.95.

(c) The interpretation of b0 has no practical meaning here because it would have been the estimated average distribution cost when there were no sales and zero number of orders.

(d) or $69,878

According to the outline, crude intervals can be given as an approximate confidence interval of and an approximate prediction interval of , where and . But we have checked the options for intervals and gotten the intervals below.

(e) 66.42073.337

(f) 59.38180.376

(g) . So, 87.59% of the

variation in distribution cost can be explained by variation in sales and variation in number of orders.


(h) or

Multiple Regression Problems – Significance

Exercise 11.3 in James T. McClave, P. George Benson and Terry Sincich, Statistics for Business and Economics, 8th ed. , Prentice Hall, 2001, last year’s text is given as an introductory example: We wished to estimate the coefficients , of the presumably 'true' regression . The line that we estimated had the equation . In this case we can write our results, as far as they are available as . The numbers in parentheses under the equation are the standard deviations and . According to the outline, to test use . First find the degrees of freedom , and since , use . Make a diagram with an almost normal curve with 'reject' zones above 2.056 and below -2.056.

a) So, if we wish to test use . Since this is not in the 'reject' zone, do not reject the null hypothesis and say that is not significant.

b) If we wish to test use . Since this is in the 'reject' zone, reject the null hypothesis and say that is significant.

c) Note that the size of the coefficient is only important relative to the standard deviation.

New problem J1 Old text exercise 11.5: Use the following data

Row

1 1.0 0 0
2 2.7 1 1
3 3.8 2 4
4 4.5 3 9

5 5.0 4 16

6 5.3 5 25

7 5.2 6 36

a) Do the regression of against and . Compute b) and c) d) do the ANOVA (not anywhere near as much fun as the Hokey-Pokey) and, e) following the formulas in the outline, try to find approximate confidence and prediction intervals when and f)You may also run this on Minitab using c1, c2 and c3 with the command Regress c1 on 2 c2 c3


Solution: Note that , so that you are actually doing a nonlinear regression. Capital letters are used instead of small letters throughout the following.

a) Row

1 1.0 0 0 0 0 1.00 0.0 0.0 0
2 2.7 1 1 1 1 7.29 2.7 2.7 1

3 3.8 2 4 4 16 14.44 7.6 15.2 8

4 4.5 3 9 9 81 20.25 13.5 40.5 27

5 5.0 4 16 16 256 25.00 20.0 80.0 64

6 5.3 5 25 25 625 28.09 26.5 132.5 125

7 5.2 6 36 36 1296 27.04 31.2 187.2 216

Sum 27.5 21 91 91 2275 123.11 101.5 458.1 441

To repeat, 27.5, 21, 91, 91, 2275, 123.11, 101.5, 458.1, 441 and .

Means:: , and

Spare Parts:

Note that . (is the number of independent variables.) is used later.

The Normal Equations: .

. (Eqn. 1)

(Eqn. 2)

(Eqn. 3)

If we fill in the above spare parts, we get:

We solve the equations 2 and 3 alone, by multiplying one of them so that the coefficients of or are of equal value. We then add or subtract the two equations to eliminate one of the variables. We have a choice at this point, Note that 1092 divided by 168 is 6.5. So we could multiply equation 2 by 6.5 to get

. If we subtract one of these equations from the other, we will get an equation in alone, which we could solve for . The alternative is to note that 168 divided by 28 is 6, so that we could multiply equation 2 by 6 to get .


This is the pair that I chose, if only because 6 looked easier to use than 6.5. If we subtract equation 3 from equation 2, we get . But if , then

Now, solve either Equation 2 or 3 for . If we pick Equation 2 in its original form, we can write it as If we substitute our value of , it becomes