252y0541 5/7/05

Introduction

What is a significant difference or a statistical test?Many of you seem to have no idea what a statistical test is. We have been doing them every day. For example, look at question 5b) in Part II.

b. Of the campaigns that took 0 – 2 months 7 were ineffective. Of the campaigns that took more than two months, 8 were ineffective. Is the fraction that were ineffective in the first category below the fraction in the second category? (5)

Any answer to this question should show that you are aware of the warning in the beginning of Part II.

Show your work! State and where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests – That is, explain your hypotheses and what values from what table were used to test them.

In other words, if all you do here is compute some proportions and tell me that they are different, you are wasting both our time. A statistical test is required. If you don’t know what I mean by a (significant) difference between two fractions or proportions either 1) quit right now or 2) review the course material including the answer to 5b) until you do.

I haven’t looked at this exam since it was given and I may have missed some corrections.

ECO252 QBA2

Final EXAM

May 4, 2005

Name and Class hour:____KEY______

I. (25+ points) Do all the following. Note that answers without reasons receive no credit. Most answers require a statistical test, that is, stating or implying a hypothesis and showing why it is true or false by citing a table value or a p-value. If you haven’t done it lately, take a fast look atECO 252 - Things That You Should Never Do on a Statistics Exam (or Anywhere Else)

The next 12 pages contain computer output. This comes from a data set on the text CD-ROM called Auto2002. There are 121 observations. The dependent variable is MPG (miles per gallon). The columns in the data set are:

NameThe make and model

SUV‘Yes’ if it’s an SUV, ‘No’ if not.

Drive TypeAll wheel, front wheel, rear wheel or four wheel.

HorsepowerAn independent variable

Fuel TypePremium or regular

MPGThe dependent variable

LengthIn inches – an independent variable

WidthIn inches – an independent variable

WeightIn pounds – an independent variable

Cargo VolumeSquare feet – an independent variable

Turning CircleFeet – an independent variable.

I added the following

SUV_DA dummy variable based on ‘SUV’, 1 for an SUV, otherwise zero.

Fuel_DA dummy variable based on ‘Fuel Type’, 1 for a Premium fuel., otherwise zero

SUVwtAn interaction variable, the product of ‘SUV_D’ and ‘Weight’

SUVtcAn interaction variable, the product of ‘SUV_D’ and ‘Turning Circle’

HPsqHorsepower Squared.

AWD_DA dummy variable based on ‘Drive Type’, 1 for all wheel drive, otherwise zero

FWD_DA dummy variable based on ‘Drive Type’, 1 for front wheel drive, otherwise zero

RWD_DA dummy variable based on ‘Drive Type’, 1 for rear wheel drive, otherwise zero

SUV_LAn interaction variable, the product of ‘SUV_D’ and ‘Length’

Questions are included with the regressions and thus cannot be in order of difficulty. It’s probably a good idea to look over the questions and explanations before you do anything.

————— 4/28/2005 6:18:32 PM ————————————————————

Welcome to Minitab, press F1 for help.

Results for: 252x0504-4.MTW

MTB > Stepwise 'MPG' 'Horsepower' 'Length' 'Width' 'Weight' 'Cargo Volume' &

CONT> 'Turning Circle' 'SUV_D' 'Fuel_D' 'SUVwt' 'HPsq' 'AWD_D' &

CONT> 'FWD_D' 'RWD_D' 'SUV_L';

SUBC> AEnter 0.15;

SUBC> ARemove 0.15;

SUBC> Best 0;

SUBC> Constant.

Because I had relatively little idea of what to do, I ran a stepwise regression. You probably have not seen one of these before, but they are relatively easy to read. Note that it dropped 2 observations so that the results will not be quite the same as I got later.

The first numbered column represents the single independent variable that seems to have the most explanatory effect on MPG, The equation reads MPG = 38.31 – 15.34 Weight The fact that Weight entered first with a negative coefficient should surprise no one. At the bottom appears and the statistic mentioned in your text. The value of the t-ratio and its p-value appear below the coefficient.

Stepwise Regression: MPG versus Horsepower, Length, ...

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

Response is MPG on 14 predictors, with N = 119

N(cases with missing observations) = 2 N(all cases) = 121

Step 1 2 3 4 5 6

Constant 38.31 36.75 41.59 50.06 50.15 59.00

Weight -0.00491 -0.00436 -0.00578 -0.00495 -0.00424 -0.00339

T-Value -15.34 -11.87 -12.82 -9.31 -6.74 -5.61

P-Value 0.000 0.000 0.000 0.000 0.000 0.000

SUV_D -1.72 -33.71 -35.29 -35.12 -18.68

T-Value -2.84 -4.99 -5.36 -5.40 -2.71

P-Value 0.005 0.000 0.000 0.000 0.008

SUV_L 0.180 0.185 0.182 0.088

T-Value 4.75 5.04 5.01 2.26

P-Value 0.000 0.000 0.000 0.026

Turning Circle -0.285 -0.292 -0.255

T-Value -2.79 -2.90 -2.75

P-Value 0.006 0.004 0.007

Horsepower -0.0124 -0.1619

T-Value -2.01 -5.04

P-Value 0.046 0.000

HPsq 0.00040

T-Value 4.73

P-Value 0.000

S 2.50 2.43 2.23 2.17 2.14 1.96

R-Sq 66.78 68.94 74.04 75.70 76.55 80.45

R-Sq(adj) 66.50 68.40 73.36 74.85 75.51 79.41

Mallows C-p 71.5 61.4 34.8 27.4 24.7 4.8

More? (Yes, No, Subcommand, or Help)

SUBC> y

I’m greedy, so while I was surprised that Minitab had found six explanatory (independent) variables that actually seemed to affect miles per gallon I wanted more. For the first time ever (for me), Minitab found another variable
Step 7

Constant 58.50

Weight -0.00342

T-Value -5.74

P-Value 0.000

SUV_D -19.0

T-Value -2.79

P-Value 0.006

SUV_L 0.090

T-Value 2.36

P-Value 0.020

Turning Circle -0.210

T-Value -2.24

P-Value 0.027

Horsepower -0.175

T-Value -5.43

P-Value 0.000

HPsq 0.00042

T-Value 5.03

P-Value 0.000

Fuel_D 0.92

T-Value 2.11

P-Value 0.037

S 1.93

R-Sq 81.21

R-Sq(adj) 80.02

Mallows C-p 2.5

More? (Yes, No, Subcommand, or Help)

SUBC> y

No variables entered or removed

More? (Yes, No, Subcommand, or Help)

SUBC> n

Because I was worried about Collinearity, I had the computer do a table of correlations between all the independent variables. The table is triangular since the correlation between, say, Length and Horsepower is going to be the same as the correlation between Horsepower and Length. So, for example, the correlation between Horsepower and Length is .648 and the p-value of zero below it evaluates the null hypothesis that the correlation is insignificant. The explanation of Predicted R2 that appears below the correlation table was a new one on me, but could help you in comparing the regressions.

MTB > Correlation 'Horsepower' 'Length' 'Width' 'Weight' 'Cargo Volume' &

CONT> 'Turning Circle' 'SUV_D' 'Fuel_D' 'SUVwt' 'SUVtc' 'HPsq' 'AWD_D' &

CONT> 'FWD_D' 'RWD_D' 'SUV_L'.

Correlations: Horsepower, Length, Width, Weight, Cargo Volume, ...

Horsepower Length Width Weight

Length 0.648

0.000

Width 0.660 0.825

0.000 0.000

Weight 0.673 0.634 0.780

0.000 0.000 0.000

Cargo Volume 0.296 0.395 0.546 0.716

0.001 0.000 0.000 0.000

Turning Circ 0.497 0.750 0.658 0.650

0.000 0.000 0.000 0.000

SUV_D 0.160 -0.102 0.180 0.535

0.080 0.265 0.049 0.000

Fuel_D 0.321 -0.013 -0.042 0.057

0.000 0.886 0.645 0.540

SUVwt 0.182 -0.077 0.206 0.562

0.045 0.403 0.023 0.000

SUVtc 0.185 -0.062 0.211 0.577

0.042 0.502 0.020 0.000

HPsq 0.989 0.632 0.645 0.668

0.000 0.000 0.000 0.000

AWD_D 0.059 -0.118 -0.037 0.065

0.523 0.199 0.691 0.483

FWD_D -0.370 -0.001 -0.163 -0.453

0.000 0.994 0.076 0.000

RWD_D 0.334 0.070 0.151 0.351

0.000 0.445 0.101 0.000

SUV_L 0.197 -0.053 0.219 0.582

0.030 0.564 0.016 0.000

Cargo Volume Turning Circ SUV_D Fuel_D

Turning Circ 0.486

0.000

SUV_D 0.459 0.139

0.000 0.127

Fuel_D -0.245 -0.069 -0.147

0.007 0.456 0.110

SUVwt 0.473 0.161 0.999 -0.141

0.000 0.078 0.000 0.125

SUVtc 0.484 0.196 0.996 -0.142

0.000 0.031 0.000 0.121

HPsq 0.289 0.480 0.173 0.296

0.001 0.000 0.058 0.001

AWD_D 0.021 -0.068 0.185 0.218

0.823 0.461 0.043 0.017

FWD_D -0.165 -0.027 -0.517 -0.280

0.071 0.771 0.000 0.002

RWD_D 0.108 0.015 0.364 0.098

0.239 0.874 0.000 0.288

SUV_L 0.487 0.181 0.996 -0.145

0.000 0.047 0.000 0.114

SUVwt SUVtc HPsq AWD_D

SUVtc 0.998

0.000

HPsq 0.198 0.200

0.030 0.028

AWD_D 0.184 0.174 0.040

0.044 0.057 0.667

FWD_D -0.522 -0.526 -0.369 -0.366

0.000 0.000 0.000 0.000

RWD_D 0.367 0.374 0.347 -0.137

0.000 0.000 0.000 0.135

SUV_L 0.999 0.998 0.215 0.176

0.000 0.000 0.018 0.054

FWD_D RWD_D

RWD_D -0.810

0.000

SUV_L -0.529 0.381

0.000 0.000

Cell Contents: Pearson correlation

P-Value

PRESSAssesses your model's predictive ability. In general, the smaller the prediction sum of squares (PRESS) value, the better the model's predictive ability. PRESS is used to calculate the predicted R2. PRESS, similar to the error sum of squares (SSE), is the sum of squares of the prediction error. PRESS differs from SSE in that each fitted value, i, for PRESS is obtained by deleting the ith observation from the data set, estimating the regression equation from the remaining n - 1 observations, then using the fitted regression function to obtain the predicted value for the ith observation.

Predicted R2Similar to R2. Predicted R2 indicates how well the model predicts responses for new observations, whereas R2 indicates how well the model fits your data. Predicted R2 can prevent overfitting the model and is more useful than adjusted R2 for comparing models because it is calculated with observations not included in model calculation.Predicted R2 is between 0 and 1 and is calculated from the PRESS statistic. Larger values of predicted R2 suggest models of greater predictive ability.

So now it’s time to get serious. My first regression was based on what I had learned from the

stepwise regression. The only one of the variables that I left out from the stepwise regression was FUEL_D.

  1. Look at the results of Regression 1. But don’t forget what has gone before.
  2. What does the Analysis of variance show us? Why? (1)
  3. What suggests that the relation of MPG to one of the variables is nonlinear? (1)
  4. What does the equation suggest that the difference is between an extra inch on an SUV and a non_SUV? (1)
  5. Why did I leave out FUEL_D (2)
  6. Which coefficients are not significant? Why? (2)
  7. What do the values of the VIFs tell us? (2)

MTB > Regress 'MPG' 6 'Weight' 'SUV_D' 'SUV_L' 'Turning Circle' &

CONT> 'Horsepower' 'HPsq';

SUBC> Constant;

SUBC> Brief 2.

MTB > Regress 'MPG' 6 'Weight' 'SUV_D' 'SUV_L' 'Turning Circle' &

CONT> 'Horsepower' 'HPsq';

SUBC> GNormalplot;

SUBC> NoDGraphs;

SUBC> RType 1;

SUBC> Constant;

SUBC> VIF;

SUBC> Press;

SUBC> Brief 2.

Regression Analysis: MPG versus Weight, SUV_D, ... (Regression 1)

The regression equation is

MPG = 63.1 - 0.00303 Weight - 14.8 SUV_D + 0.0653 SUV_L - 0.264 Turning Circle

- 0.213 Horsepower + 0.000522 HPsq

Predictor Coef SE Coef T P VIF

Constant 63.105 3.978 15.86 0.000

Weight -0.0030345 0.0006859 -4.42 0.000 5.6

SUV_D -14.812 7.957 -1.86 0.065 282.1

SUV_L 0.06527 0.04478 1.46 0.148 307.9

Turning Circle -0.2639 0.1050 -2.51 0.013 2.0

Horsepower -0.21251 0.03575 -5.94 0.000 63.5

HPsq 0.00052249 0.00009459 5.52 0.000 61.3

S = 2.27485 R-Sq = 77.5% R-Sq(adj) = 76.4%

PRESS = 752.906 R-Sq(pred) = 71.34%

Analysis of Variance

Source DF SS MS F P

Regression 6 2037.34 339.56 65.62 0.000

Residual Error 114 589.95 5.17

Total 120 2627.29

Source DF Seq SS

Weight 1 1605.19

SUV_D 1 47.29

SUV_L 1 132.83

Turning Circle 1 52.31

Horsepower 1 41.83

HPsq 1 157.89

Unusual Observations

Obs Weight MPG Fit SE Fit Residual St Resid

16 5590 13.000 15.361 1.137 -2.361 -1.20 X

34 7270 10.000 6.856 1.461 3.144 1.80 X

40 5590 13.000 15.361 1.137 -2.361 -1.20 X

62 4065 19.000 14.633 0.654 4.367 2.00R

108 2150 38.000 30.489 0.632 7.511 3.44R

111 2750 41.000 33.473 1.133 7.527 3.82RX

114 2935 41.000 29.806 0.777 11.194 5.24R

115 2940 24.000 29.791 0.778 -5.791 -2.71R

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.

  1. Look at the results of Regression 2. But don’t forget what has gone before.
  2. What variable did I drop? Why? (2)
  3. Are there any coefficients that have a sign that you would not expect? Why? (1)
  4. A Chevrolet Suburban is an SUV with rear wheel drive and 285 horsepower, that takes Regular fuel, has a length of 219 inches, a width of 79 inches, a weight of 5590 pounds, a cargo volume of 77.0 square feet and a turning circle of 46 Feet (!!! Maybe it was inches?). What miles per gallon does the equation predict? What would it be if the vehicle was not classified as an SUV? (3)
  5. Why do I like this regression better than the previous one? (2)[17]

MTB > Regress 'MPG' 5 'Weight' 'SUV_D' 'Turning Circle' 'Horsepower' &

CONT> 'HPsq';

SUBC> GNormalplot;

SUBC> NoDGraphs;

SUBC> RType 1;

SUBC> Constant;

SUBC> VIF;

SUBC> Press;

SUBC> Brief 2.

Regression Analysis: MPG versus Weight, SUV_D, ... (Regression 2)

The regression equation is

MPG = 63.1 - 0.00250 Weight - 3.25 SUV_D - 0.250 Turning Circle

- 0.239 Horsepower + 0.000593 HPsq

Predictor Coef SE Coef T P VIF

Constant 63.137 3.998 15.79 0.000

Weight -0.0025020 0.0005834 -4.29 0.000 4.0

SUV_D -3.2492 0.6272 -5.18 0.000 1.7

Turning Circle -0.2501 0.1051 -2.38 0.019 1.9

Horsepower -0.23928 0.03082 -7.76 0.000 46.7

HPsq 0.00059313 0.00008163 7.27 0.000 45.2

S = 2.28595 R-Sq = 77.1% R-Sq(adj) = 76.1%

PRESS = 744.047 R-Sq(pred) = 71.68%

Analysis of Variance

Source DF SS MS F P

Regression 5 2026.35 405.27 77.56 0.000

Residual Error 115 600.94 5.23

Total 120 2627.29

Source DF Seq SS

Weight 1 1605.19

SUV_D 1 47.29

Turning Circle 1 46.32

Horsepower 1 51.65

HPsq 1 275.90

Unusual Observations

Obs Weight MPG Fit SE Fit Residual St Resid

16 5590 13.000 14.381 0.921 -1.381 -0.66 X

34 7270 10.000 5.945 1.328 4.055 2.18RX

40 5590 13.000 14.381 0.921 -1.381 -0.66 X

108 2150 38.000 30.081 0.570 7.919 3.58R

111 2750 41.000 33.910 1.098 7.090 3.54RX

114 2935 41.000 30.060 0.761 10.940 5.08R

115 2940 24.000 30.047 0.762 -6.047 -2.81R

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.

Because I wanted to look at the effect of the three drive variables on MPG, I ran another stepwise regression. The first part of this is identical to the last stepwise regression, but after the 6th regression, I forced out SUV_L and forced in AWD_D, FWD_D and RWD_D. Because I had to make the regressions comparable, I threw an observation with an anomalous drive variable out and redid my two regressions as Regressions 3 and 4. I then added in all the drive variables as a package in Regression 5.

MTB > Stepwise 'MPG' 'Horsepower' 'Length' 'Width' 'Weight' 'Cargo Volume' &

CONT> 'Turning Circle' 'SUV_D' 'Fuel_D' 'SUVwt' 'HPsq' 'AWD_D' &

CONT> 'FWD_D' 'RWD_D' 'SUV_L';

SUBC> AEnter 0.15;

SUBC> ARemove 0.15;

SUBC> Best 0;

SUBC> Constant.

Stepwise Regression: MPG versus Horsepower, Length, ...

Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15

Response is MPG on 14 predictors, with N = 119

N(cases with missing observations) = 2 N(all cases) = 121

Step 1 2 3 4 5 6

Constant 38.31 36.75 41.59 50.06 50.15 59.00

Weight -0.00491 -0.00436 -0.00578 -0.00495 -0.00424 -0.00339

T-Value -15.34 -11.87 -12.82 -9.31 -6.74 -5.61

P-Value 0.000 0.000 0.000 0.000 0.000 0.000

SUV_D -1.72 -33.71 -35.29 -35.12 -18.68

T-Value -2.84 -4.99 -5.36 -5.40 -2.71

P-Value 0.005 0.000 0.000 0.000 0.008

SUV_L 0.180 0.185 0.182 0.088

T-Value 4.75 5.04 5.01 2.26

P-Value 0.000 0.000 0.000 0.026

Turning Circle -0.285 -0.292 -0.255

T-Value -2.79 -2.90 -2.75

P-Value 0.006 0.004 0.007

Horsepower -0.0124 -0.1619

T-Value -2.01 -5.04

P-Value 0.046 0.000

HPsq 0.00040

T-Value 4.73

P-Value 0.000

S 2.50 2.43 2.23 2.17 2.14 1.96

R-Sq 66.78 68.94 74.04 75.70 76.55 80.45

R-Sq(adj) 66.50 68.40 73.36 74.85 75.51 79.41

Mallows C-p 71.5 61.4 34.8 27.4 24.7 4.8

More? (Yes, No, Subcommand, or Help)

SUBC> remove 'SUV_L'.

Step 7 8 9

Constant 59.15 59.00 58.50

Weight -0.00267 -0.00339 -0.00342

T-Value -5.10 -5.61 -5.74

P-Value 0.000 0.000 0.000

SUV_D -3.13 -18.68 -18.95

T-Value -5.51 -2.71 -2.79

P-Value 0.000 0.008 0.006

SUV_L 0.088 0.090

T-Value 2.26 2.36

P-Value 0.026 0.020

Turning Circle -0.236 -0.255 -0.210

T-Value -2.51 -2.75 -2.24

P-Value 0.013 0.007 0.027

Horsepower -0.199 -0.162 -0.175

T-Value -7.09 -5.04 -5.43

P-Value 0.000 0.000 0.000

HPsq 0.00050 0.00040 0.00042

T-Value 6.75 4.73 5.03

P-Value 0.000 0.000 0.000

Fuel_D 0.92

T-Value 2.11

P-Value 0.037

S 2.00 1.96 1.93

R-Sq 79.56 80.45 81.21

R-Sq(adj) 78.66 79.41 80.02

Mallows C-p 7.8 4.8 2.5

More? (Yes, No, Subcommand, or Help)

SUBC> enter 'AWD_D' 'FWD_D' 'RWD_D'.

Step 10 11 12 13

Constant 60.14 59.11 58.50 58.50

Weight -0.00355 -0.00346 -0.00344 -0.00342

T-Value -5.75 -5.72 -5.72 -5.74

P-Value 0.000 0.000 0.000 0.000

SUV_D -19.5 -19.1 -18.8 -19.0

T-Value -2.82 -2.77 -2.74 -2.79

P-Value 0.006 0.007 0.007 0.006

SUV_L 0.092 0.090 0.089 0.090

T-Value 2.37 2.32 2.30 2.36

P-Value 0.020 0.022 0.023 0.020

Turning Circle -0.207 -0.205 -0.202 -0.210

T-Value -2.10 -2.09 -2.07 -2.24

P-Value 0.038 0.039 0.041 0.027

Horsepower -0.175 -0.177 -0.176 -0.175

T-Value -5.33 -5.42 -5.41 -5.43

P-Value 0.000 0.000 0.000 0.000

HPsq 0.00042 0.00043 0.00042 0.00042

T-Value 4.98 5.04 5.02 5.03

P-Value 0.000 0.000 0.000 0.000

Fuel_D 0.73 0.80 0.87 0.92

T-Value 1.49 1.66 1.92 2.11

P-Value 0.139 0.099 0.057 0.037

AWD_D -1.1

T-Value -0.76

P-Value 0.451

FWD_D -1.36 -0.51 -0.17

T-Value -0.98 -0.62 -0.32

P-Value 0.331 0.535 0.752

RWD_D -1.23 -0.42

T-Value -0.93 -0.55

P-Value 0.353 0.586

S 1.95 1.95 1.94 1.93

R-Sq 81.37 81.27 81.22 81.21

R-Sq(adj) 79.65 79.73 79.86 80.02

Mallows C-p 7.6 6.1 4.4 2.5

More? (Yes, No, Subcommand, or Help)

SUBC> no

Results for: 252x0504-41.MTW

MTB > WSave "C:\Documents and Settings\rbove\My Documents\Minitab\252x0504-41.MTW";

SUBC> Replace.

Saving file as: 'C:\Documents and Settings\rbove\My

Documents\Minitab\252x0504-41.MTW'

MTB > erase c21

MTB > Regress 'MPG' 6 'Weight' 'SUV_D' 'SUV_L' 'Turning Circle' &

CONT> 'Horsepower' 'HPsq' ;

SUBC> GNormalplot;

SUBC> NoDGraphs;

SUBC> RType 1;

SUBC> Constant;

SUBC> VIF;

SUBC> Press;

SUBC> Brief 2.

Regression Analysis: MPG versus Weight, SUV_D, ... (Regression 3)

The regression equation is

MPG = 64.4 - 0.00284 Weight - 15.8 SUV_D + 0.0694 SUV_L - 0.305 Turning Circle

- 0.214 Horsepower + 0.000524 HPsq

Predictor Coef SE Coef T P VIF

Constant 64.364 3.973 16.20 0.000

Weight -0.0028431 0.0006832 -4.16 0.000 5.7

SUV_D -15.843 7.867 -2.01 0.046 276.4

SUV_L 0.06943 0.04423 1.57 0.119 301.7

Turning Circle -0.3045 0.1055 -2.89 0.005 2.0

Horsepower -0.21444 0.03528 -6.08 0.000 63.1

HPsq 0.00052386 0.00009332 5.61 0.000 61.0

S = 2.24427 R-Sq = 78.3% R-Sq(adj) = 77.2%

PRESS = 725.963 R-Sq(pred) = 72.34%

Analysis of Variance

Source DF SS MS F P

Regression 6 2055.21 342.54 68.01 0.000

Residual Error 113 569.15 5.04

Total 119 2624.37

Source DF Seq SS

Weight 1 1602.61

SUV_D 1 49.58

SUV_L 1 135.39

Turning Circle 1 61.04

Horsepower 1 47.88

HPsq 1 158.71

Unusual Observations

Obs Weight MPG Fit SE Fit Residual St Resid

16 5590 13.000 15.259 1.123 -2.259 -1.16 X

34 7270 10.000 6.907 1.442 3.093 1.80 X

36 2715 24.000 28.432 0.493 -4.432 -2.02R

40 5590 13.000 15.259 1.123 -2.259 -1.16 X

107 2150 38.000 30.543 0.624 7.457 3.46R

110 2750 41.000 33.747 1.126 7.253 3.74RX

113 2935 41.000 30.000 0.772 11.000 5.22R

114 2940 24.000 29.985 0.774 -5.985 -2.84R

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.

MTB > Regress 'MPG' 5 'Weight' 'SUV_D' 'Turning Circle' 'Horsepower' &

CONT> 'HPsq' ;

SUBC> GNormalplot;

SUBC> NoDGraphs;

SUBC> RType 1;

SUBC> Constant;

SUBC> VIF;

SUBC> Press;

SUBC> Brief 2.

Regression Analysis: MPG versus Weight, SUV_D, ... (Regression 4)

The regression equation is

MPG = 64.4 - 0.00228 Weight - 3.53 SUV_D - 0.288 Turning Circle

- 0.243 Horsepower + 0.000599 HPsq

Predictor Coef SE Coef T P VIF

Constant 64.352 3.999 16.09 0.000

Weight -0.0022848 0.0005871 -3.89 0.000 4.2

SUV_D -3.5330 0.6366 -5.55 0.000 1.8

Turning Circle -0.2884 0.1057 -2.73 0.007 2.0

Horsepower -0.24278 0.03051 -7.96 0.000 46.6

HPsq 0.00059879 0.00008071 7.42 0.000 45.0

S = 2.25865 R-Sq = 77.8% R-Sq(adj) = 76.9%

PRESS = 720.507 R-Sq(pred) = 72.55%

Analysis of Variance

Source DF SS MS F P

Regression 5 2042.80 408.56 80.09 0.000

Residual Error 114 581.57 5.10

Total 119 2624.37

Source DF Seq SS

Weight 1 1602.61

SUV_D 1 49.58

Turning Circle 1 52.45

Horsepower 1 57.33

HPsq 1 280.82

Unusual Observations

Obs Weight MPG Fit SE Fit Residual St Resid

16 5590 13.000 14.223 0.914 -1.223 -0.59 X

34 7270 10.000 5.938 1.312 4.062 2.21RX

40 5590 13.000 14.223 0.914 -1.223 -0.59 X

107 2150 38.000 30.108 0.563 7.892 3.61R

110 2750 41.000 34.201 1.095 6.799 3.44RX

113 2935 41.000 30.262 0.759 10.738 5.05R

114 2940 24.000 30.251 0.760 -6.251 -2.94R

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.

  1. Look at the results of Regression 5 and Regression 4. But don’t forget what has gone before.
  2. Do an F test to see if Regression 5 is better than Regression 4. If you can include the results from my forcing variables in the last stepwise regression. (6)
  3. Should I have included another dummy variable to represent 4-wheel drive? Why? (2)
  4. Are there any coefficients in Regression 5 that have a sign that you would not expect? Why? (1)
  5. A Chevrolet Suburban is an SUV with rear wheel drive and 285 horsepower, that takes Regular fuel, has a length of 219 inches, a width of 79 inches, a weight of 5590 pounds, a cargo volume of 77.0 square feet and a turning circle of 46 Feet (!!! Maybe it was inches?). How do the predictions for MPG in Equations 2 and 4 differ in percentage terms? (3) [29]

Why do I like this regression better than the pr

MTB > Regress 'MPG' 8 'Weight' 'SUV_D' 'Turning Circle' 'Horsepower' &

CONT> 'HPsq' 'AWD_D' 'FWD_D' 'RWD_D';

SUBC> GNormalplot;

SUBC> NoDGraphs;

SUBC> RType 1;

SUBC> Constant;

SUBC> VIF;

SUBC> Press;

SUBC> Brief 2.

Regression Analysis: MPG versus Weight, SUV_D, ... (Regression 5)

The regression equation is

MPG = 66.4 - 0.00248 Weight - 3.83 SUV_D - 0.254 Turning Circle

- 0.251 Horsepower + 0.000618 HPsq - 1.21 AWD_D - 2.10 FWD_D - 1.70 RWD_D

Predictor Coef SE Coef T P VIF

Constant 66.435 4.400 15.10 0.000

Weight -0.0024795 0.0006077 -4.08 0.000 4.4

SUV_D -3.8302 0.6814 -5.62 0.000 2.0

Turning Circle -0.2541 0.1116 -2.28 0.025 2.2

Horsepower -0.25082 0.03122 -8.03 0.000 48.6

HPsq 0.00061833 0.00008244 7.50 0.000 46.7

AWD_D -1.213 1.620 -0.75 0.455 3.4

FWD_D -2.103 1.490 -1.41 0.161 11.2

RWD_D -1.697 1.434 -1.18 0.239 8.6

S = 2.26416 R-Sq = 78.3% R-Sq(adj) = 76.8%

PRESS = 727.840 R-Sq(pred) = 72.27%

Analysis of Variance

Source DF SS MS F P

Regression 8 2055.33 256.92 50.12 0.000

Residual Error 111 569.03 5.13

Total 119 2624.37

Source DF Seq SS

Weight 1 1602.61

SUV_D 1 49.58

Turning Circle 1 52.45

Horsepower 1 57.33

HPsq 1 280.82

AWD_D 1 2.00

FWD_D 1 3.36

RWD_D 1 7.17

Unusual Observations

Obs Weight MPG Fit SE Fit Residual St Resid

34 7270 10.000 5.609 1.377 4.391 2.44RX

57 4735 14.000 13.622 1.447 0.378 0.22 X

72 4720 15.000 15.901 1.374 -0.901 -0.50 X

107 2150 38.000 30.231 0.574 7.769 3.55R

109 5435 14.000 13.477 1.338 0.523 0.29 X

110 2750 41.000 34.346 1.106 6.654 3.37RX

113 2935 41.000 30.341 0.765 10.659 5.00R

114 2940 24.000 30.329 0.766 -6.329 -2.97R

R denotes an observation with a large standardized residual.

X denotes an observation whose X value gives it large influence.

II. Do at least 4 of the following 6 Problems (at least 15 each) (or do sections adding to at least 60 points – (Anything extra you do helps, and grades wrap around) . Show your work! State and where applicable. Use a significance level of 5% unless noted otherwise. Do not answer questions without citing appropriate statistical tests – That is, explain your hypotheses and what values from what table were used to test them. Clearly label what section of each problem you are doing! The entire test has 175 points, but 100 is considered a perfect score.

Exhibit 1. A tear-off copy of this exhibit appears at the end of the exam.

An entrepreneur believes that her business is growing steadily and wants to compute a trend line for her output against time . She also decides to repeat the regression after adding as a second independent variable. Her data and results follow. The statistics have been relabeled ‘t-ratio’ to prevent confusion with .

1

252y0541 5/7/05

Row

1 53.43 1 1

2 59.09 2 4

3 59.58 3 9

4 64.75 4 16

5 68.65 5 25

6 65.53 6 36

7 68.44 7 49

8 70.93 8 64

9 72.85 9 81

10 73.60 10 100

11 72.93 11 121

12 75.14 12 144

13 73.88 13 169

14 76.55 14 196

15 79.05 15 225

Regression Analysis: Y versus T

The regression equation is

Y = 56.7 + 1.54 T

Predictor Coef SE Coef t-ratio P

Constant 56.659 1.283 44.15 0.000

T 1.5377 0.1411 10.89 0.000

S = 2.36169 R-Sq = 90.1% R-Sq(adj) = 89.4%

Regression Analysis: Y versus T, TSQ

The regression equation is

Y = 52.4 + 3.04 T - 0.0939 TSQ

Predictor Coef SE Coef t-ratio P

Constant 52.401 1.545 33.91 0.000

T 3.0405 0.4444 6.84 0.000

TSQ -0.09392 0.02701 -3.48 0.005

S = 1.73483 R-Sq = 95.1%

1

252y0541 5/7/05

If you need them, her means and spare parts are below.

1

252y0541 5/7/05

68.9600

8.00

82.6667

280

75805.3

734.556

430.550

6501.33

4480.00

1

252x0541 4/22/05

1. Do the following using Exhibit 1.

a) Explain what numbers in the printout were used to compute the t-ratio 6.84, what table value you would compare it with to do a 2-sided 1% significance test and whether and why the coefficient is significant. (3)

b) The entrepreneur looked at the residual analysis of the first regression and decided that she needs a time squared term. What is she likely to have seen to cause her to make that decision? (1)

c) Use the values of to do an F test to see if the addition of the term makes a significant improvement in the regression. (4)

d) Get (Compute) (adjusted for degrees of freedom) for the second regression and explain what it seems to show. (2)

e) In the first regression, the Durbin-Watson statistic was 1.07 and for the second it was 1.94. What do these numbers indicate? (Do a significance test.) (5)

f) For the second regression, make a prediction of the output in the 16th year and use the suggestion in the outline to make it into a prediction interval. Why would a confidence interval be inappropriate here? (3)

g) (Extra Credit) Find the partial correlation,. (2) [15]

a) and .To test use . So . Compare this with . Since the computed t-ratio is larger than the table value, reject and conclude that the coefficient is significant.