Solutions to Homework Assignment 5 (STA 4234, Fall 2013)

7.2. Minitab output

Regression Analysis: y versus x, x2

The regression equation is

y = 1.63 - 1.23 x + 1.49 x2

Predictor Coef SECoef T P VIF

Constant 1.63300 0.00420 389.18 0.000

x -1.23218 0.00701 -175.78 0.000 19.906

x2 1.49455 0.00248 601.64 0.000 19.906

S = 0.00356753 R-Sq = 100.0% R-Sq(adj) = 100.0%

PRESS = 0.000220200 R-Sq(pred) = 100.00%

Analysis of Variance

Source DF SS MS F P

Regression 2 47.310 23.655 1858613.46 0.000

Residual Error 7 0.000 0.000

Total 9 47.310

Source DF Seq SS

x 1 42.703

x2 1 4.607

The regression equation isy = 1.63 - 1.23 x + 1.49 x2

F=1858613.46 with p=0.000 which indicates the significance of regression.

T=601.64 with p=0.000 so x^2 is significant and justifies the need for the quadratic term in this model.

Since it is a quadratic model, there can be potential hazards in extrapolating.

7.4.

Regression Analysis: y versus x, x^2

The regression equation is

y = - 4.5 + 1.38 x + 1.47 x^2

Predictor Coef SECoef T P VIF

Constant -4.46 14.63 -0.30 0.768

x 1.384 5.497 0.25 0.807 201.170

x^2 1.4670 0.4936 2.97 0.016 201.170

S = 1.65731 R-Sq = 99.6% R-Sq(adj) = 99.5%

PRESS = 35.3788 R-Sq(pred) = 99.39%

Analysis of Variance

Source DF SS MS F P

Regression 2 5740.6 2870.3 1044.99 0.000

Residual Error 9 24.7 2.7

Lack of Fit 5 24.3 4.9 48.70 0.001

Pure Error 4 0.4 0.1

Total 11 5765.3

5 rows with no replicates

Source DF Seq SS

x 1 5716.3

x^2 1 24.3

The regression equation isy = - 4.5 + 1.38 x + 1.47 x^2
F=1044.99 with p=0.000 which indicates the significance of regression.
F=48.70 with p=0.001 which indicates lack of fit and adequacy of the second-order model.
T=2.97 with p=0.016 which is significant and indicates the quadratic term cannot be deleted from this model.

8.6.

Regression Analysis: y versus x7, x8, x51, X52

The regression equation is

y = 19.4 - 0.007 x7 - 0.00634 x8 + 0.46 x51 + 2.33 X52

Predictor Coef SE Coef T P VIF

Constant 19.353 9.525 2.03 0.054

x7 -0.0068 0.1188 -0.06 0.955 2.028

x8 -0.006337 0.001719 -3.69 0.001 1.953

x51 0.461 2.466 0.19 0.854 7.755

X52 2.333 2.484 0.94 0.357 7.907

S = 2.33706 R-Sq = 61.6% R-Sq(adj) = 54.9%

PRESS = * R-Sq(pred) = *%

Analysis of Variance

Source DF SS MS F P

Regression 4 201.342 50.335 9.22 0.000

Residual Error 23 125.622 5.462

Total 27 326.964

Source DF Seq SS

x7 1 97.238

x8 1 81.828

x51 1 17.457

X52 1 4.819

Unusual Observations

Obs x7 y Fit SE Fit Residual St Resid

4 61.4 13.000 7.335 0.805 5.665 2.58R

25 54.9 6.000 6.000 2.337 -0.000 * X

We introduce two indicator variables based on the values of x5: x51=1 if x5 is negative, otherwise x51=0; x52=1 if x5 is positive, otherwise x52=0. This yields the regression equation

y = 19.4 - 0.007 x7 - 0.00634 x8 + 0.46 x51 + 2.33 X52.

The effect of turnover is assessed by F=(17.457+4.819)/2/(5.462)=2.04 which is not significant.

8.16. Minitab output for the original data set without transformation

Regression Analysis: INHIBIT versus UVB, x2

The regression equation is

INHIBIT = - 15.7 + 923 UVB + 21.2 x2

Predictor Coef SECoef T P

Constant -15.734 7.322 -2.15 0.050

UVB 923.2 218.8 4.22 0.001

x2 21.159 5.906 3.58 0.003

S = 10.0878 R-Sq = 59.1% R-Sq(adj) = 53.2%

Analysis of Variance

Source DF SS MS F P

Regression 2 2056.3 1028.1 10.10 0.002

Residual Error 14 1424.7 101.8

Total 16 3481.0

Source DF Seq SS

UVB 1 750.0

x2 1 1306.3

We introduce an indicator variable x2 for surface: x2=1 if the SURFACE is deep, and x2=0 if the SURFACE is surface. WE run the initial regression analysis and residual analysis and found that the model has violated the assumption of constant variance. Use Box-Cox transformation on the response variable INHIBIT, we need to transform INHIBIT by taking square root, which gives us the new response variable y*. Then we re-run the regression analysis based on the transformed data.

The regression equation is

y* = - 0.264 + 121 UVB + 2.25 x2

Predictor Coef SECoef T P

Constant -0.2640 0.8600 -0.31 0.763

UVB 121.10 25.70 4.71 0.000

x2 2.2520 0.6936 3.25 0.006

S = 1.18479 R-Sq = 62.1% R-Sq(adj) = 56.6%

Analysis of Variance

Source DF SS MS F P

Regression 2 32.151 16.076 11.45 0.001

Residual Error 14 19.652 1.404

Total 16 51.804

Source DF Seq SS

UVB 1 17.355

x2 1 14.796

Obs UVB y* Fit SE Fit Residual St Resid

1 0.0000 0.000 1.988 0.519 -1.988 -1.87

2 0.0000 1.000 1.988 0.519 -0.988 -0.93

3 0.0100 2.449 3.199 0.389 -0.750 -0.67

4 0.0100 2.646 0.947 0.654 1.699 1.72

5 0.0200 2.646 2.158 0.499 0.488 0.45

6 0.0300 2.646 3.369 0.449 -0.723 -0.66

7 0.0400 3.000 4.580 0.536 -1.580 -1.49

8 0.0100 3.082 3.199 0.389 -0.117 -0.10

9 0.0000 3.162 1.988 0.519 1.174 1.10

10 0.0300 3.317 3.369 0.449 -0.052 -0.05

11 0.0300 3.536 3.369 0.449 0.167 0.15

12 0.0100 3.742 3.199 0.389 0.543 0.48

13 0.0300 4.472 5.621 0.556 -1.149 -1.10

14 0.0400 4.583 4.580 0.536 0.003 0.00

15 0.0200 5.000 4.410 0.405 0.590 0.53

16 0.0300 6.245 5.621 0.556 0.624 0.60

17 0.0300 7.681 5.621 0.556 2.060 1.97

HI1COOK1

0.1922040.276435

0.1922040.068278

0.1075270.018008

0.3051080.432968

0.1774190.014812

0.1438170.024369

0.2043010.191280

0.1075270.000437

0.1922040.096446

0.1438170.000128

0.1438170.001292

0.1075270.009440

0.2204300.113675

0.2043010.000001

0.1169350.012396

0.2204300.033541

0.2204300.365566

The regression equation based on transformed data is

y* = - 0.264 + 121 UVB + 2.25 x2

F=11.45 with p-0.001 indicates the significance of the regression. R-Sq = 62.1% R-Sq(adj) = 56.6%. Based on the cutoff value 2p/n=2(3)/17=0.3529 of hii and the cutoff value 1 of Cook’s Di, we find that there is no influential observations. The residual plots confirm the assumptions are not violated.

9.6.

Correlations: x2, x7, x8

x2 x7 x8

x2 1

x7 -0.197 1

x8 -0.051 -0.685 1

The correlation matrix indicates that x7 and x8 are moderately near-linear dependent.

Regression Analysis: y versus x2, x7, x8

The regression equation is

y = - 1.81 + 0.00360 x2 + 0.194 x7 - 0.00482 x8

Predictor Coef SE Coef T P VIF

Constant -1.808 7.901 -0.23 0.821

x2 0.0035981 0.0006950 5.18 0.000 1.116

x7 0.19396 0.08823 2.20 0.038 2.097

x8 -0.004815 0.001277 -3.77 0.001 2.021

S = 1.70624 R-Sq = 78.6% R-Sq(adj) = 76.0%

PRESS = 87.4612 R-Sq(pred) = 73.25%

Analysis of Variance

Source DF SS MS F P

Regression 3 257.094 85.698 29.44 0.000

Residual Error 24 69.870 2.911

Total 27 326.964

Source DF Seq SS

x2 1 76.193

x7 1 139.501

x8 1 41.400

Eigenvalues

1.70072

1.02701

0.27227

The VIF values are VIF2=1.116, VIF7=2.097 and VIF8=2.021. The condition number of X’X is 1.70072/(0.27227)=6.246 which is very small (much smaller than 100). So there is no evidence of multicollinearity.

9.22.

Regression Analysis: Sy versus Sx1, Sx2, ...

The regression equation is

Sy = 0.0003 - 1.40 Sx1 - 0.493 Sx2 + 1.53 SX3 + 0.066 Sx4 + 0.493 Sx5

+ 0.054 Sx6 - 0.338 Sx7 + 0.606 Sx8 - 0.360 Sx9 - 0.768 Sx10 + 0.046 Sx11

Predictor Coef SECoef T P VIF

Constant 0.00027 0.09396 0.00 0.998

Sx1 -1.401 1.044 -1.34 0.196 119.488

Sx2 -0.4925 0.6252 -0.79 0.441 42.801

SX3 1.526 1.168 1.31 0.208 149.234

Sx4 0.0661 0.1372 0.48 0.636 2.060

Sx5 0.4931 0.2657 1.86 0.080 7.729

Sx6 0.0543 0.2206 0.25 0.808 5.325

Sx7 -0.3379 0.3278 -1.03 0.316 11.761

Sx8 0.6065 0.4370 1.39 0.182 20.918

Sx9 -0.3605 0.2930 -1.23 0.234 9.397

Sx10 -0.7677 0.8849 -0.87 0.397 85.744

Sx11 0.0458 0.2168 0.21 0.835 5.145

S = 0.514614 R-Sq = 83.5% R-Sq(adj) = 73.5%

PRESS = 19.8280 R-Sq(pred) = 31.57%

Analysis of Variance

Source DF SS MS F P

Regression 11 24.2084 2.2008 8.31 0.000

Residual Error 18 4.7669 0.2648

Total 29 28.9753

Source DF Seq SS

Sx1 1 22.0410

Sx2 1 0.1423

SX3 1 0.0941

Sx4 1 0.3966

Sx5 1 0.0569

Sx6 1 0.2394

Sx7 1 0.1760

Sx8 1 0.0042

Sx9 1 0.8431

Sx10 1 0.2031

Sx11 1 0.0118

Principal-component regression analysis

Principal Component Analysis: Sx1, Sx2, SX3, Sx4, Sx5, Sx6, Sx7, Sx8, Sx9, Sx10

Eigenanalysis of the Correlation Matrix

Eigenvalue 7.7013 1.4036 0.7738 0.5770 0.2115 0.1418 0.0953 0.0501

Proportion 0.700 0.128 0.070 0.052 0.019 0.013 0.009 0.005

Cumulative 0.700 0.828 0.898 0.951 0.970 0.983 0.991 0.996

Eigenvalue 0.0334 0.0083 0.0038

Proportion 0.003 0.001 0.000

Cumulative 0.999 1.000 1.000

Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8

Sx1 0.353 -0.114 0.036 -0.006 0.028 -0.090 0.272 0.262

Sx2 0.330 -0.261 0.077 -0.195 -0.143 -0.239 0.348 -0.053

SX3 0.351 -0.140 0.042 -0.004 -0.085 -0.185 0.354 0.066

Sx4 -0.161 -0.553 0.118 0.786 0.097 0.092 0.092 0.061

Sx5 -0.266 -0.347 -0.433 -0.352 0.517 0.073 0.065 0.438

Sx6 0.205 -0.548 0.417 -0.381 -0.008 0.382 -0.379 -0.166

Sx7 -0.304 -0.352 -0.221 -0.134 -0.050 -0.577 -0.019 -0.557

Sx8 0.323 -0.078 -0.371 0.180 -0.201 -0.206 -0.674 0.156

Sx9 0.303 0.007 -0.547 0.095 0.106 0.519 0.195 -0.525

Sx10 0.345 -0.100 -0.268 0.041 -0.029 -0.141 -0.061 0.205

Sx11 0.312 0.182 0.242 0.119 0.800 -0.276 -0.165 -0.222

Variable PC9 PC10 PC11

Sx1 -0.499 0.284 0.617

Sx2 0.650 -0.296 0.259

SX3 -0.026 0.476 -0.676

Sx4 0.065 -0.052 0.010

Sx5 0.140 0.087 -0.044

Sx6 -0.132 0.004 -0.062

Sx7 -0.254 0.057 0.051

Sx8 0.250 0.293 0.101

Sx9 0.013 0.054 0.055

Sx10 -0.394 -0.709 -0.271

Sx11 0.062 -0.018 -0.010

Regression Analysis: Sy versus Z1, Z2, Z3, Z4, Z5, Z6, Z7

The regression equation is

Sy = - 0.0003 - 0.313 Z1 - 0.0078 Z2 + 0.056 Z3 + 0.051 Z4 + 0.077 Z5 + 0.004 Z6

- 0.428 Z7

Predictor Coef SECoef T P VIF

Constant -0.00027 0.09885 -0.00 0.998

Z1 -0.31292 0.03623 -8.64 0.000 1.000

Z2 -0.00783 0.08486 -0.09 0.927 1.000

Z3 0.0557 0.1143 0.49 0.631 1.000

Z4 0.0507 0.1323 0.38 0.705 1.000

Z5 0.0774 0.2186 0.35 0.727 1.000

Z6 0.0036 0.2670 0.01 0.989 1.000

Z7 -0.4276 0.3257 -1.31 0.203 1.000

S = 0.541401 R-Sq = 77.7% R-Sq(adj) = 70.7%

PRESS = 11.9984 R-Sq(pred) = 58.59%

Analysis of Variance

Source DF SS MS F P

Regression 7 22.5268 3.2181 10.98 0.000

Residual Error 22 6.4485 0.2931

Total 29 28.9753

Source DF Seq SS

Z1 1 21.8697

Z2 1 0.0025

Z3 1 0.0696

Z4 1 0.0430

Z5 1 0.0367

Z6 1 0.0001

Z7 1 0.5052

Obs Z1 Sy Fit SE Fit Residual St Resid

1 1.77 -0.1818 -0.3413 0.2899 0.1595 0.35

2 1.71 -0.4848 -0.3960 0.2173 -0.0889 -0.18

3 -0.18 -0.0064 0.1665 0.2364 -0.1728 -0.35

4 1.19 -0.2855 -0.4883 0.1892 0.2029 0.40

5 -1.30 0.0048 0.2563 0.4053 -0.2516 -0.70

6 2.25 -1.4099 -1.0727 0.4109 -0.3372 -0.96

7 -0.63 0.3317 0.3733 0.2742 -0.0416 -0.09

8 -0.66 0.2281 0.3436 0.2597 -0.1155 -0.24

9 -4.54 2.3381 1.3671 0.2856 0.9710 2.11R

10 -5.25 1.6523 1.6789 0.3228 -0.0266 -0.06

11 1.26 -0.5646 -0.2561 0.2692 -0.3085 -0.66

12 -4.62 2.6252 1.4714 0.2214 1.1538 2.34R

13 -2.73 0.2329 0.6333 0.2807 -0.4004 -0.86

14 -0.29 -0.0542 -0.2773 0.4156 0.2231 0.64

15 -3.26 0.0415 0.9341 0.2501 -0.8927 -1.86

16 0.79 -0.3573 -0.1990 0.1894 -0.1583 -0.31

17 3.86 -0.9011 -1.2518 0.2497 0.3506 0.73

18 3.98 -0.8214 -1.2022 0.2565 0.3808 0.80

19 1.17 -0.3573 -0.1738 0.2430 -0.1835 -0.38

20 0.81 -0.5789 -0.2620 0.2429 -0.3169 -0.65

21 -0.62 0.5582 0.3703 0.2740 0.1880 0.40

22 2.16 0.2281 -0.8043 0.2639 1.0323 2.18R

23 -5.43 1.8915 1.7926 0.3338 0.0989 0.23

24 4.27 -1.0797 -1.4739 0.2874 0.3942 0.86

25 -4.19 0.6156 1.2524 0.3362 -0.6367 -1.50

26 1.46 -0.0494 -0.3233 0.2416 0.2739 0.57

27 1.77 -0.9793 -0.5549 0.2522 -0.4244 -0.89

28 1.84 -1.0797 -0.5672 0.2604 -0.5125 -1.08

29 2.38 -1.0000 -0.7448 0.2119 -0.2552 -0.51

30 1.03 -0.5646 -0.2588 0.2371 -0.3057 -0.63

R denotes an observation with a large standardized residual.

The residual sum of squares for least squares is SS_Res=4.7669 and the residual sum of squares for principal-component regression is SS-Res=6.4485, which is an increase of (6.4485-4.7669)/(4.7669)=0.3527 or 35.27%.
The coefficient vector is reduced from 11 terms to 7 terms.
This part is omitted, since we can’t do ordinary ridge model using Minitab.