Solutions to Homework Assignment 5 (STA 4234, Fall 2013)
7.2. Minitab output
Regression Analysis: y versus x, x2
The regression equation is
y = 1.63 - 1.23 x + 1.49 x2
Predictor Coef SECoef T P VIF
Constant 1.63300 0.00420 389.18 0.000
x -1.23218 0.00701 -175.78 0.000 19.906
x2 1.49455 0.00248 601.64 0.000 19.906
S = 0.00356753 R-Sq = 100.0% R-Sq(adj) = 100.0%
PRESS = 0.000220200 R-Sq(pred) = 100.00%
Analysis of Variance
Source DF SS MS F P
Regression 2 47.310 23.655 1858613.46 0.000
Residual Error 7 0.000 0.000
Total 9 47.310
Source DF Seq SS
x 1 42.703
x2 1 4.607
- The regression equation isy = 1.63 - 1.23 x + 1.49 x2
- F=1858613.46 with p=0.000 which indicates the significance of regression.
- T=601.64 with p=0.000 so x^2 is significant and justifies the need for the quadratic term in this model.
- Since it is a quadratic model, there can be potential hazards in extrapolating.
7.4.
Regression Analysis: y versus x, x^2
The regression equation is
y = - 4.5 + 1.38 x + 1.47 x^2
Predictor Coef SECoef T P VIF
Constant -4.46 14.63 -0.30 0.768
x 1.384 5.497 0.25 0.807 201.170
x^2 1.4670 0.4936 2.97 0.016 201.170
S = 1.65731 R-Sq = 99.6% R-Sq(adj) = 99.5%
PRESS = 35.3788 R-Sq(pred) = 99.39%
Analysis of Variance
Source DF SS MS F P
Regression 2 5740.6 2870.3 1044.99 0.000
Residual Error 9 24.7 2.7
Lack of Fit 5 24.3 4.9 48.70 0.001
Pure Error 4 0.4 0.1
Total 11 5765.3
5 rows with no replicates
Source DF Seq SS
x 1 5716.3
x^2 1 24.3
- The regression equation isy = - 4.5 + 1.38 x + 1.47 x^2
- F=1044.99 with p=0.000 which indicates the significance of regression.
- F=48.70 with p=0.001 which indicates lack of fit and adequacy of the second-order model.
- T=2.97 with p=0.016 which is significant and indicates the quadratic term cannot be deleted from this model.
8.6.
Regression Analysis: y versus x7, x8, x51, X52
The regression equation is
y = 19.4 - 0.007 x7 - 0.00634 x8 + 0.46 x51 + 2.33 X52
Predictor Coef SE Coef T P VIF
Constant 19.353 9.525 2.03 0.054
x7 -0.0068 0.1188 -0.06 0.955 2.028
x8 -0.006337 0.001719 -3.69 0.001 1.953
x51 0.461 2.466 0.19 0.854 7.755
X52 2.333 2.484 0.94 0.357 7.907
S = 2.33706 R-Sq = 61.6% R-Sq(adj) = 54.9%
PRESS = * R-Sq(pred) = *%
Analysis of Variance
Source DF SS MS F P
Regression 4 201.342 50.335 9.22 0.000
Residual Error 23 125.622 5.462
Total 27 326.964
Source DF Seq SS
x7 1 97.238
x8 1 81.828
x51 1 17.457
X52 1 4.819
Unusual Observations
Obs x7 y Fit SE Fit Residual St Resid
4 61.4 13.000 7.335 0.805 5.665 2.58R
25 54.9 6.000 6.000 2.337 -0.000 * X
We introduce two indicator variables based on the values of x5: x51=1 if x5 is negative, otherwise x51=0; x52=1 if x5 is positive, otherwise x52=0. This yields the regression equation
y = 19.4 - 0.007 x7 - 0.00634 x8 + 0.46 x51 + 2.33 X52.
The effect of turnover is assessed by F=(17.457+4.819)/2/(5.462)=2.04 which is not significant.
8.16. Minitab output for the original data set without transformation
Regression Analysis: INHIBIT versus UVB, x2
The regression equation is
INHIBIT = - 15.7 + 923 UVB + 21.2 x2
Predictor Coef SECoef T P
Constant -15.734 7.322 -2.15 0.050
UVB 923.2 218.8 4.22 0.001
x2 21.159 5.906 3.58 0.003
S = 10.0878 R-Sq = 59.1% R-Sq(adj) = 53.2%
Analysis of Variance
Source DF SS MS F P
Regression 2 2056.3 1028.1 10.10 0.002
Residual Error 14 1424.7 101.8
Total 16 3481.0
Source DF Seq SS
UVB 1 750.0
x2 1 1306.3
We introduce an indicator variable x2 for surface: x2=1 if the SURFACE is deep, and x2=0 if the SURFACE is surface. WE run the initial regression analysis and residual analysis and found that the model has violated the assumption of constant variance. Use Box-Cox transformation on the response variable INHIBIT, we need to transform INHIBIT by taking square root, which gives us the new response variable y*. Then we re-run the regression analysis based on the transformed data.
The regression equation is
y* = - 0.264 + 121 UVB + 2.25 x2
Predictor Coef SECoef T P
Constant -0.2640 0.8600 -0.31 0.763
UVB 121.10 25.70 4.71 0.000
x2 2.2520 0.6936 3.25 0.006
S = 1.18479 R-Sq = 62.1% R-Sq(adj) = 56.6%
Analysis of Variance
Source DF SS MS F P
Regression 2 32.151 16.076 11.45 0.001
Residual Error 14 19.652 1.404
Total 16 51.804
Source DF Seq SS
UVB 1 17.355
x2 1 14.796
Obs UVB y* Fit SE Fit Residual St Resid
1 0.0000 0.000 1.988 0.519 -1.988 -1.87
2 0.0000 1.000 1.988 0.519 -0.988 -0.93
3 0.0100 2.449 3.199 0.389 -0.750 -0.67
4 0.0100 2.646 0.947 0.654 1.699 1.72
5 0.0200 2.646 2.158 0.499 0.488 0.45
6 0.0300 2.646 3.369 0.449 -0.723 -0.66
7 0.0400 3.000 4.580 0.536 -1.580 -1.49
8 0.0100 3.082 3.199 0.389 -0.117 -0.10
9 0.0000 3.162 1.988 0.519 1.174 1.10
10 0.0300 3.317 3.369 0.449 -0.052 -0.05
11 0.0300 3.536 3.369 0.449 0.167 0.15
12 0.0100 3.742 3.199 0.389 0.543 0.48
13 0.0300 4.472 5.621 0.556 -1.149 -1.10
14 0.0400 4.583 4.580 0.536 0.003 0.00
15 0.0200 5.000 4.410 0.405 0.590 0.53
16 0.0300 6.245 5.621 0.556 0.624 0.60
17 0.0300 7.681 5.621 0.556 2.060 1.97
HI1COOK1
0.1922040.276435
0.1922040.068278
0.1075270.018008
0.3051080.432968
0.1774190.014812
0.1438170.024369
0.2043010.191280
0.1075270.000437
0.1922040.096446
0.1438170.000128
0.1438170.001292
0.1075270.009440
0.2204300.113675
0.2043010.000001
0.1169350.012396
0.2204300.033541
0.2204300.365566
The regression equation based on transformed data is
y* = - 0.264 + 121 UVB + 2.25 x2
F=11.45 with p-0.001 indicates the significance of the regression. R-Sq = 62.1% R-Sq(adj) = 56.6%. Based on the cutoff value 2p/n=2(3)/17=0.3529 of hii and the cutoff value 1 of Cook’s Di, we find that there is no influential observations. The residual plots confirm the assumptions are not violated.
9.6.
Correlations: x2, x7, x8
x2 x7 x8
x2 1
x7 -0.197 1
x8 -0.051 -0.685 1
The correlation matrix indicates that x7 and x8 are moderately near-linear dependent.
Regression Analysis: y versus x2, x7, x8
The regression equation is
y = - 1.81 + 0.00360 x2 + 0.194 x7 - 0.00482 x8
Predictor Coef SE Coef T P VIF
Constant -1.808 7.901 -0.23 0.821
x2 0.0035981 0.0006950 5.18 0.000 1.116
x7 0.19396 0.08823 2.20 0.038 2.097
x8 -0.004815 0.001277 -3.77 0.001 2.021
S = 1.70624 R-Sq = 78.6% R-Sq(adj) = 76.0%
PRESS = 87.4612 R-Sq(pred) = 73.25%
Analysis of Variance
Source DF SS MS F P
Regression 3 257.094 85.698 29.44 0.000
Residual Error 24 69.870 2.911
Total 27 326.964
Source DF Seq SS
x2 1 76.193
x7 1 139.501
x8 1 41.400
Eigenvalues
1.70072
1.02701
0.27227
The VIF values are VIF2=1.116, VIF7=2.097 and VIF8=2.021. The condition number of X’X is 1.70072/(0.27227)=6.246 which is very small (much smaller than 100). So there is no evidence of multicollinearity.
9.22.
Regression Analysis: Sy versus Sx1, Sx2, ...
The regression equation is
Sy = 0.0003 - 1.40 Sx1 - 0.493 Sx2 + 1.53 SX3 + 0.066 Sx4 + 0.493 Sx5
+ 0.054 Sx6 - 0.338 Sx7 + 0.606 Sx8 - 0.360 Sx9 - 0.768 Sx10 + 0.046 Sx11
Predictor Coef SECoef T P VIF
Constant 0.00027 0.09396 0.00 0.998
Sx1 -1.401 1.044 -1.34 0.196 119.488
Sx2 -0.4925 0.6252 -0.79 0.441 42.801
SX3 1.526 1.168 1.31 0.208 149.234
Sx4 0.0661 0.1372 0.48 0.636 2.060
Sx5 0.4931 0.2657 1.86 0.080 7.729
Sx6 0.0543 0.2206 0.25 0.808 5.325
Sx7 -0.3379 0.3278 -1.03 0.316 11.761
Sx8 0.6065 0.4370 1.39 0.182 20.918
Sx9 -0.3605 0.2930 -1.23 0.234 9.397
Sx10 -0.7677 0.8849 -0.87 0.397 85.744
Sx11 0.0458 0.2168 0.21 0.835 5.145
S = 0.514614 R-Sq = 83.5% R-Sq(adj) = 73.5%
PRESS = 19.8280 R-Sq(pred) = 31.57%
Analysis of Variance
Source DF SS MS F P
Regression 11 24.2084 2.2008 8.31 0.000
Residual Error 18 4.7669 0.2648
Total 29 28.9753
Source DF Seq SS
Sx1 1 22.0410
Sx2 1 0.1423
SX3 1 0.0941
Sx4 1 0.3966
Sx5 1 0.0569
Sx6 1 0.2394
Sx7 1 0.1760
Sx8 1 0.0042
Sx9 1 0.8431
Sx10 1 0.2031
Sx11 1 0.0118
Principal-component regression analysis
Principal Component Analysis: Sx1, Sx2, SX3, Sx4, Sx5, Sx6, Sx7, Sx8, Sx9, Sx10
Eigenanalysis of the Correlation Matrix
Eigenvalue 7.7013 1.4036 0.7738 0.5770 0.2115 0.1418 0.0953 0.0501
Proportion 0.700 0.128 0.070 0.052 0.019 0.013 0.009 0.005
Cumulative 0.700 0.828 0.898 0.951 0.970 0.983 0.991 0.996
Eigenvalue 0.0334 0.0083 0.0038
Proportion 0.003 0.001 0.000
Cumulative 0.999 1.000 1.000
Variable PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
Sx1 0.353 -0.114 0.036 -0.006 0.028 -0.090 0.272 0.262
Sx2 0.330 -0.261 0.077 -0.195 -0.143 -0.239 0.348 -0.053
SX3 0.351 -0.140 0.042 -0.004 -0.085 -0.185 0.354 0.066
Sx4 -0.161 -0.553 0.118 0.786 0.097 0.092 0.092 0.061
Sx5 -0.266 -0.347 -0.433 -0.352 0.517 0.073 0.065 0.438
Sx6 0.205 -0.548 0.417 -0.381 -0.008 0.382 -0.379 -0.166
Sx7 -0.304 -0.352 -0.221 -0.134 -0.050 -0.577 -0.019 -0.557
Sx8 0.323 -0.078 -0.371 0.180 -0.201 -0.206 -0.674 0.156
Sx9 0.303 0.007 -0.547 0.095 0.106 0.519 0.195 -0.525
Sx10 0.345 -0.100 -0.268 0.041 -0.029 -0.141 -0.061 0.205
Sx11 0.312 0.182 0.242 0.119 0.800 -0.276 -0.165 -0.222
Variable PC9 PC10 PC11
Sx1 -0.499 0.284 0.617
Sx2 0.650 -0.296 0.259
SX3 -0.026 0.476 -0.676
Sx4 0.065 -0.052 0.010
Sx5 0.140 0.087 -0.044
Sx6 -0.132 0.004 -0.062
Sx7 -0.254 0.057 0.051
Sx8 0.250 0.293 0.101
Sx9 0.013 0.054 0.055
Sx10 -0.394 -0.709 -0.271
Sx11 0.062 -0.018 -0.010
Regression Analysis: Sy versus Z1, Z2, Z3, Z4, Z5, Z6, Z7
The regression equation is
Sy = - 0.0003 - 0.313 Z1 - 0.0078 Z2 + 0.056 Z3 + 0.051 Z4 + 0.077 Z5 + 0.004 Z6
- 0.428 Z7
Predictor Coef SECoef T P VIF
Constant -0.00027 0.09885 -0.00 0.998
Z1 -0.31292 0.03623 -8.64 0.000 1.000
Z2 -0.00783 0.08486 -0.09 0.927 1.000
Z3 0.0557 0.1143 0.49 0.631 1.000
Z4 0.0507 0.1323 0.38 0.705 1.000
Z5 0.0774 0.2186 0.35 0.727 1.000
Z6 0.0036 0.2670 0.01 0.989 1.000
Z7 -0.4276 0.3257 -1.31 0.203 1.000
S = 0.541401 R-Sq = 77.7% R-Sq(adj) = 70.7%
PRESS = 11.9984 R-Sq(pred) = 58.59%
Analysis of Variance
Source DF SS MS F P
Regression 7 22.5268 3.2181 10.98 0.000
Residual Error 22 6.4485 0.2931
Total 29 28.9753
Source DF Seq SS
Z1 1 21.8697
Z2 1 0.0025
Z3 1 0.0696
Z4 1 0.0430
Z5 1 0.0367
Z6 1 0.0001
Z7 1 0.5052
Obs Z1 Sy Fit SE Fit Residual St Resid
1 1.77 -0.1818 -0.3413 0.2899 0.1595 0.35
2 1.71 -0.4848 -0.3960 0.2173 -0.0889 -0.18
3 -0.18 -0.0064 0.1665 0.2364 -0.1728 -0.35
4 1.19 -0.2855 -0.4883 0.1892 0.2029 0.40
5 -1.30 0.0048 0.2563 0.4053 -0.2516 -0.70
6 2.25 -1.4099 -1.0727 0.4109 -0.3372 -0.96
7 -0.63 0.3317 0.3733 0.2742 -0.0416 -0.09
8 -0.66 0.2281 0.3436 0.2597 -0.1155 -0.24
9 -4.54 2.3381 1.3671 0.2856 0.9710 2.11R
10 -5.25 1.6523 1.6789 0.3228 -0.0266 -0.06
11 1.26 -0.5646 -0.2561 0.2692 -0.3085 -0.66
12 -4.62 2.6252 1.4714 0.2214 1.1538 2.34R
13 -2.73 0.2329 0.6333 0.2807 -0.4004 -0.86
14 -0.29 -0.0542 -0.2773 0.4156 0.2231 0.64
15 -3.26 0.0415 0.9341 0.2501 -0.8927 -1.86
16 0.79 -0.3573 -0.1990 0.1894 -0.1583 -0.31
17 3.86 -0.9011 -1.2518 0.2497 0.3506 0.73
18 3.98 -0.8214 -1.2022 0.2565 0.3808 0.80
19 1.17 -0.3573 -0.1738 0.2430 -0.1835 -0.38
20 0.81 -0.5789 -0.2620 0.2429 -0.3169 -0.65
21 -0.62 0.5582 0.3703 0.2740 0.1880 0.40
22 2.16 0.2281 -0.8043 0.2639 1.0323 2.18R
23 -5.43 1.8915 1.7926 0.3338 0.0989 0.23
24 4.27 -1.0797 -1.4739 0.2874 0.3942 0.86
25 -4.19 0.6156 1.2524 0.3362 -0.6367 -1.50
26 1.46 -0.0494 -0.3233 0.2416 0.2739 0.57
27 1.77 -0.9793 -0.5549 0.2522 -0.4244 -0.89
28 1.84 -1.0797 -0.5672 0.2604 -0.5125 -1.08
29 2.38 -1.0000 -0.7448 0.2119 -0.2552 -0.51
30 1.03 -0.5646 -0.2588 0.2371 -0.3057 -0.63
R denotes an observation with a large standardized residual.
- The residual sum of squares for least squares is SS_Res=4.7669 and the residual sum of squares for principal-component regression is SS-Res=6.4485, which is an increase of (6.4485-4.7669)/(4.7669)=0.3527 or 35.27%.
- The coefficient vector is reduced from 11 terms to 7 terms.
- This part is omitted, since we can’t do ordinary ridge model using Minitab.