SOLUTIONS ASSIGNMENT 4
Q2(a)
Stem-and-leaf for X1:
N = 24 Median = 5.4
Quartiles = 4.35, 6.35
Decimal point is at the colon
3 : 1579
4 : 025589
5 : 135689
6 : 02568
7 : 02
8 : 0
Stem-and-leaf for X2:
N = 24 Median = 25
Quartiles = 16.5, 33.5
Decimal point is 1 place to the right of the colon
0 : 579
1 : 1358
2 : 0133557
3 : 01334559
4 : 07
Stem-and-leaf plot for X3:
N = 24 Median = 6
Quartiles = 5, 7
Decimal point is at the colon
3 : 5
4 : 0349
5 : 000589
6 : 001447
7 : 00456
8 : 03
Three plots look symmetric. No obvious lack of normality and no obvious outliers.
Q2 (b)
Correlation matrix of the explanatory variables:
X1 X2 X3
X1 1.0000000 0.4669511 0.3227612
X2 0.4669511 1.0000000 0.2537530
X3 0.3227612 0.2537530 1.0000000
-Mild evidence of colinearity between X1 and X2 as evidenced by the slight linear pattern in their plot and by the r=0.467 in the correlation matrix.
-X1 and X2 show an obvious linear relationship with Y, less so for X3.
(c)
*** Linear Model ***
Call: lm(formula = Y ~ X1 + X2 + X3, data = MathSalaries, na.action = na.exclude)
Residuals:
Min 1Q Median 3Q Max
-3.246 -0.9593 0.03772 1.199 3.309
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 17.8469 2.0019 8.9151 0.0000
X1 1.1031 0.3296 3.3471 0.0032
X2 0.3215 0.0371 8.6643 0.0000
X3 1.2889 0.2985 4.3184 0.0003
Residual standard error: 1.753 on 20 degrees of freedom
Multiple R-Squared: 0.9109
F-statistic: 68.12 on 3 and 20 degrees of freedom, the p-value is 1.124e-010
The fitted linear regression model is:
Y = 17.847 + 1.103 * X1 + 0.322 * X2 + 1.289 * X3
Q2 (d)
Residuals:
[,1]
1 0.73589638
2 1.92686291
3 -0.09841255
4 3.30885958
5 -0.71424514
6 1.24977985
7 -2.11985473
8 1.98449816
9 -0.25009046
10 1.30089606
11 0.90628607
12 -3.23820780
13 -0.56287728
14 -1.19305333
15 -1.31156330
16 -0.75176635
17 0.17384822
18 0.54778307
19 -3.24628644
20 1.18273678
21 -0.8813598
22 -1.4479381
23 0.8148471
24 1.6833611
Residuals look symmetric and show no obvious sign of a lack of normality or any outliers.
Q2 (e)
The plots vs the X-variables show no cone-shapes thus suggesting the assumption of constant variance is OK. They also show no obvious curvature.
Similarly for the plot vs Y-hat
Q2 (f)
Fitted model is :
Y ~ 1.2039*X1 -0.0232*X2 -0.0952*X3+0.004*X1X2 -0.0204*X1X3 0.0505*X2X3
Again no obvious pattern in the residuals. However, if we had plotted Y-hat from the original model against these interactions we may have seen some indication as to whether they should have been included in the model or not.
Q3(a)
Fitted model is:
Y = 168.6 + 2.0344*X
QQ plot:
Suggest that the error term is not normally distributed as deviates from a straight line.
(b)
i)
Fitted model is:
Y = 158.2875 + 2.8547 * X -0.0146 * X^2
p-value (0.0014) (0.2610)
Since the p-value for the quadratic term in the ANOVA Table is 0.26, much greater than 0.05, we do not reject the null hypothesis that the coefficient for the quadratic term is 0. Thus we can say there is no evidence that the quadratic term is significant.
ii)
When X=30, the predicted Y is 230.7445, se=1.25
0.95 CI is (228.0414, 233.4476)
iii)
This residual plot looks more random (no pattern) than the residual plot of the simple model, suggesting the assumptions are met.
Q4 (a)
The three best subset models according to the Cp criterion are:
Model1: log10(stay)~ 0.61043+ 0.00388*Age + 0.00117* xrayrat + 0.00029261*census (Cp = 3.8112)
Model2: log10(stay)~Age + xrayrat + census + nurses (Cp = 3.8638)
Model3: log10(stay)~Age + xrayrat + beds +census (Cp = 4.2696)
The bias for model1 is 0.942709
Model2:
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 0.6398 0.0904 7.0756 0.0000
age 0.0034 0.0016 2.0996 0.0406
xrayrat 0.0012 0.0004 2.8226 0.0067
census 0.0004 0.0001 4.0856 0.0002
nurses -0.0002 0.0001 -1.4110 0.1642
Residual standard error: 0.05475 on 52 degrees of freedom
Multiple R-Squared: 0.5369
Bias=1.872797
Model3:
Coefficients:
Value Std. Error t value Pr(>|t|)
(Intercept) 0.6348 0.1212 5.2373 0.0000
age 0.0037 0.0020 1.8122 0.0758
xrayrat 0.0014 0.0004 3.4043 0.0013
beds -0.0005 0.0002 -2.0792 0.0426
census 0.0008 0.0003 2.5271 0.0146
Residual standard error: 0.06303 on 51 degrees of freedom
Multiple R-Squared: 0.3486
Bias=0.5066972
The third model has the smallest bias