SOLUTIONS ASSIGNMENT 4

Q2(a)

Stem-and-leaf for X1:

N = 24 Median = 5.4

Quartiles = 4.35, 6.35

Decimal point is at the colon

3 : 1579

4 : 025589

5 : 135689

6 : 02568

7 : 02

8 : 0

Stem-and-leaf for X2:

N = 24 Median = 25

Quartiles = 16.5, 33.5

Decimal point is 1 place to the right of the colon

0 : 579

1 : 1358

2 : 0133557

3 : 01334559

4 : 07

Stem-and-leaf plot for X3:

N = 24 Median = 6

Quartiles = 5, 7

Decimal point is at the colon

3 : 5

4 : 0349

5 : 000589

6 : 001447

7 : 00456

8 : 03

Three plots look symmetric. No obvious lack of normality and no obvious outliers.

Q2 (b)

Correlation matrix of the explanatory variables:

X1 X2 X3

X1 1.0000000 0.4669511 0.3227612

X2 0.4669511 1.0000000 0.2537530

X3 0.3227612 0.2537530 1.0000000

-Mild evidence of colinearity between X1 and X2 as evidenced by the slight linear pattern in their plot and by the r=0.467 in the correlation matrix.

-X1 and X2 show an obvious linear relationship with Y, less so for X3.

(c)

*** Linear Model ***

Call: lm(formula = Y ~ X1 + X2 + X3, data = MathSalaries, na.action = na.exclude)

Residuals:

Min 1Q Median 3Q Max

-3.246 -0.9593 0.03772 1.199 3.309

Coefficients:

Value Std. Error t value Pr(>|t|)

(Intercept) 17.8469 2.0019 8.9151 0.0000

X1 1.1031 0.3296 3.3471 0.0032

X2 0.3215 0.0371 8.6643 0.0000

X3 1.2889 0.2985 4.3184 0.0003

Residual standard error: 1.753 on 20 degrees of freedom

Multiple R-Squared: 0.9109

F-statistic: 68.12 on 3 and 20 degrees of freedom, the p-value is 1.124e-010

The fitted linear regression model is:

Y = 17.847 + 1.103 * X1 + 0.322 * X2 + 1.289 * X3

Q2 (d)

Residuals:

[,1]

1 0.73589638

2 1.92686291

3 -0.09841255

4 3.30885958

5 -0.71424514

6 1.24977985

7 -2.11985473

8 1.98449816

9 -0.25009046

10 1.30089606

11 0.90628607

12 -3.23820780

13 -0.56287728

14 -1.19305333

15 -1.31156330

16 -0.75176635

17 0.17384822

18 0.54778307

19 -3.24628644

20 1.18273678

21 -0.8813598

22 -1.4479381

23 0.8148471

24 1.6833611

Residuals look symmetric and show no obvious sign of a lack of normality or any outliers.

Q2 (e)

The plots vs the X-variables show no cone-shapes thus suggesting the assumption of constant variance is OK. They also show no obvious curvature.

Similarly for the plot vs Y-hat

Q2 (f)

Fitted model is :

Y ~ 1.2039*X1 -0.0232*X2 -0.0952*X3+0.004*X1X2 -0.0204*X1X3 0.0505*X2X3

Again no obvious pattern in the residuals. However, if we had plotted Y-hat from the original model against these interactions we may have seen some indication as to whether they should have been included in the model or not.

Q3(a)

Fitted model is:

Y = 168.6 + 2.0344*X

QQ plot:

Suggest that the error term is not normally distributed as deviates from a straight line.

(b)

i)

Fitted model is:

Y = 158.2875 + 2.8547 * X -0.0146 * X^2

p-value (0.0014) (0.2610)

Since the p-value for the quadratic term in the ANOVA Table is 0.26, much greater than 0.05, we do not reject the null hypothesis that the coefficient for the quadratic term is 0. Thus we can say there is no evidence that the quadratic term is significant.

ii)

When X=30, the predicted Y is 230.7445, se=1.25

0.95 CI is (228.0414, 233.4476)

iii)

This residual plot looks more random (no pattern) than the residual plot of the simple model, suggesting the assumptions are met.

Q4 (a)

The three best subset models according to the Cp criterion are:

Model1: log10(stay)~ 0.61043+ 0.00388*Age + 0.00117* xrayrat + 0.00029261*census (Cp = 3.8112)

Model2: log10(stay)~Age + xrayrat + census + nurses (Cp = 3.8638)

Model3: log10(stay)~Age + xrayrat + beds +census (Cp = 4.2696)

The bias for model1 is 0.942709

Model2:

Coefficients:

Value Std. Error t value Pr(>|t|)

(Intercept) 0.6398 0.0904 7.0756 0.0000

age 0.0034 0.0016 2.0996 0.0406

xrayrat 0.0012 0.0004 2.8226 0.0067

census 0.0004 0.0001 4.0856 0.0002

nurses -0.0002 0.0001 -1.4110 0.1642

Residual standard error: 0.05475 on 52 degrees of freedom

Multiple R-Squared: 0.5369

Bias=1.872797

Model3:

Coefficients:

Value Std. Error t value Pr(>|t|)

(Intercept) 0.6348 0.1212 5.2373 0.0000

age 0.0037 0.0020 1.8122 0.0758

xrayrat 0.0014 0.0004 3.4043 0.0013

beds -0.0005 0.0002 -2.0792 0.0426

census 0.0008 0.0003 2.5271 0.0146

Residual standard error: 0.06303 on 51 degrees of freedom

Multiple R-Squared: 0.3486

Bias=0.5066972

The third model has the smallest bias