STAT 2607
Assignment #4
DUE:Mon. March 23 in class Sec A approx. 100 marks
Tues. March 24 in class Sec B
1.Problem 1 from Assignment 3 Continued
The president of a small chain of corner stores would like to develop a first-order linear model to estimate total annual sales based on the number of sales agents X1 and the advertising budget ($1000) X2. A small sample of size 5 gave the following results.
N.B.Remember that in assignment 3 you found and concluded that annual sales was linearly related to at least one of the explanatory variables.
a)Calculate the correlation coefficient between the explanatory variables X1 and X2.
b)Based on your correlation coefficient in (a) would interpreting b1 as the estimated change in the average value of Y for a unit increase in X1 make sense? Why or why not?
c)What feature of the value for b2 might have led you to suspect that X1 and X2 were highly correlated?
2.It is desired to develop a regression model which will predict how well students will do in their first year at Carleton. The following set of variables were proposed as possibly being related to first year GPA (Y).
X1 = ageX2 = high school average markX3 = gender
X4 = average no. of hours/week spent in Roosters
X5 = average no. of hours/week spent in class
X6 = average no. of hours/week spent on assignments
X7 = average no. of hours/week spent on studying
X8 = average no. of hours/week spent in paid employment
X9 = no. of courses
X10 = lives at home or not
a)Which explanatory variables are dummy variables?
b)Which ones might be subject to measurement error?
c)Which variables might be correlated with each other?
d)Think of at least 2 other variables that might (or should) be considered.
3.A study was conducted to examine the relationship between university salary Y, the number of years of experience of the faculty member, X1 and the gender of the faculty member, X2.
For the model where
a)Find the separate equations relating E(Y|X) to X1 for males and for females.
b)Set up the null and alternative hypotheses for testing whether the lines for males and females are parallel.
4.An experiment was conducted to examine the corrosion resistance of 4 different brands of outdoor lacquers for brass lamps. Each type of lacquer was applied to 4 independent random sample of brass lamps and all the lamps were put outside. The number of weeks until the first sign of corrosion was recorded. The results are shown below.
Totals
A|434445413546 254
B|282931242527 164
C|393641432935 232
D|253628273134 181
a)Set up the ANOVA table and test whether there are any differences between the average time to corrosion among the 4 different lacquers. Use = .05.
b)If appropriate, use the Tukey multiple comparison test to identify which lacquers differ in average corrosion time. Use = .05. Make a line summary of your results.
5.A hospital administrator wished to study the relation between patient satisfaction (Y) and patient age (X1 in yrs.), severity of illness (X2 an index), and anxiety level (X3 an index). Larger values of Y, X2 and X3 are respectively associated with more satisfaction, increased severity of illness, and more anxiety. The administrator randomly selected 23 patients.
The data is in /CourseWare/Stat2607/patients.dat
columns 1, 2, 3, 4 contain variables Y, X1, X2, X3 respectively.
Remember the form of your INFILE statement is:
INFILE '/CourseWare/Stat2607/patients.dat';
Remember also to put the following 4 statements at the top of your program:
DM OUTPUT 'CLEAR';
OPTIONS PAGESIZE=40;
OPTIONS LINESIZE=80;
FOOTNOTE 'yourname, studno';
a)Write a SAS program to fit a first order linear model in the 3 explanatory variables (see SAS Manual Sec. 13.6, & Ex 13.6.1) and print out:
-the correlation matrix giving the pairwise correlation coefficients between all 4 variables (corr option at end of PROC REG statement - i.e.
PROC REG LINEPRINTER CORR;
-the variance inflation factors (use the VIF option at the end of the model statement. See p122 Ex 13.7.1)
-the XTX, XTY, and (XTX)-1 matrices ( / xpx i at the end of the MODEL statement - see p119, Sec.13.6 & Ex 13.6.1)
N.B.You do NOT need the CLM option for this question.
-residual plots of the ei vs , and the ei vs each of the explanatory variables. Note that you can do all this in the single statement plot r.*(p. x1 x2 x3); or whatever you called your explanatory variables. (See Sec.13.6 & Sec. 13.4)
-a histogram of the residuals (OUTPUT statement in PROC REG), then use PROC CHART as in Assignment 2.
b)Write down the estimated regression function for this 3 variable model.
c)Based on the residual plot of the ei vs and the histogram of the ei do there seem to be any really serious assumption violations? Explain. Do the residual plots of the ei vs the explanatory variables indicate that any of these variables might need to be modified to correct an extremely obvious assumption violation? Explain.
N.B.For the rest of the question, assume that your residual plots and the histogram of the residuals did not show violations that are serious enough to invalidate estimation and inference.
d)For testing whether there is a linear relationship between Y and the explanatory variables at = .01: write down the calculated value of the F-statistic along with its p-value. Based on the p-value, would you reject H0? Why or why not?
e)If appropriate, test whether X3 contributes significantly to the model after X1 and X2 have been included. Use = .01.
f)Would you reject ? How about ? A formal test is not necessary. Just give the calculated value of the test statistic and the reason for your answer in terms of its p-value.
g)Use the information from the output of the xpx option to find the fitted equation for the SLR of Y on X3 (anxiety).
h)Use the output from the i option to show that = 0.821 as given in the parameter estimates table.
6.Use your output and results from question 5 and the output provided on the following pages to help answer the questions below.
a)Compare the estimated regression coefficients of X1 and X3 for the model with those found using the full model. What is the factor by which they have changed? Now compare the estimated regression coefficients of X1 and X2 for the model with those found using the full model. What is the factor by which they have changed?
b)Find SSR(X3/X1,X2) and compare it with SSR(X3)
c)List all the indications of multicollinearity you can find between the 3 explanatory variables.
d)Would you conclude, that in the model containing all 3 X's, multicollinearity was a problem for interpretation of the regression coefficients?
e)Do X2 and X3 jointly, make a significant contribution to the prediction of patient satisfaction in a model that includes X1 ? Use = .05
f)Which model would you choose? Why?
The SAS System 37
The REG Procedure
Model: MODEL1
Dependent Variable: satisfied
Number of Observations Read 24
Number of Observations Used 23
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 4063.98230 2031.99115 19.53 <.0001
Error 20 2081.23509 104.06175
Corrected Total 22 6145.21739
Root MSE 10.20107 R-Square 0.6613
Dependent Mean 61.34783 Adj R-Sq 0.6275
Coeff Var 16.62824
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 147.07512 16.73345 8.79 <.0001
age 1 -1.24336 0.29612 -4.20 0.0004
anxiety 1 -15.89064 8.25560 -1.92 0.0686
The SAS System 38
The REG Procedure
Model: MODEL1
Dependent Variable: satisfied
„ƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒ†
RESIDUAL ‚ ‚
20 ˆ ˆ
‚ ‚
‚ 1 ‚
‚ ‚
‚ 1 ‚
‚ 1 ‚
10 ˆ 1 ˆ
‚ 1 ‚
R ‚ 1 ‚
e ‚ 1 1 ‚
s ‚ 1 1 1 1 ‚
i ‚ 1 ‚
d 0 ˆ 1 1 ˆ
u ‚ ‚
a ‚ 1 ‚
l ‚ 1 ‚
‚ ‚
‚ 1 ‚
-10 ˆ 1 ˆ
‚ ‚
‚ 1 1 ‚
‚ ‚
‚ 1 ‚
‚ 1 ‚
-20 ˆ ˆ
ŠƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒŒ
35 40 45 50 55 60 65 70 75 80 85
Predicted Value of satisfied PRED
The SAS System 49
Frequency
7 ˆ *****
‚ *****
‚ *****
‚ *****
6 ˆ ***** *****
‚ ***** *****
‚ ***** *****
‚ ***** *****
5 ˆ ***** *****
‚ ***** *****
‚ ***** *****
‚ ***** *****
4 ˆ ***** ***** ***** *****
‚ ***** ***** ***** *****
‚ ***** ***** ***** *****
‚ ***** ***** ***** *****
3 ˆ ***** ***** ***** *****
‚ ***** ***** ***** *****
‚ ***** ***** ***** *****
‚ ***** ***** ***** *****
2 ˆ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
1 ˆ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
-16 -8 0 8 16
Residual
The SAS System 41
The REG Procedure
Model: MODEL2
Dependent Variable: satisfied
Number of Observations Read 24
Number of Observations Used 23
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 4081.21949 2040.60975 19.77 <.0001
Error 20 2063.99790 103.19989
Corrected Total 22 6145.21739
Root MSE 10.15873 R-Square 0.6641
Dependent Mean 61.34783 Adj R-Sq 0.6305
Coeff Var 16.55924
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 166.59133 24.90844 6.69 <.0001
age 1 -1.26046 0.28919 -4.36 0.0003
illness 1 -1.08932 0.55139 -1.98 0.0622
The SAS System 42
The REG Procedure
Model: MODEL2
Dependent Variable: satisfied
„ˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆ†
RESIDUAL ‚ ‚
20 ˆ ˆ
‚ ‚
‚ 1 ‚
‚ 1 ‚
‚ ‚
‚ 1 1 ‚
10 ˆ ˆ
‚ 1 ‚
R ‚ 1 ‚
e ‚ 1 ‚
s ‚ 1 1 1 ‚
i ‚ 1 1 1 ‚
d 0 ˆ 1 ˆ
u ‚ 1 ‚
a ‚ 1 ‚
l ‚ ‚
‚ ‚
‚ 1 ‚
-10 ˆ 1 ˆ
‚ 1 1 ‚
‚ 1 1 ‚
‚ ‚
‚ 1 ‚
‚ ‚
-20 ˆ ˆ
ŠˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆƒƒƒƒƒˆŒ
30 35 40 45 50 55 60 65 70 75 80 85
Predicted Value of satisfied PRED
The SAS System 48
Frequency
8 ˆ *****
‚ *****
7 ˆ *****
‚ *****
6 ˆ *****
‚ *****
5 ˆ ***** *****
‚ ***** *****
4 ˆ ***** ***** *****
‚ ***** ***** *****
3 ˆ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
2 ˆ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
1 ˆ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
-16 -8 0 8 16
Residual
The SAS System 45
The REG Procedure
Model: MODEL3
Dependent Variable: satisfied
Number of Observations Read 24
Number of Observations Used 23
Number of Observations with Missing Values 1
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 3678.43585 3678.43585 31.31 <.0001
Error 21 2466.78154 117.46579
Corrected Total 22 6145.21739
Root MSE 10.83816 R-Square 0.5986
Dependent Mean 61.34783 Adj R-Sq 0.5795
Coeff Var 17.66674
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 121.83182 11.04221 11.03 <.0001
age 1 -1.52704 0.27288 -5.60 <.0001
The SAS System 46
The REG Procedure
Model: MODEL3
Dependent Variable: satisfied
„ƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒ†
RESIDUAL ‚ ‚
20 ˆ ˆ
‚ ‚
‚ 1 ‚
‚ ‚
‚ ‚
‚ 1 1 ‚
10 ˆ 1 1 1 1 ˆ
‚ 1 ‚
R ‚ ‚
e ‚ 1 ‚
s ‚ 1 1 ‚
i ‚ 1 ‚
d 0 ˆ 2 ˆ
u ‚ 1 ‚
a ‚ 1 ‚
l ‚ ‚
‚ ‚
‚ ‚
-10 ˆ 1 ˆ
‚ 1 1 1 ‚
‚ ‚
‚ ‚
‚ 1 1 ‚
‚ 1 ‚
-20 ˆ ˆ
ŠƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒƒƒƒƒˆƒƒŒ
35 40 45 50 55 60 65 70 75 80
Predicted Value of satisfied PRED
The SAS System 47
Frequency
8 ˆ *****
‚ *****
7 ˆ ***** *****
‚ ***** *****
6 ˆ ***** *****
‚ ***** *****
5 ˆ ***** *****
‚ ***** *****
4 ˆ ***** ***** *****
‚ ***** ***** *****
3 ˆ ***** ***** ***** *****
‚ ***** ***** ***** *****
2 ˆ ***** ***** ***** *****
‚ ***** ***** ***** *****
1 ˆ ***** ***** ***** ***** *****
‚ ***** ***** ***** ***** *****
Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
-16 -8 0 8 16
Residual
7.(Question 6 continued) Using SAS, fit your chosen model again, this time using the CLM and CLI options at the end of the MODEL statement (see Sec. 13.5, p120 ) to have confidence intervals printed out for the mean values and the individual values of patient satisfaction.
For your chosen model
a)Find a 95% C.I. estimate for β1.
b)Estimate with 95% confidence the average patient satisfaction for 35 year old patients with a severity of illness level of 60, and an anxiety index of 2.0.
c)Predict with 95% confidence the patient satisfaction of Ms. Brown, if she is 48 years old and has X2 = 60 and X3 = 2.7.
N.B.Observation 24 with a missing value for Y is the one with X1 = 48, X2 = 60, X3 = 2.7.