1. Body fat is an important health indicator in humans, but is difficult to measure. One direct measure is to immerse the person entirely in water, and measure the amount of water displaced. It would be preferable to accurately predict body fat from measurements taken more readily in the doctor's office.

In a 1974 study, body fat was determined for 252 volunteers using the immersion method. The following measurements were taken:

DENSITY body density

FAT body fat determined from underwater weighing

AGE Age (years)

WEIGHT Weight (lbs)

HEIGHT Height (inches)

NECK Neck circumference (cm)

CHEST Chest circumference (cm)

ABDOMEN Abdomen circumference (cm)

HIP Hip circumference (cm)

THIGH Thigh circumference (cm)

KNEE Knee circumference (cm)

ANKLE Ankle circumference (cm)

BICEPS Biceps (extended) circumference (cm)

FOREARM Forearm circumference (cm)

WRIST Wrist circumference (cm)

DENSITY FAT AGE WEIGHT HEIGHT NECK CHEST ABDOMEN HIP THIGH KNEE ANKLE BICEPS FOREARM WRIST

1.0549 19.2 26 181.00 69.75 36.4 105.1 90.7 100.3 58.4 38.3 22.9 31.9 27.8 17.7

The data are on the class web page under “body composition”. Most of the SAS code required is also there – but it does need to be edited, so be careful.

We are going to focus on 4 predictors of FAT, which are all easily measured: WEIGHT, ABDOMEN, THIGH and WRIST.

The data is on the class Web page. You can copy some of the SAS code directly out of this homework file.

To cut down on amount of computation for this exercise, we are going to produce only a few of the plots you would normally use.

a. Plot FAT versus WEIGHT, including a LOESS smooth. Note any interesting features of this plot including curvature, outliers, or potential high leverage points. (Circle these points on your plot.)

Fit the linear regression of FAT on WEIGHT, ABDOMEN, THIGH and WRIST.

Answer:

Curvilinear regression. Potential Outlier & High Leverage Point exist.

b. What is the fitted regression equation?

Answer:

The SAS output is:

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t| Type I SS

Intercept 1 -35.11661 8.41381 -4.17 <.0001 92422

weight 1 -0.14355 0.03096 -4.64 <.0001 6593.01614

abdomen 1 0.97856 0.05607 17.45 <.0001 6042.72844

thigh 1 0.15849 0.10921 1.45 0.1480 81.35583

wrist 1 -1.09896 0.44668 -2.46 0.0146 116.29775

The fitted regression equation is:

Fat=-35.11611-0.14355*Weight+0.97856*Abdomen+0.15849*Thigh-1.09896*Wrist

c. What is the predicted value of FAT for a person who weighs, 162 pounds, has an abdomen circumference of 85 cm, a thigh circumference of 50 cm. and a wrist circumference of 18cm.

Answer:

The SAS output is:

Output Statistics

Dependent Predicted Std Error

Obs Variable Value Mean Predict 95% CL Mean

253 . 12.9482 0.8060 11.3608 14.5356

The predicted value is 12.9482.

d. What is an appropriate interval predictor for the FAT of a person with the measurements in part c.? (Is this a confidence interval or a prediction interval?)

Answer:

The prediction interval is an appropriate interval. The interval is (4.1701, 21.7262).

Dependent Predicted Std Error

Obs Variable Value Mean Predict 95% CL Mean 95% CL Predict

253 . 12.9482 0.8060 11.3608 14.5356 4.1701 21.7262

e. Compute a 95% interval for the mean FAT of people with the measurements in part c.

Answer:

The SAS output is:

Output Statistics

Dependent Predicted Std Error

Obs Variable Value Mean Predict 95% CL Mean

253 . 12.9482 0.8060 11.3608 14.5356

The 95% interval for the mean FAT is (11.3608, 14.5356).

f. Is the prediction in part c an extrapolation of the regression beyond the range of the data? Justify your answer.

Answer:

No, the prediction in part c is not an extrapolation of the regression beyond the range of the data. Since .

Save the studentized residuals and predicted values.

Plot the studentized residuals versus WEIGHT using PROC LOESS and GPLOT to include the loess fit. (There is no point in plotting inside PROC REG if you want to add the LOESS fit.)

g. What are the important features of the residual plot? Circle any points of special interst.

Answer:

The residual plot of the studentized residuals versus WEIGHT is:

The curvature in the loess line is due to the outlier. Curvature is an acceptable answer, but the right answer is that the residuals look fine except for a point with both high leverage and a high studentized residual.

h. Assess the normality of the residuals, using the normal probability plot of residuals. (This can be done as part of PROC REG) and hand in this plot.

Answer:

The normal probability plot is:

i. Plot leverage and Cook's distance against observation number and assess whether there are any data points which unduly affect the analysis.

Answer:

The leverage plot is:

The plot of Cook's distance against observation number is:

Above the line L= = 0.04, over fifteen observations have high leverage; Nevertheless, none of them have a COOKD larger than 1. Based on these two plots, no data points unduly affect the analysis. However, there is one point which is much more influential than the others. This is the person with extremely high weight.

j. Compute the following Sums of Squares:

Answer:

The SAS output is:

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 4 12833 3208.34954 166.99 <.0001

Error 247 4745.59168 19.21292

Corrected Total 251 17579

Root MSE 4.38325 R-Square 0.7300

Dependent Mean 19.15079 Adj R-Sq 0.7257

Coeff Var 22.88811

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t| Type I SS

Intercept 1 -35.11661 8.41381 -4.17 <.0001 92422

weight 1 -0.14355 0.03096 -4.64 <.0001 6593.01614

abdomen 1 0.97856 0.05607 17.45 <.0001 6042.72844

thigh 1 0.15849 0.10921 1.45 0.1480 81.35583

wrist 1 -1.09896 0.44668 -2.46 0.0146 116.29775

Parameter Estimates

Squared Squared

Standardized Semi-partial Partial

Variable DF Type II SS Estimate Corr Type I Corr Type I

Intercept 1 334.68264 0 . .

weight 1 413.07494 -0.50413 0.37505 0.37505

abdomen 1 5851.79119 1.26086 0.34375 0.55004

thigh 1 40.46255 0.09943 0.00463 0.01646

wrist 1 116.29775 -0.12260 0.00662 0.02392

SSE(full)=4745.59168

SSR(WEIGHT)=6593.01614

SSR(WEIGHT ABDOMEN)= SSR (WEIGHT) + SSR (ABDOMEN | WEIGHT)=6593.01614+ 6042.72844=12635.74458

SSR(WEIGHT | ABDOMEN THIGH WRIST)=SSII(WEIGHT)=413.07494

SSR(THIGH | WEIGHT ABDOMEN)=SSI(THIGH)=81.35583

k. Compute the additional percent variance explained by THIGH and WRIST when WEIGHT and ABDOMEN are already in the model.

Answer:

SSR (THIGH WRIST | WEIGHT ABDOMEN)

= SSR (WEIGHT ABDOMEN THIGH WRIST) – SSR (WEIGHT ABDOMEN)

= (6593.01614 + 6042.72844 + 81.35583 + 116.29775) – 12635.74458

= 197.65358

Thus, the additional percentage of variance explained by FOREARM and WRIST is: = =0.0112 1.12%

l. What is the total R2 of this regression?

Answer:

The total R2 of this regression is: 0.73(see the SAS output above).

m. Do a simultaneous test of whether all the regression slopes are zero. Be explicit about the null and alternative hypothesis.

Answer:

H: Not all s are equal to 0

The SAS output is:

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 4 12833 3208.34954 166.99 <.0001

Error 247 4745.59168 19.21292

Corrected Total 251 17579

The p-value is less than 0.0001. We reject the null hypothesis and conclude not all s are equal to 0.

n. Compute a 95% confidence interval for the regression coefficient of THIGH when WEIGHT, ABDOMEN and WRIST are in the model.

Answer:

The SAS output is:

Parameter Estimates

Variable DF 95% Confidence Limits

Intercept 1 -51.68858 -18.54465

weight 1 -0.20453 -0.08258

abdomen 1 0.86812 1.08899

thigh 1 -0.05662 0.37360

wrist 1 -1.97874 -0.21918

The interval is (-0.05662, 0.37360).

Alternatively, this can be computed from the parameter estimates and their standard errors.

b3±t.975,247 se(b3) =0.15849 ± 1.96* 0.10921

SAS codes:

options ps=60 ls=80;
data body;
infile "h:\bodycomp.txt";
input density fat age weight height neck chest abdomen hip thigh knee ankle biceps forearm wrist;
symbol1 color=blue value=square;
symbol2 color=black value=dot;
proc loess data=body; *Plot Fat vs. Weight with Loess Curve;
model fat=weight;
ods output OutputStatistics=Results1;
run;
proc gplot data=Results1;
plot depvar*weight pred*weight="o"/vaxis=axis1 hm=3 vm=3 overlay;
run;
proc reg data=body; *Fit the Regression Model;
model fat=weight abdomen thigh wrist/all CLM CLI influence SS1 SS2;
plot STUDENT.*NPP.;
plot H.*OBS. COOKD.*OBS.;
output out=body STUDENT=R P=P;
run;
data newx; *Insert a New Observation
input weight abdomen thigh wrist;
cards;
162 85 50 18
;
data both;
set body newx;
run;
proc print data=both;
run;
proc reg data=both;
model fat=weight abdomen thigh wrist/CLM CLI;
run;
proc loess data=body; * Loess Regression of Residuals on Predicted;
model R=weight/smooth=0.75;
ods output OutputStatistics=Results1;
proc gplot data=Results1;
plot depvar*weight pred*weight="o"/vaxis=axis1 hm=3 vm=3 overlay;
run;

- 10 -