ST 514 – Exam 1

Name______9/26/07

You must show your work to receive any credit for non-multiple choice quantitative problems. Giving or receiving assistance to other students is not permitted. Good luck!

1)The average wind speed in Honolulu, Hawaii is 18.2 km/h. Assume that the wind speed is normally distributed with a standard deviation of 5.6 km/h.

a) What is the probability that the wind speed on any one reading will exceed 20.2km/h.

P(y>20.2) = P(z>[20.2-18.2]/5.6) = p(z>0.3571) = 0.5-p(0<z<0.3571) = 0.5-0.14 = 0.36.

b) Now say that instead of a single measurement, we take the average of 365 measurements. Is the probability that this average will exceed 20.2 km/h greater than or less than your answer to part (a)? Explain.

The probability is less than 0.36. The mean has much less variation than each particular value so it will be closer on average to the true mean 18.2 and will have less probability of exceeding 20.2.

2) The number of car accidents on a particular stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. A city alderman has decided to ask the county sheriff to provide him with data covering the last few years, with the intention of examining these data using multiple regression to determine if the number vehicles and speed influence the number accidents.

a) What is (are) the independent variable(s)? number vehicles and speed.

b) What is (are) the dependent variable(s)? number of accidents.

c) Is this an experiment or an observational study? Why?

Observational. We don’t have control of the number of vehicles and speed of traffic, we only passively observe them.

3)Circle the correct response for the multiple choice questions below.

(a) The intercept in Plot 1 is: a) -1.00b) 0.00c) 1.00

(b) The slope in Plot 1 is: a) -0.50b) 0.00c) 0.50

(c) The correlation between x and y in Plot 1 is:a) -0.50b) 0.00c) 0.50

(d) R-Squared in Plot 1 is: a) -0.25b) 0.00c) 0.25

(e) The intercept in Plot 2 is: a) -1.00b) 0.00c) 1.00

(f) The slope in Plot 2 is: a) -0.50b) 0.00c) 0.50

(g) The correlation between x and y in Plot 2 is:a) -0.50b) 0.00c) 0.50

(h) R-Squared in Plot 2 is: a) -0.25b) 0.00c) 0.25

4) Drinking moderate amounts of wine may reduce the risk of coronary artery disease in some individuals. One possible reason for this is that red wine contains polyphenols, and polyphenols help serum cholesterol profiles. In an experiment involving 9 men, the subjects drank half a bottle of red wine each day for two weeks. Level of polyphenols in blood samples were measured at the beginning and end of the experiment. The sample mean and standard deviation of percent change in polyphenols are 5.5 and 2.51, respectively.

a) Calculate a 95% confidence interval for the mean percent change in polyphenols if all men drank this amount of red wine.

The t-cutoff with 8 degrees of freedom is 2.306. The interval is xbar ± t*s/sqrt(n) = 5.5 ± 2.306*2.51/3 = 5.5 ± 1.93

b) Explain what is meant by a 95% confidence interval.

If we repeat this experiment several times, 95% of the confidence intervals will cover the population mean.

5) Consider the data set with four observations:

X / -2 / -1 / 0 / 1
Y / 0 / 6 / 2 / 4

a) Assuming the regression line has intercept 2 and slope 1, compute the predicted value and error for each observation.

X / -2 / -1 / 0 / 1
Y / 0 / 6 / 2 / 4
Predicted Value / 2-2*1=0 / 2-1*1=1 / 2+0*1=2 / 2+1*1=3
Error (residual) / 0-0=0 / 6-1=5 / 2-2=0 / 4-3=1

b) Assuming the regression line has intercept 4 and slope 1, compute the predicted value and error for each observation.

X / -2 / -1 / 0 / 1
Y / 0 / 6 / 2 / 4
Predicted Value / 2 / 3 / 4 / 3
Error (residual) / -2 / 3 / -2 / -1

c) Using the least squares criterion, which line provides a better fit to the data? Why?

SSE for line a) is 0*0+5*5+0*0+1*1=26. SSE for line b) is 2*2+3*3+2*2+1*1=18. Since line b) has a smaller SSE it provides a better fit to the data.

6) Assuming A = and B = , explain why each matrix below is or is not well-defined and compute the matrices when possible.

a) A+B is undefined, A and B have different dimensions

b) BA =

7) (4pt)Below is SAS output and a scatterplot of the DVD data we analyzed in class. The data set contains data for 70 movies. The independent variable is box office sales and the dependent variable is DVD sales. The dashed lines are the confidence bounds for the mean of y given x.

Variable DF Parameter Estimate Standard Error t Value Pr > |t| Intercept 1 2.35229 0.34233 6.87 <.0001

Box 1 0.09965 0.00914 10.91 <.0001

a) What is the estimated intercept? Interpret this value in the context of the problem.

The estimated intercept is 2.35229. This is the average DVD sales of a movie with no box office sales.

b) What is the estimated slope? Interpret this value in the context of the problem.

The estimated slope is 0.09965. The is the increase in expected DVD sales for an increase of $1M box offices.

c) The 95% interval for the mean DVD sales of a movie with Box = 48 is (6.5, 7.8). What is the width of the 95% prediction interval for a particular movie with Box = 48?

Greater than 1.3. Intervals for particular values are always wider than intervals for the mean value.

d) What is the width of the 95% prediction interval for the mean DVD sales of a movie with Box = 200?

Greater than 1.3. Intervals get wider as you move away from the center of the x’s.

8) (4pt) On January 28, 1986 the space shuttle Challenger exploded. Seven astronauts died because two rubber O-rings leaked during takeoff. These rings had lost their resiliency because of the low temperature at the time of the flight. The air temperature was about 0 0 Celsius, and the temperature of the O-rings about 6 degrees below that. A scatterplot below suggests that there is a link between O-ring damage and ambient temperature during 13 previous launches. (Data taken from

We analyze these data using simple linear. The SAS output is:

Variable DF Parameter Estimate Standard Error t Value Pr > |t|

Intercept 1 9.95977 2.52583 3.94 0.0007

Temperature 1 -0.41453 0.12079 -3.43 0.0025

a) Give a 95% confidence interval for the expected change in damage due to a one-degree increase in temperature.

The t-cutoff with 11 degrees of freedom is 2.201. So a 95% CI for a one-degree decrease in temperature is 0.41453 ± 2.201*0.12079 = 0.41 ± 0.266.

b) Conduct a hypothesis test (with α=0.05) to show that there is a statistically significant association between temperature and damage. State the hypotheses and summarize your findings in the context of the problem.

The hypotheses are

Ho: temp and damage are uncorrelated (i.e., slope = 0)

Ha: temp and damage are correlated (i.e., slope not equal 0)

The p-value is 0.0025<0.05 so we reject the null hypotheses and conclude that there is a statistically significant association between damage and temperature.

c) Now ignore the data and assume that there is truly no correlation between damage and temperature. What is the probability of incorrectly rejecting the hypothesis that damage and temperature are uncorrelated using the test in b)? What type of error (I or II) is this?

Rejecting the null when it is true is a Type I error. The probability of a Type I error is always α. In this case we used α=0.05.