Stat 511 Fall 2005

Midterm 1

Statistics 511

Midterm 1

Oct. 17, 2005

The following rules apply.

1. You may use 3 sheets of paper for any information you need - double-sided, any font.

2. You may use a calculator.

3. You may not collaborate or copy.

4. You may not use outside resources, such as the internet. As well, you may not store notes or formulas on your calculator.

5. Failure to comply with item 3 could lead to reduction in your grade, or disciplinary action.

I have read the rules above and agree to comply with them.

Signature ________________________________________________

Name (printed) ___________________________________________


1. In a study of air quality in a large urban center, measurements were taken daily in July, 2004, which was a “typical” year for this city. The following variables were measured:

Ozone ozone at ground level in parts per billion

Solar.R solar radiation measured at ground level (langs)

WindSpeed maximum wind speed in miles per hour

a) For each Normal Probability plot below, sketch a histogram. Put the approximate minimum, median and maximum values of the variables on the x-axes of your histograms.

Ozone Solar R


Windspeed was skewed to with the long tail to the right, so the investigator decided to take logarithms. However, to avoid problems with days with zero windspeed, the transformation that was actually used was:

LogWind=log(WindSpeed+1)

The basic statistical measures for LogWind are printed below.

Basic Statistical Measures

N=31

Location Variability

Mean 2.251472 Std Deviation 0.30800

Median 2.261763 Variance 0.09487

Mode 2.128232 Range 1.13708

Interquartile Range 0.40968

100% Max 2.76632

99% 2.76632

95% 2.76632

90% 2.72785

75% Q3 2.47654

50% Median 2.26176

25% Q1 2.06686

10% 1.80829

5% 1.72277

1% 1.62924

0% Min 1.62924

b) About what percentage of days in July would you estimate to have maximum wind speed greater than 10 miles per hour? A histogram of the data are below. You may assume that the data are i.i.d. Normal.


c) The data were taken daily for the 31 days of July 2004. How does this violate the assumptions made in part b?


2. The square root of the stem volume of a certain plant appears to be approximately Normally distributed. Suppose you have a simple random sample of 6 plants with

Sample mean of square root of stem volume, 37.75.

Sample standard deviation of square root of stem volume, 3.71.

The units of volume are centimeters3

a) Test whether the population mean stem volume is 1600 cm3.

H0: HA:

Formula for test statistic:

t*=

degrees of freedom:

approximate p-value:

Conclusion testing at the a=0.05 level?


b) Compute a 95% confidence interval for the mean stem volume in centimeters3.


3. A car manufacturer was accused of providing an insufficient braking mechanism in one of its models. To test this, 50 cars were selected at random, and each car was assigned a target speed in miles per hour, and then the stopping distance for the car was recorded (in feet).

a) The ANOVA table for the regression of distance (Y) on speed (X) is below. Fill in the 7 blanks.

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value

Model ____ _______ 21185 ______

Error ____ _______ 236.53

Total ____ _______

b) What is the R2 for this model?


Below are some summary statistics for speed and distance.

N 50

Basic Statistical Measures for Speed

Location Variability

Mean 15.40000 Std Deviation 5.28764

Median 15.00000 Variance 27.95918

Mode 20.00000 Range 21.00000

Interquartile Range 7.00000

Max 25.00000 Min 4.00000

Basic Statistical Measures for Distance

Location Variability

Mean 42.98000 Std Deviation 25.76938

Median 36.00000 Variance 664.06082

Mode 26.00000 Range 118.00000

Interquartile Range 30.00000

Max 120.00000 Min 2.00000

Sxy = 5387.40

c) Fill in the 6 blanks in the table of parameters

Parameter Standard

Variable Estimate Error t Value p value

beta=0

Intercept _______ _______ ____ _______

speed 3.93241 0.41551 ____ _______


Below are the residual plots for the regression of Distance on Speed:

Plot of residuals versus speed Plot of studentized residuals versus

predicted values.

d) What is the difference between the residual and the studentized residual?

e) Give 2 regression assumptions that can be checked from the plots above and state whether you think they are satisfied for these data, citing evidence from the plots.

Regression Assumption 1:

Regression Assumption 2:


e) What other regression assumption needs to be checked before doing statistical inference and how is this done?

Regression assumption:

Checking method:

f) Assume that the regression assumptions are satisfied. Compute a 95% confidence interval for the mean change in stopping distance per mile per hour of speed.


g) The manufacturer is supposed to ensure that 95% of the cars can stop within 30 feet when they are travelling at 20 miles per hour. Should a confidence interval or a prediction interval be used to assess this? Briefly justify your answer.

h) An engineer working with the manufacturer draws the histogram below of the stopping distances. He says that inference based on the normal distribution cannot be used in this problem because distance is not normally distributed. Is he correct? Explain your answer briefly.


Leverage versus Speed Cook’s Distance Versus Speed

i) There are 50 data values, but there are only 19 points on the leverage plot. Why is this?

j) Are there any high leverage points in the data? Justify your answer briefly

k) Are there are highly influential points in the data? Justify your answer briefly.

8