Stat 511 Fall 2005
Midterm 1
Statistics 511
Midterm 1
Oct. 17, 2005
The following rules apply.
1. You may use 3 sheets of paper for any information you need - double-sided, any font.
2. You may use a calculator.
3. You may not collaborate or copy.
4. You may not use outside resources, such as the internet. As well, you may not store notes or formulas on your calculator.
5. Failure to comply with item 3 could lead to reduction in your grade, or disciplinary action.
I have read the rules above and agree to comply with them.
Signature ________________________________________________
Name (printed) ___________________________________________
1. In a study of air quality in a large urban center, measurements were taken daily in July, 2004, which was a “typical” year for this city. The following variables were measured:
Ozone ozone at ground level in parts per billion
Solar.R solar radiation measured at ground level (langs)
WindSpeed maximum wind speed in miles per hour
a) For each Normal Probability plot below, sketch a histogram. Put the approximate minimum, median and maximum values of the variables on the x-axes of your histograms.
Ozone Solar R
Windspeed was skewed to with the long tail to the right, so the investigator decided to take logarithms. However, to avoid problems with days with zero windspeed, the transformation that was actually used was:
LogWind=log(WindSpeed+1)
The basic statistical measures for LogWind are printed below.
Basic Statistical Measures
N=31
Location Variability
Mean 2.251472 Std Deviation 0.30800
Median 2.261763 Variance 0.09487
Mode 2.128232 Range 1.13708
Interquartile Range 0.40968
100% Max 2.76632
99% 2.76632
95% 2.76632
90% 2.72785
75% Q3 2.47654
50% Median 2.26176
25% Q1 2.06686
10% 1.80829
5% 1.72277
1% 1.62924
0% Min 1.62924
b) About what percentage of days in July would you estimate to have maximum wind speed greater than 10 miles per hour? A histogram of the data are below. You may assume that the data are i.i.d. Normal.
c) The data were taken daily for the 31 days of July 2004. How does this violate the assumptions made in part b?
2. The square root of the stem volume of a certain plant appears to be approximately Normally distributed. Suppose you have a simple random sample of 6 plants with
Sample mean of square root of stem volume, 37.75.
Sample standard deviation of square root of stem volume, 3.71.
The units of volume are centimeters3
a) Test whether the population mean stem volume is 1600 cm3.
H0: HA:
Formula for test statistic:
t*=
degrees of freedom:
approximate p-value:
Conclusion testing at the a=0.05 level?
b) Compute a 95% confidence interval for the mean stem volume in centimeters3.
3. A car manufacturer was accused of providing an insufficient braking mechanism in one of its models. To test this, 50 cars were selected at random, and each car was assigned a target speed in miles per hour, and then the stopping distance for the car was recorded (in feet).
a) The ANOVA table for the regression of distance (Y) on speed (X) is below. Fill in the 7 blanks.
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value
Model ____ _______ 21185 ______
Error ____ _______ 236.53
Total ____ _______
b) What is the R2 for this model?
Below are some summary statistics for speed and distance.
N 50
Basic Statistical Measures for Speed
Location Variability
Mean 15.40000 Std Deviation 5.28764
Median 15.00000 Variance 27.95918
Mode 20.00000 Range 21.00000
Interquartile Range 7.00000
Max 25.00000 Min 4.00000
Basic Statistical Measures for Distance
Location Variability
Mean 42.98000 Std Deviation 25.76938
Median 36.00000 Variance 664.06082
Mode 26.00000 Range 118.00000
Interquartile Range 30.00000
Max 120.00000 Min 2.00000
Sxy = 5387.40
c) Fill in the 6 blanks in the table of parameters
Parameter Standard
Variable Estimate Error t Value p value
beta=0
Intercept _______ _______ ____ _______
speed 3.93241 0.41551 ____ _______
Below are the residual plots for the regression of Distance on Speed:
Plot of residuals versus speed Plot of studentized residuals versus
predicted values.
d) What is the difference between the residual and the studentized residual?
e) Give 2 regression assumptions that can be checked from the plots above and state whether you think they are satisfied for these data, citing evidence from the plots.
Regression Assumption 1:
Regression Assumption 2:
e) What other regression assumption needs to be checked before doing statistical inference and how is this done?
Regression assumption:
Checking method:
f) Assume that the regression assumptions are satisfied. Compute a 95% confidence interval for the mean change in stopping distance per mile per hour of speed.
g) The manufacturer is supposed to ensure that 95% of the cars can stop within 30 feet when they are travelling at 20 miles per hour. Should a confidence interval or a prediction interval be used to assess this? Briefly justify your answer.
h) An engineer working with the manufacturer draws the histogram below of the stopping distances. He says that inference based on the normal distribution cannot be used in this problem because distance is not normally distributed. Is he correct? Explain your answer briefly.
Leverage versus Speed Cook’s Distance Versus Speed
i) There are 50 data values, but there are only 19 points on the leverage plot. Why is this?
j) Are there any high leverage points in the data? Justify your answer briefly
k) Are there are highly influential points in the data? Justify your answer briefly.
8