COR1-GB.1305.03
FINAL EXAM

This is the question sheet. There are 10 questions, each worth 10 points. Please write all answers in the answer book, and justify your answers. Good Luck!

In questions 1-6, we consider a data set on 14 brands of U.S. domestic cigarettes in the 1990s. The data are provided in Table 1.

The response variable (y) is carbon monoxide content, denoted by CO (mg/cigarette).

The three explanatory variables are Nicotine content (mg/cigarette), Tar content (mg/cigarette), and Weight (grams/cigarette).

Figures 1-3 provide scatterplots of CO versus Nicotine, Tar and Weight, respectively.

Figure 4 provides a scatterplot of Nicotine versus Tar.

Table 1

Brand name / CO / Nicotine / Tar / Weight
Alpine / 13.6 / 0.86 / 14.1 / 0.9853
Benson&Hedges / 16.6 / 1.06 / 16.0 / 1.0938
BullDurham / 23.5 / 2.03 / 29.8 / 1.1650
CamelLights / 10.2 / 0.67 / 8.0 / 0.9280
Carlton / 5.4 / 0.40 / 4.1 / 0.9462
Chesterfield / 15.0 / 1.04 / 15.0 / 0.8885
GoldenLights / 9.0 / 0.76 / 8.8 / 1.0267
Kent / 12.3 / 0.95 / 12.4 / 0.9225
Kool / 16.3 / 1.12 / 16.6 / 0.9372
L&M / 15.4 / 1.02 / 14.9 / 0.8858
LarkLights / 13.0 / 1.01 / 13.7 / 0.9643
Marlboro / 14.4 / 0.90 / 15.1 / 0.9316
Merit / 10.0 / 0.57 / 7.8 / 0.9705
MultiFilter / 10.2 / 0.78 / 11.4 / 1.1240

1) Consider a simple regression of CO on Nicotine. The corresponding Minitab output is given below.

Regression Analysis: CO versus Nicotine

The regression equation is

CO = 2.95 + 10.9 Nicotine

Predictor Coef SE Coef T P

Constant 2.948 1.132 2.60 0.023

Nicotine 10.905 1.124 9.71 0.000

S = 1.51727 R-Sq = 88.7% R-Sq(adj) = 87.8%

Analysis of Variance

Source DF SS MS F P

Regression 1 216.88 216.88 94.21 0.000

Residual Error 12 27.63 2.30

Total 13 244.51

A)  Is there evidence at the 1% level of significance that the true intercept is different from 0? Justify your answer. (3 points).

B)  Is there evidence at the 5% level of significance that the true coefficient of Nicotine is different from 10? (5 points).

C)  Which brand corresponds to the outlier in the upper righthand corner of Figure 1? (1 point).

D)  Based on the Minitab output, and keeping in mind that the p-values reported by Minitab are rounded to three digits after the decimal point, what can you say about the actual (not rounded) p-value for testing the null hypothesis that the true coefficient of Nicotine is zero versus the alternative hypothesis that it is positive? (1 point).

2) Consider a multiple regression of CO on Nicotine and Tar. The corresponding Minitab output is given below.

Regression Analysis: CO versus Nicotine, Tar

The regression equation is

CO = 4.22 - 3.79 Nicotine + 0.936 Tar

Predictor Coef SE Coef T P

Constant 4.2225 0.8462 4.99 0.000

Nicotine -3.787 3.939 -0.96 0.357

Tar 0.9358 0.2460 3.80 0.003

S = 1.04147 R-Sq = 95.1% R-Sq(adj) = 94.2%

Analysis of Variance

Source DF SS MS F P

Regression 2 232.58 116.29 107.21 0.000

Residual Error 11 11.93 1.08

Total 13 244.51

A)  Why do you think that the p-value provided by Minitab for the coefficient of Nicotine is so much larger than it was in the simple regression of CO on Nicotine alone? (3 points).

B)  Based on the Minitab output for this regression, is there strong statistical evidence that the true coefficient of Nicotine in this model is negative? (3 points).

C)  Do the R2 values from the two regressions considered so far indicate that the model with Nicotine and Tar is better than the model with Nicotine alone? (4 points).

3) Do you see any contradiction between the very different values for the coefficient of Nicotine in the two models considered so far? Is one value more trustworthy than the other? Explain. (10 points).

4) Consider the multiple regression of CO on Nicotine, Tar and Weight. Here is the corresponding Minitab output.

Regression Analysis: CO versus Nicotine, Tar, Weight

The regression equation is

CO = 9.68 - 2.92 Nicotine + 0.923 Tar - 6.21 Weight

Predictor Coef SE Coef T P

Constant 9.684 3.077 3.15 0.010

Nicotine -2.916 3.605 -0.81 0.437

Tar 0.9227 0.2233 4.13 0.002

Weight -6.207 3.386 -1.83 0.097

S = 0.944995 R-Sq = 96.3% R-Sq(adj) = 95.3%

Analysis of Variance

Source DF SS MS F P

Regression 3 235.579 78.526 87.93 0.000

Residual Error 10 8.930 0.893

Total 13 244.509

A) Predict the CO for a brand with Nicotine content of 1 mg/cigarette, Tar content of 10 mg/cigarette and Weight of 1 gram/cigarette. (1 point).

B)  Interpret the estimated coefficient of Tar in the model. (3 points).

C)  Show how the F statistic reported by Minitab can be obtained from other numbers in the Minitab output. (1 point).

D)  Give a practical interpretation the results of the F-test for this data set. (5 points).

5) For the multiple regression of CO on Nicotine Tar and weight,

A) Construct a 95% confidence interval for the true coefficient of weight. (5 points).

B) State any assumptions needed for the interval in A) to be valid. (5 points).

6) The Minitab 1-sample T output for CO is:

One-Sample T: CO

Variable N Mean StDev SE Mean 95% CI

CO 14 13.21 4.34 1.16 (10.70, 15.71)

A)  Based on this output, construct a 99% confidence interval for the mean. (2 points).

B)  Interpret the confidence interval, in terms of CO. (2 points).

C)  What is the probability that your 99% confidence interval contains the mean. (2 points).

D)  State any assumptions required for the validity of this confidence interval and discuss whether the assumptions seem to hold based on evidence from the data set. (4 points).

7) In the normal linear regression model with n=100, if the probability that the sum of the errors exceeds 10 is .025, then what is the probability that at least one of the errors will be less than −1? (10 points).

8) Suppose we have a data set generated by the multiple linear regression model with k explanatory variables. If the y values are all doubled, holding all the x variables fixed, it can be shown that the Least Squares estimates of the regression parameters will all double. What will happen to the value of R2? Justify your answer. (10 points).

9) Given a random sample of size 10 from a normal population, suppose that the 95% confidence interval for the mean is (−.1453, .2511). Based on this information, calculate the t-statistic for testing the null hypothesis (10 points).

10) Suppose you collect 50 independent random samples of size 10 from a normal population with mean (Note that there are 50 samples here, not just one. Each of these samples consists of 10 observations.) For each sample, you perform a t-test of the null hypothesis , versus the two-tailed alternative hypothesis If Y is the total number of samples (of the 50) for which the p-value is less than .05, then what is the standard deviation of Y? (10 points).