SOLUTIONS TO FINAL EXAM
VERSION 2
1)
A) The plot seems rather noisy, although as Food quality increases it seems that the expected value of Price increases also. If we compare the data points to the fitted line, there seems to be a slightly parabolic shape. The very low-quality restaurants are somewhat overpriced, and so are the very high-quality restaurants, in comparison with the fitted model.
B) The slope coefficient is 2.1670, which is positive as would be expected. The p-value is less than .0005/2=.00025, so the relationship between food quality and price is highly statistically significant. The is 28.7%, indicating a moderate linear relationship, but a somewhat noisy one.
C) The p-value for the intercept is .161 (two-tailed), which is not less than .05, so there is not strong evidence to suggest that the true intercept is different from zero.
2)
A) Yes, because the is now 77.6% for the model with Décor and Service, which is clearly much larger than what we had for the simple linear regression model on Food. Note that here the coefficients for Décor and Service are both highly statistically significant. (But so was the coefficient for Food in the simple regression).
B) No, because for a given value of Service and Décor, the Food rating is random. Nothing can be guaranteed in the face of randomness. But since the p-value for the coefficient of Service is less than .0005/2=.00025 (right-tailed), there is strong evidence that the expected value of Price increases as Service increases with Décor held fixed.
3)
A) The appropriate p-value here is the right-tailed p-value for the coefficient of Food. (It’s right tailed since we are looking for a positive relationship). Surprisingly, the coefficient is negative, and so the right-tailed p-value is 1-.518/2=.741. Thus, there is definitely not a strong and statistically significant positive relationship here. This may seem to contradict the findings from Problem 1, but remember that the value and meaning of a given regression coefficient depends on what other variables are in the model.
B) The sample size is large here, so we can use in constructing the confidence interval. The interval is
C) First, compute . The residual for Elaine’s is
4)
A) The predicted Price decreases by 10.7 cents for every one point increase in Food quality, holding Décor and Service fixed. This may seem like nonsense, but note that the coefficient is not statistically significant, so there is no strong reason to believe that the true underlying coefficient of Food is negative. Also, it may be that Décor and Service manage to contain all the relevant information for pricing the meal, and once these variables are in the model, Food quality is superfluous. Presumably, Décor and Service correlate highly with Food quality, and their presence seems to render Food quality an irrelevant variable.
B) For the model with Food, Décor and Service, we have . For the model with just Décor and Service, we have . So the did go up by a tiny amount when Food was included, but we know that always goes up when a new variable is introduced, even if it is completely irrelevant. So it does not imply that Food is an important variable in the larger model.
C) In terms of it’s a close call, but the model we prefer is the one with the smallest value of , which in this case is the simpler model with just Décor and Service. This is consistent with everything else we have seen above, so the model that seems to best describe Price is the one with Décor and Service (but not Food quality) as predictors.
5)
The plot looks fairly straight in the middle, but viewed as a whole it is slightly S-shaped. The deviations from the line occur at the edges, and suggest that the data has somewhat longer tails than a normal distribution. The largest values of the residuals are larger than would be expected for the corresponding percentile of a normal distribution. Also, the smallest values of the residuals are smaller than would be expected for the corresponding percentile of a normal distribution. (The normal distribution used here is the one that has the same mean and standard deviation as the data). So overall, the residuals seem to have longer tails than a normal distribution. The p-value is .028, so we can reject the null hypothesis that the residuals are normal at level .05 but not at level .01. Overall, then, we have moderate (but not overwhelming) evidence that the residuals (and hence the underlying errors) are not normal.
6) The statement is false. An example to show this is the variable Food in the regression data set from problems 1) to 5). The coefficient of Food was significant in a simple regression, but not when all of the other variables were also included. Answer is B.
7) Define X = Demand for a given customer. Then X is a discrete random variable taking values 0, 1, 2 with probabilities .75, .2 and .05, respectively. We have and Thus, The total demand is . We want Prob{Total Demand ³100}=Prob{³100/400} where is the sample average demand per customer. By the Central Limit Theorem, is approximately normally distributed with mean m=.3 and standard deviation . Thus, Prob{³ .25} » Prob{Std Normal ³(.25-.3)/.0278} = Prob{Std Normal ³-1.80} = .5+.4641 = .9641. Answer is D.
8) We have so 36% of the variability in Y is explained by X. Answer is C.
9) Since the p-value is .01, the t-statistic must be exactly at the rejection point for the left-tailed test, so we have t=with a=.01 and df=4, so t = -3.747. Answer is C.
10) The rejection region is |t|> (df=9), that is, |t|>3.250. So we have t=3.250=(2-1) / (s /) and thus Answer is C.
11) First note that the probability that the p-value is less than .01 is the same as the probability that we can reject the null hypothesis at level .01. When is true this probability becomes greater than .01 as we have demonstrated in our discussion of hypothesis testing. The greater is compared to , the higher the probability that we will reject the null hypothesis. Answer is B.
12) Let p be the probability of Heads. We have , and the value of p under the null hypothesis is The sample proportion is The test statistic is The p-value is Prob{Std Normal > 1.34} =.5-.4099 = .0901 which is between .05 and .1. Thus, we can reject the null hypothesis at level .1 but not at level .05. Answer is A.
13) The p-value is Prob{|Std Normal| > 2.1} = 2 Prob{Std Normal > 2.1} = 2(.5-.4821) = .0358. Answer is C.
14) The CI is . We have df=10, a=.05, =2.228. The CI is Answer is B.
15) We need the number of arrangements of 6 things (the six digits 1,2,3,4,5,6) taken 6 at a time. The number of ways of doing this is 6!=(6)(5)(4)(3)(2)(1)=720. Answer is D.