Statistics 103: Answers to Practice Problems for Final Exam
1. False. The mean is roughly half way between the median and 75 percentile, so that about 37.5 percent of people have taxes greater than the mean.
2. 1000
3. True. The mean should increase when the zeros are removed.
4. False:
5. Around 16%. Any value between 12 and 20 gets credit. Use the box plot.
6. married, divorced, single.
7. False
8. False
9. Single. The median is close to zero, as evidence by the lack of a median line.
10. True. The sample size for divorced people is substantially smaller, leading to increased SE.
11. Yes. The sample sizes are large in each group, and there are no very serious outliers. Hence, the Central Limit Theorem should kick in.
12. Since the degrees of freedom equal 400, we can use 1.97 as the approximate multiplier from the t-table. , which simplifies to (5.77, 476.43).
13. It appears that the average property tax for male household heads in the population is higher than the average property tax for female household heads in the population. We are 95% confident that the amount of the difference is between $5.77 and $476.43. If we wanted to narrow this range, we’d need to collect more data.
14. Check all of the following that are true:
_X_ If we took another random sample of 500, then another, then another, and so on, we’d expect 95% of the formed confidence intervals to contain the population difference in average property taxes.
15. Test the null hypothesis that there is no difference in average property taxes between male and female household heads. State your null and alternative hypotheses, the test statistic, the p-value, and your conclusions. Consider a p-value near 0.05 to be small.
Let be the population average property tax for men.
Let be the population average property tax for women.
The null hypothesis is: Ho: .
The alternative hypothesis is Ha:
The value of the test statistic equals:
Since the degrees of freedom is 400, the p-value associated with this test statistic equals approximately 0.045. Hence, there is a 4.5% chance of seeing such a difference in the sample averages when in fact the two population averages are equal. This is a fairly small chance, so that we reject the null hypothesis. There does appear to be evidence that the population average property taxes for men and women differ.
16. Check all of the following that are true.
_X_ It may be the case that the results are due to chance, and our conclusion from the hypothesis test is wrong.
_X_ The chance of getting a value of the test statistic as or more extreme than what was observed, assuming the null hypothesis is true, equals the p-value.
17. 0
18. _X_ The slope of the line would be positive.
19.
i)Let be the population percentage of people who get colds under placebo.
Let be the population percentage of people who get colds under Vitamin C.
The null hypothesis is: Ho: .
The alternative hypothesis is Ha:
The value of the test statistic equals:
The p-value associated with this test statistic equals 0.014. Hence, there is a 1.4% chance of seeing such a difference in the sample percentages when in fact the two population percentages are equal. This is a fairly small chance, so that we reject the null hypothesis. There does appear to be evidence that the population incidence rates of colds when taking Vitamin C or placebo differ.
ii) You should not grant the request. The placebo ensures that any effects due to the way the drug is administered are equally present in the Vitamin C and control groups. For example, people may feel better because they are taking a pill, regardless of whether it is Vitamin C or not. A control group that does not take a pill would not have this effect
iii) The SE would be approximately,
iv) Because the treatments were assigned randomly to the skiers, the background characteristics in the two groups should be similar. Hence, valid causal conclusions can be drawn for these people: Vitamin C appears to work for skiers. However, I am reluctant to generalize these conclusions to other populations because skiers may react differently than the general public.
20.
i) True.
ii) False. Larger sample size means smaller SE, which means narrower CI.
iii) False. A nonrandom pattern is a violation. We hope to find a random pattern.
iv) False. With a sample size of 4, it is hard to reject the null hypothesis in favor of the alternative. The SE is too large. Hence, we cannot conclude much at all from this study.
v) False. There was no control group over the same time period, so that we have no way to tell if it is the program or something else that caused their scores to increase.
vi) False. Pick the exam with the smaller SD so that scores will be closer to the average of 75.
vii) False. Management and sex are not independent, as can be seen in the conditional probabilities of being ion management.
21.
i) True. The expected value for the sample percentage is .10, and the standard error for the sample percentage is .03. Translating to numbers out of 100, we have 10 give or take 3.
ii) False. Since the SE = .03, there is a 68% chance it will be between 7% and 13%, or 7 and 13.
iii) False. The population has 10% minority members. There is no “give or take.”
iv) False. There is roughly a 2.5% chance.
22. i) The colleague is correct because .
(ii) The colleague is not correct because he or she did not add in the covariance terms when computing the variance. We need to add 2Cov(C,I)+2Cov(C,G)+2Cov(C,X)+2Cov(I,G)+2Cov(I,X)+2Cov(G,X).
23. i) Let A be the event that you get an A.
Let S be the event that you study hard.
We want Pr(A|S).
We know that Pr(S|A) = .75, and that Pr(S| not A) = .20. Also, we have that Pr(A) = .40.
Hence, we can find that Pr(A|S) = Pr(A and S)/Pr(S)
= Pr(S|A)Pr(A) / Pr(S)
= (.75)(.40) / [(.75)(.40) + (.20)(.60)]
= .3/.42
= .714.
ii) We want Pr(A | not S).
Pr(A| not S) = Pr(A and not S) / Pr(not S)
= Pr(not S|A) Pr(A) / ( 1- Pr(S))
= (1-.75)(.40) / (1 – .42)
= .172.
24. i) Pr(Y<2) =
(ii) Pr(1.5<Y<2.5) =
(iii) Pr(Y<2.5 | Y>2) =
(iv) .
25. i)
ii)
iii)
iv)
So, SD(Y) = .7.
v)
vi)
26. Let X be the random variable for your earnings. The sample space of X is found by determining all possible outcomes (in terms of dollars) of the three rolls. Note that each roll is independent, and you have a 1/3 chance of winning on any roll. Below, W means win and L means lose.
Rolls x Pr(X=x)
WWW 30
WWL 14
WLW 16
LWW 18
WLL 0
LWL 2
LLW 4
LLL -12
(ii)
27. Last problem
(i) False. The data are not 0-1 data, so the standard error formula for percentages does not apply. The data actually are continuous, even though the values lie between zero and one. People tip at different rates. (This is like the problem on the practice midterm with the alumni donation percentages.)
(ii) False. While the calculation of the z-statistics and area under the normal curve are correct, we cannot use the central limit theorem for a sample percentage when there are only 5 observations. Recall that we need np>10 and n(1-p)>10, both of which fail when n=5 and p=.30.
The way to solve this problem is to use the fact that
Pr(less than 40% blue) = Pr(0 blue or 1 blue) = Pr(0 blue) + Pr(1 blue).
Pr(0 blue) = (.7)(.7)(.7)(.7)(.7).
Pr(1 blue) = 5 (.7)(.7)(.7)(.7)(.3). The extra 5 comes from the five ways to get one blue in five M&Ms.
28. Invest your money with me. Yeah, I’ll take care of it.
a)
b)
c)
d%
Since , we have Cov(X,Y) = .45(.77)(.10)=.0348.
Plugging in, we get Var(.75Y+.25X) = .056
So, the SD = .237
e) Pr(X>0) = 0.5, based on integrating f(x).
Pr(at least one month positive) = 1 – Pr(no months positive) = 1 - .5^12.
f) There’s a 50% chance that any month is positive. We want the chance of getting at least 70/120 = 58.33% of the 120 months with positive return. Since 100 is a large sample size, we can use the central limit theorem to figure out this chance.
29. Weather predictions
a) Pr(Sun | Pred says Sun ) = Pr (Sun, Pred says Sun) / Pr(Sun)
= (.80)(.40) / ((.80)(.40) + (.10)(.30) + (.33)(.30)) = .71
b) Pr(Cloudy or Sunny | Pred says rain) = 1 – Pr(Rain | Pred says rain)
Pr(Rain | Pred says rain) = Pr(Rain, Pred says rain) / Pr(Pred says rain)
= (.5)(.3)/((.5)(.3)+(.33)(.3)+(.05)(.4) = .557.
So, Pr(cloudy or sunny | Pred says rain) = 1 - .557 = .443.
30.
The key step is that in E(XE(Y)), we can pull out E(Y) because it is a constant, thus leaving E(Y)E(X).
31. .
To prove that , we have that
A similar proof is used for .
32.
.
b)
To make Var(V)<Var(W), we solve
for to obtain 83.333. So, we need at 84 sampled individuals from group 2 to ensure that V is more efficient than W.
33. Drinking and Driving
i) The problem states that the measurement of breathalyzer in percentages for someone with blood alcohol level .095 follows a normal curve with mean .095 and standard deviation .004.
We want the probability of getting more than .10 as a measurement. Let's standardize .10 by subtracting the mean and dividing by the standard deviation:
z = (.10-.095)/.004 = 1.25
We want the area under the normal curve to the right of 1.25, which is .1056.
ii) The measurement of breathalyzer in percentages for someone with blood alcohol level .15 follows a normal curve with mean .15 and standard deviation of .004.
We want the chance that the measurement for this person is less than .10. Standardizing, we get
z = (.10 - .15) / .004 = -12.5.
The chance of observing a z-value less than (to the left of) 12.5 is very, very small (and not even on the table).
iii) For people with true levels of .10, the chance that any one individual will be booked is .50 (the median value for these people is .10). Hence, the probability that any one individual will not be booked is also 0.50.
Now, want Pr(at least one individual is booked). Since the only possible outcomes are "at least one person is booked" and "no one is booked", it is true that: Pr(at least one is booked) + Pr(no one is booked) = 1. So, Pr(at least one is booked) = 1 - Pr(no one is booked). Now, Pr(no one is booked) = Pr(first person not booked, and second person not booked, and... etc..., and eighth person not booked, and ninth person not booked). Since the test is done separately on each person, we can assume that readings are independent.
Hence, Pr(first person not booked, and second person not booked, and..., etc..., and ninth person not booked) = Pr(first person not booked) * Pr(second person not booked) * ... * Pr(ninth person not booked)
= 0.5 * 0.5 * ... * 0.5 (nine of these 0.5s) = 0.5^9.
Finally, we have Pr(at least one person booked) = 1 - 0.5^9.
34. Sex of children
i) Because the babies' sexes are independent, Pr ( M, F, M, F, M, F) = Pr(M) * Pr(F) * Pr(M) * Pr(F) * Pr(M) * Pr(F) = .4986 * .5014 * .4986 * .5014 * .4986 * .5014
ii) Because the babies' sexes are independent, Pr ( F, F, F, F, F, F) = Pr(F) * Pr(F) * Pr(F) * Pr(F) * Pr(F) * Pr(F) = .4986 * .4986 * .4986 * .4986 * .4986 * .4986
iii) Because the babies' sexes are independent, Pr ( F | M, M, M ) = Pr (F) = .5014
35. To find the MLE, we first need the joint distribution of the data.
The likelihood function equals this probability, conceived as a function of the unknown . We then find the value of that maximizes the likelihood function. This is done by setting the derivative of the function to equal zero, which results in:
After cancellations, we get that is the maximum likelihood estimate.
36. a) I would use the chi-squared analysis (analysis ii). These data are counts, not continuous data. The t-test assumes that the data are continuous. Plus, if you think about it, the sample average frequency for the digits has to equal 50, so that the t-test is completely meaningless.
b) Because the p-value is so large, there is a good chance of seeing these results when the null hypothesis is true. Therefore, we cannot reject the null hypothesis. The data are consistent with the hypothesis that each digit has a 1 in 10 chance of being selected.
c) We just need to make one frequency really small and the other really large. For example, we could make the frequency for eight to be 1, and the frequency for nine to be 99. This would mean that eight happens much less than 10% of the time, and nine happens much more than 10% of the time, thereby rejecting the null hypothesis. Note that the two frequencies should add to 100 since we are not changing the frequencies for other digits, and the original sum of the frequencies was 100.
37. a)
Favorite color Male Female
Red 10 20 You want to make a table so that the %
Blue 10 20 of males who like red equals the % of males
Green 20 40 who like blue, and also equals the % of males
Total 40 80 who like green. An example is shown here.
b)
Favorite color Male Female
Red 1 78 You want to make a table so that the % of
Blue 1 1 males who like red is very different from
Green 38 1 either the % of males who like green or the %
Total 40 80 of males who like blue. Example shown here.