Practice Midterm Stat 112 D

Practice Midterm Stat 112 D. Small

(For first midterm, scheduled October 9th, 3:00-4:20 p.m.)

Instructions: Closed book. Calculators and one (two-sided) page of notes allowed. Write answers on the test pages along with your work. Use the back of the test or extra pages as necessary. If a question says to explain your answer, you will get no credit without some explanation. When performing hypothesis tests, include a statement of the null and alternative hypotheses. Time=80 minutes. No questions will be entertained during the exam.

I. Write the letter of the best answer for the following multiple choice questions below the question. Do not give any explanation.

1. In formulating hypotheses for a statistical test of significance, the null hypothesis is often

(A) A statement of “no effect” or “no difference.”

(B) The probability of observing the data you actually obtained

(D) 0.05

2. The P-value of a test of a null hypothesis is

(A) The probability, assuming the null hypothesis is true, that the test statistic will take a value at least as large as that actually observed

(B) The probability, assuming that the null hypothesis is false, that the test statistic will take a value at least as extreme as that actually observed

(D) The probability that the null hypothesis is false

3. I collect a random sample of size n from a population and from the data collected compute a 95% confidence interval for the mean of the population. Which of the following would produce a new confidence interval with larger width (larger margin of error) based on these same data?

(A) Use a larger confidence level

(B) Use a smaller confidence level

(D) Nothing can guarantee absolutely that you will get a larger interval. One can only say the chance of obtaining a larger interval is 0.05.

4. A certain population follows a normal distribution with mean and standard deviation s=22.1. You collect data on four members of the population and test the hypotheses

vs.

You obtain a p-value of 0.052. Which of the following is true?

(A) At the 5% significance level, you have proved that is true.

(B) You have failed to obtain any evidence for .

(D) This should be viewed as a pilot study, and the data suggests that further investigation of the hypotheses will not be fruitful.

5. A researcher wishes to determine if students are able to complete a certain pencil and paper maze more quickly while listening to classical music. Suppose the time (in seconds) needed for high school students to complete the maze while listening to classical music follows a normal distribution with mean and standard deviation . Suppose also that in the general population of all high school students, the time needed to complete the maze follows a normal distribution with mean 40 and standard deviation . The researcher, therefore, decides to test the hypotheses

vs. .

To do so, the researcher has 10,000 high school students complete the maze with classical music playing. The mean time for these students is seconds and the p-value is less than 0.0001. It is appropriate to conclude which of the following?

(A) The researcher has proved that for high school students, listening to classical music substantially improves the time it takes to complete the maze.

(B) The researcher has strong evidence that for high school students, listening to classical music substantially improves the time it takes to complete the maze.

(C) The researcher has moderate evidence that for high school students, listening to classical music substantially improves the time it takes to complete the maze.

(D) None of the above.

II. Answer the following questions based on the setting described below. A nutrition researcher received two cages of experimental rates from an experimental animal supply company. All 12 rats in the first cage were given a high protein diet. All 7 rats in the second cages were given a low protein diet. The weight gains were compared. The JMP output is shown below.

Oneway Analysis of Weight gain (grams) By Diet

Means and Std Deviations

Level / Number / Mean / Std Dev / Std Err Mean / Lower 95% / Upper 95% /
High / 12 / 51.5000 / 12.5951 / 3.6359 / 43.497 / 59.503
Low / 7 / 42.5714 / 12.4480 / 4.7049 / 31.059 / 54.084

t-Test

/ Difference / t-Test / DF / Prob > |t| /
Estimate / 8.929 / 1.497 / 17 / 0.1528
Std Error / 5.966
Lower 95% / -3.658
Upper 95% / 21.515

1. What is the p-value for the two-sided test of the null hypothesis that there is no treatment effect versus the alternative that there is some treatment effect (positive or negative)?

2. What is the p-value for the one-sided test of the null hypothesis that there is no treatment effect versus the alternative that the high-protein diet leads to larger weight gains?

3. Answer the following true or false questions – no explanation necessary.

(a) The study was a randomized experiment.

(b) There is overwhelming evidence that the difference in weight gains in the two treatment groups is greater than can be explained by chance.

(d) In the high protein group, the sample median is less than 45.

(e) At least 75% of the low protein rats had weight gains greater than 30g.

III. Short answer questions. Provide brief explanations, no more than a few sentences.

1. A researcher performed a comparative experiment on laboratory rats. Rats were assigned to group 1 haphazardly by pulling them out of the cage without thinking about which one to select. Should others question the claim that this was as good as a randomized experiment? Explain.

2. More people get colds during cold weather than during warm weather. Does that prove that cold temperatures cause people to get colds? Explain.

3. A number of volunteers were randomly assigned to one of two groups, one of which received daily doses of vitamin C and one of which received daily placebos (without any active ingredient). It was found that the rate of colds was lower in the vitamin C group than in the placebo group. It became evident, however, that many of the subjects in the vitamin C group correctly guessed that they were receiving vitamin C rather than placebos, because of the taste. Can it still be said that the difference in treatments caused the difference in cold rates? Explain.

4. In order to halve the margin of error of a 95% confidence interval for the mean of a population with a normal distribution, by what factor should the sample size of a simple random sample be increased?

5. Consider blood pressure levels for populations of young women using birth control pills and young women not using birth control pills. A comparison of these two populations through an observational study might be consistent with the theory that the pill elevates blood pressure levels. What tool is appropriate for addressing whether there is a difference between these two populations? What tool is appropriate for addressing the likely size of the difference?

6. What is wrong with the hypothesis that is 0?

7. Will an outlier from a contaminating population be more consequential in small samples or large samples?

8. The following data are metabolic expenditures for patients admitted to a hospital for reasons other than trauma and 7 patients admitted for multiple fractures (trauma).

Metabolic Expenditures (kcal/kg/day)

Nontrauma patients: 20.1, 22.9, 18.8, 20.9, 20.9, 22.7, 21.4, 20.0

Trauma patients: 38.5, 25.8, 22.0, 23.0, 37.6, 30.0, 24.5

(a) Is the difference in averages resistant?

(b) Replacing each value with its rank, from lowest to highest in the combined sample gives

Metabolic Expenditure ranks

Nontrauma patients: 3, 9, 1, 4.5, 4.5, 8, 6, 2

Trauma patients: 15, 12, 7, 10, 14, 13, 11

Compare the average of the ranks for the trauma group minus the average of the ranks for the nontrauma group. Is this statistic resistant?

9. A sociologist identified 15 days in which there was newspaper article about a suicide in New York City and 15 other days in which there was no article about suicide. For each of these she determined, from public health records, the number of suicides in New York City in the following week. She wished to see whether the mean number of suicides is greater in weeks following a newspaper article about suicide than in weeks that don’t follow a publicized suicide. Is the data structure paired or two independent samples? Can a statistical statement of causation be made?

IV. An observational study to contrast cholesterol levels in rural and urban Guatemalans came up with the data shown below. The samples are not random samples. Assess the difference in cholesterol distributions. Write a summary statistical report concerning whether there is a difference in the cholesterol distributions and by how much they differ. Include in your report a statement about the validity of the statistical tools you use and what inferences are possible.

Oneway Analysis of CHOLESTEROL By GROUP

t-Test

/ Difference / t-Test / DF / Prob > |t| /
Estimate / -59.867 / -8.078 / 92 / <.0001
Std Error / 7.411
Lower 95% / -74.585
Upper 95% / -45.148

Assuming equal variances

Means and Std Deviations

Level / Number / Mean / Std Dev / Std Err Mean / Lower 95% / Upper 95% /
RURAL / 49 / 157.000 / 31.7562 / 4.5366 / 147.88 / 166.12
URBAN / 45 / 216.867 / 39.9201 / 5.9509 / 204.87 / 228.86