[Comment1]Part I.

True/False questions. Read each statement and circle “true” if it is correct and “false” if incorrect (2 points each).

1. A t-distribution with 1000 degrees of freedom is wider than a normal distribution.

TrueFalse

2. Other factors equal, a confidence interval will be wider if the standard deviation is known than if it is estimated from a sample.

TrueFalse

3. The regression model is appropriate when the means of y conditional on x fall in roughly a straight line

TrueFalse

4. When we know s instead of , we should use the t-distribution in place of the normal distribution to conduct hypothesis tests for a mean.

TrueFalse

5. Holding other factors constant, a prediction interval calculated for an individual prediction from a regression equation will usually be wider than a confidence interval for a mean response predicted from the same regression equation.

TrueFalse

Part II

Multiple choice: For the problems below, circle the answer. There is only one correct answer to each question.

6. A radio talk show host with a large audience is interested in the proportion p of adults in his listening area who think the drinking age should be lowered to eighteen. To find this out he poses the following question to his listeners. "Do you think that the drinking age should be reduced to eighteen in light of the fact that eighteen-year-olds are eligible for military service?" He asks listeners to phone in and vote "yes" if they agree the drinking age should be lowered and "no" if not. Of the 100 people who phoned in 70 answered "yes." Which of the following assumptions for inference about a proportion using a confidence interval are violated?

A)The data are an SRS from the population of interest.

B)The population is not highly skewed.

C)n is so large that both the count of successes n pand the count of failures n(1 – p) are ten or more.

D)A and B

E)A and C

F)There appears to be no violations.

7. In forming a hypothesis for a statistical test of significance, the null hypothesis is often

A) The statement of “no effect” or “no difference”

B) The probability of observing the data you actually obtained.

C) The statement that the data are all 0.

D) .05

E) A and B

F) None of the above

8. I draw an SRS of size 15 from a population that has a normal distribution with mean  and standard deviation . The one-sample t statistic has how many degrees of freedom?

A) 15

B) 14

C)

D) We cannot determine the degrees of freedom without knowing the value of s.

E) We do not need to calculate degrees of freedom here.

9. An SRS of 100 postal employees found that the average time these employees had worked for the postal service was 7 years with standard deviation 2 years. Suppose we are not sure if the population distribution is normal. Which of the following situations would violate the conditions for using t procedures?"

A) A histogram of the data shows mild skewness

B) A stemplot of the data has a small outlier

C) The sample standard deviation is large

D) A and B

E) B and C

F) A and C

D) A, B, and C

E) None of the above.

10. Which of the following is an example of a matched pairs design?

A) A teacher compares the pre-test and post-test scores of students.

B) A teacher compares the scores of students using a computer based method of instruction with the scores of other students using a traditional method of instruction.

C) A teacher compares the scores of students in her class on a standardized test with the national average score.

D) A teacher calculates the average of scores of students on a pair of tests and wishes to see if this average is larger than 80%.

E) A and B

F) B and C

G) C and D

H) A and D

E) None of the above

11. Eighty rats whose mothers were exposed to high levels of tobacco smoke during pregnancy were put through a simple maze. The maze required the rats to make a choice between going left or going right at the outset. Sixty of the rats went right when running the maze for the first time. Assume that the eighty rats can be considered an SRS from the population of all rats born to mothers exposed to high levels of tobacco smoke during pregnancy. (Note that this assumption may or may not be reasonable, but researchers often assume lab rats are representative of such larger populations because lab rats are often bred to have very uniform characteristics.) The standard error for the proportion p of those who went right the first time when running the maze is

A)0.0023. B) 0.0484. C) 0.0548. D) 0.0559.

12. The weights of three adult males are (in pounds) 160, 215, and 195. The standard error of the mean of these three weights is

A)190.00. B) 27.84. C) 22.73. D) 16.07.

Use the following text to answer questions 13 to 15: In 1989 two researchers surveyed a group of ninety-four third and fourth grade children asking them to rate their level of fearfulness about a variety of situations. Two years later, the children again completed the same survey. The researchers computed the mean fear rating for each child in both years and were interested in the relation between these ratings. They then assumed that the true regression line was

1991 Mean Rating =  + (1989 Mean Rating)

Suppose we wish to predict the 1991 mean fear rating for all children who had a 1989 mean fear rating of 1.5. We use statistical software to do the prediction and obtain the following output.

1989 Mean Fear RatingFitStDev Fit95% C.I.95% P.I.

1.5 1.4748 0.0284 (1.418, 1.531)(1.000, 1.950)

13.The explanatory variable in this study is

A)1989 mean fear ratings.

B)1991 mean fear ratings.

C)the difference in the mean fear ratings for the two years.

D)the particular group of ninety-four children used in the study.

14. A 95% confidence interval for the 1991 mean fear rating for all children who had a 1989 mean fear rating of 1.5 is

A)1.418 to 1.531. B) 1.000 to 1.950. C) 1.4748  0.0284. D) 1.5  0.0284.

15. Suppose we wish to predict the 1991 mean fear rating for a child who had a 1989 mean fear rating of 1.5. A 95% interval for this prediction is

A)1.418 to 1.531. B) 1.000 to 1.950. C) 1.4748  0.0284. D) 1.5  0.0284.

Part III: Numerical and short answer questions. Please answer the questions below. If you hope to get partial credit please show your work.

1. For each case below, find the critical value (t* or z*) you would use for the significance test or confidence interval (your answer should be a number). State whether you would use z or t, and give the degrees of freedom if appropriate..

A. A 99% confidence interval for a two-sample of the difference in mean earned income among men and women based on a sample of 30 women and 20 men, with means and standard deviations calculated from the sample? (3 points)

B. A 98% confidence interval for a test of the difference between two proportions with n=50 for both samples? (3 points)

C. A significance test for a regression slope based on a regression with n=20 at  = .04, two tailed test? (3 points)

2. The 1958 Detroit Area study investigated the influence of religion on everyday life. The sample was a simple random sample of the population of Detroit. One question asked if the right to free speech includes the right to make speeches in favor of communism. Of the 267 white Protestants, 104 said “yes”; of the 230 white Catholics, 75 said “yes”.

A. What proportion of white Protestants agreed that free speech includes the right to make speeches in favor of communism? What proportion of white Catholics agrees that free speech includes the right to make speeches in favor of communism? (2 points)

B. A historian claims that a minority of white Protestants in Detroit in 1958 believed that free speech included the right to make speeches in favor of communism.

1. What are the hypotheses to analyze this claim? (2 points)

2. What is the test statistic corresponding to this claim? (3 points)

3. What conclusion does this lead you to about the historian’s claim at  = .01? (2 points)

C. Give a 95% confidence interval for the difference between the proportion of white Protestants who agreed that communist speeches are protected and the proportion of Catholics who hold this opinion. (5 points)

3. We observe a sample mean of 80 with a sample standard deviation of 10 from a sample of size 25. We want to test the hypotheses:

H0:  = 64

HA:  > 64

A. What is the standard error of the sample mean? (3 points)

B. What is the test statistic for the test of the null hypothesis? Give the degrees of freedom if appropriate (5 points)

D. What would the critical value be for the test at  = .01? (3 points)

4. In each of the following problems, explain whether or not we can safely rely on the t-distribution or normal distribution of the test statistic to compute the confidence interval or hypothesis test, based on the “rules” we learned in class.

A. We draw a simple random sample of 25 men and 25 women, and find out the income earned by each subject. Income is a very highly skewed population (among both men and women). Can we use a t-test to examine whether or not income for men and women are equal? Why or why not? (3 points)

B. We draw a sample random sample of 500 cab drivers from the population of all cab drivers in the United States. Of that sample, 450 have clean cabs and 50 have dirty cabs. Can we use the normal distribution to test the hypothesis that more than 80% of all cab drivers have clean cabs? Why or why not? (3 points)

C. A researcher runs a regression including 200 cases. His diagnostics suggest that the residuals follow a highly skewed distribution.

(1) Is it appropriate to use the regression to generate predictive intervals? Why or why not? (3 points)

(2) C Is it appropriate to use the results to do a hypothesis test for a regression slope? Why or why not? (3 points)

5. Below is the output for a t-test comparing the number of hours worked by married men and single men. The last line of the usual output has been deleted.

Two-sample t test Single: Number of obs = 200

Married: Number of obs = 553

------Variable | Mean Std. Err. t P>|t| [95% Conf. Interval ------+------single | 40.21 .9878557 40.7043 0.0000 38.26199 42.15801 married | 42.53888 .6147335 69.1989 0.0000 41.33138 43.74638 ------+------diff | -2.328879 1.16351 -2.0016 0.0461 -4.616939 -.0408188 ------

Ho: mean(single) - mean(married) = diff = 0

Ha: diff < 0 Ha: diff ~= 0 Ha: diff > 0 t = -2.0016 t = -2.0016 t = -2.0016

A. What null hypothesis is being tested above? (you don’t need to write down the alternative) (2 points)

B. Based on the above output, what is a 95% confidence interval for the mean of hours worked by married men? (2 points)

C. Give a range for the p-value with the alternative hypothesis shown in the output as

“Ha: diff < 0"? (3 points)

6. Below shows a residual versus x graph from a regression of doctors per 1000 persons (response) versus illiteracy (explanatory).

A. Based on visual inspection of the above graph, how many of the data observations have residuals that are larger than 2? (2 points)

B. Based on the residual graph, do you think the author can go ahead in using the regression model unaltered and trust the significance tests? If something is wrong, what? (7 points)

7. A student runs a regression of the number of hours of TV watched per day (response) on the years of education (explanatory) for a simple random sample of respondents from the general social survey. She obtains the following results:

Sample size n = 1465

Intercept a: 5.433871

Slope b: -.1895677

Standard Error of b: .0189853

A. Can we conclude at  = .01 that there is a significant relationship between the number of hours of TV watched and education? (2 points)

B. What is a 90% confidence interval for the slope coefficient? (3 points)

C. Based on the model, what do we predict will be the average number of hours of TV watched by all persons with 16 years of education? (2 points)

D. Based on the model, what do we predict will be the number of hours of TV watched by a single person with 12 years of education? (2 points)

E. Which of the two predictions (part C or part D) will be more accurate? (2 points)

8. Testing study courses like the Princeton review have long claimed that they can improve students scores on college entrance exams like the SAT. Six students record their SAT scores before and after taking the Princeton review course. Assume that scores on SAT tests are normally distributed (both before and after taking a test-preparation class.) Their scores are:

BeforeAfter

1.400550

2.450610

3.520490

4.540540

  1. 480460
  2. 550620

1

On the basis of these 6 students, can we conclude that scores of students who take the course are significantly higher than before they took the course? What is the p-value for analyzing this claim based on the data, and what conclusion do you draw from that p-value? (7 points)

End of Practice Exam.

1

[Comment1]Minor changes 3 dec 03 nc