Materials Covered: Chapter 7

STT 315, Summer 2006

Lecture 6

Materials Covered:Chapter 7

Suggested Exercises:7.1---7.5, 7.11, 7.16, 7.21, 7.23, 7.39, 7.59, 7.60.

1. Hypothesis testing

Confidence intervals are used for estimation. Hypothesis testing is used for making decisions. We'll learn about hypothesis tests via an example related to ESP (Extrasensory Perception).

2. Basic concepts (Section 7-2)

Example 7.1(Rhine's ESP experiments): In the 1930s J.B. Rhine and others conducted experiments to test whether a person had ESP.

A: The procedure

--- A deck of cards with 5 designs (square, circle, star, plus sign, wavy lines) was used.

--- The cards were shuttled thoroughly by the experimenter.

--- A card was turned over each minute, and the subject had to write down the design without seeing the card.

B: The hypotheses, informally

We are to decide between the two competing claims:

--- The subject does not have ESP --- The subject has ESP

We want to rewrite these in the context of a probability model. Let p stand for the probability that the subject correctly identifies a card.

If the subject does not have ESP, we'd expect p = 0.2.

If the subject does have ESP, we'd expect p > 0.2.

C. The hypotheses, formally

We are to decide between p = 0.2 and p > 0.2 based on the data.

We'll call the claim p = 0.2 the null hypothesis, denoted H0.

We'll call the claim p > 0.2 the alternative hypothesis, denoted H1.

More concisely, the hypotheses are

H0: p = 0.2 vs H1: p > 0.2

In this problem we are not interested in cases where p < 0.2, since this wouldn't provide evidence for ESP. But it is sometimes convenient to include these cases in the null hypothesis, so that we write H0: p0.2.

The way we'll test hypotheses, only the parameter value in H0that is closest to the alternative hypothesis H1 influences the test, so we'll get the same results whichever way we state the null hypothesis. So we'll write the null hypothesis in the form that makes most sense in the context of the problem.

Note that the burden is on the experimenter to disprove H0. We'll always try to set up hypotheses this way, where we'll stick with H0unless there's strong evidence against it.

Useful to think of these in legal terms:

Ho: the defendant is innocent vs. H1 : the defendant is guilty

In our legal system, a defendant is presumed innocent until proven guilty.

D. Possible errors

In making a decision, we risk 2 possible errors.

Deciding against H0when it is in fact true, this is called a “Type I error”.

In the legal analogy, a Type I error means convicting an innocent person.

Deciding to stick with H0when H1is in fact true, this is called a “Type II error”.

In the legal analogy, a Type II error means acquitting a guilty person.

E. The test statistic

We need to use the data to make decision. As usual, let stand for the sample proportion of correct answers. Intuitively if is significantly greater than 0.2, we'll decide in favor of H1.

Define. Saying that is significantly greater than 0.2 is the same as saying that Z is significantly greater than 0. (As long as we define “significantly” properly in both cases.) It's more convenient to work with Z, because we know that if H0is true, then Z (approximately, for large n) has a standard normal distribution. We call Z the test statistic.

F. The p value

Important note: Don't confuse the p value with the parameter p.

How do we decide whether Z is significantly larger than 0?

We'll assume H0is true, and see how likely it is that we'd get a value of Z as large as or larger than the one we got in the experiment. The answer is the p-value.

G. Computing the p-value

A specific experiment of the type described was performed in 1938.

A large number of students were used as subjects. There were a total of 60000 cards used. The subjects got 12489 of the 60000 correct, which is a proportion of 0.20815 correct. The observed proportion = 0.20815 is bigger than 0.2, but is it large enough to choose H1? To answer this, we compute the observed value of the Z statistic:

The p-value is the probability that a standard normal random variable is greater than 4.99, which is a very small number (about 0:0000003 from z-Table).

H. Drawing conclusions

Such a small p-value (0:0000003) provides strong evidence against H0, so we would probably choose the alternative hypothesis H1, that p > 0.2. Based on the data, I would be comfortable concluding that the true probability of correctly identifying the card is greater than 0.2.

Typically we decide a cutoff value denoted by , before the data are collected. If the calculated p-value is less than , we reject H0. If the calculated p-value is not less than , we don't reject H0. Smaller cutoffs  provide more protection against Type I errors, but less protection against Type II errors.

The rejection region consists of those values of the test statistic that will lead to the rejection of the null hypothesis. The size of the rejection region, called the level of significance (exactly ), determines how small the p-value should be before we reject the null hypothesis.

Policy: If a level of significance  is specified, rejectif p-value<.

Basic steps in testing hypotheses about proportions:

1) State the hypotheses.

2) Decide an appropriate cutoff value , if desired.

3) Collect data and compute the test statistic

here po is the value of p specified by the null hypothesis.

4) Calculate the p-value.

The way we compute the p-value depends on the H1.

(i). For H1: p > po (right-sided test), the p-value is the area to the right of zo under a standard normal density, i.e. p-value =;

(ii). For H1: p < po (left-sided test), the p-value is the area to the left of zo under a standard normal density. i.e. p-value =

(iii). For H1: p po (two-sided test), the p-value is twice the area to the right of zo under a standard normal density if zo is positive. p-value =2* if > 0 . If <0, p-value =2*, where zois the statistic value.

5) Interpret the results.

Example 7.2 In 1999, 17% of high school students smoked frequently (20 or more days a month).

An education campaign aimed at reducing teen smoking was instituted. To determine whether it was effective, a new study interviewed 500 high-school students.Of these, 80 smoked frequently. Note that, so the sampleproportion is less than 0.17. Our job is to decide whether this is attributable to a real dropin smoking or can be attributed to the fact that we've only looked at a sample.

Hypotheses: H0: 0.17 vs. H1: p < 0.17

Here stands for the proportion of frequent smokers among all high-school students.In words, the null hypothesis specifies that the campaign was not effective, and the alternative specifies that it was. We'll use  = 0.10 as our cutoff (level of significance).

Example 7.3A mathematician (John Kerrich) tossed a coin 10000 times to determine whether it was fair! Hypotheses: H0: p = 0.5 vs. H1: p 0.5. Here p stands for the true probability of HEADS for the coin. In words, the null hypothesis specifies that the coin is fair, while the alternative hypothesis specifies that the coin is not fair. We'll use a cutoff of = 0.05. What’s your decision?

3. Testing hypotheses about means

Testing hypotheses about means is similar to testing hypotheses about proportions.

1) State the hypotheses (in terms of )

2) Decide an appropriate cutoff value , if desired

3) Collect data and compute the test statistic

A. If the population standard deviation is known.

B. If the population standard deviation is unknown.

Here o is the value of  specified by the null hypothesis.

4) Calculate the p-value.

A. If the population standard deviation is known, using the z table.

------Left-tailed test: then

------Right-tailed test:, then

------Two-tailed test: , then

B. If the population standard deviation is known, using the z table.

------Left-tailed test: then

------Right-tailed test: , then

------Two-tailed test: , then

5)Interpret the results.

(If a level of significance  is specified, rejectif p-value<.)

Example 7.4Question: How accurate are radon detectors?

Study: Twelve radon detectors were placed in a chamber which exposed them to 105 picocuries/liter of radon. The mean of the 12 readings was 101:13 and the standarddeviation of the 12 readings was 9.40. We want to test Ho: = 105 versus H1: 205. Choose = 0:10.

Example 7.5 To justify raising its rates, an insurance company claims that the mean medical expense for all middle-class families is at least $700 per year. A survey of 100 randomly selected middle-class families found that the mean medical expense for the year was $670 and the standard deviation was $140. Assuming that the tails of the distribution of medical expenses are not usually long, is there any evidence that the insurance company is misinformed?