Statistics 512 Notes 6

Hypothesis Testing Continued

Quick Review on Hypothesis Testing

Goal: Decide between two hypotheses about a parameter of interest

,

where .

Null vs. Alternative Hypothesis: The alternative hypothesis is the hypothesis we are trying to see if there is strong evidence for. The null hypothesis is the default hypothesis that we will retain unless there is strong evidence for the alternative hypothesis.

Test statistic and critical region: Test is defined by test statistic and critical region. Critical region is region of values of test statistic for which we will reject the null hypothesis.

Errors in hypothesis testing: Type I and Type II errors.

Size of test, power of test: Power function of test = =

Probability of rejecting null hypothesis when true parameter is .

Size of test =

Power at an alternative =

Neyman-Pearson paradigm: Choose size of test to be reasonably small to protect against Type I error, typically 0.05 or 0.01. Among tests which have prescribed size, choose the most powerful test.

P-value: For a test statistic , consider a family of critical regions each with different sizes. For the observed value of the test statistic from the sample, consider the subset of critical regions for which we would reject the null hypothesis, . The p-value is the maximum size of the tests in the subset ,

p-value = .

The p-value is a measure of how much evidence there is against the null hypothesis; it is the maximum significance level for which we would still reject the null hypothesis.

Consider the family of critical regions for the motivating example. Since the graphologist made 6 correct identifications, we reject the null hypothesis for critical regions . The maximum size of the critical regions is for i=6 and equals 0.377. The p-values is thus 0.377.

Scale of evidence

p-value / evidence
<0.01 / very strong evidence against the null hypothesis
0.01-0.05 / Strong evidence against the null hypothesis
0.05-0.10 / weak evidence against the null hypothesis
>0.1 / little or no evidence against the null hypothesis

Large sample binomial hypothesis tests:

For large samples, we can use the Central Limit Theorem to construct a test with size approaching a prescribed value as the sample becomes large.

The Sports Illustrated Jinx:

Many athletes believe that there is a Sports Illustrated jinx: appearing on the cover of Sports Illustrated tends to lead to a subsequent decline in performance. Gluckson and Leone (1984) put the Sports Illustrated jinx to the test. Let pdenote the probability that the performance level of a cover subject declines. If the performance is such that in normal circumstances performance is as likely to decline as not, then the hypotheses that Gluckson and Leone set out to test can be written

(SI cover has no effect)

(SI jinx exists)

Included in the study were some 271 subjects appearing on SI covers during the years 1954 through 1983. Let denote the number of subjects whose performance subsequently declined. We use as our test statistic. We would like to do a test of size approximately 0.05. Consider critical regions of the form . To choose y* so that the test has size 0.05, we need to solve:

As written, solving this equation would be difficult. The task can be greatly simplified by using the Central Limit Theorem which says that

for a binomial random variable. Thus,

for a standard normal random variable . Since , it follows that

.

Specifically, y*=149.

The observed number of declines was found to be 114. Since 114<149, we do not reject the null hypothesis. There is no strong evidence of a Sports Illustrated jinx.

p-value: Consider the test with critical region

The approximate size of the test is . We have . Thus, we reject for all tests with critical region with . The maximum size among these tests for which we reject is for with size =0.995. This the p-value, p-value = 0.995. No evidence against the null hypothesis – no evidence of a Sports Illustrated jinx.

Choosing the sample size

In the Neyman-Pearson, we choose the size of the test to be small to protect against Type I errors, typically we set the size to be 0.05. This constrains the power of the test. To achieve both a small size and a high power, we can choose the sample size.

Example: Suppose we want to test vs. from an iid Bernoulli sample. Suppose we want the size to be 0.05 and the power to be 0.8 for the alternative . How large a sample size do we need if we use the large sample binomial test?

Let Y be the number of successes. Using the large sample binomial test, the test statistic is

For large n, W has approximately a standard normal distribution when . Thus, a test of size 0.05 has critical region . The power of this test when is

We have where is the standard normal CDF. Thus, we want to choose the sample size so that

The smallest sample size that achieves this is found by setting and solving for resulting in . Thus, the smallest sample size needed is 17.

Testing a normal mean

Suppose iid with the variance known. We want to test

vs. .

Consider the test statistic and critical region . What do we need to choose to be so that the size of the test is 0.05?

where Z is a standard normal random variable.

Thus, we want to choose c to be the 0.95 quantile of the standard normal distribution which equals 1.645.

Suppose we wanted to test vs. . The size of the test with test statistic and critical region is . We have

Because is an increasing function of , the size of the test is

. Thus a test of size 0.05 for testing vs. is the same as the test of size 0.05 for testing vs. -- the critical region is where .

Two sided tests: Suppose we want to test vs. . Using the test statistic still seems reasonable but now it makes sense to reject for both very large and very small values of . We can use a critical region of the form . A test of size 0.05 has critical region because

Duality between tests and confidence intervals

Suppose we want to test vs. and use the rejection region . Then, the set of for which the is not rejected is

which is the 95% confidence interval for that we have used.

In general, there is a duality between tests and confidence intervals.

Suppose we have a family of tests of size of vs. for each . Then

is a confidence interval for .

Proof:

Conversely, suppose we have a confidence interval for . Then a test of size of vs. is to reject the null hypothesis if and only ifdoes not belong to the confidence region.

Proof: