Chapter 8: Optimal Tests of Hypotheses

Chapter 8: Optimal Tests of Hypotheses

Most Powerful Tests

For versus (*note: is also commonly denoted as )

Let the random sample be denoted by

Suppose the rejection region is such that: (*note: the rejection region can be defined in terms of the sample points directly instead of the test statistic values)

We reject if

We fail to reject if

Definition: The size or significance level of the test is defined as:

Definition: The power of a test is given by

Definition: A most powerful test corresponds to the best rejection region such that in testing a simple null hypothesis versus a simple alternative hypothesis, this test has the largest possible test power among all tests of size

Theorem (Neyman-Pearson Lemma). The likelihood ratio test for a simple null hypothesis versus a simple alternative hypothesis is a most powerful test.

Egon Sharpe Pearson(August 11, 1895 –June 12, 1980); Karl Pearson(March 27, 1857 – April 27, 1936); Maria Pearson (née Sharpe); Sigrid Letitia Sharpe Pearson.

In 1901, with Weldon and Galton, Karl Pearson founded the journal Biometrika whose object was the development of statistical theory. He edited this journal until his death.

In 1911, Karl Pearson founded the world's first university statistics department at University College London.

His only son, Egon Pearson, became an eminent statistician himself, establishing the Neyman-Pearson lemma. He succeeded his father as head of the Applied Statistics Department at University College.

Jerzy Neyman (April 16, 1894 – August 5, 1981), is best known for the Neyman-Pearson Lemma. He has also developed the concept of confidence interval (1937), and contributed significantly to the sampling theory.He published many books dealing with experiments and statistics, and devised the way which the FDA tests medicines today.

Jerzy Neyman was also the founder of the Department of Statistics at the University of California, Berkeley, in 1955.

Uniformly Most Powerful Tests

Definition: A uniformly most powerful test in testing a simple null hypothesis versus a composite alternative hypothesis , or in testing a composite null hypothesis versus a composite alternative , is a size test such that this test has the largest possible test power for every simple alternative hypothesis: , among all such tests of size

Example of a uniformly most powerful test:

Let be iid and suppose we want to test versus . For each , the most powerful sizetest of versus rejects the null hypothesis for . Since this same test function is most powerful for each , this test function is UMP.

But suppose we consider the alternative hypothesis . Then there is no UMP test. The most powerful test for each , where , rejects the null hypothesis for , but the most powerful test for each , where , rejects for . Note that the test that rejects the null hypothesis for cannot be most powerful for an alternative by part (c) (necessity) of the Neyman-Pearson Lemma since it is not a likelihood ratio test for .

It is rare for UMP tests to exist. However, for one-sided alternative, they exist in some problems.

Note: The one sided tests that we derived in the normal population model, for µ with known, for µ with unknown, and for with µ unknown are all uniformly most powerful. On the other hand, none of the two-sided tests are uniformly most powerful.

A condition under which UMP tests exists is when the family of distributions being considered possesses a property called monotone likelihood ratio.

Definition:Let be the joint pdf of the sample Then is said to have a monotone likelihood ratio in the statistic if for any choice of parameter values, the likelihood ratio depends on values of the dataXonly through the value of statistic and, in addition, this ratio is a monotone function of .

Theorem (Karlin-Rubin). Suppose that has an increasing monotone likelihood ratio for the statistic . Let and be chosen so that. Then

is the rejection region (*also called ‘critical region’) for a UMP test for the one-sided tests of: versus .

Theorem.For a Regular Exponential Family with only one unknown parameterθ, and a population p.d.f.:

If is a is an increasing function of , then we have a monotone likelihood ratio in the statistic . {*Recall} is also complete sufficient for }

Examples of a family with monotone likelihood ratio:

For iid Exponential (),

For , is an increasing function of so the family has monotone likelihood ratio in .

Karl Pearson (left) and Egon Pearson (right).

Karl Pearson (left) and R. A. Fisher (right).

One-sided LRT for one population mean, and the UMP test, Revisited

Letwhere is known. Derive the likelihood ratio test for the hypothesis , and show whether it is a UMP test or not.

Solution:

Note: Now that we are not just dealing with the two-sided hypothesis, it is important to know that the more general definition of is the set of all unknown parameter values under , while is the set of all unknown parameter values under the union of and .

Therefore we have:

Since there is no free parameter in , .

Reject in favor of if

Furthermore, it is easy to show that for each , the likelihood ratio test of rejects the null hypothesis at the significance level α for

By the Neyman-Pearson Lemma, the likelihood ratio test is also the most powerful test.

Now since for each , the most powerful size α test of rejects the null hypothesis for . This rejection region does not depend on -- it is the same rejection region, hence test, for each . Since this same test is most powerful for each , this test is UMP for , by the definition of the UMP test.

Let where is known

(1)Find the LRT for

(2)Show the test in (1) is a UMP test.

Solution:

(1) The ratio of the likelihood is shown as the following,

is the maximum likelihood under

The ratio of the maximized likelihood is,

By LRT, we reject null hypothesis if . Generally, c is chosen to be less than 1. The rejection region is,

(2)We can use the Karlin-Rubin theorem to show that the LRT is UMP.

First, we need to show the size for the LRT test.

In which,

where z follows standard normal distribution,

is an increasing function of . So

So the above LRT test is a size test where

Next, we prove the family has monotone likelihood ratio (MLR) in . It is straightforward to show that is a sufficient statistics for because we have an exponential family. For any pair , the ratio of the likelihood is:

It is increasing in So the family has monotone likelihood ratio (MLR) in .

Recall the rejection region is in terms of also:

By the Karlin-Rubin theorem, we know that the LRT test is also an UMP test.

Two-sided testfor one population mean -- NO UMP test, Revisited

In the following example there is no UMP test:

Example (Nonexistence of UMP test).

Let X1,…,Xn be iid n(,σ2), σ2 known.

Consider testing H0: = 0 versus H1: ≠ 0. For a specified α, a level α test is any test that satisfies

P(reject H0| = 0) ≤ α.

Test 1: Consider 10. Rejecting H0 if < - σzα/+ 0 has the highest possible power at 1.

Test 2: Consider 20. Rejecting H0 if σzα/+ 0.

Now it can be shown that β2(2) > β1(2). Thus Test 1 is not a UMP level α test because Test 2 has a higher power than Test 1 at 2.

Review:Inference on 2 population means, when both populations are normal and we have two independent samples. Furthermore the population variances are unknown but equal () pooled-variance t-test.

Data:

Goal: Compare and

Pivotal Quantity Approach:

1)Point estimator:

2)Pivotal quantity:

not the PQ, since we don’t know

, , and they are independent ( are independent because these two samples are independent to each other)

Definition:, when , then .

Definition:

t-distribution: , , and Z & W are independent, then

where is thepooled variance.

This is the PQ of the inference on the parameter of interest

3)Confidence Interval for

This is the C.I for

4)Test:

Test statistic:

a) (The most common situation is )

At the significance level α, we reject in favor of iff

If , reject

At significance levelα, reject in favor of iff

If , reject

At α=0.05, reject in favor of iff

If , reject

Data Analysis Example.An experiment was conducted to compare the mean number of tapeworms in the stomachs of sheep that had been treated for worms against the mean number in those that were untreated. A sample of 14 worm-infected lambs was randomly divided into 2 groups. Seven were injected with the drug and the remainders were left untreated. After a 6-month period, the lambs were slaughtered and the following worm counts were recorded:

Drug-treated sheep / 18 43 28 50 16 32 13
Untreated sheep / 40 54 26 63 21 37 39

Assume the both populations are normal, and the population variances are equal, please test at α = 0.05 whether the treatment is effective or not.

SOLUTION:Inference on two population means. Two small and independent samples. Both populations are normal, and the population variances are unknown but equal ().

We perform the pooled-variance t-test with hypotheses versus

Since is greater than , we cannot reject H0. We have insufficient evidence to reject the hypothesis that there is no difference in the mean number of worms in treated and untreatedlambs.

LRT Derivation of the Pooled Variance T-Test, 2-sided

Given that we have two independent random samples from two normal populations with equal but unknown variances. Now we derive the likelihood ratio test for:

Let ,then,

}

, and there are two parameters.

, since it contains two parameters, we take the partial derivatives with and respectively and set the partial derivatives equal to 0. Solving them we have:

, and there are three parameters.

We take the partial derivatives with and respectively and set them all equal to 0. Then solutions are:

Next we take the likelihood ratio. After somesimplifications, we have:

where is the test statistic in the pooled variance t-test.

Therefore, is equivalent to ||. Thus at the significance level α, we reject the null hypothesis in favor of the alternative when c =

LARGE SAMPLE DISTRIBUTION OF THE LR TEST

In general, for LR test, we know that when the sample size goes to infinity, we have

where k is the difference in the number of free parameters under the alternative and the null hypothesis respectively.

Note: In the pooled-variance t-test derivation we have in the previous page:

k = 3 – 2 = 1

Theorem (Asymptotic distribution of the LRT – simple H0) For testing H0: θ = θ0 versus H0: θ ≠ θ0, suppose X1, …, Xn are iid f(x|θ), is the MLE of θ, and f(x|θ) satisfies the regularity conditions. Then under H0, n → ∞,

-2 log λ(X) → in distribution,

where is a χ2 random variable with 1 degree of freedom.

Proof: Expand log L(θ|x)=l(θ|x) in a Taylor series around :

l(θ|x) ≈ l(|x) + l´+l´´=

l(|x)+ l´´, since l´= 0. Now we have

-2 log λ(X) ≈ 2 l(|x) - 2 l(θ0|x) ≈

2 l(|x) - 2 l(|x)- l´´=- l´´= = An×Bn

where An → 1 and →n(0,1)

Example (Poisson LRT)

X1,…,Xn iid from Poisson(λ).

Hypotheses: H0: λ = λ0 versus H1: λ ≠ λ0. We have

-2 log λ(x) = -2 log =2n

Where = Σxi/n is the MLE of λ.

We reject H0 at level α if -2 log λ(x)>

TheoremX1, …, Xn are iid f(x|θ). The distribution of

-2 log λ(X) converges to a chi squared distribution as the sample size n → ∞. The degrees of freedom of the limiting distribution is the difference between the number of free parameters specified by θ ∊ Θ0 and the number of free parameters specified by θ ∊ Θ1.

H0 is rejected if and only if -2 log λ(X) ≥ where ν is the degrees of freedom specified in the above Theorem.

Example (Multinomial LRT)

Let θ = (p1,p2,p3,p4p5) where p1+p2+p3+p4 +p5 = 1.

Let X be a multinomial variable with P(X = j) = pj, j=1,…,5.

Given n independent observations on X the likelihood function is

L(θ|x) = =

Where yj = number of observations equal to j.

H0: p1 = p2 = p3 and p4 = p5 versus H1: H0 is not true.

How many free parameters under H1 ? 4

How many free parameters under H0 ?1

Under H1 we obtain MLE of pj: = yj/n

Under H1 we find: = (y1 + y2 + y3)/(3n) and . Substituting into the likelihood ratio we obtain

-2 log λ(x) = 2 where mi = n.

We reject H0 if

-2 log λ(x) ≥

We thank colleagues who posted their lecture notes on the internet.