10.13: One- and Two-Sample Tests Concerning Variances

10.13-10.14-1

10.13: One- and Two-Sample Tests Concerning Variances

In Section 9.12, we examined how to construct a C.I. for 2. Now, we are going to formalize the discussion of making decisions about 2 using hypothesis tests.

Section 9.13, which was not supposed to be covered, discusses how to find C.I.s for (ratio of two variances). Section 10.13 also discusses with relation to hypothesis testing. Since the C.I.s part was not discussed in class, we will not discuss the hypothesis testing part here.

Example: Quality Control and Hand Grenades (hand_grenades.xls from Chapter 9)

A particular kind of hand grenade has an average explosion time of 5 seconds after the pin is pulled. The manufacturer claims that the standard deviation is 0.75 seconds. To test this claim, a random sample of 10 grenades is taken. Each grenade’s pin is pulled and the number of seconds until an explosion is recorded.

Grenade / Explosion Time
1 / 4.7
2 / 5.1
3 / 3.0
4 / 5.2
5 / 5.3
6 / 4.8
7 / 5.0
8 / 4.9
9 / 4.8
10 / 5.1

To formally test the manufacturer’s claim, we can perform a hypothesis test! Let’s use =0.10.

1) Ho: 2 = 0.752 = 0.5625

Ha: 2  0.5625

2) The 90% C.I. for 2 is:

3) Since 0.5625 is within the interval, do not reject Ho.

4) There is not sufficient evidence to prove the manufacturer’s claim about the explosion time standard deviation is incorrect.

Of course, you could also do the hypothesis test using the test statistic and p-value methods.

Test statistic: Suppose our null hypothesis is:

Ho:2 = or Ho:2  or Ho:2  .

where is the hypothesized value of 2. From Chapter 9, we saw that

provided the random sample came from a population with a normal PDF. Thus, the test statistic to be used here replaces the random variables with their observed values for what is in the middle of the inequality above:

Question: What are the critical values?

P-value: Suppose X2 is a chi-square random variable with n-1 degrees of freedom. Then the p-value is

2min[P(X2>2), P(X2<2)]

for a two-tail test. The reason for the different looking way to find the p-value is remember that the p-value is calculated as the probability of being “extreme” from what is expected. Since the chi-square PDF is skewed (non-symmetric), we need to look first to see where the test statistic is the closest to the most “extreme” part of the PDF.

Example: Quality Control and Hand Grenades (hand_grenades.xls)

Hypothesis test using the test statistic method:

1) Ho: 2 = 0.5625

Ha: 2  0.5625

3) and

Since 3.3251 < 6.9138 < 16.9190, do not reject Ho.

5) There is not sufficient evidence to prove the manufacturer’s claim about the explosion time standard deviation is incorrect.

Hypothesis test using the p-value method:

1) Ho: 2 = 0.5625
Ha: 2  0.5625

2) 2min[P(X2>2), P(X2<2)] = 2min[0.6461, 0.3539] = 0.7078.

3)  = 0.10

4) Since 0.7078 > 0.10, do not reject Ho.

5) There is not sufficient evidence to prove the manufacturer’s claim about the explosion time standard deviation is incorrect.

Below are Excel calculations from hand_grenades.xls:

Suppose the company’s claim was changed to this:

The manufacturer claims that the standard deviation is less than 0.75 seconds.

This may be more realistic. What would really concern a person using the grenade is if the standard deviation was actually larger than the claimed 0.75. Why do you think this would be true?

What would change in the hypothesis test?

Hypothesis test using the test statistic method:

1) Ho: 2  0.5625

Ha: 2 > 0.5625

Since 6.9138 < 14.6837, do not reject Ho.

5) There is not sufficient evidence to prove the manufacturer’s claim about the explosion time standard deviation (i.e., 0.75) is incorrect.

10.14: Goodness-of-Fit Test

In a “bi”nomial experiment, there are two possible outcomes (success or failure) for each trial where p is the probability of a success. Sections 9.10 and 10.11 focused on estimation and hypothesis testing for p.

In a “Multi”nomial experiment, there are k possible outcomes for each trial of an experiment. To help explain this type of an experiment, an example will be used.

Example: M&M’s

Source:

Suppose you want to determine the color percentages for Plain M&M’s. Take a random sample of Plain M&M’s and record the color for each candy drawn.

Characteristics of a multinomial experiment:

1) The experiment has n identical trails

Sample n=22 M&M candies

2) k possible outcomes for each trial.

There are k=6 colors (brown, yellow, red, blue, orange, and green) possible for each trial (sample one candy).

3) Probabilities of the k outcomes are denoted by p1, p2, …, pk where p1+p2+…+pk=1

The probabilities for selecting a particular color in a trial are denoted by: pBrown, pYellow, pRed, pBlue, pOrange, and pGreen and pBrown + pYellow + pRed + pBlue + pOrange + pGreen = 1.

M&M’s makes the following claims for their Plain M&M’s colors (

Color / Percent
Brown / / 13%
Yellow / 14%
Red / 13%
Blue / 24%
Orange / 20%
Green / 16%

These color percents (probabilities) have changed since I originally started using this example in classes! Also, the other types of M&M’s have other color probabilities.

For Plain M&M’s, a hypothesis test of the form:

Ho: pBrown = 0.13, pYellow = 0.14, pRed = 0.13,

pBlue = 0.24, pOrange = 0.20, and pGreen = 0.16

Ha: At least two p’s differ from their specified

values in Ho

can be conducted to investigate M&M’s claimed color probabilities.

Example: Worker injuries

Does the propensity for worker injuries depend on the length of time that a worker has been on the job? An analysis of 714 worker injuries by one manufacturer gave the results shown in the table for the distribution of injuries over eight 1-hour time periods per shift.

Hours on shift / 1 / 2 / 3 / 4 / 5 / 6 / 7 / 8
# of accidents / 93 / 71 / 79 / 72 / 98 / 89 / 102 / 110

If the length of time did not matter, then p1=1/8=, p2=1/8, …, p8=1/8. The form of the hypothesis test is:

Ho: p1=1/8, p2=1/8, …, p8=1/8

Ha: At least two p’s differ from their specified

values in Ho

These hypothesis testing situations are examples of where confidence intervals can not be used to perform a hypothesis test (what would you construct the confidence interval for?). However, the test statistic and p-value methods can both be used.

Test statistic method:

1) Ho: p1=__, p2=__, …, pk=__

Ha: At least two p’s differ from their specified values in

2) Calculate the test statistic

where

k = # of categories

oi = observed frequency for category i and

ei = expected frequency for category i

npi

n = sample size

Note: oi – ei measures how far the observed frequency is from what is expected if Ho is true.

3) State the critical value

This will always be a one-tail test with a “right-side” critical value of .

4) Decide whether or not to reject Ho

5) State a conclusion in terms of the problem

Reject Ho: There is sufficient evidence to show that at least two p’s are not what is specified under Ho.

Don’t reject Ho: There is not sufficient evidence against the specified p’s in Ho.

where “p’s” mean to insert what p is in terms of the problem.

p-value method:

1) State Ho and Ha

2) Calculate the p-value as P(X2 > )

3) State 

4) Decide whether or not to reject Ho

5) State a conclusion in terms of the problem

Notes:

This hypothesis test is commonly called a goodness-of-fit test since the test determines how “good” the hypothesized value of p1, p2, …, pk “fit” the situation of interest.
The random variable version of the test statistic is . Its PDF can be approximated well by a chi-square PDF with  = k-1 degrees of freedom as long as there are no categories with a small expected number. Often, a rule of thumb of ei5 is used to decide that the chi-square PDF is appropriate.

Example: M&M’s (M&Ms_test.xls)

Can M&M’s claimed color probabilities for their Plain M&M’s candies be disproved? Conduct a hypothesis to determine this using =0.05.

Assume one bag of Plain M&M’s is a random sample and the ei  5 frequency is not needed (These are not good assumptions, but it allows the experiment to be carried out for this example).

Table of observed and expected frequencies for the sample of size n=22:

Color / pi / ei / oi
Brown / / 13% / 220.13=2.86 / 8
Yellow / 14% / 3.08 / 7
Red / 13% / 2.86 / 2
Blue / 24% / 5.28 / 1
Orange / 20% / 4.4 / 3
Green / 16% / 3.52 / 1

1) Ho: pBrown = 0.13, pYellow = 0.14, pRed = 0.13,
pBlue = 0.24, pOrange = 0.20, and pGreen = 0.16

Ha: At least two p’s differ from their specified values in

3) = = 11.07

4) Since 20.20 > 11.07, reject Ho

5) There is sufficient evidence against M&M’s color

probabilities in Ho.

Note: p-value = P(X2 > ) = P(X2 > 20.20) =0.0011.

Below are the Excel calculations from M&Ms_test.xls:

Notes:

Changes to the above file will need to be made for hypothesis tests that include more or less categories.
Excel does not have an analysis tool to perform the goodness-of-fit test. However, there is an Excel function which may work - the CHITEST(oi range, ei range) function. However, according to formulas given in the Excel help for the degrees of freedom, it should not work for this problem! The function is actually set up for Section 10.15’s hypothesis test.
Here’s a review of what it mean for to have a 2 distribution and its relation to the hypothesis test:

Assume Ho is true. Take a sample from the population and calculate 2. Take another sample (same size) from the population and calculate 2 again. Repeat this process an infinite number of times. Then the shape of the histogram of all these 2 values would look like a chi-square PDF.

This happens as long as the assumptions behind the hypothesis test is satisfied.

Since we know that these 2 values have this property about themselves, we know what range the 2 values should fall within if Ho is really true. This range is the “don’t reject Ho” region of the 2 distribution. Outside of the “don’t reject Ho” is the region in which a very small amount of the 2 values would fall. This unlikely region is called the reject Ho region of a 2 distribution. The boundary between the regions is the critical value.

Only one 2 value is calculated in practice (the test statistic), but we still know what regions of the chi-square PDF are likely to have this one 2 value in it IF Ho IS TRUE. If the one 2 value is outside this range (reject Ho region), then its not very likely that Ho is really true.

 2005 Christopher R. Bilder