252solnE2 10/25/05 (Open this document in 'Page Layout' view!)

E. CHI-SQUARED AND RELATED TESTS.

1. Tests of Homogeneity and Independence

Text 12.18, 12.19 - 21, 12.26 [12.23*, 12.24, 12.27] (12.22, 12.27) E1, E2, E3

2. Tests of Goodness of Fit

Text 12.51, 12.54 [12.49*, 12.52*. Both on CD12_5], E4, E5, E6

a. Uniform Distribution

b. Poisson Distribution

c. Normal Distribution

3. Kolmogorov-Smirnov Test

E7, E8, E9, E10, E11

a. Kolmogorov-Smirnov One-Sample Test

b. Lilliefors Test.

This document includes Exercises 12.49, 12.27 and Problems E4-E6.

------

Problems Involving Tests of Goodness of Fit.

Exercise 12.51 [12.49 in 9th] (On CD not in 8th edition): The manager of a computer network has the following data on service interruptions per day over the last 500 days. Does it follow a Poisson distribution?

Solution:First find a mean for the distribution.Our first step would be to find a mean for the distribution.

1

0 / 160 / 0
1 / 175 / 175
2 / 86 / 172
3 / 41 / 123
4 / 18 / 72
5 / 12 / 60
6 / 8 / 48
500 / 650

This would give us

We would then have to use a Poisson table with a mean of 1.3.

0 / .272532 / 136.27 / 160 / 187.862
1 / .354291 / 177.15 / 175 / 172.876
2 / .230289 / 115.14 / 86 / 64.235
3 / .099792 / 49.90 / 41 / 33.687
4 / .032432 / 16.22 / 18 / 19.975
5 / .008432 / 4.22 / 12 / 34.123
6 / .002230 / 1.11 / 8 / 57.658
1.0000 / 500.01 / 500 / 570.414

1

We have only 5degrees of freedom since we lost one degree of freedom because we estimated the mean from the data. . Since our computed chi-square of 570.414 – 500 = 70.414 is above the

252solnE2 10/30/03

table chi-square, we reject the null hypothesis. However, the small values of in the last two rows make me suspicious and I will compute and

These account for more than all the difference between the table value and the computed value, so let’s merge the last two rows and try again.

0 / .272532 / 136.27 / 160 / 187.862
1 / .354291 / 177.15 / 175 / 172.876
2 / .230289 / 115.14 / 86 / 64.235
3 / .099792 / 49.90 / 41 / 33.687
4 / .032432 / 16.22 / 18 / 19.975
5+ / .010732 / 5.33 / 20 / 75.047
1.0000 / 500.01 / 500 / 553.680

.Since our computed chi-square of 553.680 – 500 = 53.680 is above the table chi square, we reject the null hypothesis.

Exercise 12.54 [12.52 in 9th] (On CD not in 8th edition): A random sample of 500 car batteries revealed the following distribution of battery life in years. If and does it follow a Normal distribution?

Solution:Let’s try to get the probabilities, by subtracting from the mean and dividing by to get . Then get from the Normal table.

Income in Thousands / / / / / / /
Under 1 / 1 / -1.86 / 0.0314 / 0.0314 / 15.70 / 12 / 9.171
1 – 2 / 2 / -0.82 / 0.2061 / 0.1747 / 87.35 / 94 / 101.156
2 – 3 / 3 / 0.21 / 0.5832 / 0.3771 / 188.55 / 170 / 153.275
3 – 4 / 4 / 1.24 / 0.8925 / 0.3093 / 154.65 / 188 / 228.542
4 – 5 / 5 / 2.27 / 0.9884 / 0.0959 / 47.95 / 28 / 16.350
Over 5 / / / 1.0000 / 0.0116 / 5.80 / 8 / 11.034
Sum / 500.00 / 500 / 519.528

We have only 6 – 1 – 2 = 3 degrees of freedom since we lost two degrees of freedom because we estimated the mean and variance from the data. . Since our computed chi-square of 519.528 – 500 = 19.528 is above the table value, reject the null Hypothesis.

252solnE2 10/30/03

Problem E4: Check to see if the following 1000 tax payments come from the distribution N(25000, 10000).
Amount in thousands Number
Below 10 40
10-15 60

15-20 170

20-25 140

25-30 180

30-35 210

35-40 80

Above 40 120

Solution:

1

Income in Thousands / / / / /
Below 10 / 10 / -1.5 / 0.0668 / 0.0668 / 66.8
10-15 / 15 / -1.0 / 0.1587 / 0.0919 / 91.9
15-20 / 20 / -0.5 / 0.3085 / 0.1498 / 149.8
20-25 / 25 / 0.0 / 0.5000 / 0.1915 / 191.5
25-30 / 30 / 0.5 / 0.6915 / 0.1915 / 191.5
30-35 / 35 / 1.0 / 0.8413 / 0.1498 / 149.8
35-40 / 40 / 1.5 / 0.9332 / 0.0919 / 91.9
40 up / / / 1.0000 / 0.0668 / 66.8

To find , write the top of each interval in the column. Then find .

is the cumulative distribution, for example, .1587 is . The column can be found by looking up the value for in

1

the normal table and either adding the table value to .5 (if is positive) or subtracting the table value from .5 (if is negative).

Since there are eight cells . From the chi-squared table , but the we compute is 107.1918, so we reject . Note that this problem could be done using a Kolmogorov-Smirnov test.

Income in Thousands / / / / /
Below 10 / 66.8 / 40 / 26.8 / 718.24 / 10.7521
10-15 / 91.9 / 60 / 31.9 / 1017.61 / 11.0730
15-20 / 149.8 / 170 / -20.2 / 408.04 / 2.7239
20-25 / 191.5 / 140 / 51.5 / 2652.25 / 13.8499
25-30 / 191.5 / 180 / 11.5 / 132.25 / 0.6906
30-35 / 149.8 / 210 / -60.2 / 3624.04 / 24.1925
35-40 / 91.9 / 80 / 11.9 / 141.61 / 1.5409
40 up / 66.8 / 120 / -53.2 / 2830.24 / 42.3689
1000.0 / 1000 / 0.0 / 107.1918

252solnE2 10/30/03

Problem E5: See if Frunzi earthquakes fit a Poisson distribution with parameter of 1.

Earthquakes Per Day Number of Days Observed

0 25

1 17

2 5

3 2

4 or more 1

Solution: ,. comes from the Poisson table.

1

0 / .3679 / 18.395 / 25
1 / .3679 / 18.395 / 17
2 / .1839 / 9.195 / 5
3 / .0613 / 3.965 / 2
4+ / .0190 / 0.950 / 1
1.0000 / 50.000 / 50

Due to the small size of the E’s in the last two cells we must merge the last three cells to get the table at right.

0 / 18.395 / 25 / 33.977
1 / 18.395 / 17 / 15.711
2+ / 13.210 / 8 / 4.845
50.000 / 50 / 54.533

. There are only three cells now.. From the chi-squared table .

1

So we do not reject . Note that this problem could be done using a Kolmogorov-Smirnov test.

Note: What if our was simply ?

1

Our first step would be to find a mean for the distribution.

0 / 25 / 0
1 / 17 / 17
2 / 5 / 10
3 / 2 / 6
4+ / 1 / 4
50 / 37

This would give us

We would then have to use a Poisson table with a mean of 0.7 unless a computer was available to generate a Poisson table with a mean of. 0.74. In

any case we would have lost a degree of freedom from estimating the mean. For example, using Minitab to generate the 0.74 table we would get:

0 / .4771 / 23.86
1 / .3531 / 17.66
2 / .1305 / 6.53
3 / .0360 / 1.61
4 / .0060 / 0.30
5 / .0009 / 0.05
6 / .0001 / 0.01

But, once again we would have to merge our lowest cells, so that the last line would be 3+ with an E of 8.51. Once again we would have only three cells, but, because of our estimation of the mean, .

1

252solnE2 10/30/03

Problem E6: Check to see if the earthquakes in Frunzi fit a Poisson Distribution

Earthquakes Per Day Number of Days Observed

0 40

1 45

2 7

3 4

4 2

5 0

6 1

7 1

Solution:

1

The table at right shows our calculation of the mean for the sample as was done in the previous problem.

When we find that the mean is , we use the Poisson table with a mean of 0.9, but find values of E that are too small. To get a count of 5,we must merge the last 5 rows.

0 / 40 / 40.66 / 39.351
1 / 45 / 36.59 / 55.343
2 / 7 / 16.47 / 2.975
3+ / 8 / 6.28 / 10.191
100 / 100.00 / 107.860
0 / 40 / 0 / .4066 / 40.66
1 / 45 / 45 / .3659 / 36.59
2 / 7 / 14 / .1647 / 16.47
3 / 4 / 12 / .0494 / 4.94
4 / 2 / 8 / .0111 / 1.11
5 / 0 / 0 / .0020 / 0.20
6 / 1 / 6 / .0003 / 0.03
7 / 1 / 7 / .0000 / 0.00
100 / 92 / 100.00

1

There are only four cells now.. From the chi-squared table . So we reject .

252solnE2 10/30/03

Exercise 17.9 from McClave et. al. :(Not assigned but left in because it’s a good example of a test for uniformity) This problem read that the 1997 Equifax/Harris Consumer Privacy Survey asked 128 Internet users to indicate the level of their agreement with the statement "The government needs to be able to scan Internet messages and user communications to prevent fraud and other crimes." Of the respondents 59 agreed strongly, 108 agreed somewhat, 82 disagreed somewhat and 79 disagreed strongly.

a) specify null and alternate hypothesis you would use to determine if the opinions of Internet users are evenly divided among the four categories.

b) Test the hypothesis in a) using

c) In the context of this exercise what are Type I and Type II errors?

Solution: a) or where is the probability of being in category i..

b) . Since there are 4 categories, each must be . We divide 328 into 4 equal parts of 82.

Row

1 59 82 42.4512

2 108 82 142.2439

3 82 82 82.0000

4 79 82 76.1098

Total 328 328 342.8049

.

. Since the computed is less than the table , do not reject the null hypothesis.

c) Type I error: Wrongly reject uniformity.

Type II error: wrongly fail to reject uniformity.

1