252meanx1 10/6/05 (Open this document in 'Outline' view!) Re-edited to replace with .

D. COMPARISON OF TWO SAMPLES

Examples for comparison of means.

or more generally

where

General formulas: Degrees of freedom

a. Confidence Interval: .

b. Test Ratio: .

c. Critical Value: .

The difference between the cases comes down to the choice of t and the formula for . Let us now consider the first four cases.

1. Two Means, Two Independent Samples, Large Samples.

If the total number of degrees of freedom is large (or the two samples come from normally distributed populations with known variances ), then replace t with z and use .

First Example: We wish to test the earnings of retail clerks in New York and Philadelphia for Equality.

or or

Data:

Since are well over 100, we are justified in using a large sample method.

Solutions: Use in place of .

a. Confidence Interval: . Make a diagram showing a Normal curve centered at -30 and a Confidence interval bounded by -34.32 and -25.68. Since is not between them, reject .

b. Test Ratio: . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by and . Since -13.59 is not between them, reject . Since this is actually a value of a p-value would be easy.

c. Critical Value: Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -4.32 and 4.32. Since is not between them, reject .

Second Example: (Whitmore, Netter, Wasserman) We wish to learn if battery type B has a longer service life (in months) than Battery type A . Note that this statement becomes an alternate hypothesis because it does not contain an equality.

or or

Data:

Since are well over 100, we are justified in using a large sample method.

Solutions: Use in place of . This is a 1-sided test.

a. Confidence Interval: Given , use . Make a diagram showing a Normal curve centered at -2.9 and a confidence interval, , represented by shading the area below -1.647. the null hypothesis is represented shading the area above zero. Since is not in the confidence interval, reject .

b. Test Ratio: . Make a diagram showing a Normal curve centered at zero and an 'accept' region above . Since -3.808 is below -1,645, reject .

c. Critical Value: Given , we want a critical value below zero. Use Make a diagram showing a Normal curve centered at zero and a reject' region below -1.253. Since is below -1.253, reject .

2. Two Means, Two Independent Samples, Populations Normally Distributed, Population Variances Assumed Equal.

and , where .

Example: (Whitmore, Netter, Wasserman) Each of two groups of ten men are assigned a razor blade and asked how many shaves they got from a package. We wish to find out if there is a significant difference in the durability of the two blades. Type A will be . type B will be .

or or

Data:

Since are well below 100, we need a small sample method. Because of the similarity of the two samples we assume that . . Because of our assumption about variances, we use a pooled variance

Solutions: Use .

a. Confidence Interval: . Make a diagram showing a Normal curve centered at -28.7 and a Confidence interval bounded by -51.66 and -5.74. Since is not between them, reject .

b. Test Ratio: . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by and . Since -3.596 is not between them, reject . If we want a p-value, we need Since 3.596 lies between and , we double the p-value to

c. Critical Value: Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by -22.96 and 22.96. Since is not between them, reject .

(3. Two Means, Two independent Samples, Populations Normally Distributed, Population Variances not Assumed Equal.

This time the degrees of freedom for must be calculated by the Satterthwaite approximation. The formula is , but the formula for the standard deviation is the same as in method 1, .

Example: We wish to use a 2-sided 95% confidence interval to test for a significant difference between the time it takes an employee to type a page on word processor A and word processor B . 16 pages are typed on each processor.

or or

Data:

Since are well below 100, we would be on very, very shaky ground if we use a large sample method. Even with a small sample method we probably need an assumption of Normality. If we do not want to assume , we need the Satterthwaite method. . To find the standard error of and the number of degrees of freedom, do the following calculations:

, , so ,

. I take the conservative approach of rounding this down to 29 degrees of freedom. Notice how little difference there is between this and . This is because of the near-equality of the sample variances. . This is almost the same result as we would have gotten if we had assumed that , again because of the near-equality of the sample variances.

Solutions: Use for a 2-sided confidence interval. . Make a diagram showing a Normal curve centered at 1.10 and a Confidence interval bounded by -0.37 and 2.57. Since is between them, do not reject .

A more complete version of this problem appears in Problem D3.Look at document 252meanx3 here for a computer example of this method.

4. Two Means, Paired Samples (If samples are small, populations should be normally distributed).

If is the number of pairs of data, then and . In this case , etc.

Example: We have been told that income in a region has risen by $6.00/week over the last year. We interviewed 100 families last year and found an average weekly income of $200. We reinterview the same families and find out their present incomes so that we can compute how much they have risen. We find that the new average income is $204. From the data we compute a standard deviation of the income change of $6.00. We wish to test to see if the $6 rise is believable.

or or

Data:

, , , n=100. Though we may have 200 pieces of data, we have 100 pairs, and the actual numbers we use are the 100 differences in income. and we ought to use .

Solutions: Use .

a. Confidence Interval: . Make a diagram showing a Normal curve centered at -4 and a Confidence interval bounded by -5.19 and -2.81. Since is not between them, reject .

b. Test Ratio: . Make a diagram showing a Normal curve centered at zero and an 'accept' region bounded by and . Since 3.333 is not between them, reject . If we want a p-value, we need Since 3.333 lies above , we double the implied p-value to

c. Critical Value: Make a diagram showing a Normal curve centered at -6 and an 'accept' region bounded by -7.12 and -4.81. Since is not between them, reject .

Note: We might have been better off in this problem defining as . Then our hypotheses would read and . We would say and our critical value, for example, would be The conclusion would not change.

© 2002 Roger Even Bove