Testing two non-normal population means

(The Matched Pairs Experiment)

For two matched pairs dependent samples drawn from non-normal populationswe present the Wilcoxon signed rank sum test. Generally, for this test we work on the differences between the matched observations.

The statistic T = the sum of the absolute values of either the ranks of positive differences (denoted T+) or the negative differences (T-).

H0: The two populations have the same mean

H1: The two populations do not have the same mean, or,

One of the means is greater than the other.

For small samples where n30 we use Wilcoxon tables. For larger samples we can use a Z test as will be presented later.

The small sample case (n30):

The critical value of T can be determined from Wilcoxon tables that provide the values TL and TU for which P(TTL) = P(TTU) = , (the only values of considered are .025 and .05). An excerpt of these tables is provided here. The complete table can be found at the bottom of this note.

Part (a)Part (b)

 = .025 one Tail = .05 One tail

= .05 Two Tail = .10 Two tail

nTLTUTLTU

6120219

7226424

......

30137328152313

Part (a) is used for one tail test with  = .025 or two tail test with  = .05. Similarly, Part (b) is used for one tail test with  = .05 or two tail test with  = .10. For example, for a left hand tail test at 5% significance level, if the sample size is n= 7 for both samples (don’t forget the samples sizes are equal by definition for the matched pairs experiment), the null hypothesis is rejected when T4.

The large samples case (n > 30):

For large sample we can build a statistic T that is approximately normally distributed in a similar fashion to the one used for the parametric test. The statistic is the sum of signed ranked differences of the paired observations. If the two populations tested have the same location, the sum of ranks for the positive differences and negative differences should be the same (with opposite signs) and therefore the sum of all the signed ranks should be close to zero. Consequently, the null hypothesis assigns the value zero to the expected value of the signed sum of ranks ((T)=0). The alternative hypothesis, as usualreflects the particular application ((T)>0, (T)<0, or (T)0). The standard deviation is calculated by and the test statistic Z becomes.

Under the null hypothesis the Z statistic is.

Example 4

A tasting panel of 15 people is asked to rate two new kinds of tea on a scale ranging from 0 to 100; to help the panel some of the rating scores were given the following meaning:

0 – awful;

25 – I would try to finish it only to be polite;

50 – I would drink it but not buy it;

75 – It’s about as good as any tea I know;

100 – Superb; I would drink nothing else;

Test whether there are significant differences between the rating distributions for the two types of tea. Use  = .05.

Solution

Since the data appears on a scale, it is not normally distributed. Each person provided two scores (to overcome natural rating variability among different individuals) so we have a matched pairs experiment. Since the samples are smaller than 30 observations, we need to turn to Wilcoxon signed rank sum test.

We need to test the following hypotheses:

H0: The two distributions have the same location

H1: The two distributions have different locations

The rating these people provided are:

P e r s o n

123456789101112131415
Tea S854075814250 601565406040657580

Tea J655043652065353860476043536163

Differ.+20-10+32+16+22-15+25-23+5-70-3+12+14+17

Rank+10-4+14+8+11-7+13-12+2-3X-1+5+6+9

Comment: Since the zero difference of pair 11 is discarded, n = 14 (not 15)!

Adding all the positive and negative signed ranks we have:
T+ = 10 + 14 + 8 + … + 9 = 78; T– = –4 – 7 – 12 – 3 – 1 = -27

Let us select T = T+ = 78. For = .05 and n = 14 the two-tail critical values are TL = 21, and TU = 84 (see the attached table below). Since T falls between these two values, there is no significant difference between the locations of the two populations at 5% significance level, thus people show preference toward neither tea.

Comment: we could use T = |T-| = 27 with the same conclusion.

Example5

Let us repeat Example 3, changing the manner the samples were taken. Last year

14 women were asked to report the number of hours they were busy performing housework weekly. Same women were asked this year to answer the same question.

(a)Can we conclude at 5% significance level that women as a group are doing less housework this year? See data and solution below.

(b)Repeat the question at 1% significance level, if 40 women answered the same question last year and this year. See data in the Wilcoxon- the “Matched Pairs” spreadsheet:

Solution

(a)Let us assume we checked for normality, and realized the differences are not normally distributed. We turn to the Wilcoxon signed rank sum test because pairs of observations are matched, since each woman responded twice (last year and this year).

The following is a summary of the data and of the Wilcoxon procedure: See explanations below:

ABCDEFG

This Year / Last year / Diff / |Diff| / Ranks / Rank |Diff(+)| / Rank |Diff(-)|
0 / 0 / 0 / 0
4 / 6 / -2 / 2 / 8 / 8
4 / 7 / -3 / 3 / 10.5 / 10.5
7 / 8 / -1 / 1 / 3.5 / 3.5
0 / 1 / -1 / 1 / 3.5 / 3.5
9 / 7 / 2 / 2 / 8 / 8
10 / 11 / -1 / 1 / 3.5 / 3.5
8 / 11 / -3 / 3 / 10.5 / 10.5
0 / 1 / -1 / 1 / 3.5 / 3.5
8 / 10 / -2 / 2 / 8 / 8
3 / 4 / -1 / 1 / 3.5 / 3.5
7 / 7 / 0 / 0
2 / 2 / 0 / 0
8 / 7 / 1 / 1 / 3.5 / 3.5

Explanations:

Columns A and B show the raw data of hours spent performing housework (two records for each woman).

Column C is the difference between the observations per pair.

Column D is the absolute value of the differences.

In column E the absolute values of the differences are ranked(zeros are ignored and tiesare broken by assigning average rank as before). To clarify, there are 6 pairs whose difference is 1. Their rank runs from 1 through 6, therefore their common average rank is (1+6)/2 = 3.5. The next three valuesof 2 should receive the ranks7, 8, 9 which reduces to the average of 8; similarly, the values of 3 receive the average rank of 10.5 [(10+11)/2=10.5].

In column F and G the ranks of the positive and negative differences are separated from one another.

Now the sum for each of columns F and G can be calculated. Only one of these sums is needed for the test procedure. Let us take the sum of the positive differences as our statistic.

T+ = 8 + 3.5 = 11.5

The Hypotheses:

H0: The two populations are the same

H1: The population of last year is located to the right of the population of this year

The rejection region:Since the differences are calculated as “Time this year” minus “Time last year”, for H1 to be favored the absolute sum T- needs to exceed the sum T+. Thuswe need to show that T+ is sufficiently small! (See H1). The rejection region is therefore TTL.

From the Wilcoxon table for n = 11(sample of 14 observations minus 3 difference values of zero) and  = .05, we find that TL = 14.

Conclusion: since 11.5 < 14, there is sufficient evidence at 5% significance level to reject the null hypothesis in favor of the alternative hypothesis, and infer that women on the average (thus, as a group) are doing less housework this year than last year.

(b) Repeating the same question, this time with samples of size 40 we can use the standard normal approximation. From the data in the file Wilcoxon- the “Matched Pairs” spreadsheetwe have:

T= / -423
E(T)= / 0
(T)= / 148.79516
Z= / -2.842834

Since T represents the ranks of the “Time this year” minus “Time last year” we want to prove that this sum is negative (there are larger negative differences than positive differences), the rejection region is Z<-Z.01 or Z<-2.33.

Conclusion: Since -2.84283 < -2.33, the null hypothesis is rejected at 1% significance Thus, there is sufficient evidence to infer that women this year as a group perform less housework than last year.

Wilcoxon Signed Rank Sum Test Table