Nonparametric Statistics or Distribution-free Statistics
Previously, we often used the assumption that our samples were drawn from normally distributed populations. This chapter introduces techniques that do not make these assumptions. These methods are called Distribution-free Statistics or Nonparametric Statistics. In situations where the normal assumption holds, the nonparametric tests are less efficient than our traditional parametric methods. However, when the normal assumption is not valid, the nonparametric methods are more appropriate.
In this section, we consider four nonparametric tests: (1) the Wilcoxon Rank Sum Test or Mann-Whitney U Test, (2) the Wilcoxon Signed Rank Test, (3) the Kruskal-Wallis test and (4) the one sample test of runs.
Wilcoxon Rank Sum Test or Mann-Whitney U Test:
This technique tests whether the medians of 2 populations are the same, when the 2 samples are independent of each other. This test is comparable to the parametric t-test on the difference between 2 means that we considered previously.
Technique: Merge and rank the observations. Find the sum of the ranks R1 and R2 for each of the 2 samples. Compute the T1 statistic (based on the smaller of the 2 samples, or either sample if same sizes) and its mean and standard deviation as described below.
If the 2 sample sizes are each greater than 10, then U is approximately normal and we can standardize to get Z = (T1 - T1)/T1. Then if Z is statistically different from zero, we conclude that the medians are not the same.
Wilcoxon Signed Rank Test:
This technique tests whether the medians of 2 populations are the same, when the 2 samples are not independent of each other. This test is comparable to the parametric matched-pairs test that we considered previously.
Technique: Calculate the differences between the two sample values for each pair of observations. Drop the zero values. Of the n non-zero values, rank the absolute values of the differences. Sum the ranks of the positive and negative differences separately. Let W be the sum of the positive ranks. The mean and standard deviation of W are and . If the number of non-zero differences is at least 20, then W is approximately normal and we can standardize to get . If Z is statistically different from zero, we conclude that the medians are not the same.
Kruskal-Wallis test:
This technique tests the null hypothesis that several populations have the same median. It is the nonparametric equivalent of the one-factor ANOVA. The test statistic is
where nj is the number of observations in the jth sample, n is the total number of observations, and Rj is the sum of the ranks for the jth sample. If each nj is at least 5 and the null hypothesis is true, then the distribution of K is 2 with c-1 degrees of freedom, where c is the number of sample groups. If testing at the level α and K is in the α-tail, then we conclude that the medians are not the same.
In the case of ties, a corrected statistic Kc should be computed.
where tj is the number of ties in the jth sample.
One sample test of runs:
This technique tests for randomness of order of occurrence.
A run is a sequence of identical occurrences that are followed and preceded by different occurrences. Count the number of runs r. If the order is random, the mean of r is
and the standard deviation of r is
.
If either n1 or n2 is greater than 20, then r is approximately normally distributed and we can standardize to get Z = (r - r)/r. Then if Z is statistically different from zero, we conclude that the occurrence is not random.