Statistical Computing and Simulation

Spring 2005

Assignment 5, Due May 25/2005

Using simulation to construct critical values of the Mann-Whitney-Wilcoxon test in the case that, where and are the number of observations in two populations. (Note: The number of replications shall be at least 10,000.)

 Here we assume that the observations are random samples drawn from {1,2,,} without replacement. Based on 10,000 simulation runs each, the following table shows the critical value of for :

=2 / 3 / 4 / 5 / 6 / 7 / 8 / 9 / 10
=2 / 3 / 3 / 3 / 3 / 3 / 3 / 4 / 4 / 4
3 / 6 / 6 / 6 / 7 / 8 / 8 / 9 / 9 / 10
4 / 10 / 10 / 11 / 12 / 13 / 14 / 14(15) / 16(15) / 16
5 / 15 / 16 / 17 / 18 / 19 / 21 / 22 / 23 / 24
6 / 21 / 23 / 24 / 25 / 27 / 28 / 30 / 31(32) / 33
7 / 28 / 30 / 32 / 34 / 35 / 37 / 39 / 41 / 43
8 / 37 / 39 / 41 / 43 / 45 / 47 / 49(5) / 52 / 54
9 / 46 / 48 / 51(50) / 53 / 56 / 58 / 61 / 63 / 66
10 / 56 / 58(59) / 61 / 64 / 67 / 70 / 73 / 76 / 79

Note: This table is not perfectly symmetric due to simulation errors. The Mann-Whitney-Wilcoxon test is to count the sum of ranks for the first samples, say T1 and let T2=n1(n1+n2+1)-T1. Then compute the minimum of T1 and T2, andreject if the testing statistic is too small. (Check “Statistics for Business and Economics” by Anderson, Sweeney, and Williams, 8th edition, 2002.) The following is my program code:

# m & n --> Sample size, k --> Simulation runs, p --> pth percentile

mann.whitney=function(m,n,k,p) {

temp=NULL

for (i in 1:k) {

x=runif(m)

y=runif(n)

z=c(x,y)

w=rank(z)

T1=sum(w[c(1:length(x))])

T2=m*(m+n+1)-T1

temp=c(temp,min(T1,T2))

}

temp=sort(temp)

return(temp[floor(k*p)])

}

Write a small program to perform the “Permutation test” and test your result on Good’s variance test, based on the 10 observations introduced in class.

 There are several ways for finding all possible combinations and the following is one of them:

tt=NULL

for (i in 1:10000) {

a=sample(1:8,4,F)

tt=cbind(tt,a)

}

tt=apply(tt,2,sort)

t1=tt

a=dim(t1)[2]

for (i in 1:70) {

a=dim(t1)[2]

b=t1[,i]

tb=(t1[,-c(1:i)]==b)^1

tc=apply(tb,2,sum)

td=c((i+1):a)*(tc-3)

td=td[td>0]

t1=t1[,-c(td)]

}

As expected, 4 out of 70 possible combinations have difference no smaller than 6.5, which is equivalent to p-value 0.0571. (Of course, you can use four loops to calculate all possible combinations. But I do not include the program code here and you shall do this by yourselves.)

Similar to what Efron did in the Law school data example, compute the bootstrap simulation for 10,000 and 20,000 replications and compare the difference between the sample and population. (Note: If you have time, you could also try to compare the results from the number of replications being small, say 50, to being large, say 10,000. Then see if the bootstrap variance converges.)

 Instead of using the original sample observations, we draw a sample of 15 pairs of observations randomly. The following graphs show the scatter plots of population and sample observations, and it seems that the sample looks like the population. (The correlation coefficients between LSAT and GPA are 0.7609 and 0.7614 for the population and sample, respectively.)

I first show if the bootstrap variance of the sample converges as the number of bootstrap simulation runs increases.

B / 25 / 50 / 100 / 200 / 400 / 800 / 1600 / 3200
s.e. / .0703 / .0908 / .0867 / .0909 / .0884 / .0927 / .0954 / .0936

It seems that the bootstrap standard error converges to a value close to 0.09. The following histograms also confirm this result.

If d is the minimum distance between any pair of n uniformly distributed points from a unit square, then provided that n is sufficiently large. Using R or S-Plus to check this result: First, write a function to produce n points uniformly distributed from the unit square. Then, write a function to calculate the smallest distance between any pair of n points. Change the value of n, perform simulation, and comment on what you find. (You could also consider the “testing”, such as “ks.gof” to confirm what you find.)

 Based on 10,000 simulation runs, the case n=20 is tested and the result from ks.gof cannot reject that , with p-value = 0.696. Note the mean of from simulation is 0.6227, very close to The following is the program code:

tt=NULL

for (j in 1:10000){

x=runif(20)

y=runif(20)

temp=NULL

for (i in 1:(length(x)-1) ) {

temp=c(temp,sqrt( (x[i]-x[-c(1:i)])^2+(y[i]-y[-c(1:i)])^2 ) )

}

z=min(temp)

tt=c(tt,z)

}

tt=tt*tt*20*19

ks.gof(tt,dist="exponential")

To compare teaching, twenty schoolchildren were divided into two groups: ten taught by conventional methods and ten taught by an entirely new approach. The following are the test results:

Conventional / 65 / 79 / 90 / 75 / 61 / 85 / 98 / 80 / 97 / 75
New / 90 / 98 / 73 / 79 / 84 / 81 / 98 / 90 / 83 / 88

Are the two teaching methods equivalent in result? You need to use permutation test, bootstrap, and parametric test, and then compare their differences in testing.

I. Two-sample test:

 The observed difference between two groups is –59. The result of t-test shows that p-value = 0.2212. The (two-sample) bootstrap standard error estimate of group difference is approximately 13.96. In other words the testing statistics of bootstrap simulation (10,000 replications) is , a significant difference. For the permutation test, since there are 184,756 possible combinations (too large!), we use the idea of “Monte Carlo p-value” to check if there is significant difference. I found 1101 times out of 10,000runs with difference smaller than the observed difference 59, not significant.

II. Paired test:

 The result of paired t-test shows that p-value = 0.2371. The bootstrap standard error estimate of group difference is approximately 44.23 and so the testing statistics is (10,000 replications) , also not a significant difference. The permutation test can be treated as multiplying 1/1 on the observed values. Among 1024 possible combinations, only 122 possibilities with difference no less than the observed value -59, i.e., p-value = 124/1024  0.119. The following is the program code for listing all combinations, using the idea similar in experimental design:

ind=NULL

for (i in 1:10)

{

temp=rep(c(rep(1,2^(10-i)),rep(-1,2^(10-i))),2^(i-1))

ind=rbind(ind,temp)

}

x=c(-25,-19,17,-4,-23,4,0,-10,14,-13)

y=x%*%ind

sum(y<=sum(x))