# Trtm=C(Rep( C ,4),Rep( P1 ,6),Rep( P2 ,5),Rep( P3 ,6))

STT 430/530 R#7 Fall, 2007

#let's illustrate the Kruskal-Wallis test

#both by the R function kruskal.test & the permuation test.

#we'll illustrate it on the data in Example 3.2.2

#on page 87. . .

#call the factor trtm

**trtm=c(rep("c",4),rep("p1",6),rep("p2",5),rep("p3",6))**

trtm

#now put the log of the bacteria counts into a vector

lobact=scan()

#type in the data as we've done before. . .

lobact

#now apply the Kruskal-Wallis test via the R command

#and note the agreement with the textbook values . . .

**kruskal.test(lobact~as.factor(trtm))**

#don't forget to use the as.factor function so the

#formula will work properly. . .

#now let's do a permutation test based on the KW statistic

#first learn about a function in R called tapply

#use help.start() to check out the help file on this

#important function. Basically, tapply applies a function

#(the last argument) to a vector (the first argument)

#across the values of a categorical vector (the 2nd arg)

#e.g., the line below computes the mean of the values of

#of **lobact by the values of trtm (it's like BY trtm** in SAS)

#and the line following that computes the mean of the ranks

#of lobact across the treatment groups. . .

**tapply(lobact,trtm,mean)**

**tapply(rank(lobact),trtm,mean)**

#now compute the KW statistic "by hand" to get us ready

#for doing the permutation test based on the KW . . .

#we need to program the parts of the KW formula on p.86

**paren=(tapply(rank(lobact),trtm,mean)-11)^2**

#this gives the value inside the parentheses. . .

#and the following multiplies each component by the number

#of sample values in each level of the factor

**term=tapply(trtm,trtm,length)*paren**

#now sum these up and multiply by the coefficient in front

**sum(term)*(12/(length(trtm)*(length(trtm)+1)))**

#now let's work on doing a permutation test based on this

#statistic - follow the pattern we've done before.

#fill a vector with zeros

**zz=numeric(5000)**

#our observed value of the kw statistic is below . . .

**kwobs= sum(term)*(12/(length(trtm)*(length(trtm)+1)))**

#and the values in size are our sample sizes. . .

**size=tapply(trtm,trtm,length)**

#now let's do the permutations. . .

**for (i in 1:5000) {**

**shufrank=rank(sample(lobact))**

**paren=(tapply(shufrank,trtm,mean)-11)^2**

**zz[i]=sum(size*paren)*(12/(length(trtm)*(length(trtm)+1)))**

}