Open and Save in R

Paired and Nested tests. Mission 10 November

Learning goals

· Be able to handle and understand pseudoreplication in nested data.

· Be able to make paired analyses.

Data collection

· Use the fictive oak twig data from me.

· Use the paired birch data from Södertörn.

Mission

· Analyse the fictive oak twig data with a nested anova.

· Choose your favourite illustration of a paired data.

· Analyse the paired birch data both with a paired t-test and an anova.

Hand in

· A Mini Report of the nested oak data.

· A Mini Report of the paired birch data.

Mail it to me, , and to your feed-back pal.

Feedback pal

· To get your feedback pal, specify your name and paste the code.

you<-"Didrik"

stud<-c("Niklas","Peter","JeanCharles","Nizar","Shayer","Wang", "Tahir","Karin","Esther","Joana","Yao","Rushana","Li","Nok", "Yong","Malin")

pal<-c(stud,stud)[which(stud==you)+8]

par(mar=rep(0,4))

plot(1:2,c(1,1),font=5,pch=169,col=c("red","pink"),cex=20,xlab="",ylab="", xlim=c(0,3),axes=F)

text(1:2,c(1,1),c(you,pal),font=2)

Nested anova

Download and attach the fictive oak twig data. The hypothesis is that oaks that grow in open habitats will have broader crowns, because the twigs on their lateral branches grow faster. I measured 10 twigs each on 10 open habitat oaks and on 10 forest dwelling oaks. However in two oaks a few twigs were galled and removed from the data set.

First make a graph

A good idea is to first make an ordinary stripchart on the raw data values. In this way you can identify outliers and other strange things.

stripchart(yourcode)

If you like you can colour code the different trees. As there are twenty trees you can use col=rainbow(20)[tree]. Or check ?palette for more options. Or just use your own colour string c("blue","green","black","gold", and so on)[tree]. BUT this doesn't work with stripchart, only with plot, so you have to write something like this. GAAAH!

plot(y~jitter(as.numeric(x),1),xlim=c(0.5,2.5),pch=19,col=rainbow(20)[tree])

You could add confidence intervals, but they will be incorrect because of the pseudoreplication. NEVER do this in your thesis or something. But here you might want to compare the incorrect pseudoreplicated confidence interval with the correct ones.

You can also try the incorrect pseudoreplication inflated anova test.

summary(aov(y ~ x))

Recap for yourself WHY this is incorrect!

Calculate means

One way to get rid of the pseudoreplication is to calculate means (per tree in this case). For this the function tapply is very handy.

tapply(y,tree,mean)

Or if you want to save the values in a box:

treemeansY<-tapply(y,tree,mean)

But you also need a new x variable. Otherwise you have 20 y values and 200 x values. Check yourself if you don't believe me.

length(treemeansY)

length(x)

To make the short x variable:

newshortX<-factor(tapply(as.numeric(x),tree,mean),labels=levels(x))

Now you can make a new stripchart with treemeansY ~ newshortX or whatever you called them.

stripchart(yourcode)

In this case it would be appropriate to add means and confidence intervals.

I'm now copy-pasting everything in courier font. This time confidence code!

And go ahead and test it with an anova:

summary(aov(treemeansY ~ newshortX))

Make a nested anova directly

Just go with the anova, but now on your raw data.

summary(aov(y ~ x + Error(tree)))

Paired t-test

Download and attach the Paired birch data set. This data set is from students at Södertörns University College. They have checked two twig lengths per birch, one with a catkin and one without. So the twigs are paired.

In this case the trees were just given numbers, so you want R to consider these number as a factor! This is very common to forget!!

tree<-as.factor(tree)

Make a paired graph

This is a pedagogical challenge.

1. Do an ordinary anova type graph (stripchart). Check how in the pimp your graph manual. Do not include confidence intervals in such a graph. They will not be correct as it is a paired design.

Does there seem to be a difference? Is there a cost of catkins?

But this graph does not take the pairs into account.

2. Let's try another variant. pairing is the categorical variable in which the x and y is paired. In your case it will be the trees.

stripchart(y~x,vertical=T,pch=19)
arrows(rep(1,length(x)/2), y[x=="A"][order(pairing[x=="A"])], rep(2,length(x)/2), y[x=="B"][order(pairing[x=="B"])],code=0)

The lines represent the pairs. Are the lines all having the same direction? Or at least most of them? Then the effect should be significant.

3. Finally, extract the differences of each pair and make a graph (a stripchart) on only the differences. It is of course necessary that the pairs are in the same order in both vectors. To achieve this do like this. pairing is the categorical variable that in which the x and y is paired. In your case it will be the trees.

diffr<-y[x=="A"][order(pairing[x=="A"])]- y[x=="B"][order(pairing[x=="B"])]

Then do the stripchart:

stripchart(diffr,method="jitter",jitter=.1,vertical=T,pch=19,xlab="Paired differences")

Add mean and confidence interval.

medel<-mean(diffr)

a<-1

n<-length(diffr)

s<-sd(diffr)

segments(1:a-.05,medel, 1:a+.05,medel, col="red",lwd=8, lend=2)

konf<-c(qt(0.975,(n-1))*(s/sqrt(n)))

arrows(1:a,medel+konf,1:a,medel-konf,angle=90,code=3,length=.1, col="red",lwd=2)

And maybe add a line at zero to see if the differences differ from zero:

abline(h=0,lty=3)

Which graph do you prefer? Do you believe in your graphs? Is there a cost of catkins?

Check the model assumptions

For this you just check if the differences are normally distributed.

hist(diffr)

In this case you only have one group (the differences) so the variation can not vary between groups!

If you want to remove a potential outlier, you must remove also its paired value. This is probably easiest to do in excel! If you absolutely want to do it in R, this is a suggestion:

To get what pair (tree) should be omitted

pairing[which(y>40)]

[1] 8

To remove that pair from all three variables (!= means "is not"):

y<-y[pairing!=8]

x<-x[pairing!=8]

pairing<-pairing[pairing!=8]

Aaaargh!

This code is…

Supergreen!

Gimme Excel!

Make a paired t-tests

To do a t-test you must have the two groups in different vectors (boxes). For a paired t-test it is also necessary that the pairs are in the same order in both vectors. To achieve this do like this:

groupA <- y[x=="A"][order(pairing[x=="A"])]

groupB <- y[x=="B"][order(pairing[x=="B"])]
t.test(groupA,groupB,paired=T)

The test is in fact testing in a very similar way as shown in the third graph variant. That is, does the mean of the differences differ from zero.

Make a "paired" anova of the birch twig data

The paired t-test tests whether the mean of the differences differ from zero. The anova pathway for testing the same hypothesis is like this: Test the ratio between the variation caused by the explanatory variable and the variation caused by the randomly sampled trees. That is we do not test over the residual variation within trees, but over the variation among trees.

mod.paov <- aov(y~x + Error(pairing))

summary(mod.paov)

Check that you got the same result as with the paired-test.

A nice bonus of using a "paired" anova instead of a paired-test is that you can measure many birch twigs from the same tree and analyse the data directly. However, the aov() function with Error-specification is said to give correct results only in balanced designs, that is the same number of catkin and vegetative twigs, and the same number of twigs per tree.