Didrik Vanhoenacker, 2009
Mission 8.Multivariate statistics. (5 November)
Learning goals
- Be able to make an ordination plot and evaluate it.
- Get insights into how multivariate statistics work.
Data collection
- Analyse either music taste or movie taste.
Mission
- Make a neat ordination plot.
- Find out what factors are influencing taste.
- Find your soul mate and ask him / her for a lunch date or a concert.
Hand in
- One Mini-report on either music taste orfood taste.
Mail it to me, , and to your feed-back pal.
Feedback pal
Pick the one with the most similar taste of food or music. But perhaps one that used the other data set...
Multivariate statistics – many response variables
The Data sets
You need two separate data sets.
One with the response variables. Like this:
Metallica / Beyonce / U2 / Pear_Jam / JustinSofia / 7 / 6 / 3 / 4 / 7
Anja / 6 / 3 / 4 / 7 / 6
Pelle / 2 / 6 / 3 / 4 / 1
M1 / 0 / 1 / 6 / 3 / 4
M2 / 6 / 3 / 4 / 7 / 6
K1 / 7 / 6 / 6 / 3 / 4
K2 / 0 / 6 / 3 / 4 / 1
And one with the explanatory variables:
Sex / Age / Nat_Soc_Non / City / Swede?Sofia / K / 25 / N / N
Anja / K / 24 / N / N
Pelle / M / 25 / N / N
M1 / M / 19 / S / Y
M2 / M / 56 / N / Y
K1 / K / 32 / S / Y
K2 / K / 27 / S / Y
To plot and analyse the data sets you need to installand load the package vegan (VEGetation ANalyses). It is installed in the computer lab.
library(vegan)
Import and attach both data sets. In this case it's ok to attach two data sets as no variable names are the same. Change the name to your file names of course! But taste and xvar is a good idea to use (or you have to change them everywhere!).
Windows:
library(xlsReadWrite)
taste<-read.xls(file.choose())
xvar<- read.xls(file.choose())
attach(taste)
attach(xvar)
Check a 3D cloud
If you try this at home you have to install Rcmdr (R commander) first. Load Rcmdr.
library(Rcmdr)
When the menu driven Rcmdr interface pops up, take it down, but don't close it. You won't use the menus. Then you choose three foods or artists and use them instead of this example's 50 cent & Co.
scatter3d(FiftyCent,Madonna,Kent,surface=F,point.col="grey",axis.scales=F, xlab="FiftyCent",ylab="Madonna",zlab="Kent")
Make the graph window larger (drag the bottom right corner). Use the mouse to move the graph into different angles.
What does this mean? What are you doing right now?
Make a multivariate graph (ordinationgraph, biplot)
To understand what the graphs mean, check the lecture powerpoint. Here we use the method PCA (principal component analysis). Somewhat confusing the R code for both pca and rda is rda. Asimilar method is CA (correspondence analysis).
[ Under the hood: PCA uses the "actual” distancesin themulti dimensionalcloud. CA usesthe difference between theobserved value and the value you would expect if all people had the same taste (as in a chi2-test). ]
plot(rda(taste,scale=F))
Here we usescale=False, because we have used the same scale for all artists and foods.Usually you have to standardize (scale) the variables to make them more comparable.
As a default the artists/foods are used for scaling the graph. This is good if you want to see which artist or foods that have the same fans. If you rather want to see which persons are having the same taste, it is better to scale on persons.
plot(rda(taste,scale=F),scaling=1)
Default is also to show all samples/plots/sites in the graph. This often makes the graph crowded (and porridgy / muddled). I recommend inspecting artist/foods and persons separately.
Compare artists/foods like this:
plot(rda(taste,scale=F), scaling=2, disp="species")
and persons like this:
plot(rda(taste,scale=F), scaling=1, disp="sites")
The reason for displaying species or sites is that this is a vegetation package… Persons corresponds to sites (plant sites) with different species (which corresponds to the artists/movies).
This might be porridgy anyway, especially if you want to include artist/foods and persons in the same graph. A pimpy way is to first just plot the data points and then use the mouse to click those points you want to have their name displayed.
pca.plot.1<-plot(rda(taste,scale=F), scaling=2, type="p")
This makes a graph with only points and save the graph data in a box.
identify(pca.plot.1, "species", labels=names(taste),cex=0.8)
Now you can click on the points you want to name. Right-click and choose stop when you are done. Well, then the same for persons:
identify(pca.plot.1, "sites", labels=row.names(taste),cex=0.8)
You can also pimp your graph in Inkscape (not in lab) or Powerpoint (Good luck)!
To see how much of the variation that each PC axis explains you can make a screeplot:
screeplot(rda(taste,scaling=F))
If most of the variation in cloud is explained by the first two PC axes (the two first bars in the screeplot) it means that the idea of PCA works. If all PC axes explain about the same amount of variation, you haven't gained anything; you could have used the original variables instead.
With explanatory variables
You can also add explanatory variables in the graph. First we test which are related to the first two PC axes (which are the ones in the graph).
xvar.test<- envfit(rda(taste,scale=F), xvar, permu = 5000)
xvar.test
Now you can add them to the graph. Continuous variables are added as arrows, categorical variables as points. If you don't want to add insignificant crap variables, you can specify a critical value.
plot(rda(taste,scale=F))
plot(xvar.test, p.max=0.1)
If things end up outside the graph area, you can add the xlim and ylim manually. For example:
plot(rda(taste,scale=F),scaling=2,disp="species",xlim=c(-3,3),ylim=c(-3,3))
plot(xvar.test, p.max=0.1)
So, if you'd like to make a graph with only the class mate's name displayed (but given all persons' taste) maybe it will be something like this. Function select picks the rows that you specify (e.g the 14 first, if that is us).
plot(rda(taste,scale=F), scaling=1, disp="sites", type="n",xlim=c(-2,2), ylim=c(-2,2))
plot(xvar.test, p.max=0.1)
text(rda(taste,scale=F),scaling=1,disp="sites",select=1:14)
Then add the artist/foods?
text(rda(taste,scale=F),scaling=1,disp="species",cex=0.8)
Or if you just want to add a few artist/movies. Moron code for moron R (necessity of first row).
nameplot<-plot(rda(taste,scale=F),scaling=1,disp="species")
plot(rda(taste,scale=F), scaling=1, disp="sites", type="n",xlim=c(2,2),ylim=c(-2,2))
plot(xvar.test, p.max=0.1)
text(rda(taste,scale=F),scaling=1,disp="sites",select=1:11)
points(rda(taste,scale=F),scaling=1,disp="species",type="p")
identify(nameplot, "species", labels=names(taste),cex=0.8)
Right-click to stop.
1