Discriminant Analysis / Classification Analysis
> attach(winer)
> names(winer)
"Cultivar" "alcohol" "malic" "ash" "ashalka" "Mg" "phenols" "flavanoids" "nonflav" "proantho" "colorint" "hue" "odratio" "proline"
> wine.hetero <-discrim(Cultivar~alcohol+malic+ash+ashalka+Mg+phenols+flavanoids+
nonflav + proantho+ colorint+hue+odratio+proline,data=winer,
family=Classical(“hetero”))
This is essentially the cross-validation algorithm used for both classification trees and neural networks for classification problems. For some strange reason you cannot extract the formula from a call to the discrim function they way you would for a classification tree, you must type it out. Other than that most everything else should seem quite familiar.
> mc <- rep(0,25)
> for (i in 1:25) {
+ sam <- sample(1:178,floor(.75*178),replace=F)
+ tempnet <- discrim(Cultivar ~ alcohol + malic + ash + ashalka + Mg + phenols + flavanoids + nonflav + proantho + colorint + hue + odratio + proline,data=winer[sam,],family=Classical(“hetero”))
+ pred <- predict(tempnet,newdata=winer[-sam,])
+ mistab <- table(Cultivar[-sam],pred$groups)
+ mc[i]<-(length(pred$groups) - sum(diag(mistab)))/length(pred$groups)
+ }
> mean(mc) # classification error rate from CV
We now consider some of the other models for discrimination based upon more restrictive assumptions about the group variance-covariance matrices.
> wine.equal <- discrim(formula = Cultivar ~ alcohol + malic + ash + ashalka + Mg +
+ phenols + flavanoids + nonflav + proantho + colorint + hue +
+ odratio + proline, data = winer, family = Classical("equal"))
We can perform a likelihood ratio test to compare the two models using the anova command in S-Plus. The results of this test suggest we would reject the equal correlation model in favor of the more general heteroscedastic model.
> anova(wine.equal,wine.hetero)
Group Variable: Cultivar
Cov.Structure Df AIC BIC Loglik Test Lik.Ratio P.value
wine.equal equal correlation 169 2106 2305.7 -715.00
wine.hetero heteroscedastic 325 2252 2636.1 -476.02 1 vs. 2 477.96 0
Linear Discriminant Analysis with Emphasis on Interpretation
> library(multivariate)
> names(winer)
[1] "Cultivar" "alcohol" "malic" "ash" "ashalka" "Mg" "phenols" "flavanoids" "nonflav" "proantho" "colorint" "hue" "odratio" "proline"
> x <- as.matrix(winer[,2:14]) # Form a matrix containing the predictors
> xs <- scale(x) # Scale the matrix so the variables they are equal footing
> Discrim(xs,Cultivar)
The resulting output from the Discrim function is shown below.