Discriminant Analysis / Classification Analysis

> attach(winer)

> names(winer)

"Cultivar" "alcohol" "malic" "ash" "ashalka" "Mg" "phenols" "flavanoids" "nonflav" "proantho" "colorint" "hue" "odratio" "proline"

> wine.hetero <-discrim(Cultivar~alcohol+malic+ash+ashalka+Mg+phenols+flavanoids+

nonflav + proantho+ colorint+hue+odratio+proline,data=winer,

family=Classical(“hetero”))

This is essentially the cross-validation algorithm used for both classification trees and neural networks for classification problems. For some strange reason you cannot extract the formula from a call to the discrim function they way you would for a classification tree, you must type it out. Other than that most everything else should seem quite familiar.

> mc <- rep(0,25)
> for (i in 1:25) {
+ sam <- sample(1:178,floor(.75*178),replace=F)
+ tempnet <- discrim(Cultivar ~ alcohol + malic + ash + ashalka + Mg + phenols + flavanoids + nonflav + proantho + colorint + hue + odratio + proline,data=winer[sam,],family=Classical(“hetero”))
+ pred <- predict(tempnet,newdata=winer[-sam,])
+ mistab <- table(Cultivar[-sam],pred$groups)
+ mc[i]<-(length(pred$groups) - sum(diag(mistab)))/length(pred$groups)
+ }

> mean(mc) # classification error rate from CV

We now consider some of the other models for discrimination based upon more restrictive assumptions about the group variance-covariance matrices.

> wine.equal <- discrim(formula = Cultivar ~ alcohol + malic + ash + ashalka + Mg +

+ phenols + flavanoids + nonflav + proantho + colorint + hue +

+ odratio + proline, data = winer, family = Classical("equal"))

We can perform a likelihood ratio test to compare the two models using the anova command in S-Plus. The results of this test suggest we would reject the equal correlation model in favor of the more general heteroscedastic model.

> anova(wine.equal,wine.hetero)

Group Variable: Cultivar

Cov.Structure Df AIC BIC Loglik Test Lik.Ratio P.value

wine.equal equal correlation 169 2106 2305.7 -715.00

wine.hetero heteroscedastic 325 2252 2636.1 -476.02 1 vs. 2 477.96 0

Linear Discriminant Analysis with Emphasis on Interpretation

> library(multivariate)

> names(winer)

[1] "Cultivar" "alcohol" "malic" "ash" "ashalka" "Mg" "phenols" "flavanoids" "nonflav" "proantho" "colorint" "hue" "odratio" "proline"

> x <- as.matrix(winer[,2:14]) # Form a matrix containing the predictors

> xs <- scale(x) # Scale the matrix so the variables they are equal footing

> Discrim(xs,Cultivar)

The resulting output from the Discrim function is shown below.