6. Discriminant Analysis

6. Discriminant Analysis:

(i)  Two populations:

1. Separation:

Suppose we have two populations. Let be the observations from population 1 and let be observations from population2. Note that , are vectors. The Fisher’s discriminant method is to project these vectors to the real values via a linear function and try to separate the two populations as much as possible, where a is some vector.

Fisher’s discriminant method is as follows:
Find the vector maximizing the separation function ,
,
where and

Intuition of Fisher’s discriminant method:

R

As far as possible by finding

Intuitively, measures the difference between the transformed means relative to the sample standard deviation . If the transformed observations and are completely separated,

should be large as the random variation of the transformed data reflected by is also considered.

Important result:

The vector maximizing the separation is the form of

, where

,

,

and where

and .

Justification:

.

Similarly, .

Also,

.

Similarly,

Thus,

Thus,

can be found by solving the equation based on the first derivative of ,

Further simplification gives

.

Multiplied by the inverse of the matrix on the two sides gives

,

Since is a real number,

,

where c is some constant.

2. Classification:

Suppose we have an observation . Then, based on the discriminant function we obtain, we can allocate this observation to some class.

Important result:

Allocate to population 1 if

=.

Otherwise, if

, then allocate to population 2.

Intuition of this result:

Intuition of this result:

R

(population 2) (population 1)

If is on the right hand side of (closer to ), then allocate to population 1 and vice versa.

Note: significant separation does not necessarily imply good classification. On the other hand, if the separation is not significant, the search for a useful classification rule will probably fruitless!!

(ii) Several populations (more than two populations):

1.  Separation:

Suppose there are k populations,

: population 1

: population 2

: population k,

where .

Let be the sample mean for the population j, , and .

The sample between matrix

Thus,

,

is the mean for the j’th population, , for example, and .

The sample within group matrix W is

.

Thus,

.

Note:

the pooled estimate based on .

the pooled estimate based on .

We now introduce Fisher’s linear discriminant method for several p

Fisher’s discriminant method for several populations is as follows:
Find the vector maximizing the separation function
,
subject to The linear combination is called the sample first discriminant.
Find the vector maximizing the separation function subject to
and .


Find the vector maximizing the separation function subject to
and

Note: is the estimate of

is the estimate of

The condition is similar to the condition given in the principal component analysis.

Intuitively, measures the difference among the transformed means reflected by

relative to the random variation of the transformed data reflected by . As the transformed observations

are separated, should be large even as the random variation of the transformed data is taken into account.

Important result:

Let be the orthonormal eigenvector of corresponding to the eigenvalues Then, where

2.  Classification:

Fisher’s classification method for several populations is as follows:
For an observation , Fisher’s classification procedure based on the first sample discriminants is to allocate to the population l if
,
where

Intuition of Fisher’s method:

population 1 population 2 population k

,

R :

the “total” square distance between the transformed () and the transformed mean of the population 1 ().

the “total” square distance between the transformed () and the transformed mean of the population 2 ().

the “total” square distance between the transformed () and the transformed mean of the population k ().

imply the total distance between the transformed and the transformed mean of the population l is smaller than the one between the one between the transformed and the transformed mean of the other populations. In some sense, is “closer” to the population l than to the other populations. Therefore, is allocated to the population l.

10