TPR3411 Pattern Recognition

TPR3411 Pattern Recognition

Tutorial 6

Part 1 - Theory

1. What is the difference between parametric and non-parametric approaches in pattern recognition?

In the parametric approach, the densities are uni-modal. However, in practical problems, it usually involves multi-modal densities. On the other hand, the non-parametric approach takes arbitrary distributions without assuming forms of the underlying densities.

2. Briefly explain Parzen windows in classification.

Parzen windows classification is a technique for nonparametric density estimation, which can also be used for classification. Using a given kernel function, the technique approximates a given training set distribution via a linear combination of kernels centered on the observed points.

3. What is the goal used by the k-nearest neighbor method in classification?

The goal of k-nearest neighbor is to classify a new sample by assigning it the label most frequently represented among the k nearest samples and use a voting scheme.

4. Let say you are given a task to classify whether a given cell sample is malignant or benign. You have a set of training sample for this task.

Feature 1 / Feature 2 / Classification
2 / 4 / Malignant
3 / 8 / Benign
5 / 9 / Benign
3 / 7 / Malignant
7 / 10 / Benign
5 / 4 / Malignant
6 / 8 / Benign

Given a new sample, . Use the k-nearest neighbor method with k = 3 to classify this sample.

Feature 1 / Feature 2 / Distance / Rank Minimum Distance / Is it included in 3 –NN / Category of NN
2 / 4 / 13 / 6
3 / 8 / 2 / 2 / Yes / Benign
5 / 9 / 5 / 3 / Yes / Benign
3 / 7 / 1 / 1 / Yes / Malignant
7 / 10 / 18 / 7
5 / 4 / 10 / 5
6 / 8 / 5 / 4

So we classify the new sample as Benign.

Part 2 – Practical

Objective: You are going to convert the malignant/benign classification problem above into Matlab program.

1. Create matrices for the two classes:

Class1 = [2 4; 3 7; 5 4]

Class2 = [3 8; 5 9; 7 10; 6 8]

2. Plot the data in each class. You graph should look as follow.

x1_1 = Class1(:,1);

x1_2 = Class1(:,2);

plot(x1_1,x1_2,'*','Color','blue')

hold on

x2_1 = Class2(:,1);

x2_2 = Class2(:,2);

plot(x2_1,x2_2,'o','Color','red')

3. Create a variable for the new sample, . Plot the new sample in the graph you created just now. You graph should look as follow:

x = [4 7]

plot(4,7,'x','Color','black','MarkerSize',15)

4. Call the k-NN function with k = 3. What is the returned result?

kNN(Class1,Class2,x,3)

Note: How to choose K?

· “Rule of thumb”: choose , where n is the number of samples.

· For efficiency, choose k = 1