Support Vector Machines in Face Detection Systems

Mihaela ROMANCA

Group 245

Abstract:

Automatic facial feature localization has been a long-standing challenge in the field of computer vision for several decades. This can be explained by the large variation a face in an image can have due to factors such as position, facial expression, pose, illumination, and background clutter. If the problem of face detection was firstly studied using the geometrical based measurements, the learning based algorithms are more and more popular in the image processing domain. SVM approaches to facial feature detection typically present the features extraction from images and the learning of the SVM parameters. There are different algorithms proposed in the existing literature, and in this paper three approaches are presented: the first one is based on a multiclass SVM algorithm, analyzing the image using multiple 2-class SVM. The second paper approaches an edge case of facial detection when the face is not entirely visible. The last proposed algorithm describes a system for detecting the face using the SVM algorithm combined with another popular technique in image processing: PCA.

1Introduction

Face recognition technology can be used in wide range of applications such as identity authentication, access control, and surveillance and security. Interests and research activities in face recognition have increased significantly over the past few years and SVM is a widely used technique in approaching the face detection algorithm.

However, building a face detection system has been a challenge, irrespective to the method used, due to various changes in the face image. Variation cannot occur only when changing the face identity but also when the same face changes position or the light conditions in the analyzed images are not constant. Because of all these, all face detection systems until present are compromising between two main features: performance and robustness.

For face detection two issues are central - the first is what features to use to represent a face. A face image subjects to changes in viewpoint, illumination, and expression. An effective representation should be able to deal with possible changes. The second is how to classify a new face image using the chosen representation. In geometric feature-based methods, facial features such as eyes, nose, mouth, and chin are detected. Properties and relations such as areas, distances, and angles, between the features are used as the descriptors of faces. Although being economical and efficient in achieving data reduction and insensitive to variations in illumination and viewpoint, these methods are highly dependent on the extraction and measurement of the detected face, so the systems need to be continuously recalibrated for new faces or for ranging distances between human and the point where the analyzed images are taken.

In contrast, template matching and neural methods generally operate directly on an image-based representation of faces - pixel intensity array. Because the detection and measurement of geometric facial features are not required, these methods are more practical and easy to be implemented as compared to geometric feature-based methods.

2General Information

The main idea of the SVM algorithm [2] is that given a set of points which belong to one of the two classes, it is needed an optimal way to separate the two classes by a hyperplane as seen in the below figure. This is done by:

  • maximizing the distance (from closest points) of either class to the separating hyperplane
  • minimizing the risk of misclassifying the training samples and the unseen test samples.

Optimal Separating Hyperplane

Depending on the way the given points are separated into the two available classes, the SVMs can be:

  • Linear SVM
  • Non-Linear SVM

Linearly separable data / Non-linearly separable data
Linearly and non-linearly separable data
Linear SVMs

Let be a set of points with . Each point belongs to either of two classes, with label . The set is linear separable if there are and such that

The pair defines the hyperplane equation , named the separating hyperplane.

The signed distance of a point to the separating hyperplane is given by:

From (4.23) and (4.24) it follows that:

therefore is the lower bound on the distance between points and the separating hyperplane .

Given a linearly separable set S, the optimal separating hyperplane is the separating hyperplane for which the distance to the closest (either positive or negative)

points in S is maximum, therefore it maximizes .

Optimal separating hyperplane
Non-Linear SVMs

The only way the data points appear in the dual form of the training problem is in the form of dot products . Even if in the given space the points are non-linear separated, in a higher dimensional space, it is very likely that a linear separator can be constructed.

So the solution is to map the data points from the input space into some space of higher dimension (n > d) using a function . Then the training algorithm will depend only on dot products of the form .

Constructing (via ) a separating hyperplane with maximum margin in the higher-dimensional space yields a nonlinear decision boundary in the input space.

Because the dot is computationally expensive kernel function are used. A kernel function such that

is used in the training algorithm.

All the previous derivations in the model of linear SVM are still viable by replacing the dot with the kernel function, since a linear separation is still done, but in a different space.

The classes for Kernel Functions used in SVM are:

  • Polynomial:
  • RBF (Radial Basis Function):
  • Sigmoide:

The kernel functions require calculations in , therefore they are not difficult to compute. It remains to determine which kernel function can be associated with a given (redescription space) function .

Decision surface by a polynomial classifier

However, in practice, one proceeds vice versa: kernel functions are tested about which is know to correspond to the dot product in a certain space (which will work as redescription space, never made explicit). Therefore, the user operates by “trial and error”. An advantages that the only parameters needed when training an SVM are the kernel function K.

Decision surface by a RBF classifier

3SVM in Face Detection

SVM with a binary tree classification strategy

The first analyzed approach proposes a face detection system using linear support vector machines with a binary tree classification strategy. The result of this technique are very good as the authors conclude: “The experimental results show that the SVMs are a better learning algorithm than the nearest center approach for face recognition.” [1]

General information subsection describes the basic theory of SVM for two class classification. A multi-class pattern recognition system can be obtained by combining two class SVMs. Usually there are two schemes for this purpose. One is the one-against-all strategy to classify between each class and all the remaining; The other is the one-against-one strategy to classify between each pair. While the former often leads to ambiguous classification, the latter one was used for the presented face recognition system.

A bottom-up binary tree for classification is proposed to be constructed as follows: suppose there are eight classes in the data set, the decision tree is shown in the figure below where the numbers 1-8 encode the classes. By comparison between each pair, one class number is chosen representing the “winner” of the current two classes. The selected classes (from the lowest level of the binary tree) will come to the upper level for another round of tests. Finally, the unique class will appear on the top of the tree.

The bottom-up binary tree used for classification

Denote the number of classes as c, the SVMs learn discrimination functions in the training stage, and carry out comparisons of times under the fixed binary tree structure. If does not equal to the power of 2, we can decompose as where . Because any natural number (even or odd) can be decomposed into finite positive integers which are the power of 2. If is odd, , and if is even . It can be noticed that the decomposition is not unique, but the number of comparisons in the test stage is always .

Face Detection and Recognition with Occlusions

The next approach analyzes a more edge case, where the face is not entirely present in the analyzed image. So it is searched to derive a criterion for SVM that can be employed in the three cases defined in the figure below: not occluded, mixed and occluded. The classical criteria of SVM cannot be applied to any of the three cases, because SVM assumes all the features are visible. So a new algorithm is implemented named by the authors Partial Support Vector Machines (PSVM) to distinguish it from the standard criteria used in SVM. [3]

Occlusion cases taken into account

The goal of PSVM is similar to that of the standard SVM – to look for a hyperplane that separate the samples of any two classes as much as possible. In contrast with traditional SVM, in PSVM the separating hyperplane will also be constrained by the incomplete data. In the proposed PSVM, the set of all possible values for the missing entries of the incomplete training sample are treated as an affine space in the feature space such that a criterion which minimizes the probability of overlap between this affine space and the separating hyperplane is designed. To model this, the angle between the affine space and the hyperplane in the formulation is incorporated. The resulting objective function is shown to have a global optimal solution under mild conditions, which require that the convex region defined by the derived criterion is close to the origin. Experimental results demonstrate that the proposed PSVM approach provides superior classification performances than those defined in the literature.

PCA and SVM for Face Detection

In this paper [5], a novel method is proposed for eliminating most of the non-face area in gray images, so that the detection time is shortened and the detection accuracy is improved. Face area has different pixel character with most of the non-face area. By analyzing histogram distributions, it shows face and non-face area have different histogram distribution. The histogram of face areas has Gaussian-like distribution but non-face area histogram has irregular distribution. According to the histogram distribution feature, the face potential area can be chosen.

If the histogram is distributed in a small range, its mean value is a high value and if the histogram distribution is in a wide range, it has a small mean value.

The histogram of face image is a Gaussian-like distribution; the mean value is an intermediate value. By a number of tests, the histogram mean

value of face potential area is chosen in a fixed range. So if the mean value of a sample area is falling in that range, this sample area is selected as a face potential area. Otherwise, it is filtered as non-face area.

Furthermore, for face detection, an algorithm combining PCV and SVM is used, it is a coarse-to-fine process to detect face region.

The processing consists of three steps:

  • Step 1: face potential is selected using histogram distribution feature. Face and non-face area have different histogram distribution. The histogram of face areas has Gaussian-like distribution but non-face area histogram has irregular distribution.
  • Step 2: PCA is used to decrease the dimension of face feature space. At this step, 1000 sample images of size 19×19 are trained
  • Step 3: SVM is used as classifier to verify face candidate. It is trained by face and non-face samples which are represented by PCA.

4Conclusions

Although the SVM has the disadvantage of the need to choose the type of kernel, because it can be difficult to justify quantitatively the choice of a specific kernel for a specific application, its advantages have made SVM a preferred method in face detection applications. Based on SVM real time applications for face detection have been made, in which the image is pre-processed to emphasize the features and no special illumination and headgear are needed. According to the criteria of the minimization for the structural risk of SVM, the errors between sample-data and model data are minimized and the upper bound for predicting error of the model is also decreased simultaneously. The simulation results of this method show that higher processing speed, better correct recognition rate, and improved generalization are obtained.

5References

[1] Guodong Guo, Stan Z. Li, Kapluk Chan, “Face Recognition by Support Vector Machines”, School of Electrical and Electronic Engineering Nanyang Technological University, Singapore 639798

[2] Ming-Hsuan Yang, Antoine Cornuejols. Introduction to Support Vector Machines

[3] Hongjun Jia, Aleix M. Martinez, “Support Vector Machines in Face Recognition with Occlusions”, The Department of Electrical and Computer Engineering The Ohio State University, Columbus, OH 43210, USA

[5] Jing Zhang, Xue-dong Zhang, Seok-wun Ha, “A Novel Approach Using PCA and SVM for Face Detection”, Fourth International Conference on Natural Computation 2008 IEEE