FACIAL IMAGE RECOGNITION

Shubhangi D. Sapkal*, Madhuri S. Joshi*, Y.V.Joshi**

Comp. Sci. & Engg. Dept, Govt. Engg. College, Aurangabad*, Elect. & Telecom Dept, SGGS college of Engg., Nanded.** , e-mail: , ,

1. INTRODUCTION:

Biometrics refers to the automatic identification of a person based on his/her physiological or behavioral characteristics. This method of identification is preferred over traditional methods involving passwords and PIN numbers for various reasons:

  1. The person to be identified is required to be physically present at the point of identification.
  2. Identification based on biometric techniques obviates the need to remember a password or carry a token.

With the increased use of computers, it is necessary to restrict access to sensitive/personal data. By replacing PINs, biometric techniques can potentially prevent unauthorized access to or fraudulent use of ATMs, cellular phones, smart cards, desktop PCs, workstations, and computer networks. PINs and passwords may be forgotten, and token based methods of identification like passports and driver’s licenses may be forged, stolen or lost. Various types of biometric systems are being used like, speech recognition, face recognition, fingerprint matching, iris and retinal scanning and hand geometry.

The requirement for reliable personal identification in computerized access control has resulted in an increased interest in biometrics. Biometrics being investigated includes fingerprints, speech, signature dynamics, and face recognition. Sales of identity verification products exceed 100 million. Face recognition has the benefit of being a passive, non-intrusive system for verifying personal identity. The techniques used in the best face recognition systems may depend on the application of the system. There are at least two broad categories of face recognition systems:

1. The goal is to find a person within a large database of faces (e.g. in a police database). These systems typically return a list of the most likely people in the database. Often only one image is available per person. It is usually not necessary for recognition to be done in real-time.

2. The goal is to identify particular people in real-time (e.g. in a security monitoring system, location tracking system, etc.), or to allow access to a group of people and deny access to all others (e.g. access to a building, computer, etc.). Multiple images per person are often available for training and real-time recognition is required.

We concerned with the second case. Invariance to rotation or scaling is considered here.

Object recognition is an important aspect of visual perception. We often recognize an object visually on the basis of its characteristic shape. One approach to object recognition is based on the decomposition of objects into their constituent parts. These components are obtained from its variant and non-variant characteristics. For statistical pattern recognition, quantitative description of objects is characteristic elementary numerical descriptions, that is, features are used.

2. The proposed approach:

The proposed approach is organized into the following three steps:

  1. Pre-process the input image.
  2. Label the facial features.
  3. Identify using neural network.

2.1 Pre-processing approach:

For any object recognition technique, it must have some form of pre-processing algorithm, which compensates for adjustments in blur, lighting, scaling and rotation. [1]The problem of face recognition in real world environments involves the location of the face, normalization of these located faces, scaling and recognition with the ones in the database. With the implementation of various normalization stages, the face recognition system can be designed to perform recognition on images where the faces are subject to different blur, lighting, scaling and rotation[3].

2.2 Blur normalization:

The face images captured from a digital camera could be blurred by camera motion. To deblurr an image, Blind deconvolution algorithm is used. The algorithm restores the image and the point spread function corresponding to the linear motion across 10 pixels, at an angle of 20 degrees is taken. The guess at the size of the initial point spread function is more important to the ultimate success of the restoration than the values in the point spread function.

2.3 Light normalization:

In real world environments faces are often subject to various changes to lighting conditions. If these variations are not compensated for by face recognition system, significant performance drops will occur. Lighting variations can be broadly classified into two categories global intensity changes and localized gradients are the lighting effects caused by shadows, directional and specular lighting. Global lighting variations is solved by using histogram equalization. The histogram equalization is the simple and effective way of modifying an image contrast. Histogram equalization removes global lighting variations in face images and lighting normalized images enhances the recognition efficiency of the system.

2.4 Scaling normalization:

There are a number of techniques one might use to enlarge or reduce an image. These generally have a tradeoff between speed and the degree to which they reduce visual artifacts. The simplest method to enlarge an image by a factor 2 say, is to replicate each pixel 4 times. A standardized scale is essential for the face recognition. It is a requirement that the face in the input image matches the size of the faces that is in the stored database. All the faces need to be normalized to a standard size that conforms to the database. The standard approach called bilinear interpolation is applied. Bilinear interpolation linearly interpolates along each row of the image, and then uses that result in a linear interpolation down each column in the image. With this method, each estimated pixel in the output image is a weighted combination of its four nearest neighbor in the input image. Bilinear interpolation method results in a good scale normalized faces[8]. In scaling normalization module all the face images are normalized to a standard size, which is necessary for the correct operation of recognition module.

2.5 Rotation normalization:

Let us assume that we have some means of accurately estimating the location of some facial key points such as eyes, nose or mouth. If these were possible then the location of facial key-points would provide us with enough information to adjust for inplane rotation. The task is to design, implement a eye detection based on the Sobel edge detection and Hough transform for rotation normalization.

To use Hough transform in the detection of eyes in a face image, the original image must be accomplished by applying the gradient operator. In short, the Sobel operators will highlight edges in an input image. The Sobel operator was by far the most extensively used and its derivatives were recognized widely. Compared to a Laplacian edge perator, the Sobel operator has distinct advantages. It is less sensitive to isolated high intensity point variations since the local averaging over sets of three pixels tends to reduce this. Secondly, it gives an estimate of edge detection as well as edge magnitude at a point, which is more informative and is of considerable use in later processing. The end result should be binary image that contains highlighted edges around the shape we wish to detect.

Fig. 2.1 Hough transform for circles

fig. 2.2(a): Angle difference (b) Rotation normalized image

3. Facial feature extraction[6]:

We use deformable template, proposed by Yuille et al, to extract the contours of eyes and mouth. Deformable template consists of a parameterized template of simple geometric primitives and an energy function. The energy function is defined according to our priori knowledge about the expected shape of features and is used to guide the contour deformation process[10]. Some researchers applied deformable template to extract eyes and mouth boundary before. Based on their previous work, we use parabola and circle as the geometric primitives for eyes and mouth and further develop the energy function. The deformable templates are shown in Figure 3.1. We also use active contour (snake) to extract the face and eyebrow contours. Snake is an energy minimizing spline guided by internal and external constraint forces that pull it toward specific features. With the help of those two models, we extracted fine eye, mouth, eyebrow and face contours under a reasonable computational burden.

3.1Algorithm:

During the feature extraction process, to reduce the searching time, we introduce a coarse-to-fine scheme. The anatomical prior knowledge of human face is used as heuristic. The rough positions of eyes are first located. The edges are detected. Then the horizontal projection of the edge map is calculated. According to the anatomical face model, the first peak value of the projection should correspond to the eyebrow or hairline (we assume no serious background influence). Then we can get the coarse column position of eyes. When applying eye deformable template, we can limit our search scope according to the coarse position of eyes, thus improving the efficiency. Furthermore, because the pupils are the easiest to find, we attempt to locate it first so that we can further reduce the searching time. We will then move the pupil template in the neighborhood, adjust the deformable parameters, and calculate the energy function. When the minimum value is reached, the pupil is located. Then the boundary template gets more constraints from the position of the pupil. This scheme is obviously more efficient than just move the entire eye template on a larger scope. After we get the positions of eyes, the searching scope for mouth template is reduced according to the anatomical model of face[10]. When we use snake to extract the eyebrow and face contours, we also first extract a rough contour for each facial feature, then use energy function to guide the convergence of the spline. Some of the results are shown in Fig 3.2.


Figure3.1 Deformable templates

Figure3.2 After using deformable templates and extracting features

  1. Face recognition by separate feature analysis[4]:

4.1 System model:

Features can be extracted as described above, from image (i.e. eyes, mouth, nose, eyebrows and face boundary). These features are used to train a neural network per person, one for each feature. The results are then used as input for a sixth neural network, call it global network.

The system by using feature analysis approach can be used in the applications such as security, telecommunications or enhanced human machine interfaces. Consider the problem of access control to high security restricted access areas. Each authorized person will hold a magnetic card to identify him/herself to the security system. Instead of this, the system will take a picture of the face of a person to check his/her identity against the card. Usually the systems built are able to identify a person within a group of known individuals. Identification capability falls considerably as the size of the group increases. Therefore use here a separate system for each person. Each system will be capable of determining whether the face input belongs to the person associated to it. To add a new individual to system then assign a new system to it.

This approach consists of extracting features as shown in fig.3.2. A system will consider five different features (i.e. eyes, nose and mouth, eyebrows and face boundary). The shots of one of the same feature will then be used to train the NN, refer it as a specific network. Each specific network must be capable of authenticating the person on the basis of the feature in question. So, there will be five NNs per person: one for each feature. The results will then be used as input for a sixth NN, call the global network, which will have to give a final response on the authenticity of the person.

Because of the segmentation of the image by the features of the face, network can extract maximum amount of information from the input. As the networks only have a small fraction of the full image, they will have to analyze even the tiniest detail. This is one objective of the feature extraction. Another objective is to isolate distortions caused by some items. Glasses worn by some individual who didn’t use them during training are likely to confuse the eye specific NN. However, it doesn’t affect NNs concerned with other features.

There are five NNs, each making decision separately, widens the margin of error for any one of them. This error can be corrected by the others. We cannot expect the results of all the nets to match each other perfectly. And the global network will give final response on the basis of other networks. The features analyzed are eyes, nose and mouth because these features are most significant for identifying the person. And as they are located in the center of the face, they are very likely to be visible, regardless of which way the person is looking.

Specific networks

Fig 4.1 System model

4.2 Architecture model:

We can use a particular type of architecture based on a cell connection model for the specific networks in the system. As shown in fig.4.2, each neuron receives input from some of the neurons in the preceding layer. The neurons in each cell are arranged as rectangular cells. The neuron in the next layer only receives inputs from the neurons that belong to the same cell.


Fig. 4.2 Cell connection architecture model

This connection model aims for a high number of hidden neurons, as these neurons enable abstract characteristic to be extracted. As each neuron studies a different part of the input, it is impossible for the characteristics extracted to coincide, thus reducing redundancy.

Each feature is used to train a specific network. Another aspect is the correlation between specific NN outputs. There must be a minimum correlation for the errors of one network to be offset by the others.

Compare cell connection architecture with conventional total connection architectures. Consider two conventional multilayer perceptrons with the following structures: 64*40-12-1 (architecture A1) (assume image size is 64*40). And 64*40-16-1 (architecture A2), where each pair of numbers represents the dimensions of the network layer. Find number of weights in each network.

On the other hand, the first cell connection architecture (architecture B1) has three middle layers. The neurons are arranged as: 64*40-31*19-14*8-6*3-1 (where each pair of numbers represents the dimensions of a network layer). Find the number of weight between each of the layers.

The second cell connection model network (architecture B2) has two hidden layers. The neurons of were arranged in layers as follows: 64*40-15*9-4*2-1. The number of weights between each of the neuron layers is to be obtained and its summation is reduced by a factor of between 3 and 4. On the other hand, the number of middle neurons increases considerably.

4.3 Global network:

The specific network results must be channeled to a global network to get the final response. A total connection network with one hidden layer can be used for this purpose. Then train the network. The output patterns of the last hidden layers of the specific networks are employed to this end. The aim is to use abstract data that supply the global network with enough information to make its findings. How these patterns differ depending on whether they are for the correct person or other person. There are neurons that produce almost the same output for all images. So, these weights appear to be wasted. If neurons are being wasted using cell connection networks with almost 100% learning, then most of the weights are superfluous in total connection networks that have three or four times as many weights.

The images used in the learning sets for the specific networks should not be used again for the global network learning set. The objective is to train the global network that is capable of correcting isolated errors. If we use these images, rendering will be very clear and error free. The global network also learns very quickly. After the first epoch, learning performance may be over 90% and test performance may be over 85% or more.

Conclusion:
Facial image recognition has got multiple approaches. Recognition by separate feature analysis gives better results provided the neural network is properly designed. To increase the speed of identification faster network need to be deployed without affecting the accuracy. The work is in process for developing a network for better and faster facial image recognition. Thus biometrics is going to govern the security aspect in the near future.

References:

  1. Image processing, analysis, and machine vision
    -Milman Sonka, Vaclav Hlavac, Roger Boyle
  2. Pattern recognition and image processing
    -D. Luo
  3. B.S. Mahanand, Dr. C.N.Ravi Kumar, “Towards enhancing the efficiency of facial recognition systems”
  4. Bautista J., Castellanos J., Mingo L. F., Rodriguez-Paton A., “Image recognition by integration of separate feature analysis”
  5. R. Srikantaswamy, R.D. Sudhaker Smuel, “A color segmentation approach to the implementation of a pose invariant face recognition system”
  6. Image processing
    -M.A. SID-AHMED