author’s name

Image Classification Based on Histogram Intersection kernel*

XI Hanbin, CHANG Tiantian*

School of Science, Xi’an University of Posts and Telecommunications, Xi’an, China

Email:

Received **** 2014

Abstract

Histogram Intersection Kernel Support Vector Machine (SVM) were used forthe image classification problem. Specifically, each image was split into blocks, and each block was represented by the Scale Invariant Feature Transform (SIFT) descriptors; secondly, K-means cluster method was applied to separate the SIFT descriptors into groups, each group represented a visual keywords; thirdly, count the number of the SIFT descriptors in each image, and histogram of each image should be constructed; finally, Histogram Intersection Kernel should be built based on these histograms. In our experimental study, we useCore-low images to test our method. Compared with RBF kernel SVM, theHistogram Intersection kernelSVM performance better than RBF kernel SVM.

2.4.Support Vector Machines

Keywords

classification; Bag of Word; Support Vector Machine; kernel function; visual keywords

1. Introduction

As the rapidly increasing amount of digital images which have no tags, it seems that to label them by hand is not feasible, how to classify these images accurately and rapidly by compute has been an important research topic currently. We human can discriminate animals or vehicles from complex background very fast which was found by Thorpe and colleagues[1-2].Similarly, we need some methods to teach compute to identify different images quickly, accurately and automatically.

In many classification tasks, the input data is represent as a set of features [3]. Li and colleagues takeBag of Word (BOW)as a feature that represents images [2]. Support vector machine (SVM), which is alearning method that is based on kernel, is a widely used method for classification tasks, e.g., natural language processing [4], information retrieval [5-6] and data mining [3,7]. But Different kernel results in different effect.

In our paper, we take Corel-low images as experimental data and use BOW model and SVM whose kernel is histogram intersection do image classification.

2. Processes of Method

Our method is based on the BOW model and SVM, which is used to classify images. Figure 1shows the processes of the method. Firstly, to segment the original images into blocks which have a size of B×B by the regular grid. Then, the Scale Invariant Feature Transform(SIFT) was extracted from those blocks. In order to get the codebook of BOW model, we use k-means algorithmto cluster blocks which are similar in vision. After that, we get k cluster centers and each of them is seen as a vision keyword to make up the codebook. And we represent those images as the word of the codebook. Then, we can get the histogram of each image expressed by the frequency of all the keywords. And then we take the histograms of the images as trainset data, use SVM to do classification with predefined kernel function.

Figure 1.Processes of the method.

*Special description of the title.(dispensable)

2.1. Preprocessing of Original Images

We use theCorel image collections, and segment the original images into blocks which have a size of B×B by the regular grid. Figure 2 shows how to deal with the original images.

Figure 2.Preprocessing of original images.

2.2. Extracting Feature

Scale Invariant Feature Transform(SIFT) was proposed by David G.L. in 1999 firstly, and was improved by him in 2004. SIFT descriptor is an invariant image local feature description operator that is based on scale space, image rotation and even affine transformation[8]. The core issue of object recognition is that match a target to its images that are at different times, different light, and different poses. But because of the state of target and the impact of the environment, the image of the same type of object differs from each other greatly in different scene, and people can identify the same object by the local common character. The so-called local feature descriptor is used to portray the local common character of the image. The ideal local feature should have translation, scaling, rotation invariance, and be robust in the lighting changes, affine and projector impact. SIFT descriptor is an ideal local feature.

After preprocessing of the original images, we extract SIFT descriptor from every block. Figure 3shows the extraction of SIFT descriptor, and the green mark on the image is the SIFT descriptor.

Figure 3.Extract SIFT descriptor.

2.3.Bag-of-Words Model

Bag-of-Words (BOW) model was used to expressa document(or sentence)as a histogram of frequency of words which are ignored order in natural language processing firstly [9]. Li and colleagues use it to teach the computer classify images [2].

Figure 4 shows the processes of BOW. After SIFT descriptor extraction, we get local feature of every image. Next a codebook need to be formed. We give a codebook size of k, and cluster these descriptors into k classes. We use k-means algorithm to do it. And then the k classes which are seen as vision keywords make up the codebook. In order to represent the images, every descriptor of each image is replaced by a vision keyword which is nearest to it in Euclidean distance. All the images are represented by the vision keywords now.But it does not done. Following that, a histogram of the frequencies of the vision keywords which represent the image is formed to express the image. Assume that each image is expressed as a histogram, whereis number of all the images.

Figure 4.Processes of BOW.

2.4.Support Vector Machines

Now here coming is the major part of image classification. To make the compute identify images, we should provide it the classifier at first.

Support Vector Machines (SVM) was proposed by Vapnik and colleague [10-11]. As a machine learning method, SVM is popular in classification, regression, and other learning tasks [12]. Describing in popular words, SVM isa two-category model, whose basic model is defined as a linear classifier which maximums the margin in the feature space.Figure 5gives theschematic diagram of it. The learning strategy of it is to find a hyperplane to maximum the margin which can be transformed into solving a convex quadratic programming problem eventually.

Figure 5.Schematic diagram of SVM.

But not all the problems are linearly separable in the practical applications of SVM. The solution for these problems is to choose a kernel function, which maps the data to a higher dimensional space to solve theproblems that is not linearly separable in the original space for SVM. To avoid complicated calculation in the higherdimension, the kernel function has calculated the data that is in the original dimension beforehand.For feature vectors, a defined kernel function is, andis a mapping [13].

Typical kernel functions are:

  • Linear kernel:
  • Polynomial kernel:
  • Radial Basis Function (RBF) kernel:

Hereandare kernel parameters.

And assume thatare histograms, the histogram intersection kernel is:

(1)

This function is a positive definite kernel, so it can be used to quantify the similarity of two images. It also satisfies the Mercers theorem.

In the image classification given training set that is labeled of the form, and the labelis the class thatbelongs to. And to maximum margin now is to solve the optimization problem:

(2)

And get the optimal solution.

By Deducing from the original problem and the dual problem of SVM process we can get the original problem Hyperplanes factor:

(3)

Then get the decision function:

(4)

is a bias value.

3. Experiment and Analysis

We use Windows 8.1 system, Intel Core i7 2.3 GHz, the memory of 6GB and the development environment is Matlab 2012a. We take the Corel-low images asexperimental data, which contains 59 classes and each class has 100 pictures. We design two groups of experiment. The first group is take 12 classes from the whole image data and each class take 70 pictures for training and the rest for testing randomly, and we call it the group one. The second is use all the 59 classes and each class take 80 pictures for training and the rest for testing, and we call it the group two.

We size the regular grid 16×16 and segment the original images into blocks. Then extract SIFT descriptor from every block. The size of codebook influences the classification accuracy, at the same time the larger size cost more in the calculation. So we get the codebook size by seeking a tradeoff between the classification accuracy and the calculation cost. And we size the codebook 1000 for the group one and 3000 for group two.We use LIBSVM toolboxdo classification [12]. Note that RBF kernel is a good kernel for SVM [14], so we take RBF kernel function as comparison to histogram intersection kernel function. RBF kernel function has one kernel parameter. And C-Support Vector Machine(C-SVM) has a penalty parameter C which is the cost of C-SVM. These parameters influences the classification accuracy too. Note that how to find the best values for these parameters is not our major object, so we get the best values for them by grid searching method.

Table 1shows the size of codebook, best parameters of SVM and average classification accuracy of the two kernels for Group One.And the average classification accuracy of RBF kernel is about 5.8% lower than Histogram Intersection kernel.

Table 1.The parameters and average classification of Group One.

RBF Kernel / Histogram Intersection Kernel
Size of Codebook / 1000 / 1000
Best / 2.8284 / -
Best C / 64 / 2
Average Classification Accuracy(%) / 72.9167 / 78.7500

And Figure 7 is the confusion matrix of these two kernels for Group One. From the confusion matrix, we can see that except class Flower and Africa, the accuracy of Histogram Intersection kernel is higher than the RBF kernel.

Figure 7.Confusion matrix of Group One.

Table 2 shows the size of codebook, best parameters of SVM and average classification accuracy of the two kernels for Group Two. And the average classification accuracy of RBF kernel is about 0.8% lower than Histogram Intersection kernel. Although that the accuracy of the two kernel doesn’t differ too much, the RBF kernel costs more in the calculation and needs lots of time to find the best parameters because it requests two parameters.

Table 2.The parameters and average classification of Group Two.

RBF Kernel / Histogram Intersection Kernel
Size of Codebool / 3000 / 3000
Best / 0 / -
Best C / 1024 / 8
Average Classification Accuracy(%) / 46.7232 / 47.5141

Figure 8 is the confusion matrix of these two kernels for Group Two. There are 37 classes whose prediction accuracy of Histogram Intersection kernel is higher than or equal to the RBF kernel in the 59 classes.

Figure 8.Confusion matrix of Group Two.

Comparing histogram intersection kernel with RBF kernel, it comes to conclusion that histogram intersection kernel has higher accuracy and requests less compute resource than RBF kernel for image classification.

4. Conclusions

Image classification has been an important research field currently, SVM and BOW model have been used to do this. Different kernel function results in different classification accuracy. To find a better kernel that has a high classification accuracy is necessary. Histogram intersection kernel has greater classification accuracy and need less computer resource than RBF kernel. And the simulation experiments proves that. So use histogram intersection kernel for SVM to do image classification is a good method and works well.

Acknowledgements

This work was jointly support by the Natural Science Fund Project of Education Department of Shaanxi Provincial Government (Grant No. 14JK1658), the Natural Science Foundation of China (Grant No. 61302050), and theCollege students innovation and entrepreneurship training program of Xi’an University of Posts and Telecommunications.

References

[1]Thorpe, S., Fize, D.and Marlot, C.(1996)Speed of Processing in the Human Visual System.Nature, 381, 520-522.

[2]Li, F.F.and Pietro, P. (2005)A Bayesian Hierarchical Model for Learning Natural Scene Categories. ActaRadiologica Computer Vision and Pattern Recognition, 2, 524-531.

[3]Yoshikawa Y.Y., Iwata T.H. and Sawada H.S. (2014) Latent Support Measure Machines for Bag-of-Words Data Classification.Advances in Neural Information Processing Systems 27 (NIPS 2014).

[4]Kudo T.K and Matsumoto Y.J. (2001) Chunking with Support Vector Machines.Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies,816.

[5]Zhang D. and Lee W.S. (2003) Question Classification Using Support Vector Machines.SIGIR, 3, 26-32.

[6]Yang C.H., Lin K.H.Y. and Chen H.H.(2007) Emotion Classification Using Web Blog Corpora. IEEE/WIC/ACM International Conference on Web Intelligence, 275-278.

[7]Kolari P., Finin F. and Joshi A. (2006) SVMs for the Blogosphere: Blog Identification and Splog Detection. AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, University of Maryland, Baltimore County.

[8]Lowe, D.G. (2004) Distinctive Image Features from Scale Invariant Keypoints. International Journal of Computer Vision, 60, 91-110.

[9]Jiu, M.Y., Wolf, C., Garcia, C. and Baskurt, A. (2012) Supervised Learning and Codebook Optimization for Bag-of-Words Models. Cognitive Computation, 4, 409-419.

[10]Cortes, C. and Vapnik, V. (1995) Support-vector Networks.Machine Learning, 20, 273-397.

[11]Vapnik, V. (2000) The Nature of Statistical Learning Theory. 2th Edition, Springer-Verlag., New York.

[12]Chang, C.C. and Lin, C.J. (2011) LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2, 27.

[13]Howley T. and Madden M.G. The Genetic Kernel Support Vector Machine: Description and Evaluation. Artificial Intelligence Review, 24, 379-395.

[14]Keerthi, S.S. and Lin C.J. (2003) Asymptotic Behaviors of Support Vector Machines with Gaussian Kernel. Neural Computation, 15, 1667-1689.