Content-Based Retrieval Using Invariant Features, Self-Organizing Maps, Concepts and Fuzzy Interval Numbers.
C. Skourlas N. Vassilas
Department of Informatics, Technological Educational Institute of Athens
{cskourlas, nvas }@teiath.gr
Abstract
In the complex Cross Language Document Retrieval applications dealt in this paper, the goal is to assist the document matching stage when the documents contain images. Hence, we calculate the similarity between a submitted bilingual query and each document in the collection or retrieve images that are similar to a query image, based on features extracted from images.
Fuzzy Interval Numbers (FINs) have been employed in various real-world application including numeric and non-numeric data. In this paper, the use of FINs classifier is proposed to handle problems of Cross Language Information Retrieval and documents’ classification. The FIN representation of documents is based on the use of the collection term frequency as the term identifier. Such a representation of documents seems to be suitable for Cross Language Information Retrieval without dictionary.
In the recent years, Content-Based Image Retrieval (CBIR) evolved to an important research domain within the context of multimodal information retrieval. To assure improved image retrieval performance, even when the images are at different scales and orientations or corrupted by noise, we propose a set of global and local invariant features. Color quantization of the images using self-organizing maps are also used to lead to memory savings and improve, in some cases, image retrieval accuracy.
Keywords: Content Based Image Retrieval, Concept Based Retrieval, Cross Language Information Retrieval, Multimodal Information Retrieval, Self-Organizing Maps, Fuzzy Interval Numbers
1. Introduction
In the complex Information Retrieval applications, such as cross language document retrieval, the documents are typically represented using the vector space model. A set of N keywords (index terms) is used to represent each document as an N-dimensional vector with each element representing either the appearance of the corresponding keyword in the document or its relative frequency of occurrence. To correctly translate a keyword from one language to another, besides the use of a dictionary and a thesaurus (for the synonyms), word sense disambiguation techniques that assess the appropriate contextual meaning of the word have to be employed [Alevisos et al., 2007]. The similarity between a submitted query and each document in the collection is usually an inner product based vector matching operation. However, due to a poor selection of keywords, keyword sense ambiguities, inaccurate lexicons or incomplete thesauri, the retrieval accuracy is often rather low.
In the image databases, the difficulty to annotate images so that to allow later retrieval that is acceptable by the subjective perception of the various future users makes text-based image retrieval an inappropriate approach [Rui et al., 1999]. It is, therefore, necessary to develop tools for retrieving information based on image content and also improve the cross language text retrieval without dictionaries and thesauri.
In the recent years, Content-Based Image Retrieval (CBIR) evolved to an important research domain within the context of multimodal information retrieval [Faloutsos, 1995] and a number of CBIR systems and tools have already been developed [Rui], [Veltkamp, 2002], [Eakins, 2000]. CBIR could then be used to assist the document matching stage in complex multimodal information retrieval applications when the documents contain images. In the case that the documents also contain images, redesigning the similarity measures to include the contribution from CBIR systems could lead to significant improvements in document retrieval accuracy.
Fuzzy Interval Numbers (FINs) and lattice algorithms have been employed in various real-world application including numeric and non-numeric data [Kaburlasos, 2006]. A FIN may be interpreted as a conventional Fuzzy Set; additional interpretations for a FIN are possible including a statistical interpretation. Use of FINs classifiers is proposed [Alevisos, 2007] to handle problems of Cross Language Information Retrieval. The FIN representation of documents is based on the use of the collection term frequency as the term identifier and seems to be suitable for Cross Language Information Retrieval without dictionary.
Section 2 of this paper illustrates the proposed experimental methodology using a Euro coin database. The overall structure of the coin retrieval system is presented for the original coin collection as well as for the coin collection after vector quantization with Kohonen’s self-organizing maps. In section 3 the features extracted for content-based coin indexing are presented and the experimental results for three test datasets are discussed. In Section 4 a brief introduction to definition and construction of FINs is given. The FINs similarity is defined and applied to the similarity of queries and documents. The evaluation of FIN – techniques is given in section 5. Finally, conclusions and future work are discussed in the last section.
2. The Content Based Image Retrieval
The experimental methodology consists of two distinct phases: a) the design phase, whereby we create the database and decide upon its indexing scheme, and b) the retrieval phase, in which, following the presentation of query-images, similar images are retrieved from the database.
2.1 Database Design
In the first phase, a coin database is created using the following stages:
- Specification of a digital coin image collection
- Preprocessing of each image to detect the position of the coin
- Feature extraction from coin pixels
- Feature-based indexing of the database
At the first stage, we downloaded coin images from the Internet and compiled a collection of 115 Euro coin faces [Vassilas, 2006]. Among these images, several coin designs, not yet in circulation, were included. All images were in the RGB color space and had a size of approximately 240x240 pixels. Next, each image is preprocessed in order to detect the position of the coin in the image and isolate it from the background. To this end, each image is first converted to gray-scale, then the edges are found using the Sobel operators and finally the Hough transform [Ballard, 1982] is applied on the black-and-white image of edges in order to detect the outer circle of the coin. The circular disk mask, thus created, is then used to isolate the coin from the background (see Fig. 1).
Fig. 1. Preprocessing steps: a) original color image, b) Sobel edge detection on gray-scale image, c) mask from Hough transform, and d) isolated coin.
Once the coin images are free from background interference, features are extracted from those pixels that belong exclusively to the coin area. The extracted feature vector from each image is then stored in order to form the index to the database.
2.2 Database Retrieval
Retrieval from the coin database is performed by:
- Query-image presentation to the database system
- Preprocessing to detect the position of the coin in the query-image
- Feature extraction and query encoding with its feature vector
- Specification of similarity measure
- Computation of query features similarity with those of the database index
- Retrieval of the most similar database coins
The query coin image presented to the system undergoes the same preprocessing, as that for the database coins, to isolate the coin from the background and to extract its corresponding feature vector. The similarity measures employed in this work between two feature vectors are the L1 and L2 distances defined by
Lp = ( | fi( I ) – fi( J ) |p )1/p for p = 1, 2
where f() represents the feature vector of an image and I, J are the database and query images respectively. The database coins are ranked according to their similarities to the query image and the N most similar coins are retrieved from the database and shown to the user.
2.3 Quantization of Coin Collection
Kohonen’s self-organizing maps was the vector quantization method used in this work. The coin collection was quantized in two ways. First, Kohonen’s algorithm was applied to each coin image separately resulting to indexed images with distinct colormaps. Second, the same algorithm was applied to the whole coin collection simulataneously, resulting to indexed images with a common colormap. Fig. 2 shows some original coins (upper row) along with their indexed representations (lower row) for the second quantization case. Also shown to the right of this figure is the coins’ common colormap.
Figure 2.Original (upper row) and quantized (bottom row) coins using a common colormap (right).
The feature extraction and indexing phases for the database coins are the same as for the previously described methodology. During retrieval, the queries are also quantized with Kohonen’s algorithm resulting in an indexed representation with their own associated colormap.
3. Feature Extraction and experimental results
Three different kinds of features were used, namely, color, shape and wavelet features. Color is known to be invariant to scale and orientation but is sensitive to illumination changes. Four moments (mean, standard deviation, skewness and kyrtosis) from each of the hue, saturation and value components of the HSV color space were extracted, for a total of twelve color features.
The shape features were extracted from the gradient image which was obtained by first transforming the color image to grayscale and then using the Sobel operators. To achieve some invariance to illumination conditions, the [min, max] range of the gradient image was normalized to [0, 1]. Assuming that scale and orientation changes preserve most of the edges, the normalized polar historgram, i.e. the probability distribution of edge orientations (at 90o with respect to the edge gradients), not only should it be, adequately, scale and rotation invariant but also can give an estimate of the rotation angle. The latter results from the normalized polar histogram circular correlation between the query image and each one of the database coins. Fig. 3 shows the normalized polar histograms of a database image and a query that is of a different size and orientation with respect to the original image.
Figure 3. A database (left) and a query (right) coin along with their polar histograms.
Actually, the procedure followed uses the L1 or L2 distance measures instead of the typical inner product correlation in order to determine the angle of rotation and the corresponding polar similarity score. In addition, in order to increase the robustness of the system, we extracted both, edge magnitudes and orientations from five equal-area concentric circular rings. The shape features extracted were: a) the mean, standard deviation, skewness and kyrtosis of the ring edge magnitudes, for a total of 5x4 = 20 features, and b) the normalized polar histograms of each ring for 5 degrees angular bins (36 features/ring), for a total of 180 polar features.
Finally, for comparison purposes, we also extracted features from a three level wavelet analysis with the ‘db3’ mother wavelet of the Daubechies family. In particular, we extracted the mean and standard deviation of the wavelet coefficients for each detail image of the wavelet decomposition as well as for the level-3 approximation image, for a total of 10x2 = 20 wavelet features.
3.1 Discussion of the experiments
The experiments on coin retrieval have been performed using three sets of queries. The first data set (DS1) contained queries at a lower resolution (approximately of 130x130 pixels) in order to test the effect of scale on the retrieval accuracy. The queries of the second data set (DS2) were at a different scale (70% the size of the original images), had a random rotation in the [-180o, 180o] range and were corrupted with additive zero mean gaussian noise with 0.1 standard deviation. Finally, the queries of the third data set (DS3) were at a random scale (between 50% and 100% of the original scale), at random orientations (in the [-180o, 180o] range) and with a random change of color in order to test scale, rotation and illumination invariances. Tables 1 and 2 show the retrieval accuracy based on color and edge (ring) magnitude features respectively, (for the means, standard deviations or their combination) for the three data sets while Tables 3 and 4 show the retrieval accuracy for the edge (ring) orientations and wavelet features respectively. Since the L1 distance measure gave slightly better results than L2 in most of the experiments, all results shown in the tables assume the L1 measure.
4. Definition and construction of FINs
FIN could be regarded as an abstract “mathematical object” and can have various interpretations and uses.
Given a vector (a “population”) x = [x1,x2,,xN] of term frequencies (“measurements”) that are real numbers sorted in ascending order. The dimension of a vector x is denoted by dim(x) e.g. dim([2,-1]) = 2, dim([-3,4,0,-1,7]) = 5. The median(x) of the vector x = [x1,x2,,xN] is defined to be a number such that half of the N numbers x1,x2,,xN are smaller than median(x) and the other half are larger than median(x); for instance, median([x1,x2,x3]) = x2, with x1 < x2 < x3, whereas median([x1,x2,x3,x4]) = (x2 + x3)/2, with x1 < x2 < x3 < x4. A FIN can be computed for the vector x (population) by applying the following CALFIN algorithm (see Kaburlasos, 2006].
Algorithm CALFIN // Calculate FIN
Let x be a vector of term frequencies (real numbers).
Sort vector x incrementally.
Initially vector pts is empty.
function calfin(x) {
while (dim(x) 1)
medi:= median(x)
insert medi in vector pts
x_left:= elements in vector x less-than number median(x)
x_right:= elements in vector x larger-than number median(x)
calfin(x_left)
calfin(x_right)
endwhile
} //function calfin(x)
Sort vector pts incrementally.
Store in vector val, dim(pts)/2 numbers from 0 up to 1 in steps of 2/dim(pts) followed by another dim(pts)/2 numbers from 1 down to 0 in steps of 2/dim(pts).
Eventually Algorithm CALFIN computes two vectors, i.e. pts and val, where vector val includes the degrees of (fuzzy) membership of the corresponding real numbers in vector pts.
In the figure 4 two queries or documents (vectors) are illustrated (plotted) as two Fuzzy Interval Numbers. Each value on the term axis represents a term (stem). Any “cut” at a given height h(0,1] defines two generalized intervals, denoted by [a’, c’]h, [b’, d’] h. In our case the generalized intervals are positive and intersecting. As we can see below, a bell-shaped mass function is used for the calculations of the distance between two FINs (or the similarity between two queries or documents which are represented by their FINs).
Figure 4. Two queries / documents (vectors) illustrated as two Fuzzy Interval Numbers. Each value on the term axis represents a term (stem). A bell-shaped mass function is used for the calculation of the distance (similarity) between the documents.
4.1 Fuzzy Interval Numbers and FINs similarity
The concept of the Generalized Intervals was used for introducing a metric into the lattice of the Fuzzy Interval Numbers (FINs) [Kaburlasos 2006]. The interpretation of a generalized interval depends on an application; for instance if a feature (a term) is present in a document it could be indicated by a positive generalized interval [Skourlas 2007]. The area “under” a generalized interval is a real number which could be calculated. A metric distance and an inclusion measure function in the set (lattice) of the generalized intervals Mh can be defined [Kaburlasos, 2006].
4.2 Calculation of the distance between documents
The FIN distance is used instead of the similarity measure between documents: the smaller the distance the more similar the documents. In the case of documents and for the distance calculations a bell-shaped mass function could be selected:
Figure 4 illustrates how we calculate the distance of two FINs F1 and F2 (representing two documents) using the mass function. The points a, b, c, d are used to define the distance, at height h, dh(F1(h),F2(h)) = dh([a,b]h,[c,d]h); dh(F1(h),F2(h)) equals the sum of areas of the shaded regions.
The distance of the two FINs at the height h is given by the sum of areas of the shaded regions and the distance between the two FINs is calculated using the definite integral of the distance at height h from h=0 to 1:
Distance =
5 Examples of successful document retrieval and classification
From our previous experimentation [Skourlas, 2007] we knew that if the length of the queries and the documents is greater then the results of the method are better.
Sample 1: Coin Images’ annotations
Unfortunately, the available annotations for the Euro coin images are rather short e.g.
“2-cent denomination: Depicting a Corvette, i.e. a type of ship used during the Greek War of Independence (1821)”. «Kέρμα των 2 λεπτών: Κορβέτα, τύπος πλοίου που χρησιμοποιήθηκε στον Εθνικό Απελευθερωτικό Αγώνα (1821-1827)».
Therefore, it was necessary to expand such annotations using relevant texts in order to have successful experimentation. For example, within the framework of the Olympic Games Coin Program (Athens 2004), the Greek Mint was striking series of commemorative coins. In the case of representation of Goddess Athena on such coins we added texts as the following:
“Goddess Athena
Athena was Zeus' favourite daughter. She was beautiful and brave, yet very wise as well. And it was absolutely natural that she was highly esteemed by the Greeks for it was in her figure that the most important ideals of the ancient Greek spirit, prudence and bravery, were personified. She was the goddess of wisdom, protector of cities who ensured the peaceful life of citizens. At the same time she was the goddess of war and power, the one who would greatly contribute to the final victory in the battlefield. It is no surprise therefore that she is often represented triumphant (Nikephoros), with a small figure of Nike (Victory) in hand.
Athena is always depicted in full armament - helmet, spear and shield. Her symbols are the owl and the olive. The myth of Athena's birth is quite impressive …..”.
Sample 2: The MED collection.
The MED test set is composed of a collection of 1033 documents; a set of 30 queries, and a list of all the collection documents (“qrels”) that are relevant to each of the queries e.g.“qrels14”={23,24,25,26,28,29,454,455,456,457,459,461,463,466,467,468}
We selected all the long queries (e.g. 14, 17, 20, 25, …). For example, query 14 is described by the following terms:
“renal amyloidosis as a complication of tuberculosis and the effects of steroids on this condition. only the terms kidney diseases and nephrotic syndrome were selected by the requester. prednisone and prednisolone are the only steroids of interest”.