Cluster Oriented Image Retrieval System

CHAPTER – 1

INTRODUCTION

Age estimation is the determination of a person's age based on biometric features. The determination of theage of a person from a digital photography is an intriguing problem; It involves understanding of the human aging process. People cannot freely control aging variation; the collection of sufficient data for age estimation is extremely laborious. Estimating of exact age is one of the most difficult problems even for human being. Therefore, most of the researchers who are working on age estimation are trying to get the results in certain age ranges. The experimented age ranges are still considered to be wide and in some cases exceed 10 years while in other cases reach 15 or 20 years.

One of the main problems to reduce the size of the age ranges is how correct and comprehensive the extracted features from the face are. Some researchers use 20, 22, 35,or 68 features, and the accuracy of the results vary depending on the extracted features and the used approach for age estimation. There are some open databases used for testing age estimation systems such as FG-NET [5][6[7] and Morph [8].These datasets contain photos and ages of the people and there are usually ages from 1 year to 70 years.

Many attempts towards age and sex estimation are tried and most of them give results for wide ranges of ages, or classify the ages in categories such as child, young, youth and old. The problem ofhaving an appropriate approach for age estimation for getting more specific categories of age ranges is still a challenging problem. Thus, we focus our research on more specific age ranges. Actually this problem is also a human problem where many people miss estimate the human ages. To achieve our research goal, we have to find a good database that we can use to test and train our proposed approach, also we have to construct a proper ANN to model our problem. Developing such kind of systems might help in many security purposes or in cases of having disabled people (dump and deaf people). Because our main goal is age estimation and not face recognition, we care only about images of front image, with face free fromclasses or beard.

As mentioned before, most of the researchers categorize the ages into four classes, childhood, young, youth and old. This classification is more or less as our classification in the main classification stage. In our system we go further by classifying each of the main categories (each class) into two classes which we call secondary classification stage. The classification in the secondary stage is not partitioned equally, instead the age partitions are based on some changes in the facial features. For example the face features for people who are between 13 to 25 are very close. Also the ages from 36 to 45 are almost negligible. In some cases we make the range a little bit wide because in many cases it is really difficult to categorize the age in a smaller range.

In this present scenario, image plays vital role in every aspect of business such as business images, satellite images, medical images and so on. If we analysis these data, which can reveal useful information to the human users. But, unfortunately there are certain difficulties to gather those data in a right way. Due to incomplete data, the information gathered is not processed further for any conclusion. In another end, Image retrieval is the fast growing and challenging research area with regard to both still and moving images.

Color image based segmentation aims at searching image databases for specific image that are similar to a given query image. It also focuses at developing new techniques that support effective searching and browsing of large digital image libraries based on automatically derived imagery features. It is a rapidly expanding research area situated at the intersection of databases, information retrieval, and computer vision. This method focuses on Image ‘features’ to enable the query and have been the recent focus of studies of image databases.

The features further can be classified as low-level and high-level features. Users can query example images based on these features such as texture, color, shape, region and others. By similarity comparison the target image from the image repository is retrieved. Meanwhile, the next important phase today is focused on clustering techniques.

Clustering algorithms can offer superior organization of multidimensional data for effectiveretrieval. Clustering algorithms allow a nearest neighbor search to be efficiently performed. Hence, the images are rapidly gaining more attention among the researchers in the field of data mining, information retrieval and multimedia databases. Spatial Databases is the one of theconcepts which plays a major role in Multimedia System. Researches can extract semantically meaningful information from image data are increasingly in demand.

Image are generated at increasing rate by sources such as military reconnaissance flights; defense and civilian satellites; fingerprinting devices and criminal investigation; scientific and biomedical imaging; geographic and weather information systems; stock photo databases for electronic publishing and news Agency; fabric and fashion design; art galleries and museum management; architectural and engineering design; and WWW search engines. Most of the existing image management systems are based on the verbal descriptions to enable their mining. A key-word description of the image content, created by some user on input, in addition to a pointer to the image data is the base of this system. Image mining is then based on standard mining. However, verbal descriptions are almost always inadequate, error prone and time consuming. A more efficient approach is gathered when image example is given by the user on input to the mining process. Automatically generate matching is required then for an efficient age and gender recognit.

The basic idea is to extract characteristic features similar to that of object recognition schemes. After matching, images are ordered with respect to the query image according to their similarity measures and displayed for viewing. In this work, we present a framework f

or considering the influence of this distance function on color to identify the age. This framework assesses a system’s quality from the viewpoints of users; it provides a basic set of attributes to characterize the ultimate utility of systems. Then we analyze examples of mining by color and present some conclusions.

Images present special characteristics due to the richness of the data that an image can show. Effective evaluation of the results of image by content requires that the user point of view is used on the performance parameters. Comparison among different mining by similarity systems is particularly challenging owing to the great variety of methods implemented to represent likeness and the dependence that the results present of the used image set. Other obstacle is the lag of parameters for comparing experimental performance. In this project we implement an evaluation framework for comparing the influence of the distance function on image mining by color and also a way to mine an image from its name. Experiments with color similarity mining by quantization on color space and measures of likeness between a sample and the image results have been carried out. Important aspects of this type of mining are also described.

Images are one of the most important binary multimedia data available in the system. The images are cluttered in various drives in the system. As far as Document data are concerned, they are indexed by windows indexing search. Therefore when a search command is generated, the documents are retrieved quickly. The content based image retrieval system which is an important alternative to document searching, searches a given image in set of images. This search is one to one search, therefore is time consuming. The clustering and indexing algorithm which groups the similar images together such that any matching can retrieve the entire set of images rather than requiring searching every image independently.

We propose a methodology based on Neural networks to estimate human ages using face features. Due to the difficulty of estimating the exact age, we developed our system to estimate the age to be within certain ranges. In the first stage, the age is classified into four categories which distinguish the person oldness in terms of age. The four categories are child, young, youth and old. In the second stage of the process we classify each age category into two more specific ranges. The uniqueness about our research project is that most of the previous research work do not consider the fine tuning of age as we are presenting in our research. Our proposed approach has been developed, tested and trained using the EasyNN tool.

1.1COMPARISON OF IMAGE MINING WITH OTHER TECHNIQUES

Image mining normally deals with the extraction of implicit knowledge, image data relationship, or other patters not explicitly stored from the low-level computer vision and image processing techniques.i.e. the focus of image mining is the in the extraction of patterns from a large collection of images, the focus of computer vision and image processing techniques is in understanding or extracting specific features from a single image.

Figure 1: Image Mining Process

Figure 1 shows the image mining process. The images from an image database are first preprocessed to improve their quality. These images then undergo various transformations and feature extraction to generate the important features from the images. With the generated features, mining can be carried out using data mining techniques to discover significant patterns. The resulting patterns are evaluated and interpreted to obtain the final knowledge, which can be applied to applications [1].

The field of image retrieval has been an active research area for several decades and has been paid more and more attention in recent years as a result of the dramatic and fast increase in the volume of digital images. The development of Internet not only cause an explosively growing volume of digital images, but also give people more ways to get those images.

There were two approaches to content-based image retrieval initially. The first one is based on attribute representation proposed by database researchers where image contents are defined as a set of attributes which are extracted manually and are maintained within the framework of conventional database management systems. Queries are specified using these attributes. This obviously involves high-level of image abstraction. The second approach which was presented by image interpretation researchers depends on an integrated feature-extraction object-recognition subsystem to overcome the limitations of attribute-based retrieval. This system automates the feature-extraction and object- recognition tasks that occur when an image is inserted into the database. These automated approaches to object recognition are computationally expensive, difficult and tend to be domain specific. There are two major categories of features. One is basic which is concerned with extracting boundaries of the image and the other one is logical which defines the image at various levels of details. Regardless of which approach is used, the retrieval in content-based image retrieval is done by color, texture, sketch, shape, volume, spatial constraints, browsing, objective attributes, subjective attributes, motion, text and domain concepts[2].

Content-based image retrieval has become a prominent research topic in recent years. Research interest in this field has escalated because of the proliferation of video and image data in digital form. The goal in image retrieval is to search through a database to find images that are perceptually similar to a query image. An ideal image retrieval engine is one that can completely comprehend a given image, i.e., to identify the various objects present in the image and their properties. Given the state of the art of research in the image analysis community, such an ideal retrieval system is far from being reality. Moreover retrieval based on human annotation is to no avail, because of the size of the video and image databases and the varying interpretations that different humans can attach to an image.

In a practical scenario, like the Internet, the number of images can be of the order of millions and is ever growing. Even if the time required to compare two images is very short, the cumulative time needed to compare the query image with all the database images is rather long and is probably longer than the time an average user wants to wait. We solve this problem by grouping or clustering the images according to their similarity beforehand, so that at the time of the query, it is not necessary to perform an exhaustive comparison with all the images in the database. The clustering is performed based on visual features extracted automatically from the images. Performance evaluation has been a challenging issue in the field of content-based retrieval, primarily because of the difficulty associated with calculating quantitative measures to evaluate the quality of retrieval. The precision and recall measures have been frequently used by many researchers to evaluate the performance of retrieval algorithms. In this paper we introduce a quantitative method to evaluate the retrieval accuracy of clustering algorithms. Our goal is not to subjectively evaluate the quality of retrieval, but to quantitatively compare the performance of Retrieve with and without clustering.

Content Based Image Retrieval is the retrieval of images based on visual features such as color and texture. Reasons for its development are that in many large image databases, traditional methods of image indexing have proven to be insufficient, laborious, and extremely time consuming. These old methods of image indexing, ranging from storing an image in the database and associating it with a keyword or number, to associating it with a categorized description, have become obsolete. This is not in CBIR. In CBIR, each image that is stored in the database has its features extracted and compared to the features of the query image. It involves two steps:

Feature Extraction: The first step in the process is extracting image features to a distinguishable extent.

Matching: The second step involves matching these features to yield a result that is visually similar.

The importance of an effective technique in searching and retrieving images from the huge collection cannot be overemphasized. One approach for indexing and retrieving image data is using manual text annotations.

Advances in image acquisition and storage technologyhave led to tremendous growth in significantly large anddetailed image databases. These images, if analyzed, canreveal useful information to the human users. Imagemining deals with the extraction of implicit knowledge,image data relationship, or other patterns not explicitlystored in the images. Image mining is more than just anextension of data mining to image domain. It is aninterdisciplinary endeavor that draws upon expertise incomputer vision, image processing, image retrieval, datamining, machine learning, database, and artificialintelligence. Despite the development of manyapplications and algorithms in the individual researchfields cited above, research in image is still in itsinfancy. In this paper, we will examine the research issuesin image mining, current developments in image mining,particularly, image mining frameworks, and state-of-the-arttechniques and systems.The annotations can then be used to search images indirectly. But there are several problems with this approach. First, it is very difficult to describe the contents of an image using only a few keywords. Second, the manual annotation process is very subjective, ambiguous, and incomplete.

Those problems have created great demands for automatic and effective techniques for content-based image retrieval (CBIR) systems. Most CBIR systems use low-level image features such as color, texture, shape, edge, etc., for image indexing and retrieval. It’s because the low-level features can be computed automatically. Content Based Image Retrieval (CBIR) has emerged during the last several years as a powerful tool to efficiently retrieve images visually similar to a query image. The main idea is to represent each image as a feature vector and to measure the similarity between images with distance between their corresponding feature vectors according to some metric. Finding the correct features to represent images with, as well as the similarity metric that groups’ visually similar images together, are important steps in the construction of any CBIR system.

The efficiency of different clustering approaches for selecting a set of exemplar images, to present in the context of a semantic concept. We evaluate these approaches with different images, and comparing the image based on clustering. Affinity Propagation is effective in selecting exemplars that match the top search images but at high computational cost.