Topic-Oriented Exploratory Search Based on an Indexing Network

Abstract

An exploratory search may be driven by a user’s curiosity or desire for specific information. When users investigate unfamiliar fields, they may want to learn more about a particular subject area to increase their knowledge rather than solve a specific problem. This work proposes a topic-oriented exploratory search method that provides browse guidance to users. It allows them to discover new associations and knowledge, and helps them find their interested information and knowledge. Since an exploratory search needs to judge the ability to discover new knowledge, the existing commonly used metrics fail to capture it. This paper thus defines a new set of criteria containing clarity, relevance, novelty, and diversity to analyze the effectiveness of an exploratory search. Experiments are designed to compare results from the proposed method and Google’s “search related to . . . .” The results show that the proposed one is more suitable for learning new associations and discovering new knowledge with highly likely relevance to a query. This work concludes that it is more suitable than Google for an exploratory search.

Existing system

The existing exploratory searches like topic exploration and IQE, they analyze the information in user logs and those webpages returned from keyword matching. They focus on refining user requirements, showing several facets helping users refine their requirement and find their desired information. They cannot satisfy such user needs as finding some topics related to the topic “sports.” Moreover, those methods based on user logs cannot handle “cold start.” It is a situation when users enter a field that few people have ever searched and thus there is insufficient information to support search expansion for them. Because constructing a proper and standard ontology is difficult, those methods requiring it may not perform well.

Proposed System

Proposing a method to build a semantic association graph based on hyperlinks on the Internet. Giving a method to suggest several candidate topics based on search keywords. By interacting with users, the system determines their browsing topics. Designing an interactive search expansion mode and expanding searches in two directions to provide good browse guidance for users. Different from common search expansions used for finding accurate webpages to queries, the proposed one helps users find their interested information, thereby giving them some closely related but disjoint keywords. It can better support the exploratory search. Defining some new metrics that can be well used to evaluate whether a kind of search expansion is helpful for an exploratory search.Conducting several experiments to compare the proposed method with Google’s search expansion and a co-occurrence frequency method . The experimental results validate the advantage of the proposed method in performing high-quality exploratory search.

Implementation

Module Description

The modules are:

1.Exploratory Search Module

2.Expansion Keywords Module

3. Semantic Association Module

4.Query Topic Module

1.Exploratory Search Module

Exploratory search becomes a new frontier in the search field . It supports search behaviors beyond simple lookup by considering contextual factors, users’ behavior, and semantic associations among information sources . It is commonly used in scientific discovery, learning, and decisionmaking contexts. Exploratory search provides an intelligent search thought that is different from the traditional ones like Google.Topic exploration is an example of exploratory search. It demands novel systems capable of constructing effective entry points for quickly grasping the essence of a topic and possible directions for its exploration, which has been traditionally served by vertical search engines .

2.Expansion Keywords Module

Expansion keywords and topics are relevant to the browsing one as well as they are novel, they may bring new knowledge to users. They can partly satisfy innate human curiosities. Thus, it is successful. It is essential to define a new set of criteria in order to analyze the effectiveness of exploratory search.A basic metric is that expansion keywords must be significative in terms of a given search keyword.

The expansion keywords are relevant to a query as well as they can show new information to users. Novelty is defined as a metric to measure the ability to explore new knowledge. Novelty is the quality of being different, new, and unusual. It refers to how different those expansion keywords are with respect to search keywords. This work proposes for the first time the following way to quantify a method’s ability to produce novel results.

3.Semantic Association Module

It generate such structures based on hyperlinks on the Internet. As we all know, webpages on the Internet are linked by hyperlinks. A hyperlink between two webpages usually implies some real-life association. For example, there are some hyperlinks between cooking webpages and ingredient sale webpages. According to them, one can identify some associations between cooking and ingredient sale. We target at dealing with this kind of associations and present a semantic association graph of topics, thereby greatly helping users’ topic-oriented exploratory search.

4.Query Topic Module

An information on the Internet is redundant and unordered. Many similarwebpages exist there.Viawebpage classification we can gather them together to benefit to information analysis. The hyperlinks are created by developers or administrators of websites and webpages. They can thus be subjective and arbitrary, and some hyperlinks are equivocal. In other words, hyperlink noises exist on the Internet. This paper uses classification and statistics to lessen their influence. Setting proper threshold can help minimize their unfavorable impact on establishing associations among topics.

System Requirements

H/W System Configuration:-

Processor - Pentium –III

Speed - 1.1 Ghz

RAM - 256 MB(min)

Hard Disk - 20 GB

Key Board - Standard Windows Keyboard

Mouse - Two or Three Button Mouse

Monitor - SVGA

S/W System Configuration

Operating System :Windows95/98/2000/XP

Application Server : Tomcat5.0/6.X

Front End : HTML, Java, Jsp

 Scripts : JavaScript.

Server side Script : Java Server Pages.

Database Connectivity : Mysql.

Architecture Diagram

Algorithm

K Means algorithm

k-means clusteringis a method ofvector quantization, originally fromsignal processing, that is popular forcluster analysisindata mining.k-means clustering aims topartitionnobservations intokclusters in which each observation belongs to theclusterwith the nearestmean, serving as aprototypeof the cluster. This results in a partitioning of the data space intoVoronoi cells.

The problem is computationally difficult (NP-hard); however, there are efficientheuristic algorithmsthat are commonly employed and converge quickly to alocal optimum. These are usually similar to theexpectation-maximization algorithmformixturesof Gaussian distributionsvia an iterative refinement approach employed by both algorithms. Additionally, they both use cluster centers to model the data; however,k-means clustering tends to find clusters of comparable spatial extent, while the expectation-maximization mechanism allows clusters to have different shapes.

The algorithm has a loose relationship to thek-nearest neighbor classifier, a popularmachine learningtechnique for classification that is often confused withk-means because of thekin the name. One can apply the 1-nearest neighbor classifier on the cluster centers obtained byk-means to classify new data into the existing clusters. This is known asnearest centroid classifieror Rocchio algorithm.

Conclusion

This paper presents a topic-oriented exploratory search based on a semantic association graph. Although there are some existing exploratory search studies, their main focus is on the implementation modes, evaluation methods, and result analysis of matching searches. They aim to refine user requirement, show several facets of the set of search results to help them find the desired information. When users want to quickly know an unfamiliar field, their help is limited. Different from them, based on semantic associations among topics, the proposed method helps users find information interesting them. We focus on recommending users with keywords and topics closely related as well as containing new knowledge rather than finding desired webpages to a query. Experimental results show that, the proposed method shows users more closely related but disjoint keywords. It is suitable to learn new associations and to discover new knowledge related to a query. Hence, it is more suitable for exploratory search. This work develops a semantic association graph based on hyperlinks on the Internet. It can show users the Internet environment of a browsing topic, which contains topics related to it and associations among them. We may establish a semantic association graph through other ways. For example, it can be done based on user logs. Then, we can recommend users keywords whose related websites are often visited together.

Future Enhancement

We have several interesting research directions. First, topics in this paper are designated based on Google’s artificial directory. By analyzing experimental results, we find that the granularity of topics is big. To provide users more helpful information, we are considering what a suitable topic granularity should be. We may set a topic granularity by means of its context and webpage’s content. Consequently, the set of topics becomes uncertain. Some unsupervised topic discovery algorithms need be used. Second, other methods to generate semantic association graphs should be explored. More advanced methods may be adopted to extract expansion information from a semantic association graph. Furthermore, the “topic-sensitive webpage links” concept seems a good idea to distinguish types of associations among topics, which needs our further work. Finally, we should validate if exploratory search can improve user search experience.