Info-miner

User Manual

Project 2

Info-miner – A Web Search engine

CS 6362 - Software Architecture

Dr. Lawrence Chung

Athrey Joshi

Divya ChanneGowda

Tarun Belagodu

TABLE OF CONTENTS

1.INTRODUCTION

2.SYSTEM INSTALLATION

2.1Info-miner Installation

2.2Applciation Set Up

3.USER INTERACTION

TABLE OF FIGURES

Figure 1 Index page

Figure 2 Add Kwic Page

Figure 3 Add Success Page

Figure 4 Add Kwic error page

Figure 5 Search page

Figure 6 Searh resul page

Figure 7 Search page with Prev Next

Figure 8 Invalid searh error message

Figure 9 Cleanup Success page

1.INTRODUCTION

Info-miner– A growing web search engine

The objective of the project is to architect and implement a web search engine – Info Miner using KWIC system, develop in the first phase of the project.

This document describes the initial set up and software configuration required for the Info-miner to run. The document also describes the typical interactions between the user and the system, i.e., the steps followed by the user in using the system.

2.SYSTEM INSTALLATION

This section describes the initial application setup required for the execution of the Info-miner and installing the Info-miner system.

2.1Info-miner Installation

  • The Application home page can be accessed at
  • Download the Application zip folder.
  • The extracted directory structure is as follows

SCRIPTS

create.sql

Application Root

setEnv.bat // for setting the environment variables like

CLASSPATH, PATH etc …

run.bat // for deploying the web components and the

Enterprise java bean components

kwicEar.ear

*.html

*.jsp

WEB-INF

classes

edu

utd

kwic

Business components// EJB

2.2Application SetUp

  • Run the SetEnv.bat to set up the environment variables

>SetEnv.bat

  • Start the Cloudscape server

>cloudscape –start

  • Run the DBCreate.sql to Create the application Database ‘CloudscapeDB’

cloudscape –isql < create.sql

  • Start the J2EE server

>j2ee –verbose

  • Run the deploy script run.bat to deploy the application

>run.bat

  • Once the application deploy is done run the application by typing the following url in the browser

3.USER INTERACTION

  • Enter the application URL

in the web-browser.

The index page shows the links to add kwic indices, search strings and to clean upoutdated URLs on the left side of the page and the search page with the search box for the user to enter the search string is shown in the right portion of the page as shown in the following screen.

Figure 1 Index page

  • Click on the Add into KWICRepository link, which shows the input boxes for URL and Descriptor as shown in the figure 2.

The URL entered by the user has to follow the following syntax.

URL::=’ | ‘com’ | ‘org’ | ‘net’]

The Descriptor has to be of the following format.

Descriptor ::= Identifier{“ “Identifier}*

Identifier ::= {letter|digit}+

letter ::= [‘a’ | ‘b’ | … | ‘y’ | ‘z’ | ‘A’ | ‘B’ | … | ‘Y’ | ‘Z’]

digit ::= [‘1’ | ‘2’ | … | ‘9’ | ‘0’]

Example:

Valid URL and Descriptor

URL:

Descriptor:Google is one of the world’s popular search engines

Figure 2 Add Kwic Page

  • After the user enters the URL and descriptor according to the syntax and clicks on ADD button. The following page is shown.

Figure 3 Add Success Page

  • If the user enters invalid URL or descriptor the following error page is shown

Figure 4 Add Kwic error page

  • On click if Search button the user is displayed with the following page which has an input box, for the search string to be entered.

Figure 5 Search page

The Search String follows the below syntax.

If the search string has a single word

SearchString: = identifier {“ “identifier}*

identifier ::= {letter|digit}+

letter ::= [‘a’ | ‘b’ | … | ‘y’ | ‘z’ | ‘A’ | ‘B’ | … | ‘Y’ | ‘Z’]

digit ::= [‘1’ | ‘2’ | … | ‘9’ | ‘0’]

else if the search string has multiple words

LOGICAL ‘AND’ : *

LOGICAL ‘OR’ : +

LOGICAL ‘NOT’ : -

NOT should be followed by an indentifier and the identifier prefixed with the ‘-‘ should be placed within the a pair of parenthesis.

For Eg: (-identifier)

Operator : = (*|+)

notOperator := -

SearchString should be a fully parenthesized sentence.

Eg Valid SearchString :

((‘Inter’+Intra)(-pim))

The search string displays all the URLs whose descriptor contains either Inter or Intra and not pim.

After the user enters valid Search String and the no of results in a page to be shown is entered and clicks on Search button

The result for the search doesn’t include the noise words.

Defined List of Noise Words:a, the, of, for, an, is, and, to, at, on, by, but.Moreover, these noise words are case-insensitive.

The Result page with URL (Hyperlink enforced) and the URL descriptor matching the Search String is shown below.

Figure 6 Search result page

If the result records are greater than the ‘no of results in a page to be shown’ entered by the user, the result page shows Next radio button.

  • On selecting the Next button and on click of submit button displays the next set of results with both Next and Previous radio buttons and submit button.

Figure 7 Search page with Prev Next

Selecting Previous or Next radio button and click of submit button shows the Previous or Nextresult pages.

  • If valid search string is not entered displays an error message page as shown below

Figure 8 Invalid input SearchString error message

  • On Click of CleanUp link, old records (1+ days older ) deleted and displays a page with no of records deleted.

Figure 9 Cleanup Success page