Info-miner
User Manual
Project 2
Info-miner – A Web Search engine
CS 6362 - Software Architecture
Dr. Lawrence Chung
Athrey Joshi
Divya ChanneGowda
Tarun Belagodu
TABLE OF CONTENTS
1.INTRODUCTION
2.SYSTEM INSTALLATION
2.1Info-miner Installation
2.2Applciation Set Up
3.USER INTERACTION
TABLE OF FIGURES
Figure 1 Index page
Figure 2 Add Kwic Page
Figure 3 Add Success Page
Figure 4 Add Kwic error page
Figure 5 Search page
Figure 6 Searh resul page
Figure 7 Search page with Prev Next
Figure 8 Invalid searh error message
Figure 9 Cleanup Success page
1.INTRODUCTION
Info-miner– A growing web search engine
The objective of the project is to architect and implement a web search engine – Info Miner using KWIC system, develop in the first phase of the project.
This document describes the initial set up and software configuration required for the Info-miner to run. The document also describes the typical interactions between the user and the system, i.e., the steps followed by the user in using the system.
2.SYSTEM INSTALLATION
This section describes the initial application setup required for the execution of the Info-miner and installing the Info-miner system.
2.1Info-miner Installation
- The Application home page can be accessed at
- Download the Application zip folder.
- The extracted directory structure is as follows
SCRIPTS
create.sql
Application Root
setEnv.bat // for setting the environment variables like
CLASSPATH, PATH etc …
run.bat // for deploying the web components and the
Enterprise java bean components
kwicEar.ear
*.html
*.jsp
WEB-INF
classes
edu
utd
kwic
Business components// EJB
2.2Application SetUp
- Run the SetEnv.bat to set up the environment variables
>SetEnv.bat
- Start the Cloudscape server
>cloudscape –start
- Run the DBCreate.sql to Create the application Database ‘CloudscapeDB’
cloudscape –isql < create.sql
- Start the J2EE server
>j2ee –verbose
- Run the deploy script run.bat to deploy the application
>run.bat
- Once the application deploy is done run the application by typing the following url in the browser
3.USER INTERACTION
- Enter the application URL
in the web-browser.
The index page shows the links to add kwic indices, search strings and to clean upoutdated URLs on the left side of the page and the search page with the search box for the user to enter the search string is shown in the right portion of the page as shown in the following screen.
Figure 1 Index page
- Click on the Add into KWICRepository link, which shows the input boxes for URL and Descriptor as shown in the figure 2.
The URL entered by the user has to follow the following syntax.
URL::=’ | ‘com’ | ‘org’ | ‘net’]
The Descriptor has to be of the following format.
Descriptor ::= Identifier{“ “Identifier}*
Identifier ::= {letter|digit}+
letter ::= [‘a’ | ‘b’ | … | ‘y’ | ‘z’ | ‘A’ | ‘B’ | … | ‘Y’ | ‘Z’]
digit ::= [‘1’ | ‘2’ | … | ‘9’ | ‘0’]
Example:
Valid URL and Descriptor
URL:
Descriptor:Google is one of the world’s popular search engines
Figure 2 Add Kwic Page
- After the user enters the URL and descriptor according to the syntax and clicks on ADD button. The following page is shown.
Figure 3 Add Success Page
- If the user enters invalid URL or descriptor the following error page is shown
Figure 4 Add Kwic error page
- On click if Search button the user is displayed with the following page which has an input box, for the search string to be entered.
Figure 5 Search page
The Search String follows the below syntax.
If the search string has a single word
SearchString: = identifier {“ “identifier}*
identifier ::= {letter|digit}+
letter ::= [‘a’ | ‘b’ | … | ‘y’ | ‘z’ | ‘A’ | ‘B’ | … | ‘Y’ | ‘Z’]
digit ::= [‘1’ | ‘2’ | … | ‘9’ | ‘0’]
else if the search string has multiple words
LOGICAL ‘AND’ : *
LOGICAL ‘OR’ : +
LOGICAL ‘NOT’ : -
NOT should be followed by an indentifier and the identifier prefixed with the ‘-‘ should be placed within the a pair of parenthesis.
For Eg: (-identifier)
Operator : = (*|+)
notOperator := -
SearchString should be a fully parenthesized sentence.
Eg Valid SearchString :
((‘Inter’+Intra)(-pim))
The search string displays all the URLs whose descriptor contains either Inter or Intra and not pim.
After the user enters valid Search String and the no of results in a page to be shown is entered and clicks on Search button
The result for the search doesn’t include the noise words.
Defined List of Noise Words:a, the, of, for, an, is, and, to, at, on, by, but.Moreover, these noise words are case-insensitive.
The Result page with URL (Hyperlink enforced) and the URL descriptor matching the Search String is shown below.
Figure 6 Search result page
If the result records are greater than the ‘no of results in a page to be shown’ entered by the user, the result page shows Next radio button.
- On selecting the Next button and on click of submit button displays the next set of results with both Next and Previous radio buttons and submit button.
Figure 7 Search page with Prev Next
Selecting Previous or Next radio button and click of submit button shows the Previous or Nextresult pages.
- If valid search string is not entered displays an error message page as shown below
Figure 8 Invalid input SearchString error message
- On Click of CleanUp link, old records (1+ days older ) deleted and displays a page with no of records deleted.
Figure 9 Cleanup Success page