Name: Anthony Ku Ong [ 991040119]

TA: Diego Macrini

CSC228H – File Structures and Data Management - 2002

Assignment Two - Martian Explorer (Part A)

Summary of Classes and Data Structures:

For this assignment, I plan to have several classes. First there needs to be classes, PrimaryIndex , and SecondaryIndex which will build the Indexes. Also I plan to have a BinarySearchTree class to sort the secondary indexes, and a LinkedList class to create the inverted lists. I also will need a TestDriver Class which will consist of a main method to accept command line arguments and create the Primary and Secondary structures to execute the program.

Design Decisions

Primary Index

The structure of the primary index will compose of two things: It will have a (a)primary key and a (b)reference field giving the address of the byte of the corresponding data record.

(a)The Primary Key to be used:

Since the primary index uniquely identifies one record from another, thus a good primary index would be the two characters for the location combined with the compound. Denote this key as the Location Compound. The canonical form of the Location Compound will consist of the form of the location field followed immediately by the terrain type at this location. Ie. f7crevasse

(b) The Reference Field will be a pointer to the appropriate record in the input file. I will be using a Relative Record Number (RRN) which will be a field represented by a number as an index to the location of the record on file.

The class PrimaryIndex contains two fields, the Primary Key as mentioned above and a RRN pointer. The Key must be a character string, and it must also contain the Relative Record Number field as a pointer.

class PrimaryIndex

{

public:

char Key [26];

RRN ptr;

int ReadRRN (&Record, int RRN);

int numRecords; //Number of records in primary index

};

To Create a PrimaryIndex

void ReadbyRRN (RecordTo Find, int RRN){

Seek current RRN index * 26 characters from current position

}

void Build()

LOOP: Read the input file until end of file{

Read the Input and call ReadbyRRN() for next Record to write.

Associate the Relative Record Number pointer with this index

Write current record information to primary_index.dat

}

Searching for the a given key in the Primary index with the BinarySearch() function.

int BinarySearch(InputFile, RecordToFind, PrimaryKey){

int low =0;

int high = InputFile.NumRecs( ) – 1;

LOOP: (low <= high){

int guess = (high + low)/2;

file.ReadbyRRN(RecordToFind, guess);

if (RecordToFind.key () == PrimaryKey) return 1;

if (RecordToFind.key() > PrimaryKey ) high = guess – 1;

if (RecordToFind.key() < Primarykey) low = guess +1;

}

return 0; //Didn’t find key.

}

Secondary Indexes

We want to be able to create multiple indices. We have both a substance secondary index, and a location secondary index. Both secondary indexes will store and sort information using the structure of a Binary Search Tree.

1. Substance Index – has a secondary key to the substance name and a pointer to a Linked List of primary keys.

2. Soil Type/Location Index. – has a secondary key to the soil type, and a pointer to a Linked List of primary keys

Each secondary index will have a binary search tree structure associated with it. Each secondary key (substance name & soil type) will be the Nodes of the BST. And each Node will be the head of a linked list which will be our Inverted List structure which will hold pointers to the Primary Keys.

class BinarySearchTree { BSTNode parent, BSTNode right, BSTNode left };

Functions: The BinarySearchTree should contain a function insertKey() and BuildBST() which will build BST structure by inserting the next secondary key where all the left children is smaller than the parent, and the right children is greater than the parent.

class LinkedList { head, link, primaryKeyData ptr };

Functions: The LinkedList structure should contain a constructor to create the inverted list and a function append() which will append primary key pointers to this linked list structure to follow back to the primary index.

Secondary Index Class for both Indexes.

class SecondaryIndex

{

public:

char SecondaryKey [ ];

BinarySearchTree to sort the secondary keys

LinkedList Structure for Each Node in the BST

};

Creation of Substance Index/ Location Index

Seek to the 3rd position to get the first compound substance index.

(or seek to the 13th position to get the first location index)

LOOP: Read the input file until end of file{

Sort the current substance field(or location field) by placing the record in a sorted BST.

If there’s a duplicate entry, the BST Node which is the head

of an Inverted List will create a new link and store its corresponding information regarding its primary key record.

Write the record to substance_index.dat (or location_index.dat)

Seek index to next 26 characters from current position to get the next record

}

Creation of Primary Index

Index class.

Since there are three indexes, a primary index, and two secondary indexes, we need to create an Index class to create these indexes.

To create a primary index, we will do this:

// Open the index file.

// Parameter: i is the index file name.

Constructor:

Index(char *indexFileName) {

index.open(indexFileName, ios::in |ios::out);

}

Searching for compound X

(1)  Goes to the secondary index - compound

(2)  Finds compound X in the secondary index which is stored in a BST

(3)  in that node where compound X is stored, it will contain a linked list of the primary keys

associated with it, and each primary key will contain the byte offset so we know where it is on file.

Classes to be Used

Data Structures to be Used:

Linked List, Binary Search Tree.

Pseudo-code description of algorithms.

Algorithms

for creating primary and secondary key files

for searching by key word

for compound searching

//Insert and write index to file

if

For each of the index files, the records will be organized like a binary search tree.

The functions in the class are used to search for, maintain and update information in the index file.

The constructor opens a given index file for both reading and writing.

The function searchWord() searches for a given word in the index using binary search.

Each node will have two pointers: a left and right pointer to potential children.

The function insert() inserts an index record to the index file.

QAgain this will make a good primar key as it should provide a unique key for each entry in the file.

the primary index should store the type of terrian since for every locationis, there is only one type of terrain.

Seconary Index

1. Compound Index – has a secondary key to the substance name and a pointer to a list of pirmary keys.

2.Soil Type index. – has a secondary key to the soil type, and a pointer to a list of primary keys

Queries

// To instantiate this index the user codes:

// Index<ExampleIndex> Example;

// This creates Example as an Index that supports Index Records of the

// ExampleIndex type.

// Note that we are using a Singly Linked List to support our Index in memory