CSCI 321 Assign 01: Hashing

Please save your project as YourLastNameAssign01(filling in YourLastName for your actual last name). Hand in a zipped folder of the entire Eclipse projectto Blackboard by the due date.No late assignments will be accepted.Any online resources used extensively should be cited in your code!

You have been instructed to implement a program to store data via a custom hashing solution. Your solution must use bucket hashing and use two different hash functions. When an item is to be placed into the hashtable, you must calculate the hashcode for each of the two algorithms. The new item should be inserted into the bucket that has the fewest entries. If they have the same amount of items, use hash function 1. If both buckets are full, you will use the overflow bucket.

The hash functions you will use are very simple and not meant to be reflective of a realistic, secure hash function. This is just meant to be an exercise in understanding the process of hashing  The descriptions for each hash function are as follows:

Hash Function 1: Take the length of the key (the word to be defined), multiply it by 17 and mod it by the number of buckets.
Hash Function 2: Take the Unicode value of the first character in the word and mod it by the number of buckets.

For your implementation, the key-value pair you are storing in the hash table is a word with its definition.Sample input files are given on Blackboard. Each input file follows the format of word semicolon space definition. You must read from the input file, and load each of the key-value pairs into the hash table. Note: the semicolon and space should not be stored in the hashtable.

Required Project Organization

  • HashEntry class – stores the key/value pair.
  • Make sure to include a toString method that will print out the key/value pair in the format:
  • key => value
  • Hashtable class – implements the hashing process as outlined above
  • should have a constructor which takes in two parameters: the size of the table and the number of items in each bucket.
  • additional attributes include: an array of HashEntry items (your hashtable), and an array (could use arraylist here) of HashEntry items to store the overflow items.
  • a put method which takes two Strings (the word & the definition) to insert an item into the table. Your put method will use that info to create a HashEntry object to store in the table.
  • a get methodwhich takes the word as a parameter and returns the appropriate definition.
  • A method called bucketContains which takes in a key (the word) and a bucket number, and returns whether that word exists in that bucket.Use a parameter of -1 to specify overflow bucket.
  • A method called getBucketSize which takes in the bucket number and returns how many items are in this bucket.Use a parameter of -1 to specify overflow bucket.
  • Note: you can also make use of this in your put method.
  • A toString method that prints out the bucket number and each of the items in the bucket (see formatting output for example of this)
  • HashDriver
  • Given to you on Blackboard. You may add lines to test your program accordingly, but when handed in, you should include the original driver class.Please note, your classes must work with the original driver.

Examples on how your program should work and how to test your program are given on the following pages.

Driver File and Command Line Arguments

I have included a Driver class on Blackboard you must use. Notice it makes use of command-line arguments for the filename, the table size, and the size of the buckets. To use command-line argument in Eclipse:

Go to RunRun Configurations

Choose the Arguments Tab and Enter the arguments:

Sample Walkthrough for in1.txt

in1.txt is the smallest of the input files and will be the quickest and easiest way to start testing your project. It contains only 20 words and definitions.Testing this input file using a tablesize of 20 and bucket size of 5 (therefore having four buckets), would proceed as follows:

adonis

hash1: (length * 17) % NO_OF_BUCKETS = (6*17) % 4 = 2

hash2: Unicode value of “a” % NO_OF_BUCKETS = 97 % 4 = 1

Nothing in either bucket, use hash function 1, place in the first index of bucket 2 (index 10 in the array)

aesculapius

hash1: (11 * 17) % 4 = 3

hash2: 97 % 4 = 1

Nothing in either bucket, use hash function 1, place in the first index of bucket 3 (index 15 in the array)

alumnae

hash1: (7 * 17) % 4 = 3

hash2: 97 % 4 = 1

Bucket 3 has 1 item, Bucket 1 has none. Use hash2 and place in first index of bucket 1 (index 5 in the array).

flattered
hash1: (9 * 17) % 4 = 1

hash2: 102 % 4 = 2

bucket 1 and bucket2 are tied, use hash1 and place in 2nd index of bucket 1 (index 6 in the array)

Continue processing as outlined above, and your hashtable will end up like:

Sample Output and Minimum Tests

Your program must meet the minimum tests outlined below. If your program fails the minimum tests specified, you will receive a best grade of 50%.

Test 1:

After reading the records from “in1.txt” and placing all entries into yourhashtable, the following code:

System.out.println(t); //t is the name of your hashtable

should provide the following output:

------Bucket 0:------

haitian => See Haytian.

haziness => The quality or state of being hazy.

lacrytory => Alt. of Lacrymose

libra => A southern constellation between Virgo and Scorpio.

verdured => Covered with verdure.

------Bucket 1:------

alumnae => of Alumna

flattered => of Flatter

galliform => Like the Gallinae (or Galliformes) in structure.

leptynite => See Granulite.

rufol => A phenol derivative of anthracene obtained as a white crystalline substance, which on oxidation produces a red dyestuff related to anthraquinone.

------Bucket 2:------

adonis => A youth beloved by Venus for his beauty. He was killed in the chase by a wild boar.

gallicized => of Gallicize

heptateuch => The first seven books of the Testament.

latian => Belonging, or relating, to Latium, a country of ancient Italy. See Latin.

------Bucket 3:------

aesculapius => The god of medicine. Hence, a physician.

frounceless => Without frounces.

hypognatous => Having the maxilla, or lower jaw, longer than the upper, as in the skimmer.

kalendarial => See Calendarial.

------Overflow ------

udder => One of the breasts of a woman.

upbraided => of Upbraid

Test 2:

The following method calls should produce the results given:
t.getBucketSize(0) = 5
t.getBucketSize(3) = 4
t.getBucketSize(-1) = 2
t.bucketContains("lacrytory", 0) = true
t.bucketContains("latian", 3) = false
t.bucketContains("udder", -1) = true

Test Case 3:

Using in2.txt, a table size of 500, and bucket size of 10, the following methods should return the results given below.

t.getBucketSize(22) = 4

t.getBucketSize(38) = 8

t.getBucketSize(-1) = 274

t.get("acquirable") should return "Capable of being acquired."

t.bucketContains("halved", 27) = false

t.bucketContains("latian", 3) = false

t.bucketContains("aluminated", 47) = true;