Assignment5: CMPT 101/104 SimonFraserUniversity, School of Computing Science
Due by April 5, 2004
Please refer to the assignment web page for details of assignment preparation.
You are to develop a system which counts the number of words and the number of sentences in a file containing text. Your system should also count the number of occurrences in the text of the following list of words:
- The
- hope
- years
- was
- in
For each of the above words your system should build a Vector of objects, one object for each occurrence of the word in the text. These Vectors will be called indices. Each of the objectsto be placed in one of these Vectors (IndexInformation objects) should contain a line number (the line number in the file of text) and a word number (the word number of the word in the line).The line number and word number counters should start at 1 for the first line in your file and for the first word in each line respectively.
Your system should
- be able to process multiple files of text without rerunning your Java application
- use a dialog box to ask the user the name of the file containing the text to be analyzed.
- process the file one token at a time producing a count of the number of words and the number of sentences,
- When processing the file build one index for each of the above list of words. Remember an index is a Vector of IndexInformation objects which record the line number and word number of each occurrence of the word.
- The resulting counts and indices should be printed to the standard output (the console screen) using the toString methods in the developed classes
- When the text in the file has been analyzed and the results of the analysis printed your system should request the next file to be analyzed.
- Entering nothing in the text box requesting the file name and pressing OK should terminate the program.
Your system will be built of the following classes,
- Interface class TokenWatcher
- Classes TokenCounter, SentenceCounter and TokenIndex inherited from TokenWatcher
- Class TokenCounterNoCase inherited from TokenCounter
- Class IndexInformation records line number and word number.
- Class TokenCounterUser which contains the main method to use the other classes
Please note that the code for each of these classes should be short (not exceeding 50-80 lines before comments, and in some cases much shorter). The TokenWatcherUser class may be slightly longer. If your code is significantly longer you are probably missing something and might benefit from talking to a TA or the instructor. The details of the function of each of these classes are given below in the section on deliverables.
Your deliverables must include the following
- Interface Class, TokenWatcher, with methods boolen isSame( String token), void reset(), and String toString()
- public Class TokenCounter inherited from TokenWatcher. Class TokenCounter implements each of the methods in TokenWatcher. Token Counter must include the following:
- Two variable elements, the number of tokens and the String being counted
- A default constructor used to count all tokens
- A constructor which takes a String containing a particular word to count.
- An isSameMethod that should compare each token from the input to the String and count the occurrences of the String.
- A toString method that constructs an output string containing
- the label “All words “ followed by the number of words in the file if all words are being counted.
- The label “sensitive to case “ followed by the String being counted and the number of times that String occurred in the file
- A reset() method that should reset the TokenCounter to begin counting on a new file
- public Class TokenCounterNoCase inherited from TokenCounter. Class TokenCounterNoCase must provide the following variables, methods and constructors to the external user of the class
- A default constructor used to count all tokens
- A constructor which takes a String containing a particular word to count.
- An isSameMethod to compare tokens to the String and count
- the specific word ignoring the case of all letters in the String and the token from the text
- all words in the file
- A toString method that constructs an output string containing
- the label “All words “ followed by the number of words in the file (if all words are being counted).
- The label “insensitive to case “ followed by the token being counted and the number of times that token occurred in the file
- A reset() method to reset the TokenCounterNoCase to begin counting on a new file
- public Class SentenceCounter inherited from TokenCounter.. Class Sentence Counter assumes that a sentence ends with a period, a question mark, or an exclamation point.. SentenceCounter must include the following:
- A default constructor used to count all sentences
- An isSameMethod that should check the end of each token looking for the period, question mark or exclamation point that indicates the end of the sentence.
- A toString method that constructs an output string containing the label “Number of sentences “ followed by the number of sentences in the file.
- public Class IndexInformation. IndexInformation must include the following:
- Two variable elements, the line number where a token occurred, and the word number in that line (word number and line number should start a 1 for line 1 and word 1 respectively.
- A default constructor used to construct a IndexInformation object with line number and word number both equal to 0,
- A constructor used to construct an IndexInformation object with a given line number and word number
- A copy constructor which constructs an IndexInformation object with variable elements set to the same values as an existing IndexInformation object.
- nextLine() and nextWord() methodswhich keep the values of line number and word number for the token presently being processed
- A toString method that should print
- the label “line # “ followed by the line number of the occurrence of the token and continuing on the same line
- The label “wordnumber “ followed by the value of the wordnumber.
- IndexInformation objects should print one per line.
- A reset() method that resets the IndexInformation objectfor use on a new file
- public Class TokenIndex inherited from TokenWatcher. Class TokenIndex implements each of the methods in TokenWatcher. TokenIndex must include the following:
- Three variable elements, a String containing the token being indexed, an IndexInformation object, and a Vector of IndexInformation objects.
- A constructor which takes a String containing a particular word to index and an IndexInformation object. The constructer should build a Vector of IndexInformation Objects (not and array)..
- An isSameMethod method that should compare each token from the input to the TokenIndex’sString. When a match is found a copy of the TokenIndex’s IndexInformation object should be placed in the Vector.
- A toString method that should construct an output string fromthe Vector of IndexInformation objects. When this string is printed the IndexInformation objects should print one per line
- A reset() method that resets the TokenIndex to begin indexing a new file
- public Class TokenWatcherUser will contain a main method. This main method will use the above classes to generate the output requested in the statement of the problem .