CSCI325: TreeMaps, Tree Sets

Problem description

Many compilers used to offer the services of a cross-referencing program to aid in debugging. Such cross-referencers will list in alphabetical order all of the identifiers that appear in a program and the various lines of the program that reference them. Write a cross-referencer for Java programs.

Input

Input may be any Java program (a file of type text, but with a 'java' file extension). Allow the user to input the file name

Processing

Read the Java program, picking off strings of characters that begin with a letter (identifiers or reserved words). For the purposes of this assignment, an identifier will be defined as any sequence of characters that begins with a letter of the alphabet, and is followed by any combination of letters or digits. All such strings will be either reserved words, pre-declared identifiers, or programmer-declared identifiers. Keep a table of reserved words (see below). A set may be appropriate here. Every time a word (they are usually called "tokens") is encountered, check this table to see if the word is in it. If so, do nothing. If it is NOT in the table, it must be an identifier. All identifiers will be placed into a TreeMap. A TreeMap is a pair of objects: a key and a value. The key in this case should be the identifier (a String), and the value should be a TreeSet of Integer (TreeSet<Integer>). If a word is not in the table set of reserved words, check to see if it is in the TreeMap. If it is not in the TreeMap, make a new entry in the tree for it and create a new TreeSet of Integers that has a single integer in it: the line number of the current line. If the word is in the TreeMap, get the TreeSet of Integers that is associated with it and add the current line number to the TreeSet.

Data Structures

Storing the reserved words: TreeSet of String

Create a TreeSet of Strings (TreeSet<String>) that holds the reserved words. Read each word from the data file, and add it to the TreeSet.

Picking off words: Regular expressions

Use regular expressions to pick off the words in the file. If you use the pattern "[a-zA-Z][a-zA-Z0-9]*", it will pick off all sequences of characters that begin with a letter and are followed by any number of letters and/or digits. This makes parsing the input file very easy.Note that there is one problem with this method: it also picks off words that are in comments and words that are part of a string literal.

There are probably multiple ways to pick off the tokens. One way involves the use of the Pattern and Matcher classes.The following lines show how to create a Pattern and a Matcher.

Pattern pattern = Pattern.compile("[a-zA-Z][a-zA-Z0-9]*");

Matcher matcher = pattern.matcher(input.nextLine());

You will also want to use the following methods of the Matcher class:

  1. matcher.find(): returns true if it finds another token in the current line. When it returns false, it is time to read another line from the input file.
  2. matcher.group(): returns the next token in the current line.
Storing the other words: TreeMap of <String, TreeSet<Integer>

Create a TreeMap where the key in each pair is the word, and the value is a TreeSet of Integers where each integer represents a line number that the word occurred on.

Output

Print a list of all of the identifiers in alphabetical order, followed by a list of line numbers on which that identifier was referenced by the Java program. Print one identifier and its line numbers per line.

Assumptions

If an identifier appears on the same line more than once (e.g. x = x + 1), do NOT print the line number more than once. You may assume that all comments have been stripped from the program.

Hints

  • Your data structures are TreeMaps and TreeSets.
  • To find out if a key is in a TreeMap, use the containsKey method.
  • To retrieve a value from a TreeMap, use the get method.
  • To add a new element to a TreeMap, use the add method.
  • To add a new element to a TreeSet, use the add method.
  • To find out if an element is in a TreeSet, use the contains method.

Java reserved words

abstract, assert, boolean, break, byte, case, catch, char, class, const, continue, default, do, double, else, enum, extends, final, finally, float, for, goto, if, implements, import, instanceof, int, interface, long, native, new, package, private, protected, public, return, short, static, strictfp, super, switch, synchronized, this, throw, throws, transient, try, void, volatile, while

1Prog21--CH22--TreeMap Program Cross Referencer.docx11/17/2018