Lesson: CountWords

CountWords1 Program :

Opening files for reading text

try {

BufferedReader in = new BufferedReader(new FileReader(fileName));

: :

}

} catch (Exception e) {}

Splitting the line into words and counting total number of words

String line;

while ((line = in.readLine()) != null) {

String[] items = line.split("[^a-zA-Z]"); // this is a regular expression

count += items.length; // the size of the array is the number of words

Common operations inside a regular expression

[ … ] encloses a group of characters

[A-Z] any upper case character

[^A-Z] anything that is not an upper case character

[a-z&[^aeiou]] any lower case letter that is not a vowel

X? a Greedy quantifier, X once or not at all

X* X zero or more times

X+ X one or more times

XY X followed by Y

X|Y either X or Y

(X) a capturing group that can be referenced later, e.g. “I(.*)you” captures all characters between

the words I and you, can be referenced by matcher.group(1)

Declaring and using a HashSet

HashSet<String> words = new HashSet<String>(); // no duplicates allowed

try {

: :

String[] items = line.split("[^a-zA-Z]");

count += items.length;

for(int index = 0; index < items.length; index++) {

words.add(items[index].toLowerCase().trim());

}

Useful String methods:

trim() – removes whitespace on either end of the string

toLowerCase() – makes all alphabetic characters lower case

Counting the number of unique words

System.out.println("Number of different words: " + words.size());

Commonly used methods for a HashSet

add(key,value) adds the item o to the set

clear() removes all elements from the set

containsKey(key) returns true if the set contains the specified key

containsKey(key) returns true if the set contains the specified key

isEmpty() returns true if the set contains no elements

remove(o) removes o from the collection if it was present

size() returns the number of items in the set