Lesson: CountWords
CountWords1 Program :
Opening files for reading text
try {
BufferedReader in = new BufferedReader(new FileReader(fileName));
: :
}
} catch (Exception e) {}
Splitting the line into words and counting total number of words
String line;
while ((line = in.readLine()) != null) {
String[] items = line.split("[^a-zA-Z]"); // this is a regular expression
count += items.length; // the size of the array is the number of words
Common operations inside a regular expression
[ … ] encloses a group of characters
[A-Z] any upper case character
[^A-Z] anything that is not an upper case character
[a-z&[^aeiou]] any lower case letter that is not a vowel
X? a Greedy quantifier, X once or not at all
X* X zero or more times
X+ X one or more times
XY X followed by Y
X|Y either X or Y
(X) a capturing group that can be referenced later, e.g. “I(.*)you” captures all characters between
the words I and you, can be referenced by matcher.group(1)
Declaring and using a HashSet
HashSet<String> words = new HashSet<String>(); // no duplicates allowed
try {
: :
String[] items = line.split("[^a-zA-Z]");
count += items.length;
for(int index = 0; index < items.length; index++) {
words.add(items[index].toLowerCase().trim());
}
Useful String methods:
trim() – removes whitespace on either end of the string
toLowerCase() – makes all alphabetic characters lower case
Counting the number of unique words
System.out.println("Number of different words: " + words.size());
Commonly used methods for a HashSet
add(key,value) adds the item o to the set
clear() removes all elements from the set
containsKey(key) returns true if the set contains the specified key
containsKey(key) returns true if the set contains the specified key
isEmpty() returns true if the set contains no elements
remove(o) removes o from the collection if it was present
size() returns the number of items in the set