Chapter 5 Rule E2-2 and the Horspool Algorithm

Chapter 5 Rule E2-2 and the Horspool Algorithm

In Chapter 4, we introduced Rule E2 which is a substring matching rule. Given a substring in the window of the text string , we must try to find if there exists a substring which is identical to to the left of in the pattern string . In Chapter 4, we also introduced a variant of Rule E2, namely Rule E2-1 in which the substring is the longest suffix of which is equal to a prefix of . In this chapter, we shall introduce another variant of Rule E2.

Section 5.1 Rule E2-2: The 1-Suffix Rule

Consider Fig. 5.1-1. Note that the last character of is . If we have to move the pattern , we must align the particular in , if it exists, to align it with the in as shown in Fig. 5.1-1(b). If no such an exists in , we move as shown in Fig. 5.1-1(c).

Fig. 5.1-1 The Basic Idea of Rule E2-2

The following is a formal statement of Rule E2-2.

Rule E2-2: We are given a text string , a pattern string , and a window of , which is aligned with . Assume that the last character of is and we have to shift . If exists in , let be the location of the rightmost in , shift to such an extent that is aligned with . If no exists in , shift to such an extent that is aligned with .

Section 5.2 The Horspool Algorithm

The Horspool Algorithm starts with scanning from the right as shown in Fig. 5.2-1. For each pair of characters in the text and pattern, we compare them until we find a mismatch. After we find a mismatch, we know that we should shift the pattern to the right. The shifting is based upon Rule E2-2.

Fig. 5.2-1 The right to left scanning in the Horspool Algorithm

To implement Rule E2-2, we must have a mechanism to find for a given . This is not done in run-time. Instead, we do a pre-processing.

Definition 5.2-1 The Location Table for the Horspool Algorithm

Given an alphabet set and pattern with length , we create a table, denoted as location table of , containing entries. Each entry stores the location of the rightmost , in counted from location , if it exists. If does not exist in , store in the entry.

Example 5.2-1

Let . The location table of is displayed as follows:

Table 5.2-1 The location table of

a / c / g / t
1 / 9 / 3 / 4

Each time when we have to move the pattern , we consult the location table of .

For instance, consider the case shown in Fig. 5.2-1.

T / = / A / c / c / g / a / g / g / t / t / g / a / a / t / t / g / c
P / = / A / g / g / t / t / g / a / a / t

Fig. 5.2-1 An example for the Horspool Algorithm

As can be seen, we have to move . The last character of the window is . There are two ’s in . Both ’s are bold faced in Fig. 5.2-1. We consult Table 5.2-1. The entry of is 4. We therefore move 4 steps to the right as shown in Fig. 5.2-2. A match is now found.

T / = / A / c / c / g / a / g / g / t / t / g / a / a / t / t / g / c
P / = / a / g / g / t / t / g / a / a / t

Fig. 5.2-2 The moving of in the Horspool Algorithm

The Horspool Algorithm is very similar to the Reverse Factor Algorithm. It is now given as Algorithm 5.1 below:

Algorithm 5.1 The Horspool Algorithm Based upon Rule 1-2

Input: A text string and a pattern string with lengths and respectively

Output: All occurrences of in .

Construct the location table of

Set ,

Step 1: Let be a window.

Align with .

Set and .

If , exit.

While and

End of While

If , report that is an exact match of .

Find the entry of in the location table of . Let it be denoted as .

Go to Step 1.

Example 5.2-2

Let and . The location table is as shown in Table 5.2-2.

Table 5.2-2 The location table for

a / c / g / t
2 / 3 / 3 / 1

The Horspool Algorithm is initialized as shown in Fig. 5.2-3.