We Step Through a Full Parse of the Sentence Kim Adores Snow in Oslo, Using Both a Basic

Section, October 17, 2003

We step through a full parse of the sentence Kim adores snow in Oslo, using both a basic top down parser (with certain filters), and using the Earley algorithm.

We’ll use this grammar:

S  NP VP / NP  Kim
VP  V NP / NP  snow
VP  V / NP  Oslo
VP  VP PP / V  adores
PP  P NP / V  snores
NP  NP PP / P  in

Here is the trace using the basic top down algorithm. The first column is a unique number for each state. An asterisk (*) next to the number indicates that although, according to the algorithm, the state could be expanded, I have not expanded it. Each asterisk is explained below. The second column is the state from which that state was derived. The third column is the tree being considered in that state. I’ve put the leftmost unexpanded node of the tree in bold. The fourth column is the next word to consider in the sentence.

Note that the algorithm as stated on page 366 doesn’t like the above grammar. In the grammar, NP a phrasal node, i.e. it can be expanded as NP PP. It is also a preterminal, i.e. it can be expanded as Kim, snow, or Oslo. The algorithm expects each category to be either phrasal or preterminal, and expands the node differently depending on which type it is. The trace below assumes a slightly different algorithm where we simply expand an NP both ways. For instance, state 2 is expanded as both state 3 and state 4.

1 / S / Kim
2 / 1 / [S (NP VP)] / Kim
*3 / 2 / [S ([NP (NP VP)] VP)] / Kim
4 / 2 / [S ([NP (Kim)] VP)] / adores
5 / 4 / [S ([NP (Kim)] [VP (V NP)])] / adores
6 / 4 / [S ([NP (Kim)] [VP (V)])] / adores
7 / 4 / [S ([NP (Kim)] [VP (VP PP)])] / adores
8 / 5 / [S ([NP (Kim)] [VP ([V (adores)] NP)])] / snow
9 / 6 / [S ([NP (Kim)] [VP ([V (adores)])])] / snow
10 / 7 / [S ([NP (Kim)] [VP ([VP (V NP)] PP)])] / adores
11 / 7 / [S ([NP (Kim)] [VP ([VP (V)] PP)])] / adores
*12 / 7 / [S ([NP (Kim)] [VP ([VP (VP PP)] PP)])] / adores
13 / 8 / [S ([NP (Kim)] [VP ([V (adores)] [NP (NP PP)])])] / snow
14 / 8 / [S ([NP (Kim)] [VP ([V (adores)] [NP (snow)])])] / in
15 / 10 / [S ([NP (Kim)] [VP ([VP ([V (adores)] NP)] PP)])] / snow
*16 / 11 / [S ([NP (Kim)] [VP ([VP ([V (adores)])] PP)])] / snow
17 / 13 / [S ([NP (Kim)] [VP ([V (adores)] [NP ([NP (snow)] PP)])])] / in
18 / 15 / [S ([NP (Kim)] [VP ([VP ([V (adores)] [NP (snow)])] PP)])] / in
19 / 17 / [S ([NP (Kim)] [VP ([V (adores)] [NP ([NP (snow)] [PP (P NP)])])])] / in
20 / 18 / [S ([NP (Kim)] [VP ([VP ([V (adores)] [NP (snow)])] [PP (P NP)])])] / in
21 / 19 / [S ([NP (Kim)] [VP ([V (adores)] [NP ([NP (snow)] [PP ([P (in)] NP)])])])] / Oslo
22 / 20 / [S ([NP (Kim)] [VP ([VP ([V (adores)] [NP (snow)])] [PP ([P (in)] NP)])])] / Oslo
23 / 21 / [S ([NP (Kim)] [VP ([V (adores)] [NP ([NP (snow)] [PP ([P (in)] [NP (Oslo)])])])])] / ε
24 / 22 / [S ([NP (Kim)] [VP ([VP ([V (adores)] [NP (snow)])] [PP ([P (in)] [NP (Oslo)])])])] / ε

State 3 is the first step toward left recursion. That is, it’s the result of expanding NP  NP PP. I did not expand state 3 by applying this rule again, because following that route would have led us to an infinite number of states. Note that there is no principled reason why I stopped there. I just waved my magic wand and made it stop.

Similarly, state 12 could have been expanded by another application of VP  VP PP, the same rule which brought state 7 to state 12. I nipped that in the bud, again, for no good reason other than I already knew that it wouldn’t lead to a successful parse, but would lead eventually to an infinite number of states.

I had a good reason, on the other hand, for not expanding state 16. The node to expand in state 16 is a PP, and state 16 is looking at the word snow in the input. Since snow is only an NP, and it can be shown that no expansion of a PP can begin with NP, we can be sure that expanding state 16 will never lead to a successful parse.

In addition to the problem of left recursion, this trace illustrates the inefficient parsing of subtrees. Note that “in Oslo” gets independently analyzed as a PP twice: once in the sequence of states {17, 19, 21, 23} and once in the sequence of states {18, 20, 22, 24}.

Next comes a trace of processing the same sentence, with the same grammar, using the Earley algorithm. I’ll repeat the sentence here, including the numbers of the positions in the string:

0Kim1adores2snow3in4Oslo5

As the algorithm requires, the chart is divided into 6 sections, one for each position in the string. The first column is a number identifying the state. The number reflects the order in which the state is created. Note that state although state 13, in the fourth section, is created before states 14, 15 and 16, which are in the third section. Again, an asterisk indicates a state that could, in principle, have been expanded, but which was not.

The second column indicates the state which was expanded to get to this state. If the state is the result of a “completer” rule, the second column will also contain a parenthesized number indicating which rule was advanced. For instance, state 5 is the result of using state 4 (a complete NP) to advance state 2.

The third column indicates the state itself: which rule is being considered, how much that rule has accounted for, and how much it has left to go.

The fourth column indicates the start position and end position of the substring that the state accounts for.

The fifth column is used for states resulting from a “completer” rule. It contains the numbers of the states that represent the constituents of the state. For instance, state 14 is a completed S  NP VP state. Its fifth column contains the states for its completed NP and completed VP constituents. It is information in this column that is used to reconstruct completed parses from the chart that this algorithm returns.

As with the basic top down algorithm, the algorithm on page 381 has been modified slightly to accommodate our grammar. Rather than expanding NP only as a phrasal category or only as a preterminal, we expand it as either one.

1 / start / γ  • S / [0, 0]
2 / 1 / S  • NP VP / [0, 0]
3 / 2 / NP  • NP PP / [0, 0]
4 / 2 / NP  Kim • / [0, 1]
5 / 4 (2) / S  NP • VP / [0, 1] / [4]
6 / 5 / VP  • V NP / [1, 1]
7 / 5 / VP  • V / [1, 1]
8 / 5 / VP  • VP PP / [1, 1]
9 / 6 / V  adores • / [1, 2]
10 / 9 (6) / VP  V • NP / [1, 2] / [9]
11 / 9 (7) / VP  V • / [1, 2] / [9]
12 / 10 / NP  • NP PP / [2, 2]
14 / 11 (5) / S  NP VP • / [0, 2] / [4, 11]
*15 / 11 (8) / VP  VP • PP / [1, 2] / [11]
16 / 14 (1) / γ  S • / [0, 2] / [14]
13 / 10 / NP  snow • / [2, 3]
17 / 13 (10) / VP  V NP • / [1, 3] / [9, 13]
18 / 13 (12) / NP  NP • PP / [2, 3] / [13]
19 / 17 (5) / S  NP VP • / [0, 3] / [4, 17]
20 / 17 (8) / VP  VP • PP / [1, 3] / [17]
21 / 18 / PP  • P NP / [3, 3]
22 / 19 (1) / γ  S • / [0, 3] / [19]
23 / 21 / P  in • / [3, 4]
24 / 23 (21) / PP  P • NP / [3, 4] / [23]
25 / 24 / NP  • NP PP / [4, 4]
26 / 24 / NP  Oslo • / [4, 5]
27 / 26 (24) / PP  P NP • / [3, 5] / [23, 26]
28 / 26 (25) / NP  NP • PP / [4, 5] / [26]
29 / 27 (18) / NP  NP PP • / [2, 5] / [13, 27]
30 / 27 (20) / VP  VP PP • / [1, 5] / [17, 27]
31 / 28 / PP  • P NP / [5, 5]
32 / 29 (10) / VP  V NP • / [1, 5] / [9, 29]
33 / 29 (12) / NP  NP • PP / [2, 5] / [29]
34 / {30 (5), 32(5)} / S  NP VP • / [0, 5] / {[4, 30], [4, 32]}
35 / 30 (8) / VP  VP • PP / [1, 5] / [30]
36 / 34 (1) / γ  S • / [0, 5] / [34]

Note that although state 3 (NP  • NP PP) is expanded to state 4 (NP  Kim •), it is not expanded into a state such as, say, (NP  Oslo •). This is because state 2 is situated at position 0 of the string which is looking at the word Kim. We can only apply the “scanner” rule if the current input matches the category that is being expanded.

Note also that the “predictor” rule does not expand state 3 into another state (NP•NPPP). This is because that state already exists in the chart (as state 3 itself). It is in this way that the Earley algorithm avoids the problem of left-recursion that we had in state 3 of the basic top down trace.

Similarly, the node to be expanded in state 8 is a VP, but no new states were created by expanding that state. That’s because all the resulting VP rule states already exist in the chart. They are states 6, 7 and 8.

We didn’t expand state 15 here for the same reason we didn’t expand state 16 in the basic algorithm. The next word in the input associated with that state is snow, which can never be the beginning of a PP. Since the next node to be expanded in state 15 is a PP, it can never lead to a successful parse.

Finally, note state 34. This state was created originally by using the completed VP in state 30 to advance the S rule in state 5 to (S  NP VP •). As such, it had the constituents [4, 30] in its fifth column. But state 32 is also a completed VP, and can be used to advance the S rule in state 5 to state (S  NP VP •). But since this state already exists as state 34, no new state was created. However, when state 32 is used to complete state 5, we get a different constituent structure from the one we get when state 30 is used. So even though we don’t add a new state to the chart, we do add this alternative constituent structure to column five of state 34.

Now that we have a completed chart, we can recover the actual parses by inspecting the constituents in column 5 of the completed states. The right side of the rule in the state tells us what the daughters of the local subtree are, and the fifth column tells us how to expand those daughters.

So we start with state 36, the state where the γ rule is completed, and spans the entire input. The right side of this rule is an S. So we start out with this tree:

Node 34 says that the S is expanded as NP VP. The fifth column says that the NP is node 4, and that the VP can be either node 30 or node 32. So we get a tree something like this:

The NP in state 4 is expanded as Kim. The VP in state 30 is expanded as the VP in state 17 followed by the PP in state 27. The VP in state 32 is expanded as the V in state 9 and the NP in state 29. All told, the reconstructed tree looks like this: