PART 6. SEMANTIC PARSING OF THE VERB CLUSTER IN DUTCH AND GERMAN

Bach et. al. 1986 performed a cross-linguistic experiment.

The idea of the experiment was the following: Dutch and German are similar enough to be able to compare the speed of processing of the Dutch verb cluster by native speakers of Dutch with that of the German verb cluster by native speakers of German.

What Bach et. al. measured for Dutch and for German was the following.

Let's use the numbers 0,1,2,3,... for 0 embeddings, 1 embedding, 2 embeddings,... as indicated below:

German:

0 Jan weiβt daβ wir --- das Haus haben wollen malen ---

1 Hans lassen

2Hans Peter helfen lassen

3Hans Peter Marie sehen helfen lassen

NP1 NP2 NP3 V3 V2 V1

Center embedded dependencies.

Dutch:

0 Jan weet dat we --- het huis hebben willen --- schilderen

1Hans laten

2Hans Peter laten helpen

3Hans Peter Marie laten helpen zien

NP1 NP2 NP3 V1 V2 V3

Cross serial dependencies.

Bach et. al. measured, in terms of processing time, in each language, how much longer type 1 sentences take to process than type 0, how much longer type 2 than type 1, etc. And then they compared the figures they got for Dutch and for German.

They found that indeed type 0 in Dutch and in German (and in English) take about the same time (which forms the basis for comparison). Interestingly enough they found the following:

Systematically Dutch speakers process n Dutch embeddings FASTER

than German speakers process n German embeddings.

I want to suggest a possible explanation of this result in terms of semantic parsing.

(For an alternative explanation, see Joshi 1989.)

Let's first explain the idea of semantic parsing.

We have an input string which is read symbol by symbol from left to right:

dat Jan Marie kust [that Jan kisses Marie]

In semantic parsing the task is to come up with a semantic interpretation of the sentence. And we use the types of the input expressions and the type assignment of the grammar to do that online. Basically the idea is to find the values for the variables introduced in the parsing process, or equivalently, eliminate the variables. The parse is done when all variables are eliminated.

We indicate in boldface where we are in the parse:

Step 1dat Jan Marie kust

Semantics:φ[φ  VARt, a sentential variable]

Task: find the value of φ.

Step 2dat Jan Marie kust

Semantics:φ = P1(j)[P1 VAR<e,t>, a one-place predicate

variable]

Task: find the value of P1.

Step 3datJan Marie kust

Semantics:φ = P1(j)

P1 = P2(m)[P2 VAR<e,<e,t>, a two place predicate

variable]

This means that we can eliminate P1:

Semantics:φ = P2(j,m)

Task: find the value of P2.

Step 4datJan Marie kust

Semantics:φ = P2(j,m)

P2 = KISS

We eliminate P2:

Semantics:φ = KISS(j,m)

We eliminate φ:

Semantics:KISS(j,m)

Done.

Applying the very same strategy in the verb cluster, we get for Dutch (and for German) the following partial parse: (LF stands for: 'look for the value of')

dat Fred Susan Dafna haarpap zal helpen laten eten.

SEM: φ SEM: P1(f) SEM: P2(f,s) SEM: P3(f,s,d) SEM: P4(f,s,d,p)

LF: φ LF: P1LF: P2 LF: P3 LF: P4

So at the point where we reach the verb cluster, the parser is looking for a four-place relation.

Now we need to make some assumptions about the semantics of the verbs involved in the verb cluster. The basis of the semantics is the following.

Even though I have assumed that syntactically a verb like laten takes a subject Susan and a small clause [SCDafna [VPhaar pap eten]], this doesn't mean that semantically LATEN is a two-place relation between the interpretation of the subject and the interpretation of the small clause (a proposition). Semantically, LATEN is a three-place relation between the interpretation of the subject, the interpretation of the small clause subject and the interpretation of the small clause predicate.

The latter is the standard assumption in the semantic literature, and it is semantically well motivated.

For some semanticists, this means that we should give a different syntax as well.

Some syntacticians would object to the semantics because they have syntactic reasons to assume the small clause analysis.

I have argued in Landman 2002, 2004 against the Perfect Match between syntax and semantics (for a completely unrelated phenomenon). I assume that this is another case where the syntax and the semantics are mismatched.

Based on these considerations, I give the following semantic interpretations for the verbs involved:

zullen: a function from one-place properties to one-place properties:

λP1λx.WILL(x,P1(x))

dansen:a one place property:

DANCE

Hence: zullen dansen: λP1λx.WILL(P1(x)) (DANCE) =

λx.WILL(DANCE(x))

Fred zal dansen: λx.WILL(DANCE(x)) (f)=

WILL(DANCE(f))

laten: a function from one-place properties to two-place properties:

λP1λyλx.LET(x,P1(y))

Hence: laten dansen: λP1λyλx.LET(x,P1(y)) (DANCE) =

λyλx.LET(x,DANCE(y))

Dafna laten dansen: λyλx.LET(x,DANCE(y)) (d) =

λx.LET(x,DANCE(d))

Fred laat Dafna dansen: λx.LET(x,DANCE(d)) (f) =

LET(f,DANCE(d))

So we have the following lexical specifications:

dansen:DANCE = λx.DANCE(x)

eten:EAT = λyλx.EAT(x,y)

zullen: WILL = λP1λx.WILL(P1(x))

helpen:HELP = λP1λyλx.HELP(x,P1(y))

laten:LET = λP1λyλx.LET(x,P1(y))

zien:SEE = λP1λyλx.SEE(x,P1(y))

Now we come to the basic semantic assumptions about the grammar, which are, again, widely assumed principles:

-Inside SYNTACTIC domains (X', XP) the basic semantic composition mode is:

FUNCTIONAL APPLICATION: (f(α))

Find a function and an argument and apply the function to the argument.

-Inside LEXICAL domains (X) the basic semantic composition mode is:

FUNCTION COMPOSITIONL f o g

Find two functions and compose them.

The way our parser worked so far represents this: we have so far been parsing in syntactic domains, and assumed functional application to partially resolve the semantic parse.

But the parse has now reach the verb cluster, which is a lexical cxategory, hence, the parser will switchto function composition to resolve the remainder.

The operation the grammar uses is an operation of generalized function composition, and it works as follows:

(clause (1) shows standard function composition, clause (2) shows how it generalizes, clause (3) is the general clause:)

Generalized Function Composition:

1. If f is a function of type <a,b>(from a-entities into b-entities)

and g is a function of type <c,a

then f o g is a function of type <c, b>

f o g = λx.f(g(x))

2. If f is a function of type <a,b>

and g is a function of type <c2,<c1,a> (from pairs of c1 and c2-entities

then f o g is a function of type <c2,<c1, b> into a-entities)

f o g = λx2λx1.f(g(x1,x2))

3. If f is a function of type <a,b>

and g is a function of type <cn,...<c1,a>...>

then f o g is a function of type <cn,...<c1, b>...>

f o g = λxn...λx1.f(g(x1,...,xn))

We are at the following stage in Dutch:

dat Fred Susan Dafna haar pap zal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

In German we are, similarly at the following stage:

daβ Fred Susan Dafna ihr Breiessenlassen helfen wird.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

In both cases, we are looking for a four-place relation P4 and we rely on function composition to find it.

Let's argue the German case first.

We are looking for a four place relation. But essen is a two-place relation.

So we are stuck.

At this point we start a store in which we build a four place relation:

daβ Fred Susan Dafna ihr Breiessenlassen helfen wird.

SEMANTICS: φ = P4(f,s,d,p)

LF: P4

STORE: EAT(a two place relation)

We continue and at the next step we apply function composition in the store:

daβ Fred Susan Dafna ihr Breiessen lassen helfen wird.

SEMANTICS: φ = P4(f,s,d,p)

LF: P4

STORE: LET o EAT(a three-place relation)

LET o EAT =

λzλyλx.LET(x,EAT(y,z))

We continue to helfen and again do function composition on the store:

daβ Fred Susan Dafna ihr Breiessenlassen helfen wird.

SEMANTICS: φ = P4(f,s,d,p)

LF: P4

STORE: HELP o (LET o EAT)(a four-place relation)

HELP o (LET o EAT) =

λuλzλyλx.HELP(x,LET(y,EAT(z,u)))

We have now a four-place relation in store, but the parse continues inside V, so we continue with function composition:

daβ Fred Susan Dafna ihr Breiessenlassen helfen wird.

SEMANTICS: φ = P4(f,s,d,p)

LF: P4

STORE: WILL o (HELP o (LET o EAT))(a four-place relation)

WILL o (HELP o (LET o EAT)) =

λuλzλyλx.WILL(HELP(x,LET(y,EAT(z,u))))

This completes the parse in the V domain, we match the store and P4:

daβ Fred Susan Dafna ihr Breiessenlassen helfen wird.

SEMANTICS: φ = P4(f,s,d,p)

LF: P4 = WILL o (HELP o (LET o EAT))

We eliminate variables and get the correct parse:

daβ Fred Susan Dafna ihr Breiessenlassen helfen wird.

SEMANTICS: WILL(HELP(f,LET(s,EAT(d,p))))

DONE

In Dutch we have exactly the same option as in German, we can create a store:

dat Fred Susan Dafna haar pap zal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

STORE: WILL

We continue in the store with function composition:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

STORE: WILL o HELP

WILL o HELP =

λP1λyλx.WILL(HELP(x,P1(y)))

We continue with laten:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

STORE: (WILL o HELP) o LET

(WILL o HELP) o LET =

λP1λzλyλx.WILL(HELP(x,LET(y,P1(z))))

We compose with eten:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

STORE: ((WILL o HELP) o LET) o EAT

((WILL o HELP) o LET) o EAT =

λuλzλyλx.WILL(HELP(x,LET(y,EAT(z,u))))

We are done in the V-domain, we have the same relation in store as in German, so we match, eliminate variables and are done:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: WILL(HELP(f,LET(s,EAT(d,p))))

DONE

Thus far, there is no difference between the Dutch and the German case. The difference comes in with the following observation:

Dutch allows a straightforward alternative parsing strategy that does not

involve a store at all.

We go back to the point where we switched from functional application to function composition:

dat Fred Susan Dafna haar pap zal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

LOOK FOR: P4

We continue the parse by introducing search variable Q4 and compose as follows:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: P4(f,s,d,p)

P4 = WILL o Q4

LOOK FOR: P4

So we get:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: [WILL o Q4] (f,s,d,p)

LOOK FOR: Q4

And we continue by introducing a search variable P3 and compose as follows:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: [WILL o (HELP o P3)] (f,s,d,p)

LOOK FOR: P3

We continue by introducing search variable P2 and compose similarly:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: [WIIL o (HELP o (LET o P2))] (f,s,d,p)

LOOK FOR: P2

At this point we reach the end of the V and we resolve:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: [WIIL o (HELP o (LET o EAT))] (f,s,d,p)

The result is the same:

dat Fred Susan Dafna haar papzal helpen laten eten.

SEMANTICS: WILL(HELP(f,LET(s,EAT(d,p))))

DONE

Thus, on this strategy, we just continue to compose on.

In this parse, we do not use a store, and we only need to introduce six search variables all in all: φ, P1,P2,P3,P4, Q4of five different types.

Now, composition is a powerful mechanism, so it shouldn't come as a surprise that also for German we can find a direct parse that doesn't rely on the store.

The parse in German can continue directly as follows:

P4(f,s,d,p)

LF: P4

(R3 o EAT)(f,s,d,p)where R3 is a variable of type <e,t>,<e,<e,<e,t>

LF: R3

(R2 o (LET o EAT))(f,s,d,p)where R2 is a variable of type <e,t>,<e,<e,t>

LF: R2

(R1 o (HELP o (LET o EAT)))(f,s,d,p) where R1 is a variable of type <e,t>,<e,t>

LF: R1

(WILL o (HELP o (LET o EAT)))(f,s,d,p)

which is:

WILL(HELP(f,LET(s,EAT(d,p))))

This parse takes as many steps as the Dutch parse, and doesn't use a store either.

It differs from the Dutch parse, though, in that it introduces more search variables than the Dutch parse: eight search variables all in all: φ, P1,P2,P3,P4,R3,R2,R1of eight different types.

If we make the plausible hypotheses that using a store is costly, and that using more search variables of more different type is costly it follows that both parsing strategies potentially available in German are more costly than the fastest strategy available in Dutch. And this is what Bach et. al. 1986 found.

PART 7. TREE ADJOINING GRAMMARS

I have argued that Dutch cross serial dependencies are outside the scope of context free grammars. But how far outside are they? A very useful framework for comparing grammar formalisms for the analysis of natural language that go beyond context free grammars, and for studying these questions, has been developed by Aravind Joshi. It is based on the idea of taking the I-sets and A-sets, that we developed in the course of Parikh's theorem, as grammars in their own right. Such grammars are called tree adjoining grammars.

Before, the notions of I-trees and A-trees were defined as parse trees in a given context free grammar. Since we don't have such a grammar now, we need to define these notions independently.

A vocabulary V = VNVT is the union of a finite set VN of non-terminal

symbols, containing S, and a finite set VT of terminal symbols, where VN and

VT are disjoint.

An I-tree in V is any tree with topnode labeled S, yield in VT*, non-terminals

from VN on the non-leaves.

An A-tree in V is any tree with topnode some non-terminal X in VN, yield of

the form αXβ, with α,β in VT*, and non-terminals for VN on the non-leaves.

A tree adjoining grammar, TAG, is a set of elementary trees EV = IV AV,

where IV, the set of initial trees, is a finite set of I-trees in V, and

AV, the set of auxiliary trees, is a finite set of A-trees in V.

Tree adjunction:

Let T be an I-tree or an A-tree in V and let TX be an A-tree in V with toplabel

X, let n be a node in T with label X:

ADJ[T,A,n] is the result of adjoining A in T at node n, where adjunction is as

we have defined before.

Note that this definition is a bit wider than what we have defined before: it allows adjunction not only in I-trees, but also in A-trees.

FACT: If T is an I-tree, then ADJ[T,TX,n] is an I-tree, if defined.

If T is an A-tree, then ADJ[T,TX,n] is an A-tree, if defined.

We will define for tree adjunction grammars two notions of derivation. The first is the notion we have used before, the second is new. The notions will be equivalent for

tree adjoining grammars, but not for the later extension of the theory to multiple adjunction.

I. DER: ADJOIN AUXILIARY TREES IN DERIVED I-TREES

DERE[IV], the set of derived I-trees in EV, is the smallest set of I-trees in V such that:

1. IV DERE[IV]

2. If T  DERE[IV] and TX AV, n a node in T,

then ADJ[T,TX,n]  DERE[IV], if defined.

On this notion, you always adjoin into an initial or derived I-tree, but what you adjoin with is not just an A-tree, but an auxiliary tree in AV.

II. DER*: ADJOIN DERIVED A-TREES IN ELEMENTARY TREES

DERE*[AV], the set of derived A-trees in EV, is the smallest set of A-trees in

V such that:

1. AV DERE*[AV]

2. If TX AV and TY DERE*[AV], and n a node in TX,

then ADJ[TX,TY,n]  DERE*[AV], if defined.

DERE*[IV], the set of derived I-trees in EV,, is the smallest set of I-trees in V

such that:

1. IV DERE*[IV]

2. If I  IVand TX DERE*[AV], and n a node in I,

then ADJ[I,TX,n]  DERE*[IV], if defined.

On this notion, you build up, from an elementary auxiliary tree a derived A-tree by adjoining a derived A-tree into it. All the tree building takes place in the A-tree.

Finally, you get an I-tree by adjoining the complex A-tree into an initial tree.

So on this notion of derivation, what you adjoin can be a derived A-tree, but

you never adjoin into derived trees,only into initial trees.

FACT: For any tree adjunction grammar E = IV AV:

DERE[IV] = DERE*[IV]

Thus, for TAGs indeed the two notions of derivation are equivalent.

It can be proved that TAGs, as defined here, are a bit richer than context free grammars, but not a lot, and in fact, not enough to be interesting from a grammatical point of view. That is, while TAGs generate all context free languages and some non-context free languages, none of the non-context free languages that we have discussed

are generated (like anbmcndm). The reason is, that you cannot control where you can adjoin and where not. On the first derivation notion, if T contains label X, it is not just that you can adjoin an auxiary tree with toplabel X in T, but you can adjoin that auxilary tree at any node in T where X occurs. This means typically that you cannot

prevent unwanted adjunctions.

TREE ADJOINING GRAMMARS WITH CONSTRAINTS

For this reason, tree adjoining grammars have been extended with constraints.

Let EV be a tree adjoining grammar as above.

A tree with constraints in AVis an I-tree or an A-tree in V with on each non-

terminal node n two constraint labels: Cn and On, where:

Cn AV and On AV.

Cnis called the constraint set.

Onis called the obligatory adjunction set.

Thus, the constraint labels that occur at a node are (finite) sets of auxiliary trees from AV.

Cn is going to mean:

Only trees derived from auxiliary trees in Cn can be adjoined here.

If Cn = AV there are no constraints on adjunction at node n. In this case I will write * instead of Cn.

If Cn = Ø then nothing can be adjoined at node n. In this case I will leave out Cn.

Onis going to mean:

Some tree derived from one of the trees in On must be adjoined at this node.

If On = Ø, there is no obligatory adjunction at this node. In this case I will leave out On.

We incorporate these notions as constraints on adjunction:

I. WITH DER.

Let T be an I-tree in V with constraints in AV, n a node with label X, Cn,On, and

TX an auxiliary tree with constraints in AV with with toplabel X. Then:

ADJ[T,TX,n] is only defined if TX Cn and TX On

II. WITH DER*.

Let Z  AV

DERE*[Z], the set of A-trees derived from Z, is the smallest set of A-trees such that:

1. Z  DERE*[Z]

2. If TX DERE*[AV] and n a node in Z,

then ADJ[Z,TX,n]  DERE*[Z], if defined.

Let E be an elementary tree with constraints in AV, n a node with label X, Cn, On, and TX an A-tree with constraints in AV with toplabel X. Then:

ADJ[E,TX,n] is only defined if

1. for some Z  Cn: TX DERE*[Z]

2. if On Ø, for some Z  On: TX DERE*[Z]

So, labels Cn and On on node n provide input constraints on the adjunction at that node. In ADJ[E,TX,n], the labels Cn and On on node n have disappeared and are replaced by the constraint labels of the topnode and bottom leaf node of TX.

With obligatory constraints, we need to define the set of generated trees separately.

T(EV), the set of I-trees generated by EV is:

T(Ev) = {T  DERE[IV]: for every node n in T; On = Ø}

L(EV) = {α: for some T  T(EV); α = yield(T)}

T*(EV), the set of I-trees *-generated by EV is:

T*(EV) = {T  DERE*[IV]: for every node n in T; On = Ø}