for Anna Kibort (ed.) Syntactic Government and Subcategorisation

Dependency grammar

1.A brief history

The terms government and subcategorization are an interesting pair in terms of the history of syntactic theory because one is ancient while the other is an invention of the twentieth century with much the same meaning – a clear case of reinventing the wheel.

By the twelfth century, grammarians were already using the Latin verb regere, ‘to rule’, to describe the way in which a preposition or verb dictated the case of its complement (Robins 1967:83), and (according to the Oxford English Dictionary) the verb govern is used in the same sense by the early seventeenth century. Moreover, the intellectual and metaphorical foundations for these terms go even further back in time; so in second-century Alexandria, Apollonius discussed the ways in which different verbs and prepositions selected different cases in dependent nouns, and even fathered the term ‘transitive’ (Robins 1967:37). These selection relations received considerable attention in the Arabic grammarians of the eighth century onwards, which described a word as ‘governing’ (Arabic ‘a:mil) another word whose case it selected, and even went so far as to notice that in Arabic (a head-initial language) the governor generally precedes the governed (Owens 1988:53). In short, the terms govern and government are at least five hundred years old, and the underlying idea of an asymmetrical relation in which one word controls another is almost two thousand years old. (Similar ideas in Panini may well push the history even further back to the fifth century BC; Robins 1967:145.)

In contrast, the term subcategorization dates back only to 1965, when Chomsky introduced it (Chomsky 1965) as a solution to a problem that arose from the phrase-structure theory that he had espoused. The basis for phrase structure is the assumption that only one kind of structure can be represented: that between a whole and its parts. Thus, the phrase cows eat grass could be related to its parts (cows and eat grass) and eat grass to its parts (eat and grass) but, crucially, these two parts could not be related directly to one another. This curiously restrictive assumption meant that there was no way to show that the presence of grass had anything to do with the properties of eat as a transitive verb. Chomsky’s solution was to add ‘features’ to eat which showed how it could combine with other elements: ‘selection features’ for semantic restrictions and ‘subcategorization features’ for syntactic restrictions.

The term subcategorization is odd, because it would normally mean nothing but ‘subclassification’ although its intended meaning is much more specific: subclassification according to the syntactic properties of accompanying elements. However, there are deeper objections to this ‘solution’ because it is part of a theory which starts by claiming that part-part relations are not permitted in syntactic structure. Subcategorization features, like selection features, undermine this principle by allowing part-part relations in the guise of a classification of the head word, but without acknowledging that each such feature implies a direct relation between the parts. Fortunately, at least the terminology in Chomskyan linguistics has reverted to the more traditional govern and government, a direct relation between governor and governed, though this relation is still treated as parasitic on the whole-part relations of phrase structure. Unfortunately, both the terminology and the ideas of the earlier theory persist in models such as Phrase Structure Grammar (Pollard and Sag 1994).

It is interesting to trace the recent history of these ideas, going back to the first attempts to produce formal underpinnings for syntactic analysis – systems of analysis which were sufficiently explicit and clear to be represented diagrammatically. This history is interesting not only because it concerns some of the most elementary assumptions of modern syntactic theory (such as government), but also because it shows varying influence of the various ‘stakeholders’ in syntax – descriptive linguistics, logic and education. For each theory considered below, the immediate question for us is how the theory concerned accommodated government relations.

We start with the ‘sentence diagramming’ which was invented in the United States in the early nineteenth century and reached maturity in 1877 in the work of Reed and Kellogg (Gleason 1965:73). Their diagramming system allowed structures like Figure 1 for the sentence Cows eat fresh grass, with horizontal lines for government relations and diagonals for what we call adjuncts. The vertical lines distinguish subjects from objects, with the verb in the centre of the diagram as the heart of the sentence. Each relation is shown as a single line, so government relations are shown directly.

Figure 1: A sentence diagram

This diagramming system was intended for use in schools, and was so successful that it is still taught today in many American (and other) schools – indeed some readers of this chapter may have learned it as children. It even has a 21st century face in a website ( that generates ‘Reed and Kellogg’ sentence diagrams to order and an informative page on Wikipedia. However, so far as I know it was never used in descriptive linguistics, so it remained a product of, and for, the school classroom, without any theoretical or research-based underpinnings. On the other hand, it may well have been part of the school education of academic linguists, so it is hard to rule out the possibility that it at least suggested the possibility of using diagrams to display sentence structure.

One feature of Reed and Kellogg diagrams is that (in modern parlance) they show dependency relations (government and adjunction) but not precedence (word order); for instance, Figure 1shows that fresh is an adjunct ofgrass, but does not show which word follows which. Another important feature is that they do not recognise phrases as such, although phrases are implicit in the dependency lines. These features were given a more thorough theoretical foundation during the 1920s and 1930s by at least two European linguists, both of whom wanted to improve the teaching of grammar in schools. On the one hand, Otto Jespersen recognised the hierarchical ordering of words in phrases such as a furiously barking dog, but (confusingly) concluded that the word classes concerned could be arranged in three ‘ranks’ so that ‘tertiary’ words such as furiously consistently attach to ‘secondary’ words like barking, which in turn attach to ‘primary’ words such as dog (Jespersen 1924, Pierce 2006). And on the other hand, Lucien Tesnière not only wrote a major theoretical discussion of dependency relations (published posthumously as Tesnière 1959), but also produced a simple diagramming system. He does not seem to have known about the Reed and Kellogg system, but he may have been influenced by German grammarians who developed the idea of dependency, as well as the name, in the early nineteenth century (Forsgren 2006). His notation showed dependency relations more consistently and iconically, with dependents consistently written lower than the words on which they depend in a tree-diagram called a ‘stemma’ such as Figure 2. Notice that the stemma has the same features as the Reed and Kellogg diagrams: showing dependency but not precedence, and leaving phrases implicit in the word-word dependencies.

Figure 2: A stemma

Another European attempt to formalise the notion of government led to Categorial Grammar, but this time the development was driven by logic (Morrill 2006). A verb such as eat is incomplete in itself, and needs to combine with a following noun to produce a phrase such as eat grass;but this too is incomplete until it combines with a preceding noun to produce a phrase such as cows eat grass, which is complete. These notions of one word ‘needing’ anotheraccurately reflect the old tradition of government, although they are extended to include subjects; but they are also extended in another direction to include adjuncts such as fresh, which is said to need a noun in order to combine with it and produce another noun – a case of a dependent being sanctioned by itself rather than by the head.

Categorial Grammar is sensitive to word order, but like Jespersen’s theory it bases the classification of words directly on their combinatorial needs rather than (as in traditional grammar) on a bundle of morphological, syntactic and semantic criteria. The ‘categories’ of Categorial Grammar replace the traditional word classes such as ‘noun’ and ‘verb’, so (at least in early versions of the theory) there is a category for intransitive verbs (N\S) and another for transitive verbs ((N\S)/N), but none for ‘verb’. On the other hand, the basis in logic allows a very simple translation from a syntactic structure to a logical semantic structure. Given the orientation to logic rather than pedagogy, it is unsurprising that there is no standard diagrammatic representation for syntactic structure in Categorial Grammar, comparable with Reed and Kellogg sentence diagrams or stemmas.Figure 3 uses an ad hoc notation which at least reflects the spirit of Categorial Grammar.

Figure 3: A categorial grammar analysis

Meanwhile, in the USA the main demand for syntactic theory came from descriptive linguists working on the local native-American languages, for whom the tradition developed for highly inflected, case-based, languages such as Latin, Greek, Hebrew and Arabic proved hard to apply. Bloomfield’s reaction to the problem was to start from scratch, making the minimum of basic assumptions about how sentences were structured (Bloomfield 1933). The result was immediate-constituent analysis, in which the only relation needed, or allowed, in syntax is the whole-part relation between a phrase and its parts. When diagrams started to be used in works such as Nida’s analysis of English (Nida 1960) they were the tree diagrams which later became familiar through Chomsky’s work, as in Figure 4.

Figure 4: A phrase-structure tree

It is true that some of these early systems, including Nida’s, acknowledge the importance of the traditional grammatical relations such as ‘subject’ and ‘object’ by recognising these as sub-divisions of the whole-part relations; but these are still not true part-part relations like traditional government; and phrase structure, Chomsky’s purification of immediate-constituency analysis, left no room even for these concessions to traditional analyses.

When Chomsky was developing his ideas about syntactic structure, he was aware of Categorial Grammar but not, apparently, of other theories which treated government relations as basic (pc). The main models for his work were, paradoxically, the post-Bloomfieldian theories of Zellig Harris(Harris 1951) which he later attacked so vehemently, and the branch of mathematics called ‘formal language theory’ (and in particular the theory of recursive functions - Smith 1999:56). Government relations played litte or no part in these models, so they were entirely absent from Chomsky’s earliest work, and it was only in 1965 that he recognised them through the introduction of the ‘subcategorization’ featuresdiscussed earlier. Since then, the part-part relations of government have increased in importance, but the framework of whole-part relations remains basic. It is unfortunate that it was possible to argue in the 1960s that dependency grammars were equivalent to phrase-structure grammars (Gaifman 1965, Robinson 1970), because this allowed syntacticians to conclude that dependency grammars could safely be ignored. In fact, the arguments only showed that one very limited version of dependency grammar was equivalent to an equally limited version of phrase structure; and in any case, the equivalence was only weak – in terms of the strings of symbols that could be generated – rather than the much more important strong equivalence of the structural analyses, where dependency structures were clearly not equivalent to phrase structures.

During the decades since Chomsky’s generative grammar rose to prominence in syntactic theory, other approaches have also been growing and developing. Categorial Grammar has turned into the most popular option for logically oriented linguists, and has been combined with phrase structure in Head-driven Phrase Structure Grammar (Pollard and Sag 1994). A number of approaches have combined phrase structure with a traditional functional analysis (in terms of subjects, objects and the like), but interpreted as whole-part relations rather than as government relations between a head and its dependents:

  • Systemic Functional Grammar (Halliday 1985).
  • Relational Grammar (Blake 1990)
  • Functional Grammar (Dik 1991)
  • Lexical Functional Grammar (Bresnan 2001)
  • Role-and-Reference Grammar (Van Valin 1993)
  • Construction Grammar (Croft 2007, Fillmore and others 1988, Goldberg 1995, Tomasello 1998)

What these approaches share is the basic Bloomfieldian assumption that the foundation for syntactic structure is the whole-part relation between a phrase and its parts, rather than the part-part relation between a word and its dependents.

During the same period, however, the dependency-based tradition also developed a range of theories that is almost as diverse as the tradition based on phrase structure.Many of these theories are inspired by the formal developments of generative grammar and computational linguistics (Kruijff 2006), including at least the following list (where some theories are given ad hoc names):

  • Abhängigkeitsgrammatik (for computing) (Kunze 1975)
  • Generative Dependency Grammar (Vater 1975, Diaconescu 2002)
  • Case Grammar (Anderson 1977)
  • Functional Generative Description (Sgall and others 1986)
  • Lexicase (Starosta 1988)
  • Abhängigkeitsgrammatik (for schools) (Heringer 1993)
  • Meaning Text Theory (Mel'cuk 2004)
  • Tree-Adjoining Grammar (Joshi and Rambow 2003).
  • Link Grammar (Sleator and Temperley 1993).
  • Dependency Parsing (Kübler and others 2009)
  • Catena Theory (Osborne and others 2012)

There is also a very productive research tradition called Valency Theory which focuses specifically on government relations (Allerton 2006, Herbst and others 2004) and whose key term, valency, was introduced into syntax by Tesnière.

The enormous diversity within the dependency tradition undermines most simple generalisations about ‘dependency grammar’ in relation to government. Instead of trying to survey the diversity, I shall present the version of dependency grammar that I have been developing since about 1980 under the name ‘Word Grammar’ (Gisborne 2010, Gisborne 2011, Duran-Eppler 2011, Sugayama 2003, Sugayama and Hudson 2006, Hudson 1984, Hudson 1990, Hudson 2007a, Hudson 2010). First, however, I start with a survey in section 2 of the kinds of information that could, and I believe should, be covered by the term government. Section 3 then introduces the relevant general characteristics of Word Grammar (including, of course, its use of word-word dependencies), and shows how these characteristics allow Word Grammar to accommodate government in all the diversity surveyed in section 2.

2.The scope of government and valency

Traditionally, government applied either to a word’s complement (e.g. a preposition governs its complement) or to that complement’s case (e.g. the Latin preposition de, ‘about’, governs the ablative case). However, the Arabic grammarians extended the same notion to the inflectional categories of dependent verbs, and it is easy to argue for a much wider application of the term to cover any word-complement pair. For instance, in the ‘DP’ analysis, a boy consists of a determiner followed by its complement, so a counts as the governor of boy. This extension is fully justified by the similarities across word-classes in the way they restrict their complements. And, of course, if we insist on fidelity to the tradition, we are left with a terminological gap: what do we call the relation between a word and its complement when the word is not a verb or preposition? Extending the term govern fills this gap perfectly, and removes the otherwise arbitrary historical restriction to verbs, prepositions and case.

Some dependency grammarians have extended the term in another direction, so that governor is simply the converse of dependent (e.g. Tarvainen 1987:76). This regrettable extension misses the main point of the idea (and terminology), which is that the governor controls both the presence and other properties of the dependent; in this sense, Cows eat grassconstantlyshows a verb eat governinggrass but not constantly. If govern is extended to cover constantly as well as grass, then we need another term to distinguish them, so we may as well stick with govern for the relation between eat andgrass, with constantly as itsadjunct (or Tesnière’s circonstant). On the other hand, we do need a converse of the term dependent. One obvious possibility is head (the term I once used), but this term is used in phrase structure to relate a word to its containing phrase, rather than to its individual dependents; for example, in the phrase green grass, the word grass is head of green grass, and not of its dependent green. To avoid this confusion I now prefer the term parent, so in the pair green grass, grass is the parent of green, which is its dependent; and in Cows eat grass constantly, the verb eat is the parent of all the other words.

If the governor of an element is the word that controls its presence and properties, then two further extensions are needed to the traditional scope of government. On the one hand, it must include subjects as well as complements; after all, the finite verb eats demands a subject even more strongly than it demands a complement. With this extension, the scope of government includes all ‘arguments’ or (following Tesnière) ‘actants’. Whatever mechanism is responsible for providing a word with its correct complements can also be extended to deal with subjects.

The other extension is to parents (in the sense I introduced above, where a word’s parent is the word on which it depends). This extension is already established in all but name in Categorial Grammar, where (as explained earlier) adjuncts are elements that take their parent as argument; so constantly might be classified as a form that takes a verb as its argument to produce another (larger) verb. It is also justified by the facts, because dependents can ‘govern’ their parents in much the same way as vice versa. The reason why government was traditionally applied only to complements was that the classical languages Latin and Greek are both ‘dependent-marking’, marking most dependencies by case inflections on the dependent. But not all languages are like this, and ‘head-marking’ languages locate the marker of a dependency on the parent (Nichols 1986, Nichols 2006). In such a language, the dependent controls the inflectional form of the parent. Moreover, it is easy to find examples even in more familiar dependent-marking languages where a dependent selects its parent. For example, in many European languages a past participle selects between ‘have’ and ‘be’ as its auxiliary verb, but we all agree that the participle depends on the auxiliary. Similarly, many nouns select their preferred preposition – e.g. in school but at university – and once again, it is the governor in this relation that depends syntactically on the governed. And of course, even adjuncts are fussy about the words they can depend on, so very can depend on an adjective or an adverb, but not on a verb (very interesting, but not *very interest). And even more generally, the one thing we know about the syntax of almost every word except finite verbs is that it needs a parent – again a clear case of government.