Lecture Notes in Computer Science s2

Bubble Networks: A Context-Sensitive
Lexicon Representation

Hugo Liu

MIT Media Laboratory

20 Ames St., E15-320D

Cambridge, MA 02139, USA

Abstract. The paradigm of packing semantics into words has long been the dogma of lexical semantics. As computational semantics is maturing as a field, we are beginning to see the failure of this paradigm. Pustejovsky’s Generative Lexicon is only able to have generative powers if the semantics packed within a word can anticipate possible combinations with other words. However, even an idealized Generative Lexicon will have to grapple with the effects of implicit, non-lexical context (e.g. topics, commonsense knowledge) on meaning determination. In this paper, we propose an intrinsic, connectionist network representation of a lexicon called a bubble network. In a bubble network, meaning is a result of graph traversal, from some word-concept node toward a context (e.g. “wedding” in the context of “ritual”), or toward another lexical item (e.g. “fast car”). Possible meanings are disambiguated using a modified spreading activation function which incorporates ideas of structural message-passing and active contexts. An encapsulation mechanism allows larger lexical expressions and assertional knowledge to be incorporated into the network, and along with the notion of utility-based learning of weights, helps to give a more natural account of lexical evolution (acquisition, deletion, generalization, individuation). A preliminary implementation and evaluation of bubble networks show that lexical expressions can be interpreted with remarkable context-sensitivity, but also point to some pragmatic problems in lexicon building using a bubble network.

1 Introduction

Semantic frames have been a dominating knowledge representation in artificial intelligence because they can encapsulate the entirety of an idea with exhaustive satisfaction or a priori definition. In extremely rich domains such as commonsense, they have been used to encapsulate objects, events, and functions (as in Lenat’s Cyc project (1995)). However, when reasoning with frame encapsulations in rich environments, it can be difficult to decide how to identify and apply the relevant aspects of a frame to any given situation. This is closely related to the classic AI frame problem, first posed by McCarthy and Hayes (1969) in the context of the situation calculus formalism.

Likewise in computational semantics, within the non-statistical camp, there is an increasing trend toward representing word meanings with richly formulated frames, the idea being that a richer lexicon solves more problems. Arguably, the flagship semantic theory of this view is Pustejovsky’s Generative Lexicon Theory (GLT). (1991). In GLT, many different semantic structures are packed into a lexical item. These linguistic representations include logical argument structure, event structure, qualia structure, and lexical inheritance structure. Packing this practical hodgepodge of semantics into each lexical item, GLT postulates that lexical items, when combined, will interact with and coerce one another with enough power, to generate larger lexical expressions with highly nuanced meanings. We would certainly agree that such a lexicon, being richer, would exhibit more “generative” behavior than most lexicons that have been developed in the past, such as Fellbaum et. al's WordNet project (1998). Nonetheless, we see several problems with the GLT approach that prevent it from better reflecting how human minds handle context and semantic interpretation in a more flexible way.

1. Packing semantics into lexical items is a difficult proposition, because it’s hard to know when to stop. In WordNet, formal definitions and examples are packed into words, organized into senses. In GLT, the qualia structure of a word tries to account for its constitution (what it is made of), formal definition, telic role (its function), and its agentive role (how it came into being). But how, for example, is one to decide what uses of an object should be included in the telic role, and which uses should be left out?

2. Not all relevant semantics is naturally lexical in nature. A lot of knowledge arises out of larger contexts, such as world semantics, or commonsense. For example, the proposition, “cheap apartments are rare” can only be considered true under a set of default assumptions, and if true, has a set of entailments (its interpretive context) which must also hold, such as, “they are rare because cheap apartments are in demand” and “these rare cheap apartments are in low supply.” Although it is non-obvious how such non-lexical entailments are important to the semantic interpretation of the proposition, we will show later, that these contextual factors heavily nuance the interpretation of, inter alia, the word, “rare.” The implication to GLT is that not all practically relevant meaning can be generated from lexical combination alone. In contrast, one inference we might make from the design of GLT is that the context in a semantic description must arise solely as a result of the combination and intersection of explicit words used to form the lexical expression.

3. In the GLT account, the lexicon is rather static, which we know not to be true of human minds. Namely, GLT does not give a compelling account of the lexicon’s own evolution. How can new concepts arise in the lexicon (lexical acquisition)? How can existing concepts be revised? How can concepts be combined (generalization), or split apart (individuation)? How can unnamed concepts, kinds, or immediate or transitory concepts interact with the lexicon (conceptual unification)?

4. The lexicon should reflect an underlying intrinsic (within-system semantics) conceptual system. However, whereas the conceptual system of human minds has connectionist properties (conceptual recall time lapse reflects mental distances between concepts, semantic priming (Becker et al., 1997), GLT does not exhibit any such properties. Indeed, connectionist properties are arguably the key to success in forming contexts. Contexts can be formed quite dynamically by spreading activation over a conceptual web.

The intention of scrutinizing GLT in a critical way is not to discount the practical engineering value of building elaborate lexicons, or to single out GLT from our lexicon-building efforts, and in fact, much of this criticism applies to all traditional lexicons and frame systems. In this paper, we examine the weaknesses of frames and traditional lexicons, especially on the points of context and flexibility. We then use this critical evaluation to motivate a proposal for a novel knowledge representation called a bubble network.

A bubble network is a connectionist network (also called semantic networks and conceptual webs, in the literature), specially purposed to represent a lexicon. With a small set of novel properties and processes, we describe how a bubble network lexicon representation handles some classically challenging semantic tasks with ease. We show how bubble networks can be thought of to subsume the representational power of frames, first-order predicate logic, inferential reasoning (deductive, inductive, and abductive), and defeasibility (default) reasoning. The network also produces some nice generative properties, subsuming the generative capabilities of GLT. However, the most desirable property of the bubble network is in its seamless handling of context, in all its different forms. Lexical polysemy (nuances in word meaning) is handled naturally (compare this to the awkwardness of WordNet senses), as are the tasks of lexical evolution (acquisition, generalization, individuation), conceptual analogy, and defining the interpretive context of a lexical expression. The power of a bubble network derives from its duality as a connectionist system with intrinsic semantics, and a symbolic reasoning system with some (albeit primitive) inherent notion of syntax.

Many hybrid connectionist/symbolic systems have been proposed in the literatures of cognitive science and artificial intelligence to tackle cognitive representations. Such systems have tackled, inter alia, belief encoding (Wedelken & Shastri, 2002), concept encoding (Schank, 1972), and knowledge representation unification (Gallant, 1993). Our attempt is motivated by a desire to design a lexicon that is flexible and naturally apt with context. Our goal is not to solve the connectionist/symbolic hybridization problem; it is to demonstrate a new perspective on the relationship between lexicon and context: how an augmented connectionist network representation of a “lexicon” can naturally address the context problem for lexical expressions. We hope that our presentation will motivate further re-evaluation of the establishment dogma of lexicons as semantically packed words, and inspire more work on reinventing the lexicon to be more pliable to contextual phenomena.

Before we present the bubble network, we first revisit connectionist networks and identify those properties which inspired our representation.

2 Connectionist Networks

The term connectionist network is used broadly to refer to many, more specific knowledge representations, such as neural networks, the multilayer perceptron (Rosenblatt, 1958; Minsky & Papert, 1969), spreading activation networks (Collins & Loftus, 1975), semantic networks, and conceptual webs (Quine & Ullian, 1970). The common property shared by these systems is that a network consists of nodes, which are connected to each other through links representing correlation. For purposes of situating our bubble network in the literature, we characterize and discuss three desirable properties of connectionist networks that are useful to our venture: learning, representing dependency relations, and simulation.

2.1 Learning

Adaptive weights. The tradition of learning networks most prominently features neural networks, which arguably best emphasizes the principles of connectionism. In neural networks, typically a multilayer perceptron (MLP), nodes (or perceptrons) are de-emphasized, in favor of weighted edges connecting nodes, which give the network a notion of memory, and intrinsic (within system) semantics. Such a network is fed a set of features to the input layer of nodes, passes through any number of hidden layers, and outputs a set of concepts. Various methods of back propagation adjust the weights along the edges in order to produce more accurate outputs. Because connecting edges carry a trainable numerical weight, neural nets are very apt for learning.

The idea of learning via adapting weights is also found in belief networks (Pearl, 1988), where weights represent confidence values, causal networks (Rieger 1976), where weights represent the strength of implications, Bayesian networks, whose weights are conditional probabilities, and spreading activation networks, where weights are passed from node to node as decaying activation energies.

We are interested in applying the idea of changeable weights to lexical semantics because they provide an intrinsic, learnable feature for defining semantic context, by spreading activation; they also support lexical evolution tasks such as acquisition, generalization, and individuation.

De-emphasizing nodes. As in the MLP representation, the de-emphasis of nodes themselves in favor of their connections to other nodes also supports lexical evolution, because nodes can be indexical features corresponding to named lexical items, or kinds, or to unnamed features, or to contexts (to be explicated later). In fact, the purpose of node may simply be to provide a useful point of reference and to bind together a set of descriptive features which describe the referent (see Jackendoff, 2002) for a more refined discussion of indexical features and reference). One potentially important role played by nodes is to provide points for conceptual alignment across individuals. Laakso and Cottrell (2000) and Goldstone and Rogosky (2002) have demonstrated that it is possible to find conceptual correspondences across two individuals using inter-conceptual similarities between individualized conceptual web representations, without the need for total conceptual identity. This suggests that such a perspective on a lexical item as a node that is not precisely and formally defined (even though they are called by the same “word”), or extrinsically defined, still supports the faculties of language as a communication means. The lexicon as a conceptual network distinct to each individual does not take away from language’s communication efficacy. However, this is not to say that formal lexical definition plays no role. There is an acknowledged external component to lexical meaning, whether coming from social contexts (Wittgenstein, 1968), the community (Putnam, 1975), or formalized as being free from personal associations (Frege, 1892). Such external meaning, can be learned by and incorporated into an individual’s conceptual network through conceptual alignment and reinforcement.

Despite the proposition that each individual maintains his/her own conceptual network that is not identical across individuals, it is still possible, and indeed, a worthwhile venture to create a common conceptual network lexicon that can be assumed to be more or less very similar to the conceptual network possessed by any reasonable person within some culture and population (since meaning converges in language and culture). Similarly, the conceptual networks within some culture and population can possess non-lexical knowledge that constitutes “common sense.” The intent of our paper is to demonstrate how a common lexicon can be represented using a conceptual network, and also how non-lexical commonsense knowledge integrates tightly with lexical knowledge.

Generalization. Having only discussed adaptive weights as a learning method for networks, we should add that two other methods important to our venture are rote memory (knowledge is accumulated by directly adding it to the network) and restructuring (generalization). Case-based reasoning (Kolodner, 1993; Schank & Riesbeck 1994) exemplifies rote memory learning. Cases are first added to the knowledge base, and later, generalizations are produced based on a notion of case similarity. In the case of our conceptual network, we will introduce graph traversal as a notion of similarity. One issue of relevance not well covered in the literature is negative learning over graphs, or, learning how to inhibit concepts, and prune contexts. The most relevant work is Winston’s (1975) “near-miss” arch learning. His program learned to generalize a relational graph given positive, negative, and near-miss examples. In our bubble network, conceptual generalization is a consequence of a tendency toward activation economy, the minimization of activation energies needed in graph traversals.

2.2 Representing dependency relations

Conceptual Dependency. Connnectionist networks have also been used extensively as a platform for symbolic reasoning. Frege’s (1879) tree notation for first-order predicate-logic (FOPL), and Peirce’s (1885) existential graphs, were early attempts to represent dependency relations between concepts, but were more graphical drawings than machine-readable representations. In existential graphs, arbitrary predicates, which were concept nodes, were instead used as relations, labeling edges. We do need to label the edges between nodes with dependency relations for reasoning. The key is to decide on a set of edge labels that is not too small or too big and is most appropriate. Arbitrary or object-specific predicate labels might be appropriate for closed domains, such as the case of Brachman’s (1979) description logics system, Knowledge Language One (KL-ONE), but if we are to represent something as rich and diverse as a lexicon, arbitrary dependency labels making generalized reasoning very difficult. Some have tried to enumerate a set of conceptually prime relations, including Silvio Ceccato’s (1961) correlational nets with 56 different relations, and Margaret Masterman's (1961) semantic network based on 100 primitive concept types, with some success. While such creations are no doubt richly expressive, there is little agreement on these primitive types, and motivating a lexicon with an a priori and elaborate set of primitives lends only inflexibility to the system. On the opposite end of the spectrum, overly impoverished relation types, such as the nymic subtyping relations of WordNet, severely curtail the expressive power of the network.