A Computational Model of Human Affective Memory and Its Application to Mindreading

A Computational Model of Human Affective Memory
and Its Application to Mindreading

Hugo Liu

MIT Media Laboratory
20 Ames Street #320D
Cambridge, MA 02139, USA
+1 (617) 253-5334

ABSTRACT

The cognitive science and artificial intelligence communities are both interested in the problem of how humans infer the mental states of others, known as mindreading. Whereas cognitive science is interested in a deeper understanding of how humans mindread, artificial intelligence is interested in imparting mindreading capabilities to social computers. Current AI approaches to mindreading are weak, however. Techniques such as user profiling and collaborative filtering try to predict user preferences and actions, but do so very weakly. In this paper, we propose a deeper model of a person in terms of their system of attitudes, and implement the system called PERSONA. Grounded in the episodic and reflexive memories of a person, PERSONA uses saliency-mediated associative learning to automatically acquire a human affective memory model from a corpus of personal text, such as a weblog. Applying this model, PERSONA performs affective mindreading to predict a person’s likely affective response given a new situation or event. In addition to memory-based prediction alone, the system also analyzes the attitudes of a person’s Minskian imprimers and performs conceptual analogy to make predictions more robust. An evaluation of PERSONA indicates that it is a promising approach, comfortably outperforming baselines; however, because affective communication is fairly fail-hard, more refinement would be needed before this system can be applied to a socialize a computer.

1.WHAT IS MINDREADING INTERESTING?

Recently there has been much ado in the cognitive science community about the human faculty for Theory of Mind (ToM), otherwise known as mindreading. And no, it does not refer to psychic powers as one might guess. ToM and mindreading refer to an animal’s capability for reflecting on its own mental states – attitudes, beliefs, and desires – and modeling the mental states of others. It is believed that humans evolved specialized mindreading abilities absent in other primates (Povinelli and Preuss, 1995), and that the human mindreading faculty makes human social learning uniquely powerful – inter alia, the rapid learning of words (Bloom, 2002), and the learning of goals and values (Minsky, forthcoming). Cognitive scientists have gone about the study of mindreading in many ways, including: by evolutionary comparison, e.g. (Call and Tomasello, 1996); by examining linked phenomena like imitation (Meltzoff and Gopnik, 1993); by studying deficits of ToM in autistic children; by speculating on potential neural substrates for ToM such as mirror neurons (Gallese and Goldman, 1998); and by debating how it works, i.e. Simulation Theory of ToM versus Theory Theory of ToM.

Across the divide, artificial intelligence researchers are also thinking about mindreading. However, being on the whole more pragmatic, this community is more interested in imparting mindreading capabilities to computers and robots to create more sociable human-computer interaction (Nass et al., 1994). While some results from the cognitive science literature is interesting for to the AI community, such as the recent find of special action recognition neurons called mirror neurons in macaque monkeys (Gallese et al., 1996), we think it is fair to say that behavioral and bottom-up approaches to mindreading is still far away from producing a compelling and predictive cognitive model that could empower social computers.

Despite lacking a complete cognitive model of mindreading, the AI community has been working on weaker forms of mindreading for many years. User modeling, for example, attempts to model a human user’s preferences and mental context in hopes of creating more natural and personal interactions between human and computer. One common approach in user modeling is user profiling, whereby users are modeled by their demographic information, usually obtained via explicit questionnaires. Applying a small set of rules, these user demographics can be mapping into predicted user preferences. Another common approach in user model is collaborative filtering, in which patterns of user actions are modeled against those of a whole user community. While these forms of user modeling have enjoyed some success, particularly in product recommendation (Resnick and Varian, 1997), these approaches are too weak to be useful for socializing computers. User profiling oversimplifies people as obeying demographic lines, while collaborative filtering is a purely statistical approach offering little insight into a user’s beliefs or preferences; thus, user profiling and collaborative filtering are weak mindreaders.

In this work, we explore how mindreading can be deepened in a novel way: by considering knowledge of person’s life experiences over a long period of time, and applying this knowledge to predict how a person might respond in new situations.

In order to create a more complete and more intimate model of a person, we would necessarily need a corpus of knowledge about that person’s beliefs, desires, goals, and experiences. While it may be possible to acquire this directly through interactions with the user, building a sufficiently rich model of a person might require cumbersome interactions; thus the approach taken by this work is to try to infer such a model automatically from personal texts such as a journal, or a transcript of a person’s beliefs and ideas, as might be manifested in an interview. Through our initial experience, we realized that specific beliefs and goals would be too hard to accurately infer from unconstrained natural language text, but we did not want to sacrifice breadth of knowledge for specificity, so instead, we decided to try to infer just the emotions, attitudes and dispositions associated with these beliefs and goals. Using this body of knowledge, we construct a mechanism to predict the affective context of a person in reaction to a topic, situation or event. We dub this task affective mindreading, with affect referring to emotions, dispositions, and attitudes. If successful, we believe that this type of mechanism can have great implications for sociable computers.

Our approach can be summarized as follows. From text, we wish to infer a person’s emotions, attitudes, and dispositions toward particular people, topics, events, and situations, at different times in their lives, and to record these into a model of a person’s affective memory. By interpolating and extrapolating from this affective memory, a computer can perform affective mindreading – that is to say, given a new topic, event, or situation, the system will try to predict a person’s affective response. To implement this approach, we built PERSONA, a system that creates a model of a person’s affective memory from personal texts, and exploits this model for affective mindreading.

The rest of this paper is organized as follows. First, we present a computational model of human affective memory, a model of saliency-mediated associative learning from personal texts, and discuss the implementation of the PERSONA model learner. Second, we explore how the affective memory model is used in conjunction with conceptual analogy to perform affective mindreading. Third, we describe an experiment to evaluate PERSONA in an affective mindreading task. Fourth, we reconnect with the literature and address how affective mindreading aids social learning tasks in humans and computers.

2.A MODEL OF AFFECTIVE MEMORY

In the previous section, we motivated the development of a computational model of human affective memory by suggesting that such a model would allow for more advanced mindreading by computers than can be achieved through typical knowledge-impoverished user modeling techniques such as profiling. We begin this section with the caveat that the computational model described here is not claimed or intended to be cognitively motivated. We attempt to model human affective memory only insofar as it is feasible to infer from personal texts, and only insofar as it is useful to the task of affective mindreading – predicting a person’s attitudes and dispositions toward a particular subject. In this section, we first propose the two-part episode-reflex model of human affective memory and connect it to the literature. Second, we introduce saliency-mediated associative learning as a strategy for automatic model acquisition from personal texts. Third, we discuss how such a model has been implemented in PERSONA.

2.1The Episode-Reflex Model Of Human Affective Memory

Of the different types of human memory that have been studied, two are of great interest to us as tools for modeling affective memory: long-term episodic memory, and reflexive memory. In PERSONA, we combine the strengths of two memories to form the episode-reflex model.

2.1.1Affective long-term episodic memory

Long-term episodic memory (LTEM) is a relatively stable memory based on experiences and events in context. An episode can be thought of as a coherent packet of events with a time-sequence. Episodes are generally content-addressable, meaning that they can be retrieved through a variety of cues based on the sensory, affective, or semantic content of the episode, such as a sight, sound, emotion, or location. LTEM can be very powerful because even events which happen only once can become salient memories and serve to recurrently influence a person’s future thinking. If we hope to accurately predict a person’s affective response to a future situation, we must account for the influence of these one-time salient episodes. Even though our aim is to model only the affective aspect of human memory, we cannot, in the case of LTEM, completely disregard the non-affective aspects because they may serve as cues for retrieval. Consequently, our affective LTEM model represents episodes with some semantic structure and several types of context. In PERSONA, an affective LTEM episode has the following components:

A collection of the subevents of an episode that are salient to the evocation of the overall affect of the episode, sequentially ordered.
If possible, the perceived root cause of the affective response in that episode are extracted
Possibly salient contexts: the date, the location, the topic
An affect valence score associated with the episode
Salience score of episode, measuring the perceived importance of the memory

The motivation of extracting only salient subevents and extracting the perceived root cause of the episode make learning more precise, and will be discussed further in the next subsection. In addition to describe the thematic structure of the episode, we also encode several other types of contextual cues with the episode. As suggested by Tulving’s encoding specificity hypothesis (1983), retrieval of an episode is more likely when current conditions match the encoding conditions, thus it is important to remember the salient contexts surrounding an episode as completely as possible. Finally, because our main focus is on being able to recall the attitudes experienced during an episode, we associate an affect valence score, to be described in a later subsection.

2.1.2Affective reflexive memory

While long-term episodic memory deals in salient, one-time events and must generally be consciously recalled, reflexive memory is full of automatic, instant, almost instinctive associations. Whereas LTEM is content-addressable and requires pattern-matching the current situation with that of the episode, reflexive memory is like a simple hash-table that directly associates a cue with a reaction, thereby abstracting away the content. Tulving equates LTEM with “remembering” and reflexive memory with “knowing” (Tulving, 1983).

In humans, reflexive memories are generally formed through repeated exposures rather than one-time events, though subsequent exposures may simply be recalls of a particularly strong primary exposure (Locke, 1689). In addition to frequency of exposures, the strength of an experience is also considered. Complementing the event-specific Affective LTEM with an event-independent affective reflexive memory makes sense because there may not always be an appropriate distinct episode which shapes our appraisal of a situation; often, we react reflexively – our present attitudes deriving from such amalgamation of our past experiences now collapsed into something instinctive.

Because humans undergo forgetting, belief revision, and theory change, update policies for human reflexive memory may actually be quite complex. In PERSONA, we adopt a more simplistic representation and update policy that is not cognitively motivated, but instead, exploits the ability of a computer system to compute an affect valence at runtime. An entry in the memory is as follows:

The key to the entry is one of two types:
1) A simple conceptual cue whose semantic type belongs to the following ontology: a person, an action, an object, an activity, or a named event; or
2) A simple conceptual cue Bayesian conditioned on the presence of a discourse topic.
The value of the entry is a list of exposures.
An exposure X is the following triple:
date of exposure;
affect valence score of exposure, V;
saliency of exposure S

To read off the current valence associated with a conceptual cue, the formula given in Eq. (1) is applied.

(1)

where n = the number of exposures of the concept

This formula returns the valence of a conceptual cue averaged over a particular time period. The term, , rewards frequency of exposures, while the term, , rewards the saliency of an exposure. In this simple model of an affective reflexive memory, we do not consider phenomena such as belief revision, reflexes conditioned over contexts, or forgetting.

In summary, we have motivated and characterized two components to our computational model of human affective memory: an episodic component emphasizing the affect of one-time salient memories, and a reflexive component, emphasizing instinctive reactions to conceptual cues that are conditioned over time. In the following subsection, we propose how this two-part model of human affective memory can be acquired from personal texts via saliency-mediated associative learning.

2.2Saliency-Mediated Associative Learning

With origins in Aristotle, classical associative learning was popularized as an explanation of many brain processes beginning in the 17th century by several British philosophers, including John Locke and James Mill. However, after the rise and fall of popularity of Pavlovian classical conditioning, many in the cognitive science community now dismiss associative learning as inadequate. In the study of word learning in children, Paul Bloom reported that contrary to Locke’s assertion that repetition is necessary to associate words with sights and sounds, children actually learn word meanings error-free, and without repetition, in a process dubbed fast-mapping (Bloom, 2002). While it may seem that associative learning is being debunked as a plausible theory of cognitive learning, we suggest that associative learning can in many cases, be salvaged, given that it is appropriately structured. In Bloom’s research on word learning in children for example, error-free fast mapping is possible because the child uses the teacher’s mental and intentional context to disambiguate reference, and once disambiguated with sufficient confidence, the association between word and meaning can then be made with greater confidence. With a similar sentiment against weak associationism, Marvin Minsky argues that simply remembering everything is not equivalent to learning. The defining criteria for learning is knowing precisely what is learned. (Minsky, forthcoming) Or, formulated another way, learning involves credit assignment (Sutton, 1984).

The lesson to be learned from this (pun intended) is that associative learning is not useful unless it is precise. In other words, our mechanism of learning should not involve solely semantically weak statistical methods, but instead, perhaps incorporating some external knowledge and heuristics to gain additional precision. In particular, we see the identification of saliency and salient events as a mechanism to focus associations. We dub this, saliency-mediated associative learning (SMAL). SMAL is similar to credit assignment, except that salience is a heuristically generated score rather than an assertion, thus making it amenable to statistical learning methods.

The learning mechanism of each of the two parts of the proposed affective memory model incorporates saliency to focus learning.

In the affective long-term episodic memory model, affect is associated with particularly salient subevents rather than the whole of the episode. In addition, the perceived root cause of the affective response in the episode are extracted or inferred when possible. Finally, a saliency score is given to the whole of the episode, to rate its importance and impact to the person being modeled. These three features together focus the associative learning mechanism, and help to answer the question, “what should be learned.” Of course, identifying saliency, being a flavor of the credit assignment problem, is not an easy task, especially over domain unconstrained texts. In the next subsection, we explain the role that a large common sense knowledge base plays in this important subtask.

In the affective reflexive memory model, associations are not made at the word-level, which would tend to conflate the affect of too many different senses of a word into the same entry, but rather, conceptual cues are those first-order or second-order phrases which follow the ontology: a person, an action, an object, an activity, or a named event. The choice of ontology reflects the types of salient concepts that we believe people typically form stable attitudes about. In addition, to embrace the possibility that concepts may have different affect valences under different contexts, an entry in the affective reflexive model may be keyed on a concept Bayesian conditioned over a particular discourse contexts. The difficulty in identifying the contexts which dictate a conceptual cue’s interpretation is discussed further in the evaluation of PERSONA. Finally, each exposure is associated with a saliency score, and conceptual cues with more entries are assignment more salient valence scores. By putting constraints on the types of concepts that can learn affective associations, by considering contexts is learning affect associations, and by valuating the saliency of the strength and frequency of exposures, the reflexive memory model seeks to incorporate as much precision as possible in its associative learning.

Having proposed the episode-reflex model of human affective memory and saliency-mediated associative learning as a mechanism for model acquisition, the next section discusses how such the model and learning mechanisms were implemented in PERSONA.

2.3Model Implementation in PERSONA

In proposing the model and learning mechanism, several subtasks where implied but not addressed explicitly, such as 1) having a source of personal texts meeting certain suitability criteria, 2) a model for measuring affect valence, 3) a mechanism for judging the affect of episodes and text in general, and 4) methods for determining saliency. These implementation issues are discussed in the ensuing subsections, following by a start-to-finish architectural walkthrough of PERSONA’s model learner.