Computing point-of-view

By Hugo Liu

Thesis Proposal for the degree of Doctor of Philosophy

at the Massachusetts Institute of Technology

January 2006

Professor Pattie Maes

Associate Professor of Media Arts and Sciences

Massachusetts Institute of Technology

Professor William J. Mitchell

Head, Program in Media Arts and Sciences

Alexander W. Dreyfoos, Jr. (1954) Professor

Professor of Architecture and Media Arts and Sciences

Massachusetts Institute of Technology

Professor Warren Sack

Assistant Professor of Film & Digital Media

University of California, Santa Cruz

Computing point-of-view: modeling and simulating judgments of taste

Hugo Liu

Media Arts and Sciences, MIT

December 2005

Abstract

Point-of-view affords individuals the ability to judge and react broadly to people, things, and everyday happenstance; yet it seems ineffable and quite slippery to articulate through words. Drawing from semiotic theories of taste and communication, this proposal presents a computational theory for representing, acquiring, and tinkering with point-of-view. I define viewpoint as an individual’s psychological locations within latent semantic “spaces” that represent the realms of taste, aesthetics, and opinions. The topologies of these spaces are acquired through computational ethnography of online cultural corpora, and an individual's locations within these spaces is automatically inferred through psychoanalytic readings of egocentric texts. Once acquired, viewpoint models are brought to life through viewpoint artifacts, which allow the exploration of someone else’s perspective through interactivity and play. The proposal will illustrate the theory by discussing interactive-viewpoint-artifacts built for five viewpoint realms—cultural taste, aesthetics, opinions, tastebuds, and sense-of-humor. I describe core enabling technologies such as culture mining, common sense reasoning and textual affect sensing, and propose a framework to evaluate the accuracy of inferred viewpoint models and the affordances of viewpoint artifacts to recommendation, self-reflection, and constructionist learning.

1 Introduction

Our capacity for aesthetics and emotional reaction is one of the most celebrated bastions of humanity. Underlying our explicit knowledge and rationality is a faculty for judgment—the capacity to prefer, to view the world through our individual lenses of taste. An interesting intellectual question is: can a computer model a person’s taste, aesthetics, and opinions richly enough to predict their judgment? The proposed thesis explores this question, in depth.

The user modeling literature has been exploring predictive models of persons for over two decades. Two main approaches have emerged—stereotype-based models and behavior-based models. Stereotype-based models, such as Elaine Rich’s (1979) book recommender system, represent persons by demographic categories and acquire each user’s profile by asking a set of questions. Behavior-based approaches perform statistical inference over a history of a user’s actions to predict future actions—key examples include social information filtering recommenders (Shardanand & Maes 1995), Bayesian goal inference (Horvitz et al. 1998), and community-driven distributed agent modeling (Orwant 1995). While Rich’s stereotypes are potentially profound models, they must be handcrafted through exercises of superb intuition, and even then, the coarseness of stereotypes tend to under-fit the individuality of people. Application-specific behavioral models can be acquired automatically, but the models themselves tend to form more haphazard impressions of people as they behave under overly specialized contexts, so there is a danger of over-fitting. Orwant’s (1995) Doppelganger user modeling shell demonstrates some desirable reconciliation of stereotypes with behavioral modeling. User model robustness is supported by fallbacks to overlapping memberships within various ‘communities’, which can be interpreted as dynamic stereotypes. Still, a drawback of behavior-based approaches is that they typically model individuals within the context of using applications, like tutoring systems and shopping websites[1]. As such, these models model ‘users’, but do not model the ‘persons’ who underlie user-hood. While user modelers routinely capture user’s ratings of items within application contexts, a general model and simulation of a person’s tastes, aesthetics, and opinions that cuts across application domains has not yet been achieved.

Scope. The scope of the proposed thesis, then, will be to approach the computational modeling of a person’s taste, aesthetics, and opinions in richer, more sophisticated ways, modeling persons as rich wholes, not just as application users. The under-specificity of stereotype-based models and the over-specificity of behavior-based models will together be addressed in the proposed thesis by modeling persons by their possession of various rich affected systems of aesthetical dispositions, organized under the unification of viewpoint. I pose things like taste, aesthetics, and opinions, as points-of-view in order to emphasize this crucial metaphor—we are always judging the world through the optics of some viewpoint, and our viewpoint can be seen as a location within the greater cultural space of possible viewpoints. I also differentiate the treatment of point-of-view in this thesis from previous computations of point-of-view by Warren Sack (1994; 2001). Whereas Sack’s robotic readers mine ideological ‘spin’ structures from news stories, this thesis examines psychological point-of-view. Ideological point-of-view is a set of politicized and institutional conventions, what Lakoff (Lakoff & Johnson 1980) calls metaphorical framings, e.g. the Islamic martyrs versus the Islamic terrorists; psychological point-of-view is concerned with modeling the interior experience of one individual—how a person sees the world idiosyncratically by possessing various unconscious, culturally-conditioned lenses that give emotional tint to judgments and reactions.

Claims. The thesis will present a series of experimental systems that have been built to capture and simulate psychological viewpoint under five realms—cultural taste, aesthetics, opinions, tastebuds, and sense-of-humor. These experiments will support the thesis’ three main claims, listed below.

v Viewpoint can be modeled and simulated as an individual’s psychological locations within latent semantic spaces that represent cultural taste, aesthetics, and opinions. This claim will be dissertated as a computational theory of point-of-view, building closely on existing semiotic/cultural theories of viewpoint, aesthetics, culture, and taste (Jung 1921; Barthes 1964; Bourdieu 1984; Latour 2005).

v The topology of viewpoint spaces can be acquired by semantic mining of large-scale cultural corpora; while an individual’s location can be inferred by psychoanalytic reading of his/her egocentric texts. Viewpoint spaces are acquired by a technique called culture mining, which combines natural language processing and machine learning to discover the emergent semantics of cultural corpora like social network profiles and weblog communities. In Semiotics, psychoanalytic reading means reading text deeply in order to model the author (Silverman 1983). I extend existing computational models of reading (Zwaan & Radvansky 1998; Moorman & Ram 1994) to acquire the author’s viewpoint location. Common sense reasoning, and textual affect sensing are two technologies critical to computing psychoanalytic reading.

v Interactive viewpoint artifacts that simulate a person’s taste judgments can provide deeper user models for recommendation, and can support constructivist learning of other people’s perspectives. Interactive artifacts have been implemented and evaluated for each of the five experimental systems. The artifacts are interface agents that perform just-in-time information retrieval (Rhodes & Maes 2000)—in other words, they observe the user’s context of browsing and writing, and constantly offer their reactions to the user.

A process model (Figure 1) contextualizes the three claims within what is to be achieved by this sort of modeling. For the past four years, I have been implementing and evaluating these five experimental systems and the core natural language and common sense components. More recently, I have reflected on the interconnections between these various systems and I now propose to organize and unify their discussion under the banner of point-of-view models. Deep person modeling fits well under the banner of Ambient Intelligence Group, where our motto is insight, inspiration, and interpersonal communication. Other than completing several further evaluations, I see my main research task as synthesizing together a coherent theoretical framework that will make contributions to the User Modeling and Artificial Intelligence literatures. The problems of viewpoint, taste, and aesthetics are so complex that to explain them would necessarily require crossing many literatures. Thankfully, Semiotics is a virtual literature whose mandate is the linguistic modeling of self, culture systems, and aesthetics. Semiotics is a stew of other disciplines such as literary theory, psychology, psychoanalysis, sociology, and cognitive science. Thusly, the major aspiration for this thesis is to import many important deep Semiotic theories of viewpoint, taste, and aesthetics into the computational literature; and to substantiate, through extensive evaluation, the efficacy of these theories in accomplishing sophisticated modeling of persons.

2 Proposed Research

Section 2.1 presents a computational framework for point-of-view. Section 2.2 discusses three core technologies necessary for viewpoint computation—culture mining, common sense reasoning, and textual affect sensing. Section 2.3 overviews the five implemented experimental systems and their viewpoint artifacts, which are already implemented. Section 2.4 outlines an evaluation strategy for this thesis.

2.1 Computational Framework

Groundings. I compute viewpoint as an individual’s psychological location within latent semantic spaces such as cultural taste, aesthetics, opinions, tastebuds, and humor[2] (Figure 2a). This framework is grounded in the Semiotics and Cultural Criticism literatures’ tradition of psychological situationalism (Hume 1748) and social constructionism (Lacan 1957; Bourdieu 1984; Latour 2005)—the notions that individuals are constructed by their environment, and that subjectivity is the product of socialization. Pierre Bourdieu’s Distinction: A Social Critique of the Judgment of Taste (1984) is a seminal work in Cultural Criticism which needs to be mentioned upfront, for it comes very near to being a direct theoretical basis for the computational framework presented in this thesis. In that work, Bourdieu surveyed 1200 French persons in the 1960s, computed statistical correlation, and found a relationship between taste and class structure in French society. He theorized an individual’s judgment faculty as being structured by a set of personal dispositions called a habitus, which is constituted from a cultural field of socio-economic conditions. The intersection of the personal habitus and cultural field is called doxa—doxa, then, is the site of the individual’s cultural identity. Habitus, field, and doxa, I suggest, is almost a parallel vocabulary for viewpoint, space, and location, respectively. Space/field defines the limits of what is possible. Location/doxa defines where an individual’s psychology fits into the culture. Viewpoint/habitus is an individual’s system of dispositions (e.g. system of opinions, system of aesthetics, system of taste); this is the psychological structure that can be used directly to predict the individual’s future judgments and reactions.

Building on the success of Bourdieu’s theory, the computational framework presented here considers more than just the space of cultural taste (Taste Fabric / InterestMap) —it extends the space/location/viewpoint metaphor to experiment with modeling persons under other realms such as perceptual aesthetics (Aesthetiscope), opinions (What Would They Think?), tastebuds (Synesthetic Recipes), and humor (Buffolo). The topology of these spaces can often be acquired through computational ethnography of online cultural corpora—the invocation of latent semantic mining to reveal the emergent correlations and network structure of a cultural space. For example, Taste Fabrics is a densely connected network of cultural taste, mined from automated analysis of the texts of 100,000 social network profiles.

An individual's locations within these spaces can often be inferred through psychoanalytic readings of egocentric texts (self-revealing, self-describing), for example, a diary, a research paper, a social network profile. Psychoanalytic reading means reading not for the message, but for the subjectivity of the message sender. The technique of psychoanalytic reading is anchored in Semiotics—Roman Jakobson’s theory of communicative function (1960), JL Austin’s speech acts theory (1962), and Kaja Silverman’s suture technique for psychoanalyzing narratives (1983). The common ground of these theories is that they all pose emotional attitude as the unifying force of subjectivity. In speech acts, underlying each utterance is the illocutionary force, which is the author’s emotional posture, such as aggression, agreeableness, or sadness. Similarly, Jakobson suggest that the goal of emotive communication is to paint a portrait of the author, which is why the present research prefers emotionally expressive egocentric texts as a way to ensure that the subject can be modeled. Heeding these theories, the proposed thesis computes psychoanalytic readings by reading for the unconscious emotional undertone of topics discussed in egocentric text. Natural language understanding, common sense reasoning and textual affect sensing are core technologies which achieve psychoanalytic reading.

Knowledge representation for viewpoint spaces. Figs. 2b-2d illustrate three varieties of knowledge representation used in this thesis research to model latent semantic spaces. But why different representations of viewpoint and not one? Because sometimes the space has straightforward dimensionality (Fig. 2b) while other times a space can appear quite disorganized (Figs. 2c-d). The choice of representation is ultimately an engineering consideration, but I believe that the three representations developed through this thesis are principal.

There is a pecking order. Dimensional spaces are most preferred, as meaning is most organized, and Cartesian distance is easily measured. The viewpoint space for perceptual aesthetics (Fig. 2b) developed in this thesis is dimensional—its axes are based on Carl Jung’s theory of fundamental psychological functions (1921). Next best are semantic fabrics, which are n by n correlation matrices with topological features like cliques and stars. Semantic fabrics are fully connected representations, but with only patchwork consistency—distance is non-Cartesian here but can be measured simply by spreading activation (Collins & Loftus 1975). The mining of the latent space of cultural tastes from social network profiles (Liu & Maes 2005a, Liu, Maes & Davenport 2006) leverages semantic fabrics because while the mutual information between cultural products (e.g. books, music, films) can be calculated, it is believed that the dimensionality of this space are too complex to be able to name principle dimensions. Still, the space enjoys partial organization such as cliques of highly correlated products, and star structures around “identity hubs” (e.g. products like ‘yoga’, ‘hiking’ can be organized around the hub of ‘new agers’). In the poorest case, neither dimensions nor connectedness are known, as is the situation for this thesis’s modeling of a person’s system of opinions. The space of all possible opinions (opinion = an attitude about a topic) is consistent around a few ideological centers like politics and academia, but there is no obvious global consistency. My What Would They Think? system (Liu & Maes 2004) develops a semantic sheet representation (Fig. 2d)—to make the best of this situation.

Inspired by Marvin Minsky’s “causal diversity matrix” (Minsky 1992), Figure 3 summarizes these representational tradeoffs. Note that a third dimension could also be name—semioticity. We could distinguish “dimensional spaces” as being either a semiotic /structuralist space like Jung’s modes of perception, or as being a data-emergent “quality space” (Gärdenfors & Holmqvist 1994).