Computing point-of-view

By Hugo Liu

Thesis Proposal for the degree of Doctor of Philosophy

at the Massachusetts Institute of Technology

November 2005

Professor Pattie Maes

Associate Professor of Media Arts and Sciences

Massachusetts Institute of Technology

Professor William J. Mitchell

Head, Program in Media Arts and Sciences

Alexander W. Dreyfoos, Jr. (1954) Professor

Professor of Architecture and Media Arts and Sciences

Massachusetts Institute of Technology

Professor Larifari Aufhebung

King of Candy Land

Computing point-of-view

Hugo Liu

Media Arts and Sciences, MIT

November 2005

Abstract

A point-of-view affords individuals the ability to judge and react broadly to people, things, and everyday happenstance. Your same sense-of-beauty is versatile enough to judge almost anything you put before it, be it a painting, a sunset, or a novel's ending. Yet point-of-view is ineffable and quite slippery to articulate formally through words—just as light has no resting mass, perhaps it could be said that viewpoint cannot be measured in stasis. Drawing from semiotic and epistemological theories, this proposal narrates a computational theory for representing, acquiring, and tinkering with point-of-view. I define viewpoint as a self's collected situations within latent semantic spaces such as culture, taste, identity, and aesthetics. The topology of these spaces are acquired through linguistic ethnography of online cultural corpora, and an individual's locations within these spaces is inferred through psychoanalytic machine readings of egocentric texts. Once acquired, viewpoints can gain embodiment as viewpoint artifacts, which allow the exploration of someone else through interactivity and play. The proposal will illustrate the theory by discussing interactive-viewpoint-artifacts built for five viewpoint realms—aesthetics, attitudes, cultural taste, taste-for-food, and humor. I describe core enabling technologies such as common sense reasoning and textual affect sensing, and propose a framework to evaluate the judiciousness of point-of-view representations and the value of viewpoint artifacts in affording people new ways for organizing, shaping, and searching human narrative content.

Introduction

Since the late 1950s, every few years, some researcher in Artificial Intelligence has exclaimed eureka, that they have almost engineered a human intelligence, or some basal capability of a person. But in 2005, four years after the computer H.A.L. should have played tricks with man in space, Artificial Intelligence feels still the same distance from this ever-present mirage of human-level intelligence.

So it seems there were several bad paradigms stalling progress on representing and computing people. First, too grandiose of claims were made about formal logic and purely symbolic representation—nicknamed Good Ole Fashioned AI by its detractors. Logic, with its immaculate and universal calculus, treats minds like Rube-Goldberg machines, and idealizes thought process the way that Descartes did. Logic failed because thought is far too flexible, rich and opportunistic than can be contained by a mathematically rigid, symbolically sparse, and non-opportunistic representation like first-order predicate calculus. Second, much ado was made about purely connectionist representations like artificial neural networks. The idea was that a properly wired ‘baby machine’ could be deployed in the world and re-derive human mental capability applying only first-principles. Like logic, this touched another extreme of the representational spectrum, namely it was representationally agnostic. The approach has yet to demonstrate compelling emergent intelligence. Around (??), Marvin Minsky wrote a nice piece reporting the stalemate—he suggested that the common error was that ‘neat’ representations could never capture ‘scruffy’ and diverse human intelligence. He proposed that the intelligence modeling enterprise should instead be focused on combining ‘multiple representations’ (SOM). In advocating the overthrow of Cartesian hegemony, Minsky may have unbeknowingly inspired Gilles Deleuze and Felix Guattari’s defining work of our time “A Thousand Plateaus: Capitalism and Schizophrenia—“ () also advocating the overthrow of Modernism’s immaculate linear account of life and thought.

While some illusions have been overcome, Artificial Intelligence needed in the boom of expert systems and needs now again in the boom of knowledge-based approaches to sort out the importance of microscopic knowledge, given as expert rules, or “facts about the world—“ whatever that may mean. The shadow of Descartes haunts ‘facts’ as much as logic—for even if facts are received cum grano salis and their truth conditions are hedged, they still purport to evoked by people engaged in thinking. As a matter of reflexivity, much of our Open Mind Common Sense work at this lab is as vulnerable as Cyc ( ) to the stamp-collecting syndrome. Cyc’s 3 million assertions and Open Mind’s 800,000 sentence-based “facts” do not further them in a ‘horse-race’ toward human level knowledge. So long as representation is purely symbolic—as facts are—abilities granted to children like dexterously manipulating a ball ( ) or granted to adults like skill with people might occupy billions if not more sentences to describe judiciously. The warning to heed is that human intelligence is not about possessing rote knowledge. Having knowledge around does not ensure that it can be applied judiciously and opportunistically to form coherent thoughts and reactions.

Motivated by a search for coherent yet flexible representation and emulation of human intelligence, we identify point-of-view as a crucial metaphor for conceptualizing human intelligence. A layperson’s dissection of the “point-of-view” concept—two participants in an argument are debating the merits of an artwork and find that they disagree; one says to the other, “but from my point-of-view, I see things differently.” Here point-of-view evokes an image of the two debaters standing at opposite ends of an opinion-space. In the middle is a large blob representing the true meaning of the artwork. The claim “from my point-of-view, I see things differently” reifies as one debater reporting that he can see a different side of the true meaning of the artwork than can the other debater, while allowing that she herself cannot grasp the whole meaning. So, having point-of-view relieves the anxiety of having true thoughts—instead, it privileges coherency and integrity over truth itself, for standing from the same vantage point, a debater will tend to report all sightings of meaning blobs with the same idiosyncratic tendencies, always seeing a certain side to things.

A point-of-view is easy. Every person is always operating under one or more points-of-view regardless of having reflexivity about it, because cognitive economy dictates that our knowledge and memories are always consolidated and systematized, with at least patchwork consistency. In Metaphors We Live By, George Lakoff and Mark Johnson [] report that language itself is organized and unified by culturally-specific metaphorical frameworks, which then shape the thoughts of cultural participants in the way that Lacan [] and Whorf [] had presaged. For example, time is money, as in “I spent my day on you, I can’t believe I invested so much time in you, and you weren’t worth it.”

The grandeur of point-of-view’s economy is easily demonstrated. Look at this artwork, do you find it beautiful? Read this book ending, is it beautiful? Is this sunset beautiful? Or this government? Most likely, your sense-of-beauty viewpoint prepared you to judge all of these things, or at least attempt judgment. Point-of-view affords the immediacy of judgment over person, thing, idea, or situation placed within its realm. There is no need to move, to be agile, for judgment often happens like the natural reflex of a knee popping when stricken with a mallet. Whereas a facts-oriented view of thought requires conceptual knowledge, every person has abundant judgmental knowledge for virtue of possessing points-of-view like sense-of-beauty, sense-of-humor, sense-of-cultural-identity, a palette for food, and a personality. It is not necessary to store each judgment as a fact, for point-of-view’s lucidity readily produces judgments as it reacts to whatever fodder is put before it.

Economical, flexible, and broad in applicability, point-of-view is a powerful framework and mover for human judgmental thought, arguably exceeding conceptual and logical thought in breadth and utility. If point-of-view could be successfully modeled, acquired, and animated computationally for a few important human realms such as aesthetics, identity, and opinions, in toto, the computational system would be emulating a significant basal capability of human thinking.

To be clear, a computational model of an individual’s point-of-view would constitute a stereotype of that person that is not as agile, and that might make the same judgment if asked ten times in a row. But I will argue in this proposal that this would still be an extremely useful stereotype. What if every person could access a computational stereotype representing 80% of their mentor’s judgmental capability—to bounce random things off their ‘virtual mentors’ without resource bounds? There would be real consequences for education if students could ‘tinker’ a la constructionist learning ( ) with the stereotyped opinions and perspectives of mentors, computationally producing ‘just-in-time’ and ‘just-in-context’ reactions to the student’s actions.

The goal of the research proposed here is to design, build, and validate systems for 1) modeling an individual’s point-of-view within various realms—such as aesthetics, attitudes, and identity; for 2) automatically acquiring an individual’s point-of-view model through machine readings of egocentric (self-effacing, self-describing) texts and 3) organizing the model into coherency; and for 4) animating point-of-view placed inside interactive artifacts such as virtual mentors by causing the artifact to judge and react to a very broad range of things placed before it, ‘just-in-time,’ and ‘just-in-context’.

I plan to address these four steps as follows. 1) To develop representations of viewpoints across the realms of concern, I will draw heavily from well-established semiotic and epistemological theories of said realms from the psychology and literary theory literatures. For example, Carl Jung’s Modes of Perception (Think, Intuit, Sense, and Feel) [] form the dimensions of my proposed aesthetic viewpoint space, as I pose aesthetics as the perceptual manner and priority with which an individual approaches some topic—a realist sees a sunset, but a romantic might prefer to feel the sunset. The realist is thus located at the position, 100% Sense, 20% Think, 20% Intuit, 20% Feel, for example. 2) To automatically acquire an individual’s point-of-view, I propose to apply natural language processing tools such as my widely used MontyLingua package ( ), in conjunction with my common sense reasoning package ConceptNet ( ), and my textual affect sensing system known as Emotus Ponens ( ). In particular, I anticipate that reading emotion out of text will be vital to modeling viewpoint because human judgment often reifies in narratives through emotional appraisal or mannerisms around a topic’s discussion. 3) To make point-of-view models somewhat coherent, I will apply analogy-based reasoning ( ). For example, knowing that a person loves trees, by analogical-extension, they might also love rocks (note that this is different from a layperson intention for the word ‘analogy’); however, pitfalls must be avoided—for example, a dog lover may hate cats, even though dogs and cats are both pets. 4) Finally, to animate point-of-view, I will follow Brad Rhodes’ methodology of Just-in-Time-Information-Retrieval (JITIR) [] which prescribes that interface agents—in my case a virtual mentor reacting to things that you are writing or doing using its viewpoint—continuously mine present user context and utterances, searching for opportunities to retrieve and present relevant information – in my case, a viewpoint-produced judgment about whatever the user is doing—on the chance that it can lend insight, inspire, or teach the user.

While the acquired models will not be absolutely complete or always correspond to true viewpoint, and while none of the produced reactions will be as spontaneous or as flexible as those of the actual person, I believe that even a first-order approximation of model acquisition and animation can produce incisive models of individual perspective, that upon animation will afford novel and effective new ways to search, gain insight into, be inspired by, and connect with someone else and their collected narrative content. I have italicized three words in the previous sentence because these words constitute the tripartite agenda of our Ambient Intelligence Group. I believe that our group has the most to gain from such a thesis as the methodological conclusions of this research would directly inform much of the impact we seek for our technologies to have on people.

Finally, this thesis is as diverse and as simple as I believe Media Laboratory research should be—diverse in the methods and theories it draws from, but simple in that it is attacking a basic problem of relevance to people—so basic that it’s goal could be explained to anyone on the street. This thesis draws from Sociology, Literary Theory and Psychology for its computational framing of point-of-view, from Computational Linguistics and Artificial Intelligence for reasoning about text, and from Interaction Design for designing point-of-view artifacts. I have developed but not assembled nor integrated some implementations for this thesis, and already it forms the basis for an AAAI workshop on computational aesthetics which I will co-chair upon the proposed completion of this thesis. To do justice to an idea as complex and with as long as a history as ‘point-of-view,’ it will be important to clothe the thesis in all of the relevant literatures and to spend as much time on a computational theory of point-of-view, as on technical details of implementation. Otherwise, this work would lose a golden opportunity to be absorbed by an AI community that is interested in how machines can appraise beauty and emotion, and by a humanities and cultural studies community that would be very interested in the computation of its long-standing but thought incomputable theories of identity and aesthetics. The rest of this proposal will reflect my emphasis on the importance of grounding this thesis in the literatures, and on the importance of distilling reusable methodology and a robust theoretical framework. I will, of course, motivate all theory with many implemented demonstrations and task-based evaluations.

References

Certeau, Michel de (1997), Culture in the Plural. Ed. and intro. Luce Giard. Trans. and afterword Tom Conley. Minneapolis: U of Minnesota Press.

Geertz, Clifford (1973), The interpretation of cultures. New York: Basic.

Goffman, E. (1959), The Presentation of Self in Everyday Life. Garden City, NY: Doubleday.

Kluckhohn, Clyde (1949), Mirror for Man. McGraw-Hill Book Co.

Krueger, Myron (1983), Artificial Reality, Addison Wesley.

Liu, H., Maes, P., Davenport, G. (2006), “Unraveling the Taste Fabric of Social Networks”, International Journal on Semantic Web and Information Systems 2(1). Idea Academic Publishers.

Liu, H. Davenport, G., Maes, P. (forthcoming). “Taste Fabrics and the Beauty of Homogeneity.” Association of Information Systems SIG SEMIS Bulletin 2(x), ISSN 1556-2301.

Rokeby, David (1995), “Transforming Mirrors: Subjectivity and Control in Interactive Media.” In Penny, Simon (Ed.), Critical Issues in Electronic Media: 133-158, Series in Series in Film History and Theory, Albany: SUNY Press.

Rozin, Daniel (2005). Works, (accessed 25 October 2005)

Ruby, Jay (1980), “Exposing yourself: reflexivity, anthropology, and film” In Semiotica 30-1/2: 153-179.

Shakespeare, William (1997), As you like it. In The Riverside Shakespeare, Houghton Mifflin Co., 2nd edition. Originally written circa 1598-1600.

Sonenberg, Janet (2003), Dreamwork for Actors. Routledge, Inc.