Deliverables Report

IST-2001-33310 VICTEC

January 2005

FearNot!: Final demonstrator of VICTEC

VERY MUCH A DRAFT

AUTHORS: João Dias, Marco Vala, Steve Grant, Ruth Aylett, Ana Paiva, Rui Figueiredo, Daniel Sobral, Mafalda Fernandes, Mick Lockwood, Sandy Louchart.

STATUS: Discussion

CHECKERS:

PROJECT MANAGER

Name: Ruth Aylett

Address: CVE, Business House, University of Salford, University Road,, Salford, M5 4WT

Phone Number: +44 161 295 2922

Fax Number:+44 161 295 2925

E-mail:

TABLE OF CONTENTS

1 EXECUTIVE OVERVIEW

This deliverable presents the final prototype of the demonstrator FearNot!. First, it overviews the challenges and requirements posed for the development of FearNot!. Then, it describes the first version (the scripted version) of the system and draws some conclusions concerning its evaluation (reported in D7.2.1). Then, it described the runtime system and the emergent narrative version describing the agents, the view system and how the integration was done. Finally, it shows some of the results obtained in a small-scale evaluation of the emergent version.

2 INTRODUCTION

Intelligent Virtual Environments (IVEs) bring new challenges to the way we use technology in educational contexts, promoting and creating new learning experiences where experimentation and presence are explored. One of the big advantages of IVEs is that they offer a safe place where learners can explore and understand through experimentation without the dangers or problems of the real situations. Moreover, when IVEs are augmented with contextual information, questions and activities, they can engage learners in entertaining and motivating experiences, otherwise often considered as boring and uninteresting. Like computer games, IVEs may allow learners to get immersed and interact in synthetic worlds using a set of interaction facilities, such as movement and language interaction, as well as specific actions with other characters. Inhabiting these IVEs can be agents or intelligent characters, that are responsible for events that happen in the environment and make it not entirely predictable or completely controlled. Characters can be given the roles of teacher; helpers, companions, elements in the simulated worlds, or even friends. They become the part of the environment giving life to the interaction with the learners.

However, when considering Social learning, as in the case of bullying (and in FearNot!) using IVEs, these characters play a fundamental role, and their believability is perhaps one of the main goals to attain. A believable character has been defined as a character that gives the illusion of life and allows the user's suspension of disbelief [Bates94]. This quest for believability has indeed been the Holy Grail of the area of synthetic characters for years. However, given the nature of the concept (believability), several aspects are at stake. One of them is the character's appearance. Are more realistic characters more believable? And what about cartoon like characters?. A second factor that leads to believability is the character's autonomy. Again, some results show that the more autonomy may seem more believable. See for example the case of the tamagochis. However, autonomy is difficult to achieve in synthetic characters as there are tremendous technological problems, such as for example believable and expressive speech generation. Often, completely scripted characters may lead to more realistic and believable situations. Finally, one other important aspect to consider is the narrative structure of the story behind the characters. A good and believable situation may lead the user to believe the character itself.

Animators and film makers have for a long time been producing situations and characters that are believable, having the power to make the viewer feel emotional reactions. However, doing it in real time, with autonomous characters is still a difficult research challenge. It needs competences from agent research, believable characters, empathy, computer graphics, education and cinematography.

This deliverable presents the final version of FearNot! which combines research in all the above topics.

This deliverable is organised as follows: first we will describe the scripted version of FearNot!. Then we will briefly overview the emergent version of FearNot!, describing all its components: the language system; the agent’s minds and the view system. Finally, we will describe a small-scale initial evaluation performed with the emergent version of FearNot! that shows the relationships children had with the autonomous characters and establishes some comparisons with the scripted version.

3 FearNot!

The overall pragmatic objective of the development of FearNot! was to build an anti-bullying demonstrator in which children age 8-12 experience a virtual scenario where they can witness (from a third-person perspective) bullying situations.

To avoid group pressure and enable individualized interaction, the experience is for a single user. The child acts as an invisible friend to a victimized character, discussing the problems that arise and proposing coping strategies. Note that in bullying situations there are quite clear identifiable roles: the bully, the victim, bully-victim (a child that is sometimes the victim and sometimes the bully) and bystander.

The scenario begins by introducing the child to the school environment and the characters, providing a starting context (see Figure 3.1) . This initial presentation provides the background needed about the characters in the story (a description of who is the bully, the victim, and so on). Then, the episodes start. The whole session is developed one episode after another.

Figure 3.1 Introduction to FearNot!: setting up the scene

Within an episode, the child is mostly a spectator of the unfolding events (the narrative emerges from the actions of the participant characters). After each episode, however, the victim will seek refuge in a resource room (identified as a library) where a personalized conversation with the user can occur.

Figure 3.2: FearNot! Cycle

Then, the child takes the role of a friend of the victim advising her on what to do. A short dialogue takes place between the two (see Figure ??), where the victim raises the main events that occurred in the previous episode and asks for the child's (learner) opinion and suggestions for future behaviour. The dialogue established between the child user and the victim character is carried out based on a set of menus holding standard responses to bullying situations, however, allowing the children to express the reasons and the expectations for the advice given to the character victim. Nevertheless, note that the victim is clearly recognized as a believable self, with its own personality and behaviour, and thus may decide to reject the child's suggestions (see Figure ?? ).

Figure 3.3. The interaction window

Each dialogue finishes with a decision that influences the character's behaviour in future episodes. In the initial version we developed three episodes that were pre-scripted, and the advice of the child would lead simply to a choice in the type of ending achieved. For example, if the child’s advice was to tell the parents or to seek help from a friend, the final outcome would be positive, otherwise the ending would be somehow negative.

4 Emergent FearNot!: General description

5 The Language System

The language system allows for the agents to communicate between themselves, as well as for the interpretation of the utterances by the children and the generation of all the utterances by the agents, so that the child can understand the story. Some definitions needed are:

Language Engine – Each agent, including the agent that represents the user, has its own instance of the language engine class compiled into its code. The language engine is the interface for converting speech acts into utterances and user input into speech acts. Each agent has its own copy so that the conversational context can be agent-specific.

Speech Act (SACT) – This is the XML information understood by the agent mind. Agents pass incomplete speech acts to their language engine, describing what they want to say (e.g. insult Eric). The language engine then creates an appropriate utterance, adds it to the SACT and returns it to the agent. User input is also passed to the user agent’s language engine within an incomplete speech act (this time as an utterance), and the language engine completes the SACT by adding semantic information, such as the class of speech act represented by the input text and anything else the agent wishes to know.

Language Act (LACT) – This is XML information understood by the language engine. Each agent can have its own database of LACTs, although agents with similar needs can share a single database. The LACT database is used to convert SACTs into meaningful utterances, and user input into meaningful SACTs. Each language act in the database contains the information needed to choose and utter (or identify) an appropriate phrase for one given speech act type.

Conversational Context – The language engine has the job of maintaining semantic coherence, freeing the agents from having to know anything at all about natural languages. It does this in several ways using context variables. Each context variable is a name/value pair, such as topic=football, swearword=moron, you=Eric. The flow of natural language is used to maintain these context variables without involving the agents at all. However, agents may alter/determine the current conversational context by sending/receiving context variables in a SACT. Context variables can be either local (each agent keeps its own copy) or global (all agents share a copy). They also come in two flavours: Lexical and Semantic. Lexical variables track the precise words being used, while semantic variables track more abstract information about the flow of the conversation.

Synonym lists – the XML LACT database also contains a synonym dictionary, consisting of lists of words with similar meaning. These have many uses within the language system. Synonym lists have a root word (e.g. “idiot”) and a list of synonyms (moron, prat, jerk, etc), which can include common misspellings. They also have attributes that provide semantic information.

Canonical form – an important part of the processes of extracting context and identifying user input is the conversion of natural language to a simplified and generalised form. I call this its ‘canonical’ form, and the process is called standardising the text. The canonical form of a sentence has had all of its words converted (where possible) into their root form, using synonym lists. It is also entirely lower case, has no punctuation, and each word is separated by a single space. For example the user input “Yeah - but yore such a MORON!” might be standardised to “yes but you’re such a idiot”. The process of standardisation identifies any changes of context (and hence is used even for agent utterances, although the resulting canonical string is then discarded) and it puts the phrase into a form that is easy to match against <phrase> templates when identifying user input.

Tokens – Phrases consist of explicit text intermingled with various tokens, delimited by square brackets, e.g. “Leave him alone, [YOU]”. These tokens are expanded when generating an utterance. Also, some tokens are used to help with identifying user input.

Program flow

· When an agent wishes to say something to another agent or the user, it first passes an incomplete speech act to the Say() method of the language engine. Say() uses the conversational context and the language act database to create an utterance, which is added to the SACT and returned. The agent then passes this to code that displays a speech bubble or whatever, as well as to the agent that needs to respond to the speech act.

· The recipient of a speech act (including its utterance) also passes it through its language engine via the Hear() method. This allows the language engine to extract personalised context information (for example the name of the person speaking to it). The Hear() method returns the SACT unaltered to the agent.

· The user is also represented by an agent, which has its own language engine. Natural language input from the user is presented to the engine via its Input() method, as a SACT containing (at a minimum) an utterance. The language engine uses the context and a (probably specialised) LACT database to find the language act that best represents the user’s intentions, the name of which is added to the SACT (along with context information such as who is speaking) and returned to the agent.

No other interaction between the agents and their language engines should be needed. It’s SACT-in, SACT-out in every case, with the language engine completing the missing elements of the SACT as appropriate.

Creating an utterance, in more detail

Information is extracted from the SACT, such as the names of the speaker and recipient (but also any other semantic information the agent wishes to provide), and is used to set various context variables.

The specified SACT class name is then looked up in the speaking agent’s LACT database. The resultant LACT entry consists of several phrases, which are scored for appropriateness, based on the current conversational context, some random noise, and how recently they’ve been uttered before.

The highest scoring phrase is chosen, and its tokens are expanded to produce an utterance. The utterance is added to the SACT and returned to the agent.

Finally, the utterance is temporarily converted into its canonical form, in order to extract useful contextual information from it using the synonym dictionary. This allows the speaking agent, the receiving agent and later speakers to know what exact words have been used in key parts of the speech act (e.g. which particular insult was hurled), and what semantic context changes there have been (a change of topic or mood, say).

Identifying user input, in more detail

A SACT containing an utterance (the raw user input), plus a small amount of other information, is sent to the language engine. The engine converts the user input into its canonical form. This has the effect of:

· Correcting common spelling mistakes

· Converting a wide variety of words into a smaller variety of root types for easier identification

· Extracting context information from the user input

The canonical user input is then compared against canonical versions of all the phrases stored in the user agent’s LACT database. Some phrases will be matched explicitly while others contain wildcard tokens such as [BEGINS], which allows sub-phrases and key words to be matched. The first phrase that matches the input defines the LACT name most applicable to the user’s intentions. This is then added to the speech act and returned to the agent.

By converting both the input and the LACT phrases to canonical form, it should be easy to find matches for even quite arbitrary user input (so we needn’t constrain the user into answering explicit questions as much as was feared). For example, if the user typed “I reckon yer shood have told the teecher”, this might be rendered canonically as “I think you should have tell a teacher” and matched successfully against the phrase “[ENDS] tell a teacher”.