Deliverables Report
IST-2001-33310 VICTEC
<August 2003>
First Prototype of Empathic Synthetic Characters
AUTHORS: Raquel César, Daniel Sobral, Ana Paiva, Rui Ferreira, Rui Prada, Ruth Aylett
STATUS:Final
CHECKERS: Sandy Louchart
PROJECT MANAGER
Name: Ruth Aylett
Address: CVE, Business House, University of Salford, University Road,, Salford, M5 4WT
Phone Number: +44 161 295 2912 Fax Number:+44 161 295 2925
E-mail:
TABLE OF CONTENTS
1Purpose of Document
2Executive Overview
3Introduction
3.1The Planning and Coping Module
4Neural Planning
4.1Overview
4.2Basic Mechanisms
4.2.1Schemas
4.2.2Bistables
4.3Neural Network Definition
4.3.1Nodes
4.3.2Links
4.4Neural Network Functioning Principles
4.4.1Detection of goals
4.4.2Context Learning
4.4.3Base nodes functioning
4.4.4Specialized nodes functioning
4.4.4.1Detection of learning situations
4.5Context usage
4.5.1Context definition
4.5.2Context role
4.5.3Learning
4.5.4Off transitions
4.6Splitting Mechanism
4.6.1Creation of goals
4.6.2Creation of Subgoals
4.7Contention Scheduling
4.7.1Reward and drive computations
4.7.2Contextual component of the drive
4.7.3Reward anticipation
4.7.4Incompatibilities between commands
4.8Motivational System
5Integration in the Agent’s Framework and First Results
5.1Overview of the Framework
5.2The agents in the Framework
5.2.1Sensors
5.2.1.1Add World Content Sensor
5.2.1.2Effector Execution Sensor
5.2.1.3Message Sensor
5.2.1.4Property Sensor
5.2.1.5Look At Sensor
5.2.2Effectors
5.2.2.1Add World Content Effector
5.2.2.2Change Property Effector
5.2.2.3Look At Effector
5.2.2.4Say & Tell Effectors
5.3Use of the Agent’s First Prototype
5.3.1New Sensors Defined
5.3.2New Effectors Defined
5.3.3WildTangent View Agent
5.3.3.1Execution
5.4Integration with the Wild Tangent environment
6Conclusions
7REFERENCES
Appendix A - Spreaders and Graphical Formalism
Appendix B - Goal Learning Rule
Appendix C - Context Learning Rules
Appendix D - Neural Planing Implementation
Base Nodes
Bistable activity
Specialized Nodes
Competition mechanism
Bistable activity
Detection of learning situations
Context Learning
Adaptation of the signal split_bascule
Off transitions
Contention Scheduling
Reward and drive computations
Contextual component of the drive
Reward anticipation
Incompatibilities between commands
Appendix E - Integration with the WT environment
Resources Management
Execution
Communication
Execution
View Actions
Abstract Action class
Body Definition class
Animation playing
Collision checking
Character moving
View Actions Management
Execution
Path Planning
Heuristics
Search state
Successor states generation
Further path planning algorithm improvements
1Purpose of Document
This document follows D5.1.1, which described some of the requirements needed for the creation of Empathic Characters. The main result of the deliverable was the development of an agent architecture that could enable the development of Empathic Synthetic Characters. This involved the production of a generic Agent Framework within which such architecture can be located.
This document presents the first prototype of the empathic characters, relating to the requirements assessed and the agent architecture that was devised in the previous deliverable. It aims at a first detailed design and implementation of the inner mechanisms driving the motivated behaviour of the characters.
The integration of the agent architecture developed with the runtime system and the modifications needed to allow for such integration are also discussed.
2Executive Overview
This deliverable describes a first prototype of the empathic characters architecture devised in deliverable D5.1.1. Due to its extreme importance and complexity, this document will focus entirely on the planning and coping module, which has the main responsibility of defining the characters’ deliberated behaviour.
Other aspects of the characters, as the design process of the external look and feel have already been discussed in the previous document and will be deferred from this document. Nevertheless, all the issues raised in D5.1.1 are permanently accounted in the development of the characters and will be taken into consideration in the deliverable that describes the final prototype.
After a brief introduction in section 3, section 4 presents the Planning and Coping module of the agent’s architecture along with the inner details of its functioning, although leaving some of the mathematical formalisms as annexes. First, a motivation is given to justify the approach taken in the development of that module. Then, we make a detailed description of the structure and mechanisms that characterize our approach. Along with the description, we argue for the flexibility and robustness of the learning process and the capacity to model coping responses to situations within any given domain, particularly the one relevant to our project.
This document should be read in conjunction with deliverable 5.1.1, as it only details a single topic of the ones referred in the previous deliverable.
3Introduction
The development path of empathic synthetic characters is a difficult and complex multidimensional task. As we already mentioned in previous deliverables (see D3.2.1, D5.1.1, and D6.1.1), we devised a Generic Framework (GF) that allowed us to manage each of those dimensions in a relatively independent way. In particular, the GF allows the parallel development of an Agent Framework (AF), with which it interacts as depicted in Figure 3.1.
Figure 3.1. The Relationship between the Generic and Agent Frameworks.
The main purpose of this document is to present a first prototype implementation of the character agent architecture devised in deliverable D5.1.1, which is depicted in Figure 3.2.
The AF enables us to develop Agents transparently to the GF. A Body Interface will deal with all the specificities of the GF, enabling the adoption of a many different behavioural approaches in the agent Mind component, according to the type of agent and its complexity. Communication between the two modules is performed through a specific internal API. Simple agents can be implemented without minds, while complex agents, like the ones realising the characters in VICTEC, rely on complex architectures that depend on learning strategies.
This document will focus on the planning and coping functionality in the architecture of Figure 3.2: this produces the deliberative behaviour of the characters, according to their properties and the context in which they are situated. This is seen as the most complex component of the characters: it contains the learning processes, explained in the next chapter, and all the behaviour control that characterises the coping mechanisms of the characters in social situations.
Figure 3.2. General abstract architecture for an emotional and social agent.
3.1The Planning and Coping Module
The Planning and Coping module incorporates the action-selection mechanism of the character, which allows it to choose actions appropriate to its inner state and outer context. Action-selection is known to be a hard research problem in embodied agents, and a conventional approach requires the programmer to anticipate every possible context and state and tune the mechanism to produce the right action. An alternative to this almost-impossible task is to use a learning approach to overcome the complexity of making the connections between state, context and selected action.
The power of this approach can be seen from considering the problem facing a new-born human child. The child is faced with the daunting task of creating its reality. It must build the knowledge and goal structures that will dictate how it interacts with the world around it – and why. It needs to learn things like how its muscles work, how to communicate with its parents and others, and how to conform to the norms of its society. Although some of this learning is just a matter of noting the common sensory results for actions performed under a given situation, like simple muscle control, for example, almost all learning is more complex than this. The child must judge not only what an action does, but also whether that result is a desirable one or not.
Luckily for humanity, evolution has provided us with a set of innate sensory inputs that are pre-wired in our brains to give us pleasure, pain, happiness, sadness or any number of other feelings in response to the external context and our own actions. Even so, the child must learn more than which actionsdirectly result in activation of one or more of the innate feelings. The child must also learn to recognise environmental states that do not trigger an immediate innate response, but represent an increased likelihood of encountering a state that does trigger an innate positive or negative response at some point in the future. This requires the ability to decompose goals into a lower level sequence and to plan the actions to be executed in the future rather than only considering the currently selected action.
Typical human cognitive tasks like speech understanding, image interpretation or strategy planning have been modelled for a long time. Such studies are often inspired by data on human cognition, at different levels of description. At a low level, models of cerebral functioning focus on the elementary elements of the cortex – neurons - and are applied to relatively elementary tasks like perceptual discrimination. Such tasks are also typically addressed by other numerical approaches like classical connectionism or statistics. On the other hand, more integrated tasks can be approached with a functional and logical view of cognition, as in symbolic artificial intelligence. Unfortunately, symbolic approaches work poorly at the perceptual level and are generally limited to higher cognitive levels, like reasoning. This poses a potential problem since VICTEC characters have to function at both the lower perceptual level and the higher cognitive ones, especially given they use language to interact.
Recently, a new class of models has appeared, neuro-symbolic integration, whose goal is to combine the advantages of numerical and symbolic techniques and therefore to apply to more complete cognitive problems, including perceptual, learning and reasoning aspects. This was the approach selected for the work reported here.
The behaviour of VICTEC characters requires perceptual and motor skills, since both actions and perceptions must be realised in the graphical virtual world in real time. However characters also need to be able to sequence their actions to reach goals that lead to the satisfaction of internally-represented needs. The first point, related to servo-control of actions, is closer to the problems that are usually addressed by current neural techniques, because it often requires learning a functional mapping. This function takes current perceptual state, “desired" perceptual state, and current actuator state, and returns the motor command that allows the “desired" perceptual state to be reached.
Action sequencing has been successfully addressed by symbolic Artificial Intelligence techniques, allowing problems to be solved by the application of predicates and inference rules. These techniques often use stacks to decompose goals into sub-goals, and algorithmic methods to find a way from available facts to desired ones, exploring the huge graph of predicates that can be derived through inference rules. These techniques can endow an artificial system with powerful reasoning skills, but their complexity is exponential on the size of the domain addressed. Attempts have been made to build hybrid neural and symbolic systems, but algorithms providing robust reasoning remain to be found.
For this prototype, we have developed a new autonomous agent control structure that makes use of both neural network and symbolic constructs to learn sensory-motor correlations and abstract concepts through its own experience. It is an architecture for robust action selection that learns not only how to achieve primitive drives, but also appropriate sub-goals satisfying these drives. It does this in a way that is cognitively plausible and also provides clear benefits to the performance of the system.
The purpose of our model is to present one way to deal with action sequencing, viewed as a type of motor reasoning, in a fully neural architecture. This can be thought of as initial steps in addressing planning with a connectionist paradigm. Biological inspiration has led us to propose a connectionist model, in which basic unit represents an elementary perceptual, motor, or associative function. This offers a representational power intermediate between the low-level neuron and the symbolic knowledge based system. We will explain in the next chapter how this basic unit manages different kinds of signal and accordingly different kinds of learning in an explicit and symbolic way.
4Neural Planning
4.1Overview
Figure 4.1 shows the simplified agent architecture used for the first prototype of the emphatic agents. This first architecture is just a sketch of what we think the final agent architecture will be. Our emphasis for this first prototype was in the components directly involved in agent behaviour management.
Figure 4.1. Simplified gent architecture.
The agent lives in an environment that it can sense and act upon. When the conditions are right, the execution of an action can satisfy an agent internal need. For example, when the bully agent feels “angry", it tries to trigger the “beat" action schema. As beating only succeeds when the victim is close, the agent has to stack this action schema and invoke a “reaching" action schema, until it detects that action schema of beating becomes feasible. This detection reactivates the previously stacked schema, beat is then performed and leads to satisfaction of the need. This small story illustrates the approach. The architecture has to deal with the ability to call the appropriate schema, according to the currently-felt need. It also has to be able to stack infeasible schemas, and invoke sub-schemas to make the stacked ones feasible. Sub-schemas may also be stacked if not directly feasible, recursively. Feasibility has to be detected from the perceptual state. In the following discussion, the perceptual state supporting the conclusion that an action schema is feasible will be called the context of this schema. Context use is a crucial point in the model. During sub-schema invocation, the stacked schema must stay in some kind of “ready-to-apply" state, waiting for its context to be right. This is the below-mentioned stacked state and it has been modelled in a biological framework as a bistable (see later definition) activity.
The planning and coping module addresses behavioral learning and planning with a fully neural approach. It aims at modelling the ability to schedule elementary action schemas to reach behavioral goals. For this, it uses robust context detection.
The neural architecture consists essentially of a set of nodes. Each node can be in three states: inactive, called and excited. Excitation activity in a node means that some specific perceptual event is occurring, and call activity triggers actions that eventually will make this perceptual event occur. The main role of a node is to call the corresponding action when it is likely to succeed in getting the perception.
The network has to schedule all the calls from the nodes in the right order, so that current needs get satisfied. This requires several mechanisms, briefly described below. These mechanisms are all running at the same time, without any supervision of their execution.
The first mechanism is the random spontaneous call of a node. It allows random actions to be performed when nothing is known about the world. Spontaneous activity means that the probability of a node having a call activity is non-null. In the model, that probability is proportional to current need intensity.
The second mechanism is the learning of rewarded actions. A reward signal is sent to all nodes when a need is satisfied. This reward signal only tells the system that some need has just been satisfied, without telling it which specific one, as suggested by Taylor [1]. When a reward occurs, and only in that case, a node increases its sensitivity to the need if it has called its action successfully, i.e., if the action performed by the agent has produced the perceptual event the node is tuned on. We use temporal competitive learning to select such nodes, considering that reward occurrence can be predicted by their successful calls. Then, when any need occurs, the nodes that have been detected to be significantly able to provide some reward increase their probability of being called.
At the beginning, the call from such nodes may not always succeed, because requirements for the call to be rewarded are not fulfilled in the world. Nevertheless, this allows the node to learn these requirements, i.e., the configuration of excitations (current detected perceptual events) that actually predicts rewarded satisfaction of the call. This is what is called context of this node. Context matching is then used to increase the probability of triggering a call, so that the best matching rewarding nodes have higher chances of getting their action executed. To benefit from this, it is necessary to have need detectors, so that the current need can favour the context of units related to its satisfaction. This allows the units to be indirectly need-dependent whereas learning and rewards deal with non specific need and reward signals. The nodes that are excited consecutively to one specific need (drive nodes) are the somatic markers [5] of the present model.
Third, as each node knows from its context which perceptions are required to ensure the successful execution of its corresponding action, it can ask the nodes corresponding to those perceptions to trigger the appropriate actions. In that case, the node has a null probability of calling its event, but it contributes to the raise of the calling probability in the units contributing to its context until each of them succeeds in getting its perception. This non-calling state of a node, that sustains the probability of call in its context nodes, is the actual stacking effect of the model. During stacking of a node, the favoured nodes can recursively stack to obtain their own context. The stacking state may stop by itself spontaneously, the node giving up, but any improvement of the node’s context tends to reduce this effect, in order to persevere in successful searches. This can be done by non selective context detection, providing a context matching value sensitive to minor improvements.
Fourth, stacking of a node has to stop when context is detected, so that call activity, that is now supposed to succeed, is triggered. This context detection has to be much more selective, ensuring reliably the success of the call.
Last, as each node can learn which excitations are needed to ensure success of its call, it can learn the same way another context, in order to detect which calls in the other nodes are responsible for the failure of its calling. This allows detecting that some calls, corresponding to some elementary actions, are exclusive to the action. This endows a node with the ability to inhibit the spontaneous activity of disturbing nodes by decreasing their probability of calling. This mechanism in our model implements the contention scheduling function.