Grounding in computer supported collabrative problem solving

Report of project #11-40711.94

December 1996

Grounding in computer-supported

collaborative problem solving.

Pierre Dillenbourg & David Traum

Abstract We study how two collaborators build a shared solution to a problem, using a computer-mediated communication system. This system includes a text-based virtual reality (a MOO) and a shared whiteboard. The subjects communicate using MOO dialogue commands, but also across different modalities (an utterance acknowledging or being acknowledged by an action in the virtual space or by an action on the whiteboard). Our analyses show the relations between the mechanisms for building shared understanding and engaging in the problem solving process. When the rate of acknowledgment regarding task management utterances is low, the pair shows a higher long-term cross-redundancy rate in data acquisition actions. The communication mode (MOO dialogue, MOO action, whiteboard) varies according to the content of interactions (e.g., facts, inferences, management). Moreover, the choice of a particular mode for a particular content varies according to the problem solving strategy. While we initially expected that whiteboard drawings would be used to disambiguate MOO utterances, it is often the opposite which occurs: the central space is the whiteboard, probably because its content is more persistent and more structured that the MOO. The whiteboard maintains a shared context for the subjects,with respect to the task (e.g., what has been done, what remains to be done), but not the linguistic context of MOO dialogues. Interwoven dialogue turns reveal that subjects are able - with a semi-persistent medium such as the MOO - to maintain parallel conversational contexts, e.g. one in MOO dialogue and one in the whiteboard, or even two contexts in MOO dialogue. The same communicative function wss sometimes performed through one tool by one pair and by another tool for another pair, or even for the same pair at another time. However, we can generalize our observations across pairs, and beyond the particular system we used, if the consider the pair plus the computer tools as a single distributed system which can be configured in many ways.

This project was funded by the Swiss National Science Foundation (grant #11-40711.94). We would like to thank all the subjects in Geneva and eslewhere who participated to the experiments. Special thanks to Daniel Schneider who anticipated several years ago the potential of MOOs for education and for research and created a MOO environment at TECFA, and to Richard Godard who carries out the technical maintenance of TefcaMOO. Our gratitude also goes to Patrick Mendelsohn for multiple discussions, to Tom Wherle and Kayla Block for technical assistance and to Philippe Lemay who computed the complexity index. We also want to acknowledge Jeanne Gaffie, Cyril Roiron, Stephanie Ohayon, Philippe Oehler, Pierre-Nicolas Meier, Lydia Montandon, Patrick Jermann and Beatrice Ligorio who conducted related research projects. Thanks to the colleagues who allowed us to test or use their groupware systems: Andrew Wilson (tkMOO), Daniel Suthers (Belvedere) and Randy Smith (Kansas). Many thanks to M. Fedel who let Pierre the chalet in which this report was written and to Patrick Jemann for giving Pierre's lectures during that time.

1.Research objectives

Our long term goal is to improve the quality of educational software. This project builds on our previous work on learning environments in which the human learner collaborated with an artificial agent (a rule-based system) (Dillenbourg et Self, 1992; Dillenbourg et al, 1994). In these systems, the quality of interaction with the machine was often not satisfactory for the user. We made the hypothesis that knowledge-based techniques are not appropriate to design collaborative agents (Dillenbourg, to appear). This is not surprising since artificial intelligence techniques grew out cognitive science, where the dominant view was that cognition is an individual process, occurring inside the individual head. We hence aim to upgrade knowledge-based technologies in a way which accounts for the distributed nature of cognition. This project originally included two phases: (1) observing grounding in computer-mediated collaboration and (2) implementing grounding in artificial agents. Only the first phasis has been funded so far: It aims to study how two human agents build a shared understanding of the problem they have to solve jointly.

The elaboration of common grounds between two speakers has been mainly studied in linguistics, namely in pragmatics, both as a condition for dialogue and as a result achieved through dialogue. The challenge we face here is to relate the description of interactions with the problem solving process conducted by the pair. While the former are often analytic, the analysis focusing on dialogue episodes with a few turn, the latter imply a more synthetic view of the problem solving process.

In our experiments, we control the communication bandwidth between the agents to avoid the non-verbal clues (facial expressions, gazes, gestures, body language, ...) which are difficult to analyze for a psychologist, difficult to model in computational terms and difficult to transpose into a human-computer interface. We therefore use a standard computer-mediated communication software, the MOO. The MOO is a text-based virtual reality in which several users can move, act and communicate.

When we jointly solve problems, verbal interactions are often enriched by the possibility to draw a schema. Hence, the MOO was enriched by a whiteboard on which the two users can draw. Our initial hypothesis was that the drawings on the whiteboard would contribute to common grounds by disambiguating MOO utterances. This project has been named 'Bootnap', en english variation of 'bout de nappe', i.e. the piece of napkin on which one draws a schema when we discuss a probelm in a retaurant.

The choice of a standard Internet tool is relevant nowadays. The fascinating growth of Internet applications in our society generates all kinds of extreme attitudes. We encounter both optimistic discourses ("Internet will generate fundamental innovation in education) and technophobic discourses ("Internet will deprave our teenagers"). Before to discuss about the effects of using Internet software, we believe that research must first describe with precision how people use Internet tools for different tasks. Ther exists for instance very few experimental research regarding how people use the MOO, besides the work of Cherny (1995), Tennison and Churchill (1996). This project is also a contribution to the understanding of problem solving processes in virtual spaces.

2.Theoretical framework

The distributed cognition theories (hereafter 'DC' theories) offer an interesting theoretical framework to study collaborative problem solving. The common point of these theories is to consider that cognition is not bound to the processes which occur in our brain, but extends to the social and physical environment in which one acts and reasons.

As Salomon (1993), we deliberately use the plural for distributed cognition theories. The broad range of theories can be classified with respect to their main source of influence. Some contributions, such as Hutchins (1995) heavily rely on concepts borrowed from cognitive science (information flow, memories, buffers,...), while other contributions such as Lave (1991) are inscribed in the continuation of socio-cultural theories. The empirical studies conducted on each side differ by their scale: while the former analyze in details the interaction in a small group, solving a task during a short period of time, the latter study the culture of larger groups doing a variety of tasks over a long period of time. While the former explores the inter-psychological plane, the latter addresses the social plane[1]. This study belongs to the first approach:we look at rather short periods of time (2 hours) between two people who do not know each other very well and have a clearly defined task to do. We feel not only more comfortable with the conceptual framework, but also prefer its 'constructive' flavor: "The question is not how individuals become members in a larger cognitive community as they do in apprenticeship studies. Rather the question is how a cognitive community could emerge in the first place" (Schwartz, 1995, p. 350). We adopt a functional rather than a socio-historical view of culture, i.e. we aim to understand cultural tools as a group adaptation to its environment.[2]

The notion of distributed cognitive system covers different group sizes. It can be a single agent plus a tool. Pea (1993) reported for instance the case of a forest ranger who had to measure the diameter of a tree, i.e. to measure the circumference of the tree and divide it by p. Since it is non trivial to divide mentally by 3.14, she took a tape and put a mark every p in such a way that, when she put the tape around the tree, she could directly read the diameter. Then she did not perform any more the computation in her head, the tool was doing the computation for her. A distributed system can include two agents, two agents using an artifact (Hutchins; 1995), it can be a small group, a 'community of practice' (Lave, 1991),... and wider and wider distributed systems until the whole society. The term 'system' is actually vague enough to apply more or less to anything. Even an individual can be viewed as a distributed cognitive system, as in Minsky' society of mind metaphor (1987). What does our understanding of group processes gain from considering a group as a single cognitive system? This question has been addressed by Salomon (1993), Perkins (1993) and Nickerson (1993). We will provide our personal answer in the final discussion of this research (section 7).

The term 'distributed' roughly indicates that different functions are performed by different components of a cognitive system, i.e. by different agents or tools. Other researchers (Resnick, 1991) prefer the term 'shared' to indicate that the different components of the system share some understanding of the task. These two terms refer to antagonist forces, we rather say 'shared despite distributed' (Dillenbourg, 1996). The distribution of functions has its advantages (reduced cognitive load, variety of viewpoints, ...) but is also increases the group heterogeneity: if different agents have different skills, different knowledge, different preferences, the group may hardly function as a group. If, despite this heterogeneity, the agents interact well enough, they may come to build a shared understanding of the task and to function really as a single cognitive system. In other words, 'distributed' refers to the conditions of collaboration while 'shared' describes an achievement.

Like the concept of 'system', the concept of 'tool' is central to the DC theories, but it is quite vague: It includes physical tools, such as the tape in the ranger example, and conceptual tools, for instance domain-specific taxonomies used by professionals. DC theories pay especially attention to the language as it conveys the conceptual tools elaborated by a community to adapt to its environment. This broad understanding of a tool, from a hammer to our culture, enables us to bypass the distinction between the physical and the social environment of the agent. This study, is concerned by specific tools: a computer input/output devices and several software components. Once again, we can consider larger and larger distributed systems including computerized tools:

  • The user and the software can be viewed as a single cognitive system (Woods & Roth, 1998; Dillenbourg, 1995). Research in human-computer interaction aims to find the optimal distribution of subtasks over partners, according to their respective cognitive skills (Dalal & Kapser, 1994)
  • In computer-supported collaborative learning, the software plays a role, positive or negative, in the collaborative process. Roschelle & Behrend (1995) observed that learners use the computer graphical representation to test their mutual understanding under increasingly tighter constraints. Conversely, when courseware provides immediate feedback, it may prevent pairs to argue about the quality of their answers, hence missing opportunities to justify or explain it. The shared workspace used in this research can be viewed as a shared working memory for the whole cognitive system (the pair + the tools).
  • A computer software can also be viewed as a tool which mediates the culture of community of practice, or at least the way this culture is reified into a concrete artifact by the developer team. For instance, in other development projects, we had explicit requests to design training software which does not only cover the specific training objectives, but also convey the culture of the enterprise.
  • Computer networks create specific communities, such a Internet newsgroups. These communities have specific features such as a high geographical dispersion, a semi-anonymous participation, ... Their culture reflects these features as well as the specificity of medium (e.g. e-mail groups use 'smilies', while MOO groups use EMOTE verbs).

In this study, we will be often reminded that the artifact we provide is not only a conceptual tool. It is also a physical tool, and the physical energy (or time) necessary to manipulate different components of the interface influence the way the cognitive system allocates different cognitive functions to different software components.

3.Grounding

There exist many definitions of collaborations and namely different of understanding of how collaboration differs from cooperation. The definition by Roshelle and Teasley (1995) has become widely accepted: “Collaboration is a coordinated, synchronous activity that is the result of a continued attempt to construct and maintain a shared conception of a problem». The process by which two participants progressively built and maintain a shared conception has been studied in pragmatics under the label 'social grounding'. Grounding is the process of augmenting and maintaining this common ground. It implies communication, diagnosis (to monitor the state of the other collaborator) and feedback (acknowledgment, repair, ...). There have been several proposals for modelling mutuality of knowledge. When common ground concerns simple beliefs, authors stress the importance of iterated belief (A believes X and A believes B believes X and A believes B believes A believes X,...), or access to a shared situation, formulated by [Lewis69] as:

Let us say that it is common knowledge in a population P that X if and only if some state of affairs A holds such that:

•Everyone in P has reason to believe that A holds.

A indicates to everyone in P that everyone in P has reason to believe that A holds.

A indicates to everyone in P that X.

Clark and Marshall (1981) pointed out that using such a schema requires a number of assumptions in addition to the mere accessibility or presentation of information. Clark and Schaefer (1989) went beyond this, claiming that feedback of some sort was needed to actually ground material in conversation, and that this grounding process was collaborative, requiring effort by both partners to achieve common ground. They point out that it is not necessary to fully ground every aspect of the interaction, merely that they reach the grounding criterion: “The contributor and the partners mutually believe that the partners have understood what the contributor meant to a criterion sufficient for the current purpose.” What this criterion may be, of course, depends on the reasons for needing this information in common ground, and can vary with the type of information and the collaborator’s local and overall goals. They also point out that the conversants have different ways of providing evidence which vary in strength. These include display of what has been understood, acknowledgments, and continuing with the next expected step, as well as continued attention.

This study addresses grounding when two subjects (a) solve a problem together and (b) communicate via a groupware. Grounding in collaborative problem solving is probably more tightly constraint than in simple conversation. The specific features of the task, which affect the grounding process, are presented in section 5.1. We focus here on how the use of groupware may impact on grounding mechanisms. The term 'groupware' refers to a large variety of synchronous and asynchronous tools for communication and action including written communication (electronic mail, news groups, bulletin boards, MOOs, ...), oral communication (audio link, voice messages, ...), visual communication (video link, video messages) and shared workspaces (shared editors, whiteboards, task-specific shared interfaces, ...). These tools are generally not used alone but organized into different configurations to support decision processes in groups (McLeod, 1992), collaborative design (Fischer et al, 1992), meetings (Shrage, 1990), and so forth. This study is concerned by virtual collaborative environments (VCEs), a category of groupware aiming to empower collaborative work. There exists a large variety of VCEs. We do not pretend that the grounding mechanisms observed in one VCE will be identical with another VCE. The VCE system we have chosen is a MOO environment plus with a whiteboard. These tools are described in section 5.

3.1Grounding in a MOO

MOOs [Curtis93] are virtual environments on the network where people can meet and collaborate on various projects. Technically speaking, a MOO is a network-accessible, multi-user, programmable, interactive system. When a user connects to a MOO he connects as a character with the help of a specialized telnet-based client program. The client's primary task is to send and receive I/O between the server and the user. The MOO server exists on one machine on the network, while the client is typically run by the users on their own machines. Having connected to a character, participants then give on-line commands that are parsed and interpreted by the MOO server as appropriate. Commands cause changes in the virtual reality, such as the location of the user or of objects. In the MOO architecture, everything is represented by objects. Each person, each room, each thing is considered as an object that can be looked at, examined and manipulated. The MOO keeps a database of objects in memory and this means that once created objects are still available at each session. A MOO world can be extended both by "building" and by programming. "Building" means creating of new objects or customizing prototypical objects. The MOO programming language is quite powerful and has been used to create a large set of objects for professional and academic use.