Artificial Agents Play the Mad Mex Trust Game

Artificial Agents Play the “Mad Mex Trust Game”:

A Ccomputational Approach

D.J. Wua, Steven O. Kimbroughb, and Fang Zhonga

aLebow College of Business, Drexel University, Philadelphia, PA 19104, USA

bThe Wharton School, University of Pennsylvania, Philadelphia, PA 19104, USA

{wudj, fz23}@drexel.edu,

Wu, D.J., S. Kimbrough and F. Zhong

Abstract

We investigate the “Mad Mex Trust Game,” a game of trust which cannot easily be represented in strategic form. We investigate such questions as: In outline the game is as follows. N players of various types are free to negotiate with each other. The players and their types are identified and known to the other players. Players of a given type produce a particular commodity at a certain rate. The well being of the individual players depends upon having a mixture of commodities; hence the players have incentive to negotiate trades with players of other types. After arriving at an agreement, there is a fulfillment stage. Players are free to renege on their agreements, and players are able to remember who has reneged and who hasn't. Will cooperative behavior emerge and under what conditions? What are some of the efficient and effective mechanisms for trust building in electronic markets? How will these mechanisms affect the emergence of trust and cooperative behavior? What are the key ingredients in building distributed trust and what destroys trust? This game constitutes a more realistic model of negotiation support systems in electronic markets, particularly on the Internet.

1. 1. Introduction

An important aspect of electronic commerce is that often trust is absent it is not trusted (Tan and Thoen 2001), since it is often difficult for a user to figure out who to trust in online communities (Dusgopta 1988; Schillo and Funk 1999). Recently much interest has developed from researchers into how to build trust in electronic markets operation in such environments as the Internet. The literature has approached the study of trust with various “trust games”.

Basically there are two varietiesversions of the trust game, the classical trust game in economics or the investment game (Lahno 1995; Berg et al. 1994; Erev and Roth 1998), and the “Electronic Commerce Trust Game” or the “Mad Mex Trust Game”[1]. The former is well studied in the economics literature and it is regarded as revealing and of fundamental importance in social interaction and knowledge management. Some see it, as fundamental as the prisoner’s dilemma game (Hardin 1982; Lahno 1995). The latter is due to Kimbrough and Tan. In it where the players exchange goods such as red sauce or green sauce, make deals, and may renege on the deals rather than money. In this paper, we focus on the Mad Mex Trust Game using artificial agents. We leave the Economics Trust Game to a subsequent paper where we plan to use the agent-based approach as well. Will trust/cooperative behavior emerge? If so, under what conditions (and when and how)? Put in another way, what are the conditions that promote of trust/distrust? How do we explain/reveal/understand the behavior of agents (What they are doing, why they are doing what they are doing)? The ultimate goals are to study the effects of markets, characteristics of markets, and market mechanisms associated with systems of artificial agents.

The contribution of thisThis paper is thecontributes an integration of several strands of research literature: The trust literature (we focus here on the computational approach of social trust); The eElectronic cCommunities and eElectronic mMarkets literature (we focus here on what kind of market mechanisms would facilitate trust and cooperation) and; what kind of market mechanisms would disrupt trust and cooperation.)

The rest of the paper is organized as follows. Section 2 provides a brief literature review. Section 3 outlines our key research methodologies and implementation details. Section 4 reports our experimental findings for two agents. Section 5 reports further experiments where in which an additional player, the third agent, has been introduced. Section 6 summarizes and discusses future research.

2. Literature Review

There are roughly two major streams in the trust literature. One stream is interested in developing trust technology (example: security technology such as password setting or digital watermarking). Representative work can be found in the recent special section of CACM (December 2000). The second stream focuses on social trust (Shapiro 1987) and social capital (e.g. Uslaner 2000). Our interest is in the latter, e.g., an online trader can may well have access to the trading system but will not cooperate with the other online traders due to self-interest. In particular, we are interested in trust based on cooperation (Güűth et al. 1997), i.e., social trust is viewed as cooperative behavior.

Methodologically, there are several approaches to the study of trust, illustrating a broad interest from several disciplines in social sciences. These include the behavioral approaches (e.g., Das and Teng 1998; Mayer, Davis and Schoorman 1995); the philosophical and logical approaches (e.g., Tan and Thoen 2001); the computer science approaches (e.g., Holland and Lockett 1998; Zacharia, Moukas and Maes 1999, ZMM for short); the sociology approaches (e.g. Shapiro 1987); the psychology approaches (e.g. Gűth et al. 1997); the classical economics approaches (e.g. Ellison 1993) and the experimental economics approaches (e.g. Engle-Warnick 2000; Erev , E. and Roth 1998; Sundali, Israeli and Janicki 2000). In this paper, we use an interdisciplinary approach that integrates the economics and computer science approaches. Call it, or the computational economics approach. Details of our methodology and framework are provided later in section 3. We now define our specific trust game.

As mentioned earlier, there are several versions of the trust game. The following is our version description of the standard game, known in the literature as the investment game. There are two players, the principal and the agent. The principal has some amount of money to invest, say x, so he hires an agent (she) to do this for him. The agent, in term, gives the money to an investor or a broker, who invests the money in the market, and truthfully reports to the agent on the return of the investment, say 3x. The agent then decides how to split with the principal on the profit. The game is played repeatedly, i.e., the principal has the choice to whether to hire the agent or not. Under some regularity conditions, it has been shown in the literature that trust can be built if the game is played repeatedly (Lahno, 1995).

In the Mad Mex Trust game, the money is replaced with goods. This game cannot easily be represented in strategic form. In outline the game is as follows. N players of various types are free to negotiate with each other. The players and their types are identified and known to the other players. Players of a given type produce a particular commodity at a certain rate. The well being of the individual players depends upon having a mixture of commodities; hence the players have incentive to negotiate trades with players of other types. After arriving at an agreement, there is a fulfillment stage. Players are free to renege on their agreements, and players are able to remember who has reneged and who hasn't.

We now describe our research framework and methodology in more detail.

3. Methodology and Implementations

In our framework,We model artificial agents are modeled as finite automata (Hopcroft and Ullman 1979). This framework has been adopted by a number of previous investigations. Among them, Rubinstein (1986), Sandholm and Crites (1995), Miller (1996) and many others used it to study interated prisoner’s dilemma (IPD). Kimbrough, Wu and Zhong (20021a) used it to study the MIT “Beer Game”, where genetic learning artificial agents played the game and managed a liner supply chain. Wu and Sun (20012a, b) investigate the electronic market off-equilibrium behavior of artificial agents in a price and capacity bidding game using genetic algorithms (Holland 1992). Arthur et al. (1996) modeled a realistic stock marketplace composed of various genetic learning agents. Zhoang, Kimbrough and Wu (20021) studiedy the ultimatum game using reinforcement learning agents. These are merely examples to illustrate the acceptance of this framework in the literature. The reader is referred toa Kimbrough and Wu (2001) for a survey.

In this study, we depart from previous research by integrating several stranded standard approaches. First, we study a different game, namely, the Mad Mex Trust game, rather than games such as the IRPD, Beer, Bidding, or Ultimatum.; Second, in studying this game, we use a computational/evolutionary approach, in comparing classical or behavioral game theoretical approaches with artificial agents.; Third, our agents are using a reinforcement learning regime (Sutton and Barto 1998), Q-learning, as a learning mechanism in game playing. Previous studies of the trust game are not computational (with the exception of Zacharia et al., where they employed a reputation rating mechanism). Finally, our agents are identity-centric rather than strategy-centric as used in previous studies (e.g., Kimbrough, Wu and Zhong 20021a; Wu and Sun 200012a, b). That is, our agents may meaningfully be said to have individual identities and behavior. They are not just naked strategies that play and succeed or fail. Individuals, rather than populations as a whole, learn and adapt over time and with experience. Fielding these kinds of agents, we believe, is needed for e-commerce applications.

We now describe our model and prototype implementations in more detail in the framework of Q- learning. This includes how the rules or state-action pairs are embedded in our artificial agents, how the rewards were set up, what was the long-term goal of the game (returns), and finally the specific Q- learning algorithm we designed.

Rules (State-Action Ppairs): The Q-learning algorithm estimates the values of state-action pairs Q(s, a). At each decision point, the state of an agent is decided by all the information in its memory, history, e.g. its own and opponent’s last trade volumedetails. The possible action at this decision point that an agent can take is any number between zero and its endowment. In this sense, the agent’s strategy is the mapping from its memory of the last iteration to its current action. To balance exploration and exploitation, we use the e-greedy method to choose randomly with trivial probability. The value of e starts from 0.3, and then decreases to 0.01 by steps of 0.000001.

Rewards: The instant reward an agent i can get is determined by a modified Cobb-Douglas function that is the mixture of the amounts of different types of sauces the agent posses after each episode,

Ui = ∏j aij1/n

where n is the number of the types of comedies commodities in the market. The use of Cobb-Douglas utility function is standard in the economics literature.

Returns: The long-run return is simply the total utility an agent i obtained after playing every episode so far:

Ri = ∑ Ui.i

The goal of an agent at any iteration is to select actions that will maximize its discounted long-term return following certain policies. The use of Q-learning ensures that the artificial agents are non-myopic.

Q- llearning: The learning algorithm used by the artificial agents is a one-step Q-learning described as the following:

Initialize Q(s, a) to zero

Repeat

From the current state s, select an action a using the e-greedy method

Take action a, observe an immediate reward r, and the next state s’

Q(s, a) = Q(s, a) + a [r + g maxa’ Q(s’, a’) – Q(s, a)]

s ¬ s’

Until the end of a trial

The experiment runs with the learning rate a = 0.05 and discount factor g = 0.95. The values of a and g are chosen to promote learning of cooperation.

4. Two-Agent Experiment Design and Results

We compare the following trust mechanisms: (1) Moving average of the past five observations; (2) Exponential smoothing; (3) Zacharia-Moukas-MaesZMM reputation rating; (4) Tit-for-tat; and (5) Most recent move (or last move). We are interested to in seeing if under each of the above five mechanisms, will whether agents will trust each other and if so, will cooperative behavior convergeemerge. By comparing these five mechanisms, we are interested to see which mechanism does better in terms of building trust and promoting social welfare.

Experiment one:4.1 Two Q-learning agents play against each otherlearners without Reputation Index

Two learning agents play the iterated trust game. In each iteration, both agents start with the same amount of endowment. Player A first offers its commodity to player B, upon receiving the commodity, player B decides how much of his commodity he will trade. Since there is no third party agent to deal with the transaction, the exact number amount of the trade volume is open to both parties and thus there is no asymmetric information. The whole experiment is a long trial, i.e. two artificial agents play the game in an indefinite number of times.

To test the validity of the experiment and analyze the results, the endowment is set to be rather small, 3, so that each agent has 4 * 4 * 4 = 64 state-action pairs. Based on the utility function, one agent learns to switch its trade volume between 3 and 0, the other agent learn to switch between 0 and 2 correspondingly through long iterations. This mutual exploitation behavior can be illustrated as:

ti / ti+1 / ti+2 / ti+3 / ti+4 / ti+5 / …

A

/ Trade Volume / 3 / 0 / 3 / 0 / 3 / 0 / …
Utility / 0 / 6 / 0 / 6 / 0 / 6
B / Trade Volume / 0 / 2 / 0 / 2 / 0 / 2
Utility / 9 / 0 / 9 / 0 / 9 / 0

Agent A (trade volume): 3 0 3 0 3 0 …