GO: Review of Work That Has Been Done in This Area

GO: Review of Work That Has Been Done in This Area

GO: Review of Work that has been done in this Area

Go is an Oriental two-player game that originated between 2500 and 4000 years ago (Bouzy, 2000). Attempts to develop programs, not only for research but also for money prizes, in particular The World Ing Cup which offers $1.6 million to the winner, has increased significantly through time (Burmeister 2001).

Albert Zobrist wrote the first Go program in 1970 and although basic, the pattern recognition method was influential in the release, in 1971, of Jon Ryder’s program. This applied a forward-pruning method to search the game trees and found that the tactical analysis compared favourably to the powers of analysis displayed by human beginners. The program, however, still contained inaccuracies, failing to make any distinction between the stages of the game and as such numerous unsuitable tactical moves resulted. A year later William Reitman and Bruce Wilcox produced INTERIM.2; they developed the theory that “perception, knowledge and coordination” are crucial in manufacturing a Go program (Burmeister, 2001).

In 1990, David Fotland released The Many Faces of Go (MFG) as a commercial Go program. It was the evolution and combination of his earlier two programs (G2 and Cosmos) that led to the use of a quiescent search rather than modifying the evaluation function. A quiescent search is an additional search, starting at the leaf nodes of the main search that tries to understand tactical actions and predict their outcomes (Lauriere, 1990). Johnson (1997) argues that by using this type of search, MFG can improve performance substantially, especially when integrated within the program’s embedded information, as it serves to reduce the branching factor.

GO: Current State of the Art

Chen Zhixing, born in 1931, was a professor of chemistry in Guangzhou University, China, and retired in 1991. He was responsible for Handtalk the winner of the Ing Cup in 1997 and has since modified the management of patterns and the volume of knowledge of Handtalk and renamed it Gomate – the strongest Go program in the world to date.

Gomate has been ranked between 4-13 kyu; still only a learner grade in human terms. In 1998 Handtalk failed to beat one of the world’s outstanding human players, Janice Kim, when it failed to notice the dual nature of an attack and didn’t defend a group of stones in danger. Mechner, (1998) believes intuition or experience could have prevented this from happening.

Results from Lichenstein’s and Sipser’s experiments (1980) suggest that the use of approximate algorithms which deliver near-optimal solutions, are required to solve some aspects of Go, whilst brute-force search methods will quickly run into difficulties (Burmeister, 2001). Gomate appears to have followed this idea and evaluates very few possible moves using a static evaluation function – that is, the desirability of expanding a node (Russell, 1995). As such there is very little evidence of global search strategy; this makes sense in view of the fact that the branching factor (the average number of moves possible in a given game position) is extremely large. Coupled with this, Gomate incorporates an influence function whose influence decreases by half as every node is expanded. This in effect means that the program may combine the joint probability distribution (a belief network) and incorporate actions and utilities simultaneously into the search, thus increasing the efficiency and effectiveness of the program. Additionally, pattern matching from a knowledge database, similar to Ryder’s program (1971), is used to potentially add a more intuitive perspective to the program, giving way to elevated levels of competitiveness.

I believe in the near future programs, similar to Gomate, will evolve with growing emphasis on a machine learning approach, in order to establish a more intuitive nature. Another factor that may well come to fruition is a program that might calculate search trees whilst the opponent is thinking, thus making greater search possible. Clearly, if processing power continues to increase, it may be as soon as 40 years before a computer will beat a human expert, however more conservative views are in abundance. Dr Piet Hut an astrophysicist at the Institute for Advanced Study in Princeton, N.J, predicts “it may be a hundred years before a computer beats humans at Go – maybe even longer.”

BACKGAMMON: What work has been done in this area?

Backgammon is governed by strategy but involves chance. Unlike many other diverse games, the branching factor is approximately 400. This creates enormous problems if a deep search is to be carried out and thus clever, complex Artificial Intelligence techniques are needed for a successful program. There has been much research into numerous Backgammon playing programs (Russell, 1995).

The first program to make a serious impact was BKG in 1977, which uses an extremely complicated evaluation function (a method of returning measure of desirability concerning expanding nodes). It defeated the world champion in 1980 but is now rated only as a strong amateur (Hoffman, 1998). Temporal difference learning applies each of the differences in utilities between successive states to update rules for every evaluation. Brian Shepphard attempted using temporal difference learning with a neural network, but discovered the network wouldn’t always improve as expected. Research was instigated and it was found that by combining several learning techniques, directed mutations could be produced. That is, mutations occurred which effectively improved the program’s strength. (Sheppard, 2000).

The most commercial program Jellyfish, written by Dahl in 1994, was inspired by earlier work in neural networks, but tends to estimate the probabilities of actions, rather than calculating exact values. This led to problems with the program’s underestimation of losing – or being ‘gammoned.’

BACKGAMMON: Current State of the Art

TD-Gammon Vs. 3.0, written by Gerry Tesauro of IBM, replaced an earlier version of the game called Neurogammon. The program plays at a higher level than Neurogammon and has reached an elevated level of play, consistently being ranked among the top three human players in the world.

(Tesauro, 1995) argues that programs must rely upon heuristic evaluation functions that imitate the judgement and positional knowledge human experts possess. TD-Gammon manages this by means of reinforcement learning and the use of a neural network. The former ensues when a feedback signal indicates how successful the input/output has been. By receiving large numbers of these signals throughout the search tree, a system is able to generate optimal actions, and thus increase system strength (Turian, 1997). The network is arranged in a multi-layer perception architecture (MLP), making it possible to represent any continuous function of the inputs, i.e. the reinforcement learning signals. Together with reinforcement learning, search algorithms provide a means to reduce the number of possible moves available. Although implementing a search is expensive because of the large branching factor (due to the different combinations of dice-rolls), Version 3.0 includes a feature that conducts a 3-ply search using the General-Search algorithm.

In the future it may be possible to give the neural network a set of features that allows it to learn a smooth value function – which will make the machine learning more accurate. Introducing new algorithms to the search methods that create features by themselves, such as the RGO and IA, may complement the idea of a smooth value function and raise efficiency to such an extent that the program may well become an undisputed champion (Turian, 1997).

References

  • Bouzy, B. (2000) “Computer Go: An Artificial Intelligence orientated survey”. Artificial Intelligence, Vol 132, Issue 1 (October) p39.
  • Johnson (1997) “To Test a Powerful Computer, Play An Ancient Game”. New York CyberTimes (July).
  • Mechner, D (1998) “All Systems Go”, The Sciences, Vol 38, Issue 1.
  • Sheppard, B (2000) “TD learning as a mutation”. Comminications of the ACM, Vol.38, No.5.
  • Tesauro, G (1995) “Temporal Difference Learning and TD-Gammon”.Communications of the ACM, Vol.38, No.3.
  • Turian, J (1997) “Automated Feature Selection to Maximise Learning in Artificial Intelligence”. Thirteenth International Conference on Machine Learning (ICML 1996).
  • Burmeister (2001) “An Introduction to the Computer Go Field and Associated Internet Resources”. University of Queensland, Australia: Department of Computer Science.
  • Hoffman, A (1998) “Paradigms of Artificial Intelligenc”.Singapore: Springer-Verlag Singapore Pte. Ltd.
  • Lauriere,J-L (1990) “Problem Solving and A I”. London, UK: Prentice-Hall.
  • Russell, S & P.Norvig (1995) “Artificial Intelligence – A Modern Approach”. New Jersey, USA: Prentice-Hall.

2.1) Show the logical expressions required to model this phenomenon

1) If a cold stimulus is applied for two time steps and then removed, heat should be perceived

Y1(t) = [X1(t-1)] OR [X2(t-4) AND X2(t-3) ANDNOT X2(t-2)]

This truth table shows that it gives us the required result:

X2(t-4) / X2(t-3) / X2(t-2) / X1(t-1) / AND / ANDNOT / OR
1 / 1 / 1 / 1 / 1 / 0 / 1
0 / 1 / 1 / 1 / 0 / 0 / 1
1 / 0 / 1 / 1 / 0 / 0 / 1
0 / 0 / 1 / 1 / 0 / 0 / 1
1 / 1 / 0 / 1 / 1 / 1 / 1
0 / 1 / 0 / 1 / 0 / 0 / 1
1 / 0 / 0 / 1 / 0 / 0 / 1
0 / 0 / 0 / 1 / 0 / 0 / 1
1 / 1 / 1 / 0 / 1 / 0 / 0
0 / 1 / 1 / 0 / 0 / 0 / 0
1 / 0 / 1 / 0 / 0 / 0 / 0
0 / 0 / 1 / 0 / 0 / 0 / 0
1 / 1 / 0 / 0 / 1 / 1 / 1
0 / 1 / 0 / 0 / 0 / 0 / 0
1 / 0 / 0 / 0 / 0 / 0 / 0
0 / 0 / 0 / 0 / 0 / 0 / 0

We may see that by applying heat (i.e. X1(t-1)) that we always perceive heat. But if a cold stimulus is applied for two time steps, heat is still perceived (i.e. the row in bold).

2) If a hot stimulus is applied, then heat should be perceived.

Y1(t) = [X1(t-1)]

If heat is applied (i.e. X1(t-1)) then we always perceive heat.

3) If a cold stimulus is applied for three time steps then cold should be perceived.

Y2(t) = X2(t-3) AND X2(t-2) AND X2(t-1)

This truth table shows that it gives us the required result:

Formula 1 = [X2(t-3) AND X2(t-2) AND X2(t-1)]

We may see (in bold) that if a cold stimulus is applied for three time steps, then cold is perceived.

2.2) Show the steps involved in converting these expressions into the network

To convince myself that the logical statements are able to represent the network, I use algebra to deduce proofs. Firstly we must remember that time is discrete so for example, from a stimulus applied at X1/X2, there will be a time lapse as it makes its way to Y1/Y2. These may be called hidden neurons and are notated as Z1,Z2 etc.

Using the formula derived in the first section:

1) Y1(t) = [X1(t-1)] OR [X2(t-4) AND X2(t-3) ANDNOT X2(t-2)]

We can break the second clause of logic into Z1(t-1).

2) Y1(t) = [X1(t-1)] OR [Z1(t-1)]

This means one may perceive heat if a stimulus is applied at X1 after one time step, OR a neuron is fired from Z1, again one time step away from Y1, shown here in diagrammatical form:

Also, we know that:

3) Z1(t-1) = [X2(t-4) AND X2(t-3) ANDNOT X2(t-2)]

This and the remaining derived formulas can all be simplified as less complicated functions, in the network, using the logic rules (AND, ANDNOT, OR). We may now separate further and introduce another middle neuron Z2. like the example above:

4) Z1(t-1) = Z2(t-2) ANDNOT X2(t-2)

From this we may see that Z2 at the second time step is:

5) Z2(t-2) = Z3(t-3) AND X2(t-3)

This simply leaves rule 6, which may then be incorporated into the diagram:

6) Z3(t-3) = X2(t-4).

To prove that the Y1 section of the network works, we must take formula 2 and substitute in rules 3-6, to prove that it is correct.

(Substitute 6 into 5):Z2(t-2) = X2(t-4) AND X2(t-3);

(Now substitute this into 4):Z1(t-1) = X2(t-4) AND X2(t-3) ANDNOT X2(t-2);

(Finally put this into 3):Y1(t) = [X1(t-1)] OR [X2(t-4) AND X2(t-3) ANDNOT X2(t-2)]

Hence, the Y1 section of the network does operate. I now do the same for the Y2 section to ensure the network works altogether.

Using our original formula for Y2:

1) Y2(t) = X2(t-3) AND X2(t-2) AND X2(t-1)

We can break the logic into sections and integrate 2 of the hidden Z neurons (previously discussed) to complete the network. Again logic diagrams were drawn to complete the network.

Consider:

2) Y2(t) = Z2(t-1) AND X2(t-1)

We know that:

3) Z2(t-1) = X2(t-3) AND X2(t-2)

So, we break it down further:

4) Z2(t-1) = Z3(t-2) AND X2(t-2)

Again producing simply:

5) Z3(t-2) = X2(t-3)

To convince myself that the network does function, as it should, I use another substitution method to carry out an analysis (much like the study of Y1).

(Substitute 5 into 4):Z2(t-1) = X2(t-3) AND X2(t-2);

(Now substitute this into 2):Y2(t) = X2(t-3) AND X2(t-2) AND X2(t-1).

Hence, I may say the Y2 section of the network also functions correctly, meaning the entire network must operate as a whole.

2.3) Network in Diagrammatical Form

2.4) Values of Neuron Outputs at Each Stage


Page 1