Developments in computer Baduk
Łukasz Lew
Department of Computer Science,
Warsaw University, Poland
e-mail:
INTRODUCTION
In most board games humans are dominated by computers. Despite a huge amount of work Baduk still remains the last popular board game where computer play is still weak.
The usual Artificial Intelligence approach for game playing consists of a global search algorithm and a procedure evaluating the goodness of a position. This does not work in Baduk. A global search is impossible because the rules of Baduk do not put a lot of constraints on the players thereby allowing an astronomical number of possible combinations. But even when search is possible, for instance on a 9x9 board, a good way of evaluating the position is not known. It is so difficult because Baduk positions have a very rich and diverse structure. Situations appearing on the board can have a lot of subtlety, and may be in a state of fragile equilibrium, especially in high level games.
This diversity, complication and deepness are features that distinguish Baduk from other board games. For the same reasons Baduk is considered the greatest challenge for Artificial Intelligence among all the games.
This article consists of two parts. First part shortly describes the history of computer Baduk and the construction of current state-of-the-art Baduk programs. I place special emphasis on a recently popular Monte-Carlo technique, which is amazing combination of simplicity and good performance (for computer Baduk standards).
The second part presents developments on particular sub-problems of Baduk, such as pattern systems, Life and Death, finding territory, etc. I place special emphasis on the pattern systems, because they are probably the most interesting and useful for a Baduk player.
PART I
I.1 Short history
The first computer Baduk program was created around 1970 by Zobrist [Zor71]. It was mainly based on computation of an influence radiated by the stones. Thanks to that it was able to beat a human (an absolute beginner) for the first time. At that time all the programs were based on different variations of the influence function.
The next important step was taken thanks to the analysis of human perception of Baduk [Rei75a, Rei75b]. Programs of that time started to use an abstract representation of a Baduk board and were able to reason about strength of groups. But probably the most significant breakthrough happened around 1990 [Boo90] when use of the pattern was widely adopted to recognize typical situations and to suggest moves.
Since then probably all top programs combine those three techniques, but little is known about the details of algorithms used, because most of the best Baduk software is commercial. Those programs were usually developed by a single programmer being simultaneously a Baduk player during a period of 5-15 years [Fot92, Fot93]. A notable exception is Gnu Go [Bum05] - a program from the Free Software Foundation written by a community of programmers.
From what is known most of State-of-the-art (non Monte Carlo) programs have the same basic framework: The backbone is a pattern-based move generator. Also, a lot of local tactical search determining a life status of strings, connection algorithms joining strings into groups are performed. Sometimes an influence function is calculated determining definite and potential territory. Sometimes shallow global search is run, but it is not obligatory. All the gathered pieces of information are used to choose a good move,
It is important to note that all the phases are filled with hand-coded knowledge in the form of heuristics and specialized algorithms. Usually, even the patterns, the most important component, are created manually. Such an approach is limited by the amount and the correctness of the knowledge introduced by the programmer.
I.2 Monte-Carlo Baduk
Recently a new technique, called Monte-Carlo Baduk attracted the attention of many researchers, because of its simplicity and very good results.
In general, Monte-Carlo algorithms are characterized by the fact that randomness is a substantial ingredient of their calculations. In the field of games it usually means that the algorithm plays a large number of simulations to explore characteristics of a large search space. This technique usually is used in imperfect information such as bridge [Gin99], poker [Bil02] or scrabble [She02]. In those games the randomness is used to model unavailable pieces of information. In stochastic games like backgammon [Tes97] Monte-Carlo is even more natural approach.
In deterministic, complete information games like Baduk this approach seems to be less natural. It is unclear what elements of a game should be random and why such an algorithm can obtain good results.
Historically, the first research on application of Monte-Carlo technique to 9x9 Baduk appeared in [Bru93]. Bruegmann's approach was remarkably simple. To choose a move, his program run several hundreds of playouts - completely random games to the bitter end. At the same time, for each legal move, the program maintained a statistic - an average of the results of those playouts, where this move occurred. Those averages were used as the estimates of moves' values. Eventually a move with the best value was played.
Bruegmann approach was followed by ten years later by Bruno Bouzy. Bruno made at least three contributions described in the following sections. In the fourth section I also describe a different approach to Monte-Carlo by Tristan Cazenave.
I.2.1 Combining Monte-Carlo program with a classical approach
Bouzy [Bou03a] set up Indigo2002 (a "classical" Baduk program he developed over a few years) as the move generator and the Monte-Carlo module as the move evaluator. This way, the Monte-Carlo module evaluated less than 10 moves instead of about 300 on a 19x19 board. Obviously moves considered tactically weak or bad for some reason by the Indigo2002 program were eliminated from evaluation.
This technique is named preprocessing. And its advantage of is improved speed allowing playing games on the 19x19 board in a reasonable time and reduction of a number of tactical blunders made by pure Monte-Carlo.
I.2.2 Monte-Carlo and patterns
The second contribution [Bou03a, Bou05b] is the use of a pseudo-random, biased playout, where probability of playing certain moves depends on their current (mid-playout) surroundings. To implement it, Bouzy used a database of 3x3 patterns manually built earlier for his program Indigo2002. Second bias for mid-playout move selection was detection of dansu which triggered captures with a probability proportional to the size of captured chain.
I.2.3 Monte-Carlo and tree search
The third contribution [Bou04] is the most influential one, as it is used in all good Monte-Carlo programs today. It is a combination of a global selective tree search with Monte-Carlo simulations.
Bouzy's original search algorithm - Progressive Pruning - is an ordinary global game tree searcher with following extensions. The depth of the tree is first set to one. When enough children are pruned, the algorithm extends the tree to the depth 2 by attaching to all not pruned leaves new children. Simultaneously, the algorithm performs random playouts starting from current tree leaves (not expanded nodes of the tree) and calculated average playouts' results. This average is used to value the leaves and to prune them, when some of them were found to be statistically inferior to some other leaf in the same branch. When the program is just about to choose a final move, then it uses a mini-max principle to propagate node values from leaves to the root and chooses the final move according to those values.
The combination of game tree search and Monte-Carlo was recently considerably improved [Cou06], and the strongest 9x9 Baduk programs of the day consist almost of this approach alone.
I.2.4 Monte-Carlo for sub-goal evaluation
Recently Tristan Cazenave proposed[Caz05] a way of integrating Monte-Carlo with specialized, search algorithms [Caz00, Caz02] used for evaluation of connection / disconnection, life / death, capture / escape or eye making / destroying. For each interesting goal, such as connection of two strings, he maintained a statistic - average result of those random Monte-Carlo playouts in which the particular goal was achieved. This way he was able to find the most valuable goals and generate the move, which achieved them. The strength of play of such algorithm was a much greater than any of its parts alone.
I.2.5 Conclusion
The last example is a completely different approach than all the previously described, because this time Monte-Carlo was not used to directly find the best move, but as an auxiliary procedure. It may show to the reader that Monte-Carlo, despite its simplicity is very flexible and still there is a lot that can be done.
PART II
Due to the difficulty and diversity of the task of automating Baduk play, many researchers concentrated on sub-problems of Baduk. This section will browse through the most important, in the author's opinion, results. It shows to the reader the diversity and broadness of sub-problems found in the Baduk and the great variety of existing approaches.
II.1 Pattern systems
Baduk is generally assumed to be a game where patterns play a major role. Full-board and especially corner openings are an obvious example, but good shape moves and many tactical moves are also typical pattern moves.
The purpose of a pattern system is to generate good moves for a given Baduk position. For a Baduk player this is probably the most interesting sub-problem, because such systems are very useful for assisting in game-analysis. Also pattern systems are usually the most important components of good Baduk programs.
II.1.1 Pattern system in Gnu Go
A very good example is an open source program - Gnu Go. Its pattern system is described in detail in [Urv02] and in the Gnu Go manual [Bum05]. The pattern matching algorithm is based on Deterministic Finite state Automata (DFA) and therefore it is very fast. The transition graph is created automatically from a database of manually edited patterns.
The patterns used in Gnu Go are very expressive and can provide detailed information about fuseki moves, possibilities of: attacking/defending and connecting/cutting a group, and endgame moves. The patterns also provide information about a group's eye-space and the way influence should be propagated through the board. The biggest bottleneck of the Gnu Go pattern system is that the patterns must be entered manually.
II.1.2 K-nearest-neighbors
An interesting approach is Bouzy's K-nearest neighbor algorithm [Bou05a]. He uses patterns only for suggesting the next move, but such a limitation makes it possible to harvest and value them automatically. The resulting system appears to be very flexible. The highlight of the paper is presented on diagram. The moves played in the opening of the game, which the system played against itself, have a very high quality and look like the moves played by a human.
II.1.3 Neural Network pattern system
A different approach was taken by Erik van der Werf [Wer02]. He proposes a neural network as a tool to recognize patterns and evaluate moves. He feeds the network with features of the position such as: location of nearby stones, proximity to the edge of the board, liberties before and after the evaluated move, possible captures, etc. The network is trained with a large number of high quality games. The results are very good - the system is able to predict 25% of the moves in previously unseen professional games.
II.1.4 Explicit pattern systems
In the previous system the knowledge is encoded in the weights of a neural network. In explicit pattern systems, the patterns are fixed in size and stored explicitly together with their ranking. Currently pattern systems with explicit patterns present the best performance in the field. There are two important pattern systems:
MoyoGo [Gro06b] - where the system is integrated into a tool supporting Baduk player in analyzing games
Bayesian Pattern Ranking [Ste05] - developed at Microsoft in cooperation with CambridgeUniversity, available on Microsoft online Baduk server.
Pattern system used in MoyoGo was inspired by the research done by David Stoutamire [Sto91]. In turn, MoyoGo performance inspired David Stern, Ralf Herbrich and Thore Greapel
from Microsoft.
The main difference between those two systems is the algorithm used for creation of the database and the scale. Microsoft used a sophisticated Bayesian rating algorithm, while MoyoGo uses a very simple and robust algorithm. The scale difference is that Bayesian system includes over 12 millions patterns harvested and rated on 181,000 high quality games while MoyoGo includes over 16 millions patterns harvested and rated on over 500'000 high quality games.
Microsoft's pattern system is able to predict 34% of moves played in a professional game. This is quite an achievement compared to 25% of neural network described earlier. MoyoGo's performance is even better - it predicts 40% of moves. The following part describes MoyoGo pattern system.
There are 12 pattern sizes, from very small to the entire board. Each pattern includes:
- The exact stone configuration inside the physical pattern perimeter, with an empty point in the middle and an indication which player should play there.
- The exact distance to the edges of the board.
- The Pae-status of the game.
- The number of stones in all connected strings of stones adjacent to the patterns' center point.
- The number of liberties of all connected strings of stones adjacent to the patterns' center point.
- The urgency value (rating).
The pattern system was constructed automatically by processing a huge number of high quality game records. To decide which patterns should be included, the criteria are:
- How urgent (statistically likely) the move is, e.g. the sooner a move is played on a patterns center point, the higher the likelihood that the pattern will be included in the database.
- How often a play on the pattern (on its center point) occurs. For quality reasons, exotic patterns are discarded.
Author of MoyoGo writes:
Earlier experiments with Bayesian learning have proved computationally intensive and too fuzzy (we require exact statistical data for each pattern, therefore the algorithm for computing the statistical move likelihood is:
How often the pattern occurred in all games
------
How many turns the average player waited to play there
Given a Baduk position system finds all the patterns that match on some place of the board, i.e. all criteria: stones location, pae status, edge distance, etc. are matched. Then several patterns with highest urgency are selected and given to the user as plausible moves. Because of the relatively large pattern database, and due to the fact that all relevant smaller patterns are included, every Baduk position always contains several to many recognized patterns.
Strong points of explicit pattern system:
- The system matches and classifies patterns near-instantaneously. On a modern PC, a Baduk position is scanned for matching patterns in less than a millisecond.
- The system achieves an extraordinary high pro-prediction rate (40% average).
- The system is a whole-board context aware and often plays like a pro in positions without much tactical complexity.
Weak points:
- The system is purely static, it cannot read ahead so the more complex and hot a tactical situation is, the worse the system performs.
- Pattern shapes do not adapt to the Baduk position, they are fixed.
- The system does not take ladders into account.
Author of MoyoGo also describes future work on MoyoGo pattern system:
The system is in constant development, future work includes:
- The inclusion of n-th order liberties in the patterns. Order 2 liberties are liberties of liberties, and are a measure for a string of stones to escape enclosure.
- Using a larger pro games database.
- Using the rank of the player as a heuristic for the reliability of the statistical move likelihood of the pattern-move.
- Strong tactical module.
II.2 Finding territory and potential territory
Probably the most well known algorithm for estimating territory and influence is Bouzy's mathematical morphology [Bou03]. It relies on two operations borrowed from the image processing domain: erosion and dilation. The algorithm is very simple yet it is able to quite precisely capture the human notion of territory. The four diagrams above shows state of the algorithm after each of four dilations, while the diagram on the side shows the state after additional thirteen erosions – the approximation of territory. Of course it is purely ``graphical'' and to produce good estimates it needs information about group strength.
Another algorithm for finding the potential territory is presented in [Ste04]. A board is treated as a Markov finite field where the propagation of influence occurs. The main difference from the Bouzy's approach is that the parameters used in the propagation are determined automatically basing on a large number of game records.
The same problem was approached in [Wer04] with the use of neural networks. The resulting algorithm was compared to many methods including the one created by Bouzy. In all the approaches the results were very similar.