Extending the Application of Partial Order Bounding to Game Tree Search

UNIVERSITY OF MINNESOTA

This is to certify that I have examined this copy of a master’s thesis by

Kristy Sue VanHornweder

and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the final examining committee have been made.

Timothy R. Colburn

______

Name of Faculty Advisor

______

Signature of Faculty Advisor

______

Date

Applying the Partial Order Bounding Method of Game Tree Searchto Connect 4

A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA

Kristy Sue VanHornweder

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE

Department of Computer Science

University of Minnesota Duluth

June 2002

 Kristy Sue VanHornweder 2002

Abstract

Game playing is of interest to artificial intelligence researchers because of the complexity involved in its exponential search. Standard exhaustive search algorithms available for graphs can be used for only the most trivial of games. The methods for dealing with this complexity are the use of heuristic evaluation functions and game tree pruning. Game tree pruning starts with the fundamental twoperson gameplaying algorithm called minimax and then applies pruning methods, such as AlphaBeta pruning, to the resulting game tree.

Researchers have devised new methods to enhance the minimax search algorithm. Two current approaches have emerged called Partial Order Bounding (POB), based on the theory of partially ordered sets and null window search, and game decomposition. POB compares partially ordered values in the leaves of a game tree, and backs up boolean values through the tree to the root. The advantage of POB compared to traditional weighted sum evaluation methods is that it avoids the problem of dealing with incomparable evaluations.

So far, POB has only been applied, with some success, to the game of Go. In order to determine the general applicability of POB, this thesis tests methods for applying POB to the game Connect 4. The issues involved in partial order evaluation and partial order bounding in Connect 4 are discussed, and the performance of a Connect 4 gameplaying program using POB is compared to one using traditional methods. Initial ideas on how to apply game decomposition to Connect 4 are also presented.

1. Motivation for Game Playing Research………………………………..1

2. The Role of Evaluation Functions in Game Playing…………………..2

2.1. Limitations of Weighted Sum Evaluation……………………….2

2.2. Motivation for This Research……………………………………4

3. The Method of Partial Order Bounding……………………………….5

3.1. Basic Definition of Partial Order Bounding……………………..5

3.2. Partial Order Evaluation: Vector Dominance……………………8

3.3. Partial Order Bounding Algorithm………………………………8

3.4. Efficiency of Partial Order Bounding: Null Window Search…..10

4. The Application of Partial Order Bounding to the Game Connect 4.12

4.1. Weighted Linear Sum Evaluation for Connect 4……………….12

4.2. Static Evaluation in Connect 4………………………………….14

4.3. Partial Order Evaluation for Connect 4………………………...16

4.4. Move Ordering and Pruning in Connect 4……………………...21

4.5. Initial Experiments for Connect 4………………………………27

4.6. Searching for Bounds…………………………………………...29

4.7. Searching to a More Shallow Depth…………………………... 41

4.8. Threshold Pruning…………………………………………….. 42

4.9. Summary of Results…………………………………………… 45

4.10. Observations and Conclusions……………………………….. 47

4.11. Future Work………………………………………………….. 48

5. The Method of Decomposition Search and its Application to the

Game Connect 4………………………………………………………..50

5.1. Partitioning a Game…………………………………………….50

5.2. Local Search……………………………………………………51

5.3. Decomposition Search Algorithm……………………………...51

5.4. The Application of Decomposition Search to Connect 4………51

6. Related Work…………………………………………………………..54

Appendix: Rules of Connect 4…………………………………………...55

References………………………………………………………………....57

ii.

List of Figures

Figure 1. Example of a partial order lattice………………………………….6

Figure 2. Example of an antichain…………………………………………..6

Figure 3. Example of success and failure set for the antichain {E,G}…...….7

Figure 4. Partial order lattice for the game of Go…………………………...8

Figure 5. Example game tree illustrating the POB search algorithm………..9

Figure 6. Example of the efficiency of null window search………………11

Figure 7. Connect 4 position illustrating block of four containing B1…….13

Figure 8. Connect 4 position illustrating the concept of gravity…………...13

Figure 9. Connect 4 position illustrating the concept of gravity…………...14

Figure 10. Connect 4 position illustrating static evaluation………………..15

Figure 11. Connect 4 states illustrating the partial order evaluation……….19

Figure 12. Partial order lattice for Connect 4………………………………20

Figure 13. Connect 4 position illustrating 3 different types of moves……..22

Figure 14. Move ordering for Connect 4…………………………………...24

Figure 15. Move ordering and pruning for Connect 4 position

in Figure 13…………………………………………………….25

Figure 16. Process of relaxing bounds……………………………………..33

Figure 17. Process of tightening bounds…………………………………...34

Figure 18. Effectiveness of tightening bounds……………………………..34

Figure 19. Further enhancements for pruning……………………………...37

Figure 20. Further enhancements for non-pruning…………………………38

Figure 21. Further enhancements for non-pruning…………………………38

Figure 22. Different choice of moves in Stage 1 and Stage 2

for Threshold Pruning…………..……………………………...44

Figure 23. The partition of a 3 pile game of NIM………………………….50

Figure 24. Partitioning a Connect 4 position……………………………….52

Figure 25. The game Connect 4……………………………………………55

iii.

List of Tables

Table 1. Results of initial tests for Connect 4……………………………...28

Table 2. Results of naïve approach for pruning……………………………31

Table 3. Results of naïve approach for non-pruning……………………… 32

Table 4. Results of reusing bounds (Stage 1) for pruning………………….35

Table 5. Results of reusing bounds (Stage 1) for non-pruning…………….36

Table 6. Results for further enhancements for pruning (Stage 2)………….39

Table 7. Results for further enhancements for non-pruning (Stage 2)……..40

Table 8. Node counts for ideal case for non-pruning………………………41

Table 9. Results of searching to depth 7 for non-pruning………………….42

Table 10. Results of Threshold Pruning method…………………..……….44

Table 11. Summary of total and average node counts for pruning………...45

Table 12. Summary of total and average node counts for non-pruning…... 45

Table 13. Summary of total and average node counts for

Threshold Pruning……………………………………………….46

iv.

Acknowledgements

There are some acknowledgements I would like to make of people who have contributed to the success of this thesis.

I would like to thank Dr. Tim Colburn, my thesis advisor, for giving me guidance throughout this project and for offering me encouragement and reassurance during the difficult times.

I would like to thank Dr. Ted Pedersen and Dr. Robert McFarland for taking the time to carefully read a preliminary version of this thesis and to offer their suggestions for improvement.

I am grateful to the UMD Department of Computer Science for giving me the opportunity to perform graduate research under them.

Lastly, I would like to thank Dr. Martin Mueller, whose work I have extended, for providing a reference to his research papers so that I would have a foundation for my own research.

1 Motivation for Game Playing Research

For decades, gameplaying researchers have attempted to devise game searching methods that are both intelligent and efficient. In general, there is a tradeoff between intelligence of play and efficiency. Traditional methods such as minimax with AlphaBeta Pruning, described in textbooks such as [8], have been able to improve efficiency by pruning unimportant portions of the game search tree, thus allowing the search to be taken deeper into the tree for the same cost. In theory, deeper search should increase the intelligence of play, as the search is able to examine game positions that are closer to the end of the game.

2 The Role of Evaluation Functions in Game Playing

The major concern for building an intelligent gameplaying program is the evaluation of a game position. A player will determine "how good" a game

position is by applying an evaluation function to that position. In standard AlphaBeta search, a common evaluation function used is a weighted linear sum, w1*f1 + w2*f2 + ... + wn*fn, where f1, f2, ..., fn are game features and w1, w2, ..., wn are weights associated with the features. The features are characteristics of a game, for example, in chess, the number of rooks on the board, the number of knights, etc. The features should be chosen so they represent the critical characteristics of a game, the ones that have a strong influence on a game's outcome. The weights chosen should be appropriate for the features. Features that are considered more important tend to have

higher weights, whereas less significant features will in general have lower weights.

2.1 Limitations of Weighted Sum Evaluation

The traditional method of AlphaBeta search and the typical application of the weighted sum evaluation have proven to be successful in many cases. However, the weighted sum evaluation has several shortcomings. The problems with evaluation functions discussed here are loss of information,

unstable positions, long-term vs. short-term plans, and near-terminal states.

2.1.1 Loss of Information

In a standard weighted sum evaluation, there are several features that are added together and compared to one another. However, it does not always make sense to compare and add certain features, for example, when the relative importance of features changes during the course of the game, such as the case of an unstable position, discussed below. This combination approach results in a loss of information. Because of this, the estimate of the value of a game position may no longer be accurate.

2.1.2 Unstable Positions

Another issue of evaluation is that of unstable positions. An unstable position is one that is likely to dramatically change very quickly. Such a position may appear favorable at one point, but only a move or two later, it

could prove to be disastrous for a player. A position such as this and a more stable one that does not change so drastically could end up with the same evaluation.

2.1.3 Short-Term Plans vs. Long-Term Plans

Another important consideration of evaluation is the need to maintain a balance between short-term plans and long-term plans. Sometimes a move may look good immediately, that is, there will be a considerable change in the outlook of a position if this type of move is chosen. In other cases, one

move may have a lower evaluation value than another, but it may prove to

be a successful choice in the long run. A player will need to keep a balance between the status of the game immediately and the status of the game further into the future, i.e. tactics vs. strategy.

2.1.4 Near-Terminal Positions

One last concern of evaluation is the case of near-terminal positions. In many such cases, a position can be evaluated statically rather than having to perform a search to determine a move. If a static analysis of a game position

finds that a certain move will result in a sure win, the player certainly does not need to perform a search.

2.2 Motivation for this Research

In an attempt to avoid these problems with weighted sum evaluation functions, a game search method called Partial Order Bounding (POB) has

recently been developed by Martin Mueller[4]. This method has proven to be successful in applications of Semeai Races in the game of Go. To determine the general applicability of POB, this thesis develops methods for applying POB to the game Connect 4. To determine the success of the application of POB to Connect 4, the performance of a Connect 4 playing program using POB is compared with a Connect 4 playing program using a traditional Alpha-Beta search.

3 The Method of Partial Order Bounding

Partial Order Bounding avoids the collapsing of several features into one number by only comparing features and game positions when it makes sense for them to be compared. The main goal of POB is to define an inequality between a fixed bound and a game position's evaluation.

3.1 Basic Definition of Partial Order Bounding

Definition A partially ordered set (poset) P is a set with a binary relation , with the three properties reflexive, anti-symmetric, and transitive. Two elements, x and y, of P, are considered comparable if the binary relation between them holds: that is, x  y or y  x. Otherwise, x and y are

considered incomparable. A binary relation having the property of anti-

symmetric means the following: if x  y and y  x, then x=y. The definitions

of the properties reflexive and transitive are straight-forward and are given

in textbooks such as [7].

A poset is represented by a lattice diagram, as in Figure 1. To say that two

elements are comparable means that there is a strictly downward leading path from one element to the other. In Figure 1, the elements I and D are

comparable since there is a strictly downward leading path from I to D. The elements G and F, however, are incomparable since it is not possible to get from one to the other without heading both upward and downward.

Figure 1. Example of a partial order lattice (Taken from [4])

Definition An antichain B is a subset of P in which every element is incomparable to every other element. A subset containing only one element is an antichain.

A bound is an antichain of a poset that will represent the minimal acceptance level for a player. The player will want evaluations that are at least as good as some element in the bound.

Figure 2. Example of an antichain {E,G} (Taken from [4])

Definition The success set of the antichain B, S(B), is the set of all elements in P which are  to some element in B. All other elements in P are in the failure set of B.

Figure 3. Example of success set and failure set for the antichain {E,G} (Taken from [4])

The success set will represent all evaluations that are acceptable with respect to the bound. These are all of the evaluations that are considered "good

enough" for the player.

As an example of a poset in the context of gameplaying, consider Figure 4.

This figure contains the lattice used by Mueller[4] for the game of Go. Each element in the poset is a possible evaluation for a game position. Each evaluation is a vector containing two Gospecific game features. The elements (9,0) and (2,5) represent the bound chosen for the Semeai problems. The shaded region consists of all of the elements that are in the success set of the chosen bound. All other elements are in the failure set of the bound.

Figure 4. Partial order lattice for the game of Go (Taken from [4])

3.2 Partial Order Evaluation: Vector Dominance

There are several different types of partial order evaluations. The type that is applied to Connect 4 in this thesis is called vector dominance. The

definition is as follows: Given two mdimensional vectors x = {x1, …, xm}

and y = {y1, …, ym}, x  y iff xi >= yi, for i = 1, …, m. In words, this states that if the relation holds between each corresponding element of both vectors, then the relation holds between the vectors themselves, and vice versa.

3.3 Partial Order Bounding Algorithm

An attractive aspect of Partial Order Bounding is that it can be built on top

of an ordinary game search method, such as AlphaBeta. The POB algorithm is as follows:

Given a poset P, a bound B, and a minimax search method M:

The leaf nodes of the game tree are evaluated using the partial

order evaluation, and the result is an element in P.

2. The evaluation is considered success (1) if it is in S(B), otherwise

it is considered failure (0).

3. The boolean values of the leaf nodes are backed up the search tree

using minimax method M.

As an example, consider the game tree in Figure 5. The levels that contain

Figure 5. Example game tree illustrating the POB search algorithm (Taken from [4])

(Note that a “+” is used for success and a “-“ is used for failure)

square nodes denote that it is MAX's turn, and the levels of the tree that

contain circular nodes denote that it is MIN's turn. The evaluations at the

leaf nodes were obtained using an unspecified partial order evaluation. Consider the bounds B1={(5,7), (10,3)} and B2={(5,8), (6,4)}. Using vector dominance, the boolean success values are obtained for each leaf node for each bound. These success values are backed up the tree in minimax fashion. Thus the boolean value at node Q is the minimum of the boolean values at the two leaf nodes below Q. The boolean value at P is the maximum of the boolean values at nodes Q and R. There is a total ordering of success > failure. Based on this evaluation, the player is able to achieve B1 but fails to achieve B2 at node P.

3.4 Efficiency of Partial Order Bounding: Null Window Search

The method of Partial Order Bounding is based on the notion of null window

search[6]. In standard AlphaBeta, the window of possible state evaluations is [, ]. However, in POB, the window is [0,1] since the only possible evaluations for a game position are success (1) and failure (0). The window is as small as possible since no value can be inserted in between the upper and lower limits of the window; thus POB uses a null window. In theory, this explains why POB should be more efficient than standard AlphaBeta search.

As an illustration of POB's efficiency, consider Figure 6. Suppose at the top node it is MAX's turn. The nine nodes branching from the top are the

successors of the top node. The first four have been evaluated as failure,

while the fifth successor evaluates as a success. Because of the null win-dow, the highest possible evaluation in a game tree using POB is 1.

Figure 6. Example of the efficiency of null window search

Thus, it is not possible to find a higher value using POB as it could be using AlphaBeta. Therefore, the rest of the successor branches can be pruned away. This results in many more branch cutoffs, and many fewer node expansions, making POB more efficient.

4 The Application of Partial Order Bounding to the Game

Connect 4

Before discussing the application of partial order evaluation to Connect 4, general evaluation methods for Connect 4 will be presented. Readers not familiar with the game Connect 4 are referred to the Appendix for a description of the game.

4.1 Weighted Linear Sum Evaluation for Connect 4

The evaluation function applied to Connect 4 for AlphaBeta search is a standard weighted sum. The function is defined as w1*B3 + w2*B2 + w3*B1 (w1*R3 + w2*R2 + w3*R1), where B3, B2, B1, R3, R2, and R1 are features with B denoting black and R denoting red. The feature Bn is the number of blocks of four consecutive board squares that have n black pieces and no red pieces. The feature Rn is defined in a similar way. Every possible consecutive four squares on the board is considered for calculating these feature values. For example, consider the highlighted region in Figure 7. This area denotes one block of four, and it is considered a B1 since there is

Extending the Application of Partial Order Bounding to Game Tree Search

Contents

List of Figures

Figure 1. Example of a partial order lattice………………………………….6

Figure 18. Effectiveness of tightening bounds……………………………..34

Figure 19. Further enhancements for pruning……………………………...37

Figure 22. Different choice of moves in Stage 1 and Stage 2

Figure 23. The partition of a 3 pile game of NIM………………………….50

Table 3. Results of naïve approach for non-pruning……………………… 32

Table 7. Results for further enhancements for non-pruning (Stage 2)……..40

Table 8. Node counts for ideal case for non-pruning………………………41

Table 12. Summary of total and average node counts for non-pruning…... 45

An attractive aspect of Partial Order Bounding is that it can be built on top