Dustin Shutes

ECE539

Prof Hu

GENETIC PROGRAMMING FOR CHECKERS PLAY

Abstract:

This project uses genetic programming to develop a program that plays the game of checkers. Unlike games like tic-tac-toe checkers is an unsolved game, meaning that there is not yet a known way to always win. Most programs that are designed to play games against human opponents are developed with expert knowledge. They use databases of human knowledge to determine what appropriate play should be. Instead this program uses machine learning through genetic programming to decide moves.

Checkers, or draughts, is a game played on a standard 8x8 chess board with pieces occupying every other square. Each side starts with 12 checkers of opposite colors. The first side to have no more pieces or no more legal moves is the loser. For regular pieces on the board a legal move is one where the checker moves forward one row, and to the right or left one column into an unoccupied square. If one of those legal-move-squares is occupied by the opponent and the next square along the diagonal line is unoccupied, a ‘jump’ is performed where the opponent’s piece is removed and the unoccupied square becomes occupied by the checker making the move. If there is another legal jump from this square it will be performed in the same manner. Jumps can be chained together in this manner as long as there continues to be legal jumps. If a checker is in a position to make a legal jump it must take it. If a checker reaches its opponents back row, it becomes a ‘king’. King’s are special pieces that are allowed to move both forward and backward.

Move Selection:

In order to select its next move the program first needs to determine all the legal moves that are presented to it. To do this program must first have a method to represent the board, and the locations of the pieces on it. The board is represented as a one-dimensional array of 32 integers for all of the squares on the board that can be occupied by pieces. The value of a square is a 1 if it is occupied by a checker of the computers, a 0 if the square is unoccupied and a -1 if the square has an opponent’s checker. Kings are given a variable value that is adjusted as part of the evolution of a strategy. The program also uses a data structure that for each square holds all the legal moves, and jumps, for checkers and kings of both colors. The program then loops through all the squares of the board, checking to see if it has a piece on that square. If it does have a piece it then checks to see if there are any legal moves available for that piece.

Now that the program has a way to make legal moves, it needs a way to decide what the ‘best’ move to make is. To do this the program first generates a ‘game-tree’, where each node of the tree represents a possible future board configuration, and the root of the tree is the current board configuration. The child nodes of the root are the board configurations after the computer has done all of its legal moves. For each of these new board configurations all the legal countermoves are performed and are stored as children of their respective nodes. This process can be repeated to an arbitrary tree depth.

To determine what the best move is the program uses a recursive minimax search algorithm, with alpha-beta pruning. This algorithm works by first assigning each of the leaf nodes a value based on how desirable this future board configuration is to the player. This value is determined by being fed through a multi-layer perceptron type network, which will be discussed later. The algorithm acts under the assumption that player making the move will always want to maximize its value, and propagates the maximum value of its children up. The player’s opponent on the other hand will in theory choose the move that minimizes the board value, therefore layers of the tree that represent the opponents move will propagate its child that has the minimal board value. For the game of checkers each node (if not a forced jump) will have upwards of 8 child nodes. Obviously the task of evaluating all these nodes quickly becomes very computationally intensive for any meaningful depth. However, because of the recursive nature of the search it can be seen ahead of time that some leaves will not propagate upward because they are already of lower value (for max layers) or higher value (for min layers) then boards on other nodes that have already been visited. The alpha-beta pruning takes advantage of this by keeping track of the value of the ‘best’ maximum and minimum values that have already been seen in the tree. If it is seen that it is impossible for a node to have a value propagated to it that can beat one of these values that has already been selected then the minimax algorithm is not performed on the sub tree. Inclusion of alpha-beta pruning to the minimax algorithm allows for up to double the number of boards to be evaluated in a given time period. The computer then chooses the move that is represented by its child with the highest value.

While training the program plays two different strategies against each other until a winner is determined. Although it is legal in checkers to declare a draw, this program will not accept nor make requests for draws and will always play out a game until a winner is decided. This proved to be a bit trickier than may be originally thought because it is possible to be left with no legal moves on the board and still have pieces on the board. Checking for this situation must be done or the game will not terminate properly. In the situation where there are no legal moves that don’t result in a jump of your last piece on the board the program will simply pick a move and be jumped to end the game.

Network Design:

As mentioned earlier the value of the leaf nodes is determined by feeding the board into a multi-layer preceptron style network. The network chosen for this program has a 362 neuron input layer, a 62 neuron first hidden layer which is fully connected to a second 62 hidden neuron layer, and then one output neuron. The value of this output neuron is then considered to be the value of the board. Because the position of the pieces on the board is a highly important part of determining the value of a board to a player, the input layer of the network is fed with all 4x4 and 3x3 sections of the board. Each of these groupings is fed into a neuron as shown below so that the network can learn-in theory- the spatial layout of the board.

Because the pieces on the board are represented as opposite signed numbers the network can also use piece differential as a factor in deciding the value of a board. The network uses a hyperbolic tangent activation function in its hidden layers and a sigmoidal activation function on its output neuron.

Adaptation:

The program was started with 32 different, randomly generated, network weight configurations. For every generation 128 games are played with the two competitors randomly selected from the field of 32. Those strategies with winning records become parents for the next generation. Strategies reproduce through mutation with the following code.

for(i=1;i<=362;i++){

N = (double)(((double)rand())/RAND_MAX); //random number from 0 to 1

w.sigma[i] = w.sigma[i]*exp(.15694977*N);//update adaptation rate

w.L2[i] = w.L2[i] + w.sigma[i]*(N-0.5);//update weight

}

for(i=1;i<=61;i++){

N = (double)(((double)rand())/RAND_MAX);//random number from 0 to 1

w.sigma[i+362] = w.sigma[i+362]*exp(.15694977*N);//update adaptation rate

w.L3[i] = w.L3[i] + w.sigma[i+362]*(N-0.5);//update weight

}

for(i=1;i<=61;i++){

N = (double)(((double)rand())/RAND_MAX);//random number from 0 to 1

w.sigma[i+423] = w.sigma[i+423]*exp(.15694977*N);//update adaptation rate

w.L4[i] = w.L4[i] + w.sigma[i+423]*(N-0.5);//update weight

}

Each strategy also has its king value updated in the following fashion.

N = (double)( ( (double)rand() )/(RAND_MAX)/2)-0.25;//random number from -0.25 to 0.25

w.king = w.king + N;//update king

if (w.king > 4) w.king = 4;

Results:

Given that the calculation time for an average move with a look ahead depth of 9 (my original plan) is about 30 seconds, and a game will usually consist of some 50-75 moves a game takes about a half hour. If a new generation is created every 128 games, it takes 2.5 days for every new generation! This learning rate was obviously too slow for the timeline of this project. However, because what is being learned by the network is the way to assign value to a board, the ideal weights should be the same regardless of depth of look ahead. Therefore I decided to reduce the amount of look-ahead in the training stages to a depth of 3, which averages moves in around 5 seconds. This means games of about 5 minutes and new generations every 10 hours. This is still a very large amount of time. I set up the program so that it can save off its information after every game and resume where it left off so that it does not need a dedicated machine to run on, and managed to get 11 generations. This was not enough time to generate an effective strategy and the computer will still lose to all but the least competent players.

Discussion:

Given the nature of games like checkers, where it is difficult to receive immediate feedback on the value of a particular move it is very difficult to achieve learning using any other paradigm other than an evolutionary one. Unfortunately this method is very time consuming, and the training is slow. Weeks of around the clock running would be required to achieve meaningful results. I would have also liked to have experimented on the configuration of the evaluation network from generation. However with the input layer not being fully connected to the first hidden layer this would require a dynamic network propagation algorithm which is a computational complexity that there was no time for.

References:

J. Maynard Smith, Evolution and the theory of Games.Cambridge, U.K.:

CambridgeUniversity Press, 1982

Chellapilla, Kumar and Fogel, David, Evolution, Neural Networks, Games, and Intelligence.

Banzhaf, Nordin, Keller, and Francome, Genetic Programming An Introduction. San Francisco, California

Morgan Kaufmann Publishers, Inc.

Simon Haykin, Neural Networks: A Comprehensive Foundation, Prentice Hall, New Jersey, second edition, 1999.

Vajda, S, An introduction to linear programming and the theory of games. London, UK

ECE/CS/ME 539

Prof. Hu

Final Project Report

Dustin Shutes