Neural Networking in simple games

ECE 539

Theodore Humpal

Introduction

As a note to the reader, this paper is based off of [1], which gives an excellent talk on this subject.

The object of this paper is to create a neural network able to play simple games. In this case, tic tac toe was picked. The idea behind this is that there is large consumer demand for fun, interactive gaming, particularly computer gaming. I would predict the market is only going to increase as time goes on, as more children grow up in this environment, the more will be involved as they grow older. In the industry, creating adaptable computerized opponents has yet to be implemented well. Many games claim to have a variety of certain AI features, but often these AI features are generally quite weak. For example, companies have specifically advertised about the ability of computerized opponents to out-flank player – an essentially simple idea. Turn based games, where the computer's blazing, inhuman speed can't compensate for its lack of a brain tend to have the weakest AI.

Because of the increase of networking and computing power, it would be quite possible to set up many neural networks within a game (most probably coupled with AI). Then have people interact with the game online, collect information from the matches, and update the neural network through patches and updates. In this way, the game gains large amounts of value to its purchaser. In order to make games more challenging, most game makers simply stack the odds against the player. Comparatively there is often surprisingly weak AI. There are practical reasons for this, but the idea of cutting AI development and replacing it with an Internet-based, worldwide neural training network, is a powerful idea. Though this project only has a very basic relation to this idea, it is a starting point. Can we create a neural network that, with little to no AI features, adapts to human play?

This project will concentrate on this in two phases. First, a basic decision making entity is needed to set against human players. Then, training data must be extracted from this and used to create an MLP network. The first phase is done through reinforced learning and is done in C++. The second phase is done through MLP back propagation and is done in Matlab.

Part I

Initially, a lookup table is created. When given a tic tac toe board state, the lookup table provides the desirability of that board state. Therefore, when a move needs to be made, the computerized opponent will look at the current board state, lookup the desirability of all possible moves, and pick the most desirable of moves. Every entry in the lookup table is initialized at .5. As the lookup table plays a duplicate of itself, matches are won and lost. On a match result, the desirability of each state is updated. On a win or a tie, the last board state is assigned a 1. On a loss, the last board state is assigned a 0. The board states that led up to this conclusion are then updated as follows:

Where:

  • D_new(n-1) – the new lookup table desirability of the n-1 board state.
  • D_old(n-1) – the old lookup table desirability of the n-1 board state.
  • D(n) – the desirability of the state one play closer to the end.

D_new(n-1) = D_old(n-1) + .5 * ( D(n) – D_old(n-1) )

Therefore, the states are updated from the final play, backward to the first play. The 'strength' of the update weakens as the state moves away from the last play.

The computer opponent is setup against a duplicate opponent. The two players will use their own lookup tables to make moves. Random moves will also be injected every once in a while in order for all possibilities to be explored. Following this process, a master tic tac toe player can be constructed. Because tic tac toe is a solvable game, this isn't that exciting of a conclusion, however, in dealing with complex games which aren't, this approach has great potential. For example, [3] explains a related method for computing an expert level (but not master level) checkers player – a game that has only been solved through many years of research.

In the simulations performed, one million matches were found to produce a competitive opponent and ten million matches were found to produce an opponent that couldn't be beat. After 100 million games, and a human inability to beat the computerized opponent, the lookup table was printed out in an MLP training data format. In particular, the input is the 9 board spaces (0 for empty, 1 for X, and -1 for O) and the player’s identity (1 for X, -1 for O). The output layer has 9 outputs (one for each board space) and produces a 1 for the most desirable move. In this way, the MLP will be constructed as solving a classification problem.

Overall, this part of the project was quite solid. The programs ability to adapt to not only another computer opponent, but also a human opponent worked excellently. The concluding lookup table could also not be beaten, which shows success.

Part II

The first step is to insert the MLP into the tic tac toe program, as the decision maker. The next step is to use the training data from part I to train the MLP network through back propagation. The final step is to run the MLP network to see its performance.

That critical middle step was the most difficult part of this project and for the most part what failed in this project. An 'acceptable' MLP was not found. Training the MLP was problematic from the start. The back propagation method was largely based off of the given bp.m. However, there were several constraints on the MLP values, for example, the alpha value needed to be strictly limited, as too large of a value broke the back propagation method (weights all went to 0) while too small of a value would make almost no progress. Despite playing with the parameters, the number of mispredictions in the confusion matrix was always quite high. An even remotely feasible MLP network was not found.

The input layer took in the board state and whether the player was X’s or O’s. The output layer gave the next move. If the next move given by the output layer doesn’t make sense a random move is selected instead.

I think the solution with the largest potential to solve this MLP training problem is one that would reduce the lookup table substantially. First of all, the tic tac toe program was never given a sense of symmetry. Adding symmetry reduces the lookup table by a large percentage. This would allow for the amount of training data the MLP needs to adjust for much smaller, making it able to make fewer mistakes. I specifically didn’t do this, as I thought doing so would be relating more toward AI-ish material. However, I think this exemplifies the need for these neural networks to be coupled with AI! Further, cutting the lookup table in half by having an MLP for when the computer is X's and a separate MLP for when the computer is O's, would also help significantly. Another possibility is emphasizing the 'important' moves, as some moves are more important then others. This would be a critical step if this were applied in a real-life setting. Defining these 'important' moves could be done without too much complication.

Usage

I would encourage the reader to run the following commands under linux to play against both parts of this project:

g++ ttt2.cpp

./a.out

#This will run an interactive 'competent' tic tac toe player (who trains for 1 million games).

#This opponent will be beatable (unless you play against him for too long!).

g++ ttt.cpp

./a.out

#This will create the training data for the matlab program (will take ~ten minutes to run).

matlab

#In matlab run bp3.m

#Then, also in matlab run the ttt.m file. In this, you'll be playing against the trained MLP

#opponent. Who, unfortunately, is not very good.

References

[1] Siegel, Sebastian.

[2]

[3]