Subject: I Have Not Forgotten You

Tic-Tac-Toe PRANO

(Pattern Recognition Attempt Number One)

The idea is simple: model a computer program to learn the same way a person would. The human mind is incredibly complex and has the ability to learn and understand many different things. To try to get a grasp on one small aspect of how the human brain operates, let’s just keep things simple. Let’s play a game.

Why Tic-Tac-Toe?

There are many programs that play tic-tac-toe. Some are written so well that they can not be beaten. Why bother to have a computer play such a simple game?

The goal with this project is to demonstrate learning. The program begins knowing that winning is good, losing is bad and a tie game isn’t as good as winning. That’s it. Using that knowledge, the program must learn from experience what moves and strategies work and which ones don’t.

Objectives

A common strategy to writing a learning tic-tac-toe program is to record every game possible. A program may need to play hundreds, if not thousands of games before it can use this information to plan useful strategies. Once these programs record every possible game state, it can become perfect. It will always pick the best strategy and it will never lose.

The PRANO team didn’t feel that this was a good model for the human mind. How many people have played EVERY possible tic-tac-toe game? How many people take 2,000 games before they learn how to block? With this in mind, the PRANO team decided to have a few objectives for this new program.

1. Learn Fast

This program should demonstrate that it has learned something within a short time. A person should be able to sit and play a few games and see that the program has changed how it plays.

2. Begin Dumb

To demonstrate total learning, the program should know nothing at the beginning. It doesn’t know how to block. It doesn’t know how to win. The first move should be a total random guess. Everything should be learned from experience.

3. Limit Memory

A person would not remember every game possibility for tic-tac-toe. A person learns concepts. The program should do the same. Also, if the program learns improperly, it should be able to "forget" what it has learned and try something different.

4. Learn Blocking

One of the most fundamental strategies of tic-tac-toe is to block your opponent from making a winning move. The program will not be told how to do this. It will have to learn it on its own.

5. Learn Combinations

To win or tie a game of tic-tac-toe with an experienced person, the program must be able to recognize combination or "trapping" moves. A trap is where there is more than one move available to win the game. The program should be able to recognize, block and use these combinations.

With these objectives in mind, we set out to make a program that could learn.

How does it work?

There are 8 winning sections in a game of tic-tac-toe. There are 3 rows, 3 columns and 2 diagonals.

Each area consists of a set of 3 squares. The reason the center square is such a good position is because it is included in 4 of the 8 winning areas. These winning areas are the basis for the logic of this program.

While the number of combinations of rows and columns in a game of tic-tac-toe is large, the number of combinations of X’s, O’s and blank spaces in a single row is quite small. In fact, there are only 27 possible combinations in a single row. They are as follows:

/ O / XO / OOX
X / O / X O / XXO
X / O / O X / XOX
X / O O / XO / OXX
X X / OO / OX / XXX
XX / OO / XOO / OOO
XX / OX / OXO

Each blank space in these combinations has a certain value to it. The blank in an XX situation would have high value because it would lead to a win. The blank in an OO situation would also have a high value because it would block the opponent. The blank in XO would have a low value because it does not contribute towards a winning move.

Now, for each blank square in the game, look at the pattern of 3 squares in each of the winning areas. Determine the value of the blank for that pattern. Adding all of the patterns gives a total weight for that blank square. After all of the available spaces have been calculated, select the highest (best) blank space.

Here is an example:

Consider the highlighted square.

The top row has a pattern of O.

Let’s say the weight for the first blank in O is 2.

The first column has a pattern of OX.

The value of the blank in OX is –3.

Finally, the pattern of the diagonal is XX.

The value of the blank in XX is 50.

The total value for the square is 2 + (-3) + 50 = 49. This is a high score and should be chosen as the next move.

After the game is over, the program analyzes the game and adjusts the weights for each pattern. The program thinks a loss could have been prevented if it had placed its own piece in the place of the winning move. Since the program doesn’t know what the winning move exactly was, it increases the value of the blank for all of the patterns involved with that position. In order to prevent that situation from happening again, the program decreases the values for all of the moves it made leading up to that situation.

The Effects

Building the program this way produces some interesting results and side effects.

· Let’s assume that each blank in each of the patterns has a value of 1. On a blank board, the center square would have the highest value because 4 winning areas cross it. The next highest values would be the corners with a weight of 3. These positions are common to better tic-tac-toe strategies.

· All of the blank spaces in all of the patterns start with a value of 0. After the first game, the program adjusts the values of the blanks. The program can show that it has learned something in the second game it plays. The program will have learned enough to tie most people within 10 games!

· Since the program is based entirely on the patterns of 3, it can be scaled to play a 3-D version of tic-tac-toe. It can play a few games of regular tic-tac-toe and apply what it has learned to a 3-D board and vice versa.

Some Philosophical Questions

A complete analysis of how this program actually performs was beyond the capabilities of the programmers. If the programmers don’t fully understand how the program works, could this be considered true learning?

The game of tic-tac-toe is simple. Most people understand the winning strategies of the game. Checkers, however, has much more strategy involved. Most people don’t understand all of the strategies involved in checkers. If a programmer creates a learning program that learns strategies the programmer is not familiar with, has the program truly learned? If the program learns to play consistently better than its programmer, has the program learned?

How is this similar to the way people learn? How is it different? What evidence supports your ideas?

PRANO Team

Neil Ryan
John Potts
Tic-Tac-Toe game interface written by Matt Carlson