### A Heuristic for Ultimate Tic Tac Toe

If you haven't heard of ultimate tic-tac-toe, you have been missing out on all the rage in the tic-tac-toe world. Ok, maybe there isn't that much of a tic-tac-toe world... but this variant of the game which everyone has played and mastered as a child, it adds additional levels of complexity, bringing back the game to the world of adults. One of my coworkers has been working on a website for the game, still in beta, found here... And in playing it a few times, I decided to try and write a bot that could beat me at the game.

The rules for ultimate tic-tac-toe are similar to the original game, but with an extra dimension. As you can see, each square of the large tic-tac-toe board has another tic-tac-toe squarer within it, and to take a square in the larger board, you must win the game within that square. The real wrinkle comes next...

Any move made in a little board dictates which little board is to be played next. In this example, the red X is in the top right corner of its own little board, which means the next move mus be played in the top right corner of the big board.

Anyone who has done turn-based complete information game programming knows that heuristic search using some form of alpha beta pruning is the way to go, such as this example for regular tic-tac-toe. For simple games like that, all you need is a move generator for each position, and some heuristic score which can be used to compare positions at the end of each search to tell you who is winning. In game trees that you can search the full depth, you can just return something like 1 for X winning, -1 for O winning, and 0 for a cats game. But there is a maximum depth of 81 with a branching factor of 9 in ultimate tic tac toe, so the heuristic becomes the interesting part.

We can find some interesting observations looking at one mini board in isolation. Where in regular tic-tac-toe, there is always a back and forth of who plays next... but here, one player could be sent to the same board many more times than their opponent. In fact, it is possible that only one player ever makes moves on a given board. Also, since the macro view of the game dominates the micro view, we find that even when a board could clearly be won on the next move, that doesn't imply that the next move should be to take the win. Good moves almost appear to be random from the prospective of the mini-board. We can take this to the extreme, and say that from any starting position on a mini board, we can estimate the probability of a winner for that board by simulating out thousands of random continuations of pieces from either side to empty squares. Since each miniboard has nine squares, and each square has either an X, an O, or nothing... there are 3^9 possible positions to consider, which works out to just 19683 positions. I wrote a program to pre-populate this table mapping any possible starting position for a little board to a probability that this board would be won by either X or O, given the assumptions of randomness from above.

The rules for ultimate tic-tac-toe are similar to the original game, but with an extra dimension. As you can see, each square of the large tic-tac-toe board has another tic-tac-toe squarer within it, and to take a square in the larger board, you must win the game within that square. The real wrinkle comes next...

Any move made in a little board dictates which little board is to be played next. In this example, the red X is in the top right corner of its own little board, which means the next move mus be played in the top right corner of the big board.

Anyone who has done turn-based complete information game programming knows that heuristic search using some form of alpha beta pruning is the way to go, such as this example for regular tic-tac-toe. For simple games like that, all you need is a move generator for each position, and some heuristic score which can be used to compare positions at the end of each search to tell you who is winning. In game trees that you can search the full depth, you can just return something like 1 for X winning, -1 for O winning, and 0 for a cats game. But there is a maximum depth of 81 with a branching factor of 9 in ultimate tic tac toe, so the heuristic becomes the interesting part.

We can find some interesting observations looking at one mini board in isolation. Where in regular tic-tac-toe, there is always a back and forth of who plays next... but here, one player could be sent to the same board many more times than their opponent. In fact, it is possible that only one player ever makes moves on a given board. Also, since the macro view of the game dominates the micro view, we find that even when a board could clearly be won on the next move, that doesn't imply that the next move should be to take the win. Good moves almost appear to be random from the prospective of the mini-board. We can take this to the extreme, and say that from any starting position on a mini board, we can estimate the probability of a winner for that board by simulating out thousands of random continuations of pieces from either side to empty squares. Since each miniboard has nine squares, and each square has either an X, an O, or nothing... there are 3^9 possible positions to consider, which works out to just 19683 positions. I wrote a program to pre-populate this table mapping any possible starting position for a little board to a probability that this board would be won by either X or O, given the assumptions of randomness from above.

52.3% | 50.9% | 49.3% |

45.2% | 62.7% | 59.9% |

11.1% | 90.1% | 50.3% |

Applied at game time, we can reduce our tic-tac-toe board to a 9x9 board of probabilities that any one square is going to be won by a given player. You could sort of use this to speculate on what your chances are for winning a given row or column by combining the probabilities... for instance, my chances of winning the vertical center row are .509 * .627 * .901, or approximately a 28% chance. However, the game is a race, and while I may have better chances of winning the center column, it is balanced with the chances that my opponent wins the left column. To adjust for this, I decided to take a probability board like the one above, and again randomly simulate thousands of times from such positions the process of choosing squares using the probability to decide who would win that square, and repeating until the game had ended. This would give me a pseudo-probability of who would win the game in such a position. Of course, the size of such a search field is too large to cache, so I decided to use these probabilities to train a classifier for linear regression.

So for those of you who don't remember, linear regression involves collecting sample inputs to sample outputs in the form A*X + B = Y... where Y is the variable our model will try to produce. In our case, I want to take some combination of the probability board as our X, and the simulated probability of winning the game described above as our Y. Then we can solve the system of equations for some set of scalars A that can be used to combine an arbitrary set of inputs to produce a probability of winning. Rather than make X simply the probabilities from the board above, I decided to collect the combined probabilities for winning any given row, column, or diagonal for each player, and using those instead. So our A*X + B = Y starts to look like...

So for those of you who don't remember, linear regression involves collecting sample inputs to sample outputs in the form A*X + B = Y... where Y is the variable our model will try to produce. In our case, I want to take some combination of the probability board as our X, and the simulated probability of winning the game described above as our Y. Then we can solve the system of equations for some set of scalars A that can be used to combine an arbitrary set of inputs to produce a probability of winning. Rather than make X simply the probabilities from the board above, I decided to collect the combined probabilities for winning any given row, column, or diagonal for each player, and using those instead. So our A*X + B = Y starts to look like...

One of the reasons I like using linear classifiers, is because there are certain techniques that make training them extremely easy. In this case, using the pseudo-inverse allows us to solve this equation for A very easily, even given a large set of sample points that don't form a square matrix. The result is a set of weighs that can be multiplied by the input values, and summed to a predicted score for the game.

There was a bit of code to this work, I posted on github here. Feel free to use these ideas generically to create a heuristic score without having to understand the tactics and strategies of whichever game you are creating.

## Comments

## Post a Comment