Background vector created by starline – www.freepik.com

NOTE: The following post discusses an intermediate to high level programming concept. It is recommended that you have a medium or greater level of fluency in at least one language before you read on. Also, a working knowledge of game theory and graph searching algorithms is preferable. This post is an extension of my previous post about a two player game of tic tac toe in Scala (Link below).

Understanding Tic Tac Toe From a Game Theory Point of View

Game theory refers to the study of interaction between two or more rationally thinking individuals. Tic Tac Toe is a zero-sum game where one player’s gain/success translates into the other player’s loss. The player that goes first (let’s assume X) is called the maximising player, whose goal is to get the highest possible reward. Player O is the minimising player, whose goal is to minimise the reward that player X gets. If two rational individuals playing tic tac toe both play optimally move after move, the game is guaranteed to end in a draw, where neither individuals gain or lose, giving rise to a “sum” of zero. In this situation, both players get a reward of 0. If player X wins, the rewards for players X and O respectively are +1 and -1, giving a sum of 0. If player O wins, the rewards for players X and O are -1 and 1, which again give a sum of 0.

TERMINAL STATEX REWARDO REWARDSUM
X WINS1-11 + (-1) = 0
O WINS-11(-1) + 1 = 0
DRAW000 + 0 = 0

The above table depicts the reward per player for each terminal state.

Introduction to the Minimax Decision Rule

Minimax is a commonly used decision rule in artificial intelligence and game theory concerning zero-sum games. The idea is to minimise the possible loss in a worst case or maximum loss scenario. In this way, the individual, using this rule, can gain the maximum possible reward obtainable in a game where its opponent is using the same decision process to minimise the reward gained by the individual.

Confusing? Let’s try to understand this with an example. For the moment, assume that you’re the player O and player X starts. Your goal is to make sure that player X does not get the reward of 1, the highest possible reward. You want him to preferably get a reward of -1, or if not, then at least 0. Both of you are using the minimax decision process – X to maximise its own reward, and you to minimise X’s reward. Let’s say X places it in the top left corner:

Since you are the defensive player, you want to make sure that X has the lowest possible win scenarios to play. So, you place it in the centre.

X, now trying to make the most possible win scenarios available to it, places it, for instance, in the bottom right corner.

Now it’s back to you. If you place it in either of the two remaining corners, X will definitely win on its next move. So, you can place it in any of the edges, say, the top-centre:

X definitely does not want you to win, so it blocks you at bottom-centre:

Again, you do not want X to win, so you block it at the bottom-left corner:

X, not wanting you to win, blocks you at the top-right corner:

You block X at the centre-right edge:

And X has only one place to play, the centre-left edge, and the terminal state becomes:

This sample gameplay shows that when two perfectly rational decision makers play optimally, the game can only end in a draw.

The goal of a tic-tac-toe-playing artificial intelligence algorithm would be to mimic the decision processes of the above players X and O. So, how does it do so?

The Minimax Algorithm

The minimax algorithm is a recursive algorithm used to implement the minimax decision rule and return the best possible move the system could make, when given a state. It has two main helper functions – maxval() and minval() – which return the maximum possible value and the minimum possible value of a state when an action is taken by X or O.

The minimax algorithm works by checking the current state of the board and then recursively mapping a tree where each node represents each of the remaining possible moves it could make, and then all the moves its opponent could make after it makes its moves, and then all the moves it could make after the opponent makes his move, and…. This goes on until a terminal grid/board state is reached, from where no further moves can be made. Note that the tree in question could just be a metaphorical object used by us to visualise the algorithm’s decision process, and that the algorithm need not necessarily search a game tree, but showcase a game-tree-search-like behaviour.

The maxval() function checks the current state of the board and if it is a terminal state, returns the value of the state. The same goes for the minval() function. Where they differ, though, is that they call each other to find respectively the maximum or minimum possible value gained from a board. This is because these functions assume that the other player will always make the optimal move.

The minimax() function calls the minval() function if used by the maximising player or the maxval() function if used by the minimising player. Again, this is because it assumes that the other player will be playing optimally throughout the game.

Below are the pseudocodes for these algorithms:

//pseudocode for minval()
define function minval(state: 3x3 2-dimensional array) returns Int {
   if (state) is a terminal state then
      return value of terminal state //-1, 0 or 1
   else
      val = 100 //it is always possible to get below 100, or infinity when values are at most 1
      for every possible action do
         val = least value b/w (val) and (maxval(state obtained after action))
   return val
}

//pseudocode for maxval()
define function maxval(state: 3x3 2-dimensional array) returns Int {
   if (state) is a terminal state then
      return value of terminal state //-1, 0 or 1
   else
      val = -100 //it is always possible to get above -100, or -infinity when values are at most 1
      for every possible action do
         val = greatest value b/w (val) and (minval(state obtained after action))
   return val
}

//pseudocode for minimax()
define function minimax(state: 3x3 2-dimensional array) returns tuple(int, int) {
   if (state) is a terminal state then
      raise a "game over" exception
   else
      declare variable optimalaction is a tuple(int, int)
      if used by maximising player then
         bestval = -100
         for every possible (action) do
            if bestval < (minval(state obtained after action)) then
               bestval = (minval(state obtained after action))
               optimalaction = (action)
         return optimalaction
      else
         bestval = 100
         for every possible (action) do
            if bestval > (maxval(state obtained after action)) then
               bestval = (maxval(state obtained after action))
               optimalaction = (action)
         return optimalaction
}

Understanding the Algorithm

Before reading on, try to figure out on your own the algorithm’s thought process.

Let’s try to understand the algorithm with an example. For the sake of argument, let us say that the AI is X, the maximising player. It calls the minimax function, feeding it the current state it finds itself in. The minimax function calls upon the minval() function (since it is used by the maximising player) for every possible action it can take given the current state. The minval() function calls the maxval() function (here, stepping into the opponent’s shoes) for every possible action it can take on the state obtained after the first action on the original state. The maxval() function then calls the minval() function for every action it can take on the new state obtained, and so on. Looking at the minval() and maxval() functions, we can see that they stop calling each other and return a value only when the state is a terminal state, that is, there are no possible actions remaining.

So, what kind of search algorithm does this resemble? Look at the facts: the algorithm does not stop searching a path until it reaches a dead end, or terminates. When it does reach a dead end, it goes back up a level of the metaphorical game tree until it finds an action it has not tried yet (backtracking), and then exhausts the path it creates from there until the next dead end. This process continues until there are no possible actions left untried at every level of the game tree. Clearly, this algorithm – that exhausts a single path before backtracking and moving on to the next – is a modified Depth First Search (if you do not know about depth first searches, visit Ansh’s post on them: https://thecodestories.com/2020/03/27/depth-first-traversal/).

Therefore, we can view the minimax algorithm as a modified depth first search that only picks the path that leads the AI to the most win scenarios.

Putting it all Together in Scala

For the code, we can simply copy a few of the functions from my previous scala code for a two-player game. These functions are the ones that check if the game is over, check for a winner, print instructions and print the grid. The rest we need to code from scratch as they will handle not only the user’s but the AI’s actions. The functions are self-explanatory. The code below contains the main minimax algorithm and its helpers and the main() and result() functions. The complete code can be accessed via the link given below. Notice how, in the result() function (which returns the new board after an action is made on the current board), we do not assign the newstate variable to the current state/board, but assign it to a copy or a clone of the current board. This is because we must not change the original board.

//main function 
  def main(args: Array[String]): Unit = {
    print_instructions()
    val user = scala.io.StdIn.readLine("Would you like to be X or O? (X starts): ")
    var game_over = false
    var board = initial_state()
    var ai_turn = false
    if (user == X) {
      ai_turn = false
    }
    else if (user == O) {
      ai_turn = true
    }
    println("GAME BEGINS!")
    println()
    print_board(board)
    while (!game_over) {
      if (ai_turn) {
        println()
        println("AI is thinking...")
        println()
        val ai_action = minimax(board)
        board = result(board, ai_action)
        print_board(board)
        ai_turn = false
        game_over = isTerminalState(board)
      }
      else {
        println()
        val user_input = scala.io.StdIn.readLine(s"Where would you like to place your $user? ")
        println()
        val user_action_arr = user_input.split(",").map(_.toInt)
        val user_action = (user_action_arr(0), user_action_arr(1))
        board = result(board, user_action)
        print_board(board)
        ai_turn = true
        game_over = isTerminalState(board)
      }
    }
    if (winner(board) != EMPTY) {
      println(s"${winner(board)} is the winner!")
    }
    else {
      println("Well played, it is a draw!")
    }
  }

//result function, returns the new board after move is made
  def result(state: Array[Array[String]], action: (Int, Int)): Array[Array[String]] = {
    var newstate = state.map(_.clone())
    if (newstate(action._1)(action._2) == EMPTY) {
      newstate(action._1)(action._2) = whoseTurn(state)
    }
    return newstate
  }

//the actual minimax algorithm
  def minimax(state: Array[Array[String]]): (Int, Int) = {
    if (isTerminalState(state)) {
      throw new Exception("game over")
    }
    var optimalaction = (0, 0)
    if (whoseTurn(state) == X) {
      var bestval = -100
      for (act <- possibleactions(state)) {
        if (bestval < minval(result(state, act))) {
          bestval = minval(result(state, act))
          optimalaction = act
        }
      }
      return optimalaction
    }
    else {
      var bestval = 100
      for (act <- possibleactions(state)) {
        if (bestval > maxval(result(state, act))) {
          bestval = maxval(result(state, act))
          optimalaction = act
        }
      }
      return optimalaction
    }
  }

//maxval
  def maxval(state: Array[Array[String]]): Int = {
    if (isTerminalState(state)) {
      return getreward(state)
    }
    var v = -100
    for (action <- possibleactions(state)) {
      v = v.max(minval(result(state, action)))
    }
    return v
  }

//minval
  def minval(state: Array[Array[String]]): Int = {
    if (isTerminalState(state)) {
      return getreward(state)
    }
    var v = 100
    for (action <- possibleactions(state)) {
      v = v.min(maxval(result(state, action)))
    }
    return v
  }

The github link for the above program is:

https://github.com/adityapentyala/Scala/blob/master/tictactoeAI.scala

https://www.youtube.com/watch?v=D5aJNFWsWew&list=PLhQjrBD2T382Nz7z1AEXmioc27axa19Kv&index=2
I learnt about the minimax algorithm from this brilliant, easy-to-understand Harvard CS50 lecture. If you have the time or have not fully understood my explanation, do watch the video. The lecturer gives in-depth explanations of various search problems and solutions, among which minimax is present. If you have understood my explanation, I would recommend that you view it all the same. It’s simply brilliant :).
Visits: 310

3 Replies to “Coding an Unbeatable Tic Tac Toe AI”

Leave a Reply

Your email address will not be published. Required fields are marked *