Ultimate Tic Tac Toe [Puzzle discussion]

Di_Masta · September 5, 2020, 8:04pm

Yeees, I did it
That was quite a challenge!!! After fixing all the bugs I could find I’ve reached 7th in the gold league after that I decided to play with the exploration parameter (sqrt(2.0) - 1.0) looked like working the best, submit -> 1st in gold league for a couple of days I could not think of anything else then I saw that I haven’t almost any battle ended in a draw. I updated my algorithm with 0.5 reward for draw(0 for lost, 1 for win), submit -> again 1st in gold…, but I saw less losses and more draws. After playing for several hours with the exploration parameter 0.5 value really balanced the search, submit -> 80th in Legend!!! And I felt really really happy.

So in the end it turned out, as @darkhorse64 said, that the exploration parameter must be adjusted properly, also the pruning proposed by @accorp guided the search in the correct direction.

From this point on, I think the main improvement for my strategy would be to optimize the C++ code, I have an idea how to make it about 2 times faster (but maybe in another parallel universe) or improve the selection phase (I read a ton of literature on the subject).

Thanks to everyone for the suggestions. I wanted to learn MCTS with this game and I think I did.
Great game, great platform, great community and it feels great to see the result of the efforts made!

Won’t stop till we’re legends

darkhorse64 · September 5, 2020, 8:45pm

Apart from optimising your code, there are a couple of simple and generic (I mean valid for any MCTS) things that you can do to improve your bot:

Smarter rollouts: If you have a winning move, play it. If you can prevent your opponent to play a winning move, do it
Solver: when a game nears the end, while building the tree, you reach terminal nodes. If you have a win, it means the previous move was losing. If all possible moves lead to a loss, it means the previous move was winning. This way, you can back-propagate proven results and avoid exploring again solved nodes. This is quite effective: usually you can predict the result of a game 10 or more moves before the end. There are two useful papers to read: “Monte-Carlo Tree Search Solver” and “Score Bounded Monte-Carlo Tree Search”

elderlybeginner · May 25, 2021, 12:50am

I get into gold league with simple heuristics.
No simulations, no complex algorithms - just scoring function for each valid action.
With defs and classes, the code is below a 100 lines.

Here is the core:

score = f1 * win + f2 * defend - f3 * closed + f4 * same_board - f5 * middle + f6 * position.get(board ) + f7 * position.get(self.next_board[y, x]) * [1, 0][closed]

Just figure out the factors.

LoganWlv · May 29, 2021, 3:14pm

Magus, do you mind sharing a bit how your Node looks like?
Because I tried to create the same amount, first in Java then in C++ and I could not reach more than ~500k nodes on first turn.

My current node looks like that:

Node {
  int[9] subBoards;
  int[81] childrenIndexes;
  // 2 double, 5 int
}

For testing purpose I tried to remove both arrays but without result, I always hit timeout at ~600k.

Here is how I create nodes:

for (int i = 0; i < nodes.length; i++) {
      nodes[i] = Node.createEmptyNode();
}

darkhorse64 · May 29, 2021, 3:41pm

I think there are two major issues with your code:

Your node includes the game state. This is taking way too much space. Your node should only contain the played move. Every time you enter a node, you should instead update a Game variable with the node move. Replaying a game will not be less efficient that copying a game state.
I guess you are initializing your nodes at startup. Don’t do that. Allocate your data and initialize a node once you need it.

LoganWlv · May 29, 2021, 5:21pm

Ok so the idea would be to recompute each Game states from the current state when finding a new node, instead of storing computed Game states for reuse. I’ll try that thank you.

For the second point, I confirm you that’s what I’m doing. In Java, If I’m not wrong
Node[] nodes = new Node[5_000_000];
will allocate memory for 5M references, which should be cheap somehow.

What I thought was expensive is when creating the actual node using its constructor

Node() {
  //set default values
}

That’s why I tried to create nodes at the first turn.
And from what Magus said, he seems to do that as well?

(Btw I’m familiar with Java but a total beginner with C++)

darkhorse64 · May 29, 2021, 5:53pm

Yes, you must create nodes at the first turn with a do nothing constructor.

Magus · May 30, 2021, 8:43am

My Node class is like this:

class Node {
public:
    Node* childrenBegin;
    Node* childrenEnd;
    unsigned char grid;
    unsigned char action;
    float visits;
    float wins;
};

And like I said, at the first turn, I create 5000000 nodes. The creation is as simple as this: Node pool[5000000];

burger4d · June 28, 2021, 5:27am

I have a little problem with mi AI because I can’t know who begins the game… Can someone help me for this?

burger4d · June 28, 2021, 5:29am

and in consequences, I don’t know what is the first move of my opponent…

Cendretoundra · June 28, 2021, 7:29am

If I remember correctly you do have the last move of the opponent as an input. If the first last move is -1 it means you are the first player, otherwise you are the second

Wontonimo · July 9, 2021, 3:44pm

Sometimes this is a programmers first experience with using Bitboards and MCTS. To this end I created a small example of how to implement Bitboards for plain TicTacToe. Instead of MCTS, this example has just Monte Carlo (also called MC, FlatMC or 1ply MC) origional and updated with feedback from darkhorse64

The example shows how to effectively use bitboarding to get 500,000 simulations within 99 milliseconds, and how something as simple as plain Monte Carlo can achieve good informed results. I know of 3 CGers LazyMammal, BugKiller_328, and KP56 who used Bitboarding and plain MC to achieve Gold, Gold, and Silver respectively.

Plain MC is very straight forward to implement. You effectively just play random games and track which starting position resulted in the most wins. It can also be entirely reused to make MCTS when you are ready.

Wontonimo · July 9, 2021, 6:59pm

@MSmits and many other CG regulars also helped and shared code for how to calculate all 8 win conditions in 3 operations by using C++ intrinsic. I’ve added the code to a tech io snippet that you can play with here The free knowledge-sharing platform for technology . This is only slightly faster than having 8 in an array and iterating over them, but every nano second counts in UTTT!

darkhorse64 · July 9, 2021, 7:59pm

A few improvements for your snippet.

Use explicitely player ids 0 and 1. Now, switching player is just
player ^= 1;
and playing a move becomes
players[player] |= 1 << move;
The status is useless. Let your playout return the score
moveScores[i] += simulateBoard.randomPlayout(playerId, moves[i]);

Wontonimo · July 19, 2021, 4:18pm

Thanks darkhorse64. I’ve made the suggested changes and updated the post to have the updated code. I also deleted a bunch of dead code.

OlivierRivoyre · September 22, 2021, 7:56pm

I’ve try that, but I’ve noticed the processing is far slower while it is the opponent turn. I guess that CodinGame simulator is limiting the CPU ressource during this time. At the end, this method allows an increase around 20% instead of the 100% I expected.

edit: I’ve used just one thread to do use the CPU while waiting for Console.ReadLine().

darkhorse64 · September 23, 2021, 11:20am

Multithreading is useless for your bot. CG allows only one thread for running a game.

xoposhiy · October 30, 2021, 5:42am

I am trying to optimize my C# MCTS-solution and can’t understand how do you reach so large numbers of rollouts.

What I have done already:

Board is two bitmasks: for X and for O.
GetMoves is something like (X | O).GetBitsSet() implemented with TrailingZeroCount technique.
Win is determined by comparing with 8 win-sequence bit-masks.

It gives me ~2k playouts on the second turn with max tree depth ~8. And 140-100 place in the Gold Legue.

Profiler shows me, that

30% of time I spend in GetMoves function
20% of time I spend in the UCT function (log and sqrt calculation I think)
10% — in HasWinSequence function.

I have no idea how to do 25k rollouts instead of 2k
I will be grateful for any hints.

jacek1 · October 30, 2021, 3:09pm

Do you have bitboard?
Do you have table for every possible moves for every possible states?
Do you have table for wins for every possible state?

Additional:
Do you reuse tree from previous turn?

darkhorse64 · October 30, 2021, 3:19pm

Your implementation seems on the right track

For the win check, you can do much better by using the bitboard as an index into a 512 sized lookup table.
Your GetMoves is using lots of time for checking bits in a mask. There is something to dig here.
C# is a nice language but it’s not fast enough. You should really consider using C++ or Rust