Ultimate Tic Tac Toe [Puzzle discussion]

Not that I know of

Hi, Iā€™ll really appreciate if someone could give me an advise, because Iā€™m completely out of ideas.
Iā€™m using MCTS, currently Iā€™m in the top 50 in Gold league, according to the forum Iā€™ve done whatā€™s necessary to beat the MightyCarlo boss and some how I donā€™t.

Iā€™ve managed to perform at least 30k MCTS iterations on the second turn.
At each iteration:

  1. I select a leave node of the tree, recursively, beginning at the current root:
    • At each depth the child node with the best UCT is selected.
    • For each node the UCT is calculated:
      (this->winScore / this->visits) + (sqrtf(2.f) * sqrtf(logf(parentVisits) / this->visits))
    • If there are not visited children, a random one from them is selected instead
  2. If the selected node holds a board, which is in progress the node is expanded(children nodes with the available moves are created) and the selected node becomes one(chosen at random) of the newly created children.
  3. The board of the selected node is simulated until an end of the game is reached. If there are equivalent mini boards won by each player DRAW is assumed
  4. Depending on, which player is to move on the selected nodeā€™s board the result is back propagated (until the root is reached):
    • If the ā€œowner playerā€ of the node is the victorious player from the simulation: node->winSocre += 1.f
    • If the ā€œowner playerā€ of the node is the defeated player from the simulation: node->winSocre += -1.f
    • In case of a DRAW node->winSocre += 0.5f
    • Increment the visits count of the node: ++node->visits

At the end of the search I choose to play the move of the child node(of the current root) with the best winScore

If Iā€™m first to move I perform the MCTS for the given time, but always play on (4,4). The tree is preserved between the turns only the root is updated based on the opponent move.

With this logic Iā€™ve managed to beat MightyCarlo around 70% of the games if Iā€™m first, but if Iā€™m second only around 10% of the games. Does this sound normal, I thought in this game the first and the second players have almost equal chances of winning!?

Does my MCTS implementation sounds logical to you?
Iā€™ve read some papers on the subject and still cannot find my mistake, maybe it could be outside of the MCTSā€¦

Hi, I am definitely not an authoritive expert on MCTS, so hopefully you get more helpful advice from others. I donā€™t even have 30k rollouts and I am 200 places behind you in Gold.

But I implemented the very same vanilla MCTS (in PHP, just to feel the speed-induced adrenaline).
What I notice as a difference from your description, that in point 4 I propagated back only my wins ( +=1.f) and not the draws and losses. But I donā€™t know anymore what would be the correct version, maybe it was my misunderstanding of the algorithm. It was almost a year agoā€¦ I remember I used this article for description of MCTS (it is for simple tic-tac-toe and example code snippets are in Java but that does not really matter.)

3 Likes

Thanks, I tried your way for the back propagation, unfortunately there isnā€™t an improvement.

I used the same article for my base code :slight_smile: After that I found this paper: https://www.researchgate.net/publication/235985858_A_Survey_of_Monte_Carlo_Tree_Search_Methods
which explains the theory with many details.

1 Like

I have been in the same situation for some time, stuck somewhere in the top 100 Gold with ~25k rollouts.
Did you handle all draws correctly? Do not forget to count the small boards even if no player has wonā€¦

Is the strength of the gold Boss static or dynamic? If the latter, I would assume with this puzzle beeing around for some time, that it just grew stronger because more and better players are ranked. maybe now you need 50k rollouts? Or at least some more static evaluation rules? Maybe an opening book for the first few rounds?

I am not sure, currently trying to port my bot from Go to Rust, just for the sake of learning another language and maybe better performance because Rust has no garbage collector and a better compilerā€¦

1 Like

You should backpropagate 0 as draw score. Either 0,0.5,1 or -1,0,1 scoring are OK but you should modify accordingly your exploration parameter.
P1 has a 60% winrate for UTTT so chances are not equal.
Your P1 winrate looks OK so I guess your MCTS is working but your winrate as P2 is very low. I suspect you may have a bug when backpropagating as P2 or when computing the result of your rollout. Not switching properly players is quite a common mistake.

1 Like

How exactly to count the non-won boards, isnā€™t the draw resolved by counting just the won boards.
I assume a draw when the won boards are equal for each player.

@darkhorse64 I just tried to reward a DRAW with 0 (do not change the winScore for the nodes) and play with various values for the exploration parameter, but I donā€™t see an improvement.
I think you are right, I definitely have a bug in the back propagation step, looking closer I see that I get very very low values for the winScore for the children nodes of the second player (no matter if Iā€™m first or second), which seems not logical at all:


The first 4 turns, I go first.
Showing the rootā€™s children stats before and after updating the root for the turn.
Each child is labelled [nodeIdx (move coordinates)]

I think this explains why I loose so often if Iā€™m second, the algorithm converges with wrong winScore for the second player, now I have to find why this happens.

Yeah, sorry , bad wording by me. That ist what i meant: You cannot assume a Draw automatically on no-moves-left but no ein on the big Boardā€¦

1 Like

I found this thread on SO:

There they are suggesting that in the selection step when using the UCT function for selecting opponent moves the term this->winScore / this->visits should be multiplied by -1.f, but I havenā€™t seen this strategy anywhere else.

Isnā€™t the idea that the opponent also must chooses the best moves for him!?

I didnā€™t see exactly where that value gets multiplied, but I didnā€™t read all that closely. Multiplying values by -1.f sounds a lot like the ā€œnegamaxā€ formulation of tree search, where both sides try to maximize their value but the evaluated move values flip sign at every level.

So if youā€™re player two, and Iā€™m asking you ā€œwhat are your values for the following board states?ā€ You might say ā€œ[4,5,1,-6,2]ā€, and Iā€™ll negate those, and take the maximal move from the values ā€œ[-4, -5, -1, 6, 2]ā€ and report the value for this position as 6.

Personally, I find that confusing and prefer a minimax formulation that always evaluates a board as ā€œhow good is this position for player 1?ā€

1 Like

I was struggling the same way as you for a long time and given up at some point (someone pushed me at the endā€¦). Although, it really sound like a P2 issue in your case.
When I create child nodes, I also execute the command for the current player and eval. If there is a winning move I just ā€œtrimā€ all other children and leave just the one and no rollout is performed. This greatly reduces CPU performance, of course, but ensures that if there is a winning move, it shall be played.

2 Likes

Thanks, Iā€™ve implemented your idea and Iā€™ve started to win some more games when Iā€™m second versus the Boss, again itā€™s not enough to beat him and move to Legend, but at least Iā€™ve gained some confidence that my algorithm is not so bad and moved up around 20 places in the Gold league.

At some point I was also thinking that the winning move should be taken in account with greater weight, but never thought to drop the other children.

Droping children is always the best, Di!

Jokes aside, this patch brings a lot more determinism to the general algo, would you agree? I now start to wonder if it is the best for the whole playthrough or just the end game. Youā€™ve probably noticed that algo favors quads that are not played before, because it has freedome, and suddenly all hell breaks loose. What I mean is that it might be bringing some unnecessary calculations in the beginning.

Please, do tell if you play around some more, but also what kind of a iteration drop this produced. Iā€™m really lazy and annoyed (everyone saying you just need 25k to go legendā€¦ I do <10k, so thatā€™s just total K-rap) to do so any moreā€¦ :blush:

1 Like

For my part, I drop children VERY aggressively in my Rust solution, but some combination of slowness and bugs is keeping me fairly low in Gold. Iā€™m hoping to put some good work in on this in the next few days though :slight_smile:

1 Like

The main thing I observe is that the games against the boss are definitely longer than before, which I suppose, tells that the boss finds the victory with little more difficulty.

Iā€™ve tried to gather some stats, but each game is different even if I launch the game in the same conditions. Somehow my iterations amount is not very consistent, I always give 98ms per turn (after the first) for the MCTS, but for the second turn I get between 29K and 41K iterations (which is another question!?):

Edit: wrong header

Iā€™ve tried to implement the check for winning move in the playouts, but the performance(and the efficiency) dropped above 5 times.

Is there a data formatting problem here? Iā€™m not clear on why mcts iterations/tree node counts are appearing in both of these super-columns, and also the sub-columns in the fourth header row.

Are there really 4 games here, and could you collect more, and possibly including iteration/node count for all of them?

Playing a large number of games between the 2 copies could be interesting. If the first two columns are separate games with MCTS iterations in both, it looks like the win check is slightly faster if anything until turn idx 15, which is very close to end of game for game II but much further for Game I. I find in my AI that iteration count rises dramatically in the final turns, so this variation in speed isnā€™t conclusive to me that the win check makes your code faster, but thatā€™s my initial take-away from the provided data that I might be misinterpreting.

1 Like

I just took a look at an example game for my bot (370th in gold league, so not great!), and on turn 2 Iā€™m apparently only getting 1358 rollouts (I thought this was MUCH higher!), on my last turn I get 175k rollouts, and 5 turns before the end I get 5192 rollouts.

1 Like

There are 3 types of CPU with different frequencies available for running the code, so you randomly get 2.2GHz, 2.4GHz or 3.0GHz CPU. I guess you could append this info (separated with space) to the output to make online testing more consistent.

1 Like

Yeap, Iā€™m sorry, the 4th row of the header is redundant. (the image is updated)

Yes, I ran 4 separate games to collect this data, I tried to choose similar ones but still they quite differ.

You got that right, the check enhances the game after some turns, at the beginning it seems that not child with winning move is reached.

Iā€™ve found a bug in my code. I tried to be clever and compute the UCT in my back propagation step, not in the selection step, this way the performance rises to the top, but it isnā€™t quite right, because the childrenā€™s uct wonā€™t be updated with the parentā€™s visits and a child could be ignored in the selection. So Iā€™ve played quite often with ignoring a root child.

Fixing that bug dropped some iterations but placed me 7th in the Gold league (almost there, man I dreamed about Xs and Os last night)

That explains a lot, thanks, I didnā€™t know that. Iā€™ll try to collect more consistent statistics.