Ultimate Tic Tac Toe [Puzzle discussion]

darkhorse64 · January 12, 2020, 1:22pm

Not that I know of

Di_Masta · August 24, 2020, 9:24pm

Hi, I’ll really appreciate if someone could give me an advise, because I’m completely out of ideas.
I’m using MCTS, currently I’m in the top 50 in Gold league, according to the forum I’ve done what’s necessary to beat the MightyCarlo boss and some how I don’t.

I’ve managed to perform at least 30k MCTS iterations on the second turn.
At each iteration:

I select a leave node of the tree, recursively, beginning at the current root:
- At each depth the child node with the best UCT is selected.
- For each node the UCT is calculated:
  (this->winScore / this->visits) + (sqrtf(2.f) * sqrtf(logf(parentVisits) / this->visits))
- If there are not visited children, a random one from them is selected instead
If the selected node holds a board, which is in progress the node is expanded(children nodes with the available moves are created) and the selected node becomes one(chosen at random) of the newly created children.
The board of the selected node is simulated until an end of the game is reached. If there are equivalent mini boards won by each player DRAW is assumed
Depending on, which player is to move on the selected node’s board the result is back propagated (until the root is reached):
- If the “owner player” of the node is the victorious player from the simulation: node->winSocre += 1.f
- If the “owner player” of the node is the defeated player from the simulation: node->winSocre += -1.f
- In case of a DRAW node->winSocre += 0.5f
- Increment the visits count of the node: ++node->visits

At the end of the search I choose to play the move of the child node(of the current root) with the best winScore

If I’m first to move I perform the MCTS for the given time, but always play on (4,4). The tree is preserved between the turns only the root is updated based on the opponent move.

With this logic I’ve managed to beat MightyCarlo around 70% of the games if I’m first, but if I’m second only around 10% of the games. Does this sound normal, I thought in this game the first and the second players have almost equal chances of winning!?

Does my MCTS implementation sounds logical to you?
I’ve read some papers on the subject and still cannot find my mistake, maybe it could be outside of the MCTS…

TBali · August 24, 2020, 9:53pm

Hi, I am definitely not an authoritive expert on MCTS, so hopefully you get more helpful advice from others. I don’t even have 30k rollouts and I am 200 places behind you in Gold.

But I implemented the very same vanilla MCTS (in PHP, just to feel the speed-induced adrenaline).
What I notice as a difference from your description, that in point 4 I propagated back only my wins ( +=1.f) and not the draws and losses. But I don’t know anymore what would be the correct version, maybe it was my misunderstanding of the algorithm. It was almost a year ago… I remember I used this article for description of MCTS (it is for simple tic-tac-toe and example code snippets are in Java but that does not really matter.)

Di_Masta · August 25, 2020, 6:40am

Thanks, I tried your way for the back propagation, unfortunately there isn’t an improvement.

I used the same article for my base code After that I found this paper: https://www.researchgate.net/publication/235985858_A_Survey_of_Monte_Carlo_Tree_Search_Methods
which explains the theory with many details.

Nazdhun · August 25, 2020, 2:40pm

I have been in the same situation for some time, stuck somewhere in the top 100 Gold with ~25k rollouts.
Did you handle all draws correctly? Do not forget to count the small boards even if no player has won…

Is the strength of the gold Boss static or dynamic? If the latter, I would assume with this puzzle beeing around for some time, that it just grew stronger because more and better players are ranked. maybe now you need 50k rollouts? Or at least some more static evaluation rules? Maybe an opening book for the first few rounds?

I am not sure, currently trying to port my bot from Go to Rust, just for the sake of learning another language and maybe better performance because Rust has no garbage collector and a better compiler…

darkhorse64 · August 25, 2020, 2:43pm

You should backpropagate 0 as draw score. Either 0,0.5,1 or -1,0,1 scoring are OK but you should modify accordingly your exploration parameter.
P1 has a 60% winrate for UTTT so chances are not equal.
Your P1 winrate looks OK so I guess your MCTS is working but your winrate as P2 is very low. I suspect you may have a bug when backpropagating as P2 or when computing the result of your rollout. Not switching properly players is quite a common mistake.

Di_Masta · August 25, 2020, 8:09pm

How exactly to count the non-won boards, isn’t the draw resolved by counting just the won boards.
I assume a draw when the won boards are equal for each player.

@darkhorse64 I just tried to reward a DRAW with 0 (do not change the winScore for the nodes) and play with various values for the exploration parameter, but I don’t see an improvement.
I think you are right, I definitely have a bug in the back propagation step, looking closer I see that I get very very low values for the winScore for the children nodes of the second player (no matter if I’m first or second), which seems not logical at all:

The first 4 turns, I go first.
Showing the root’s children stats before and after updating the root for the turn.
Each child is labelled [nodeIdx (move coordinates)]

I think this explains why I loose so often if I’m second, the algorithm converges with wrong winScore for the second player, now I have to find why this happens.

Nazdhun · August 25, 2020, 8:30pm

Yeah, sorry , bad wording by me. That ist what i meant: You cannot assume a Draw automatically on no-moves-left but no ein on the big Board…

Di_Masta · August 28, 2020, 2:16pm

I found this thread on SO:

There they are suggesting that in the selection step when using the UCT function for selecting opponent moves the term this->winScore / this->visits should be multiplied by -1.f, but I haven’t seen this strategy anywhere else.

Isn’t the idea that the opponent also must chooses the best moves for him!?

Bobbadillio · August 28, 2020, 4:04pm

I didn’t see exactly where that value gets multiplied, but I didn’t read all that closely. Multiplying values by -1.f sounds a lot like the “negamax” formulation of tree search, where both sides try to maximize their value but the evaluated move values flip sign at every level.

So if you’re player two, and I’m asking you “what are your values for the following board states?” You might say “[4,5,1,-6,2]”, and I’ll negate those, and take the maximal move from the values “[-4, -5, -1, 6, 2]” and report the value for this position as 6.

Personally, I find that confusing and prefer a minimax formulation that always evaluates a board as “how good is this position for player 1?”

accorp · August 30, 2020, 2:31am

I was struggling the same way as you for a long time and given up at some point (someone pushed me at the end…). Although, it really sound like a P2 issue in your case.
When I create child nodes, I also execute the command for the current player and eval. If there is a winning move I just “trim” all other children and leave just the one and no rollout is performed. This greatly reduces CPU performance, of course, but ensures that if there is a winning move, it shall be played.

Di_Masta · August 31, 2020, 6:33am

Thanks, I’ve implemented your idea and I’ve started to win some more games when I’m second versus the Boss, again it’s not enough to beat him and move to Legend, but at least I’ve gained some confidence that my algorithm is not so bad and moved up around 20 places in the Gold league.

At some point I was also thinking that the winning move should be taken in account with greater weight, but never thought to drop the other children.

accorp · August 31, 2020, 8:36pm

Droping children is always the best, Di!

Jokes aside, this patch brings a lot more determinism to the general algo, would you agree? I now start to wonder if it is the best for the whole playthrough or just the end game. You’ve probably noticed that algo favors quads that are not played before, because it has freedome, and suddenly all hell breaks loose. What I mean is that it might be bringing some unnecessary calculations in the beginning.

Please, do tell if you play around some more, but also what kind of a iteration drop this produced. I’m really lazy and annoyed (everyone saying you just need 25k to go legend… I do <10k, so that’s just total K-rap) to do so any more…

Bobbadillio · September 1, 2020, 5:04pm

For my part, I drop children VERY aggressively in my Rust solution, but some combination of slowness and bugs is keeping me fairly low in Gold. I’m hoping to put some good work in on this in the next few days though

Di_Masta · September 1, 2020, 9:20pm

The main thing I observe is that the games against the boss are definitely longer than before, which I suppose, tells that the boss finds the victory with little more difficulty.

I’ve tried to gather some stats, but each game is different even if I launch the game in the same conditions. Somehow my iterations amount is not very consistent, I always give 98ms per turn (after the first) for the MCTS, but for the second turn I get between 29K and 41K iterations (which is another question!?):

Edit: wrong header

I’ve tried to implement the check for winning move in the playouts, but the performance(and the efficiency) dropped above 5 times.

Bobbadillio · September 1, 2020, 10:11pm

Is there a data formatting problem here? I’m not clear on why mcts iterations/tree node counts are appearing in both of these super-columns, and also the sub-columns in the fourth header row.

Are there really 4 games here, and could you collect more, and possibly including iteration/node count for all of them?

Playing a large number of games between the 2 copies could be interesting. If the first two columns are separate games with MCTS iterations in both, it looks like the win check is slightly faster if anything until turn idx 15, which is very close to end of game for game II but much further for Game I. I find in my AI that iteration count rises dramatically in the final turns, so this variation in speed isn’t conclusive to me that the win check makes your code faster, but that’s my initial take-away from the provided data that I might be misinterpreting.

Bobbadillio · September 1, 2020, 10:15pm

I just took a look at an example game for my bot (370th in gold league, so not great!), and on turn 2 I’m apparently only getting 1358 rollouts (I thought this was MUCH higher!), on my last turn I get 175k rollouts, and 5 turns before the end I get 5192 rollouts.

Uljahn · September 2, 2020, 6:40am

There are 3 types of CPU with different frequencies available for running the code, so you randomly get 2.2GHz, 2.4GHz or 3.0GHz CPU. I guess you could append this info (separated with space) to the output to make online testing more consistent.

Di_Masta · September 2, 2020, 7:01am

Yeap, I’m sorry, the 4th row of the header is redundant. (the image is updated)

Yes, I ran 4 separate games to collect this data, I tried to choose similar ones but still they quite differ.

You got that right, the check enhances the game after some turns, at the beginning it seems that not child with winning move is reached.

I’ve found a bug in my code. I tried to be clever and compute the UCT in my back propagation step, not in the selection step, this way the performance rises to the top, but it isn’t quite right, because the children’s uct won’t be updated with the parent’s visits and a child could be ignored in the selection. So I’ve played quite often with ignoring a root child.

Fixing that bug dropped some iterations but placed me 7th in the Gold league (almost there, man I dreamed about Xs and Os last night)

Di_Masta · September 2, 2020, 7:06am

That explains a lot, thanks, I didn’t know that. I’ll try to collect more consistent statistics.