Not that I know of
Hi, Iāll really appreciate if someone could give me an advise, because Iām completely out of ideas.
Iām using MCTS, currently Iām in the top 50 in Gold league, according to the forum Iāve done whatās necessary to beat the MightyCarlo boss and some how I donāt.
Iāve managed to perform at least 30k MCTS iterations on the second turn.
At each iteration:
- I select a leave node of the tree, recursively, beginning at the current root:
- At each depth the child node with the best UCT is selected.
- For each node the UCT is calculated:
(this->winScore / this->visits) + (sqrtf(2.f) * sqrtf(logf(parentVisits) / this->visits)) - If there are not visited children, a random one from them is selected instead
- If the selected node holds a board, which is in progress the node is expanded(children nodes with the available moves are created) and the selected node becomes one(chosen at random) of the newly created children.
- The board of the selected node is simulated until an end of the game is reached. If there are equivalent mini boards won by each player DRAW is assumed
- Depending on, which player is to move on the selected nodeās board the result is back propagated (until the root is reached):
- If the āowner playerā of the node is the victorious player from the simulation: node->winSocre += 1.f
- If the āowner playerā of the node is the defeated player from the simulation: node->winSocre += -1.f
- In case of a DRAW node->winSocre += 0.5f
- Increment the visits count of the node: ++node->visits
At the end of the search I choose to play the move of the child node(of the current root) with the best winScore
If Iām first to move I perform the MCTS for the given time, but always play on (4,4). The tree is preserved between the turns only the root is updated based on the opponent move.
With this logic Iāve managed to beat MightyCarlo around 70% of the games if Iām first, but if Iām second only around 10% of the games. Does this sound normal, I thought in this game the first and the second players have almost equal chances of winning!?
Does my MCTS implementation sounds logical to you?
Iāve read some papers on the subject and still cannot find my mistake, maybe it could be outside of the MCTSā¦
Hi, I am definitely not an authoritive expert on MCTS, so hopefully you get more helpful advice from others. I donāt even have 30k rollouts and I am 200 places behind you in Gold.
But I implemented the very same vanilla MCTS (in PHP, just to feel the speed-induced adrenaline).
What I notice as a difference from your description, that in point 4 I propagated back only my wins ( +=1.f) and not the draws and losses. But I donāt know anymore what would be the correct version, maybe it was my misunderstanding of the algorithm. It was almost a year agoā¦ I remember I used this article for description of MCTS (it is for simple tic-tac-toe and example code snippets are in Java but that does not really matter.)
Thanks, I tried your way for the back propagation, unfortunately there isnāt an improvement.
I used the same article for my base code After that I found this paper: https://www.researchgate.net/publication/235985858_A_Survey_of_Monte_Carlo_Tree_Search_Methods
which explains the theory with many details.
I have been in the same situation for some time, stuck somewhere in the top 100 Gold with ~25k rollouts.
Did you handle all draws correctly? Do not forget to count the small boards even if no player has wonā¦
Is the strength of the gold Boss static or dynamic? If the latter, I would assume with this puzzle beeing around for some time, that it just grew stronger because more and better players are ranked. maybe now you need 50k rollouts? Or at least some more static evaluation rules? Maybe an opening book for the first few rounds?
I am not sure, currently trying to port my bot from Go to Rust, just for the sake of learning another language and maybe better performance because Rust has no garbage collector and a better compilerā¦
You should backpropagate 0 as draw score. Either 0,0.5,1 or -1,0,1 scoring are OK but you should modify accordingly your exploration parameter.
P1 has a 60% winrate for UTTT so chances are not equal.
Your P1 winrate looks OK so I guess your MCTS is working but your winrate as P2 is very low. I suspect you may have a bug when backpropagating as P2 or when computing the result of your rollout. Not switching properly players is quite a common mistake.
How exactly to count the non-won boards, isnāt the draw resolved by counting just the won boards.
I assume a draw when the won boards are equal for each player.
@darkhorse64 I just tried to reward a DRAW with 0 (do not change the winScore for the nodes) and play with various values for the exploration parameter, but I donāt see an improvement.
I think you are right, I definitely have a bug in the back propagation step, looking closer I see that I get very very low values for the winScore for the children nodes of the second player (no matter if Iām first or second), which seems not logical at all:
The first 4 turns, I go first.
Showing the rootās children stats before and after updating the root for the turn.
Each child is labelled [nodeIdx (move coordinates)]
I think this explains why I loose so often if Iām second, the algorithm converges with wrong winScore for the second player, now I have to find why this happens.
Yeah, sorry , bad wording by me. That ist what i meant: You cannot assume a Draw automatically on no-moves-left but no ein on the big Boardā¦
I found this thread on SO:
There they are suggesting that in the selection step when using the UCT function for selecting opponent moves the term this->winScore / this->visits should be multiplied by -1.f, but I havenāt seen this strategy anywhere else.
Isnāt the idea that the opponent also must chooses the best moves for him!?
I didnāt see exactly where that value gets multiplied, but I didnāt read all that closely. Multiplying values by -1.f sounds a lot like the ānegamaxā formulation of tree search, where both sides try to maximize their value but the evaluated move values flip sign at every level.
So if youāre player two, and Iām asking you āwhat are your values for the following board states?ā You might say ā[4,5,1,-6,2]ā, and Iāll negate those, and take the maximal move from the values ā[-4, -5, -1, 6, 2]ā and report the value for this position as 6.
Personally, I find that confusing and prefer a minimax formulation that always evaluates a board as āhow good is this position for player 1?ā
I was struggling the same way as you for a long time and given up at some point (someone pushed me at the endā¦). Although, it really sound like a P2 issue in your case.
When I create child nodes, I also execute the command for the current player and eval. If there is a winning move I just ātrimā all other children and leave just the one and no rollout is performed. This greatly reduces CPU performance, of course, but ensures that if there is a winning move, it shall be played.
Thanks, Iāve implemented your idea and Iāve started to win some more games when Iām second versus the Boss, again itās not enough to beat him and move to Legend, but at least Iāve gained some confidence that my algorithm is not so bad and moved up around 20 places in the Gold league.
At some point I was also thinking that the winning move should be taken in account with greater weight, but never thought to drop the other children.
Droping children is always the best, Di!
Jokes aside, this patch brings a lot more determinism to the general algo, would you agree? I now start to wonder if it is the best for the whole playthrough or just the end game. Youāve probably noticed that algo favors quads that are not played before, because it has freedome, and suddenly all hell breaks loose. What I mean is that it might be bringing some unnecessary calculations in the beginning.
Please, do tell if you play around some more, but also what kind of a iteration drop this produced. Iām really lazy and annoyed (everyone saying you just need 25k to go legendā¦ I do <10k, so thatās just total K-rap) to do so any moreā¦
For my part, I drop children VERY aggressively in my Rust solution, but some combination of slowness and bugs is keeping me fairly low in Gold. Iām hoping to put some good work in on this in the next few days though
The main thing I observe is that the games against the boss are definitely longer than before, which I suppose, tells that the boss finds the victory with little more difficulty.
Iāve tried to gather some stats, but each game is different even if I launch the game in the same conditions. Somehow my iterations amount is not very consistent, I always give 98ms per turn (after the first) for the MCTS, but for the second turn I get between 29K and 41K iterations (which is another question!?):
Edit: wrong header
Iāve tried to implement the check for winning move in the playouts, but the performance(and the efficiency) dropped above 5 times.
Is there a data formatting problem here? Iām not clear on why mcts iterations/tree node counts are appearing in both of these super-columns, and also the sub-columns in the fourth header row.
Are there really 4 games here, and could you collect more, and possibly including iteration/node count for all of them?
Playing a large number of games between the 2 copies could be interesting. If the first two columns are separate games with MCTS iterations in both, it looks like the win check is slightly faster if anything until turn idx 15, which is very close to end of game for game II but much further for Game I. I find in my AI that iteration count rises dramatically in the final turns, so this variation in speed isnāt conclusive to me that the win check makes your code faster, but thatās my initial take-away from the provided data that I might be misinterpreting.
I just took a look at an example game for my bot (370th in gold league, so not great!), and on turn 2 Iām apparently only getting 1358 rollouts (I thought this was MUCH higher!), on my last turn I get 175k rollouts, and 5 turns before the end I get 5192 rollouts.
There are 3 types of CPU with different frequencies available for running the code, so you randomly get 2.2GHz, 2.4GHz or 3.0GHz CPU. I guess you could append this info (separated with space) to the output to make online testing more consistent.
Yeap, Iām sorry, the 4th row of the header is redundant. (the image is updated)
Yes, I ran 4 separate games to collect this data, I tried to choose similar ones but still they quite differ.
You got that right, the check enhances the game after some turns, at the beginning it seems that not child with winning move is reached.
Iāve found a bug in my code. I tried to be clever and compute the UCT in my back propagation step, not in the selection step, this way the performance rises to the top, but it isnāt quite right, because the childrenās uct wonāt be updated with the parentās visits and a child could be ignored in the selection. So Iāve played quite often with ignoring a root child.
Fixing that bug dropped some iterations but placed me 7th in the Gold league (almost there, man I dreamed about Xs and Os last night)
That explains a lot, thanks, I didnāt know that. Iāll try to collect more consistent statistics.