Yeees, I did it
That was quite a challenge!!! After fixing all the bugs I could find Iāve reached 7th in the gold league after that I decided to play with the exploration parameter (sqrt(2.0) - 1.0) looked like working the best, submit -> 1st in gold league for a couple of days I could not think of anything else then I saw that I havenāt almost any battle ended in a draw. I updated my algorithm with 0.5 reward for draw(0 for lost, 1 for win), submit -> again 1st in goldā¦, but I saw less losses and more draws. After playing for several hours with the exploration parameter 0.5 value really balanced the search, submit -> 80th in Legend!!! And I felt really really happy.
So in the end it turned out, as @darkhorse64 said, that the exploration parameter must be adjusted properly, also the pruning proposed by @accorp guided the search in the correct direction.
From this point on, I think the main improvement for my strategy would be to optimize the C++ code, I have an idea how to make it about 2 times faster (but maybe in another parallel universe) or improve the selection phase (I read a ton of literature on the subject).
Thanks to everyone for the suggestions. I wanted to learn MCTS with this game and I think I did.
Great game, great platform, great community and it feels great to see the result of the efforts made!
Apart from optimising your code, there are a couple of simple and generic (I mean valid for any MCTS) things that you can do to improve your bot:
Smarter rollouts: If you have a winning move, play it. If you can prevent your opponent to play a winning move, do it
Solver: when a game nears the end, while building the tree, you reach terminal nodes. If you have a win, it means the previous move was losing. If all possible moves lead to a loss, it means the previous move was winning. This way, you can back-propagate proven results and avoid exploring again solved nodes. This is quite effective: usually you can predict the result of a game 10 or more moves before the end. There are two useful papers to read: āMonte-Carlo Tree Search Solverā and āScore Bounded Monte-Carlo Tree Searchā
I get into gold league with simple heuristics.
No simulations, no complex algorithms - just scoring function for each valid action.
With defs and classes, the code is below a 100 lines.
Magus, do you mind sharing a bit how your Node looks like?
Because I tried to create the same amount, first in Java then in C++ and I could not reach more than ~500k nodes on first turn.
I think there are two major issues with your code:
Your node includes the game state. This is taking way too much space. Your node should only contain the played move. Every time you enter a node, you should instead update a Game variable with the node move. Replaying a game will not be less efficient that copying a game state.
I guess you are initializing your nodes at startup. Donāt do that. Allocate your data and initialize a node once you need it.
Ok so the idea would be to recompute each Game states from the current state when finding a new node, instead of storing computed Game states for reuse. Iāll try that thank you.
For the second point, I confirm you thatās what Iām doing. In Java, If Iām not wrong Node[] nodes = new Node[5_000_000];
will allocate memory for 5M references, which should be cheap somehow.
What I thought was expensive is when creating the actual node using its constructor
Node() {
//set default values
}
Thatās why I tried to create nodes at the first turn.
And from what Magus said, he seems to do that as well?
(Btw Iām familiar with Java but a total beginner with C++)
If I remember correctly you do have the last move of the opponent as an input. If the first last move is -1 it means you are the first player, otherwise you are the second
Sometimes this is a programmers first experience with using Bitboards and MCTS. To this end I created a small example of how to implement Bitboards for plain TicTacToe. Instead of MCTS, this example has just Monte Carlo (also called MC, FlatMC or 1ply MC) origional and updated with feedback from darkhorse64
The example shows how to effectively use bitboarding to get 500,000 simulations within 99 milliseconds, and how something as simple as plain Monte Carlo can achieve good informed results. I know of 3 CGers LazyMammal, BugKiller_328, and KP56 who used Bitboarding and plain MC to achieve Gold, Gold, and Silver respectively.
Plain MC is very straight forward to implement. You effectively just play random games and track which starting position resulted in the most wins. It can also be entirely reused to make MCTS when you are ready.
@MSmits and many other CG regulars also helped and shared code for how to calculate all 8 win conditions in 3 operations by using C++ intrinsic. Iāve added the code to a tech io snippet that you can play with here The free knowledge-sharing platform for technology . This is only slightly faster than having 8 in an array and iterating over them, but every nano second counts in UTTT!
Iāve try that, but Iāve noticed the processing is far slower while it is the opponent turn. I guess that CodinGame simulator is limiting the CPU ressource during this time. At the end, this method allows an increase around 20% instead of the 100% I expected.
edit: Iāve used just one thread to do use the CPU while waiting for Console.ReadLine().