I found it very Interesting that many different player ended(?) up similar heuristical values (me too!)
like scoring:
-100 for item
-1 for accessible tiles
-5 for item on players tile
or 5th turn tie breaking
I used lower values for scoring:
- 1 for obtaining a quest item
- 0.01 for each accessible neighboring tile
- 0.1 for holding a quest item
- 0,1 bonus for being pushed into a quest item
avoid deadlock after 5th repetitive PUSH
I started out with bonus for holding a quest as well, but bizarrely went even higher when I removed the rule… Getting these score functions right is quite hard.
My feedback.
A really good contest (one of te overall best one).
Maybe something on draw would be nice as the ko rule on the go game.
The other drawback was very few developpers competing.
I reached another top-100 (72th), I’m quite happy, I don’t have too much time for contest, thus I cannot compete on top-20.
I go on NegaMax (evaluation = myEval - oppEval) very fast. Depth 1 only as I always have performance issues as I’m using lot of Linq statement in my C# code.
I hope this contest will make reflexion for other guys who want to create one ; there are many cool abstract game to implement as contest.
I ranked 3rd, which is the best ranking I’ve ever had in a CodinGame challenge!
To be honest, this was one of my favourite contests (and I don’t say that only because of my very good ranking ). The rules were simple and easy to understand, and the game was interesting! I think there were only two minor issues: maybe the game was too difficult to enter (even if it was definitely not a problem for me), and a lot of games would end up with a tie.
At the beginning of the contest, I first decided to use a negamax with a simple eval, because I thought it could work pretty well. I managed to go up to mid-gold with this negamax.
After this, I saw that MSmith was ranked 1st, and remembered reading his article about the Smitsimax, so I decided to implement this algorithm. I just had to copy-paste the implementation I use for Coders Strike Back, and tweaked it to make it compile. I was really surprised with the results I got, because it was better than my old AI on the first try. After a few bug fixes and optimisations, I was finally able to enter the Legend Ligue. Then, I finished to code the last features, correct the last bugs, optimize the code, and tadam! I ranked 3rd .
My evaluation function was really simple: I only took into account the number of quests each player has completed.
My exploration was depth 4 (4 actions per player).
I thought I should have started to code earlier, because at the end of the contest I still had a lot of things to do: I didn’t have the time to check the correctness of my code, nor to change the exploration coefficient, I let it to sqrt(2). What’s more, my code wasn’t really fast: I was only able to call the exploration function about 10k times a turn.
Here is the evolution of my ranking:
Merry christmas everyone!
Legend 1st, C++.
Above all, a big thank to Chimney42 , Iuliia , rnkey et Tzupy : that was a great challenge!
The main originality of the game was undoubtedly the importance of simultaneous actions.
Thus my priority was to find an algorithm that could meet this constraint. Decoupled MCTS seems to be ideal.
As a bonus, it also allows to take into account the random part of the game (the appearance of new quests).
I love MCTS
The Monte Carlo Tree Search (MCTS) algorithm consists of 4 steps (selection, expansion, simulation, backpropagation) as shown in this figure.
In its basic form, the simulation step evaluates the value of a position by playing random moves. It must be admitted that in this form the MCTS has not been very successful.
But in recent years a very active community has been offering variants that exploit the specificities of the problems encountered.
The most famous example is the DeepMind algorithm for the Go where an MCTS uses 2 neural networks in the simulation phase (one to evaluate the position and the other to generate “better” random moves) [ref].
MCTS is no longer rigid and becomes a very flexible canvas that leaves a lot to creativity.
It also adapts very naturally to many situations (simultaneous move games, multi-player games, puzzles…).
That’s why I use the MCTS more and more frequently on Codingame: Ultimate Tic Tac Toe (2 players), Tron (4 players), Mars Lander (optimzation)…
Note: I do not distinguish between MCTS and what could be called UCT (UCB applies to Trees).
Decoupled MCTS
Decoupled MCTS is the version for simultaneous move games [ref]. This variation modifies the structure of a node as in the following figure:
In the sequential form (left) each node represents the child node of the corresponding move (L,U,R for Player1 and l,u,r for Player2). In the decoupled form (right) each joint move leads to a child node from a combination of moves for Player 1 and Player 2 (only 7 of the 9 joint moves are shown).
In this variation the selection of a node is done through the simultaneous selection of an action for Player1 and an action for Player2. The term “decoupled” reflects the fact that (in a given node) the choice of the Player2 does not depend on the choice of the Player1.
Let n1 and n2 be the number of actions of Player1 and Player2. Then there is n1 * n2 child nodes but only n1 + n2 evaluations to memorize. Consequently, during backpropagation, we have mainly 2 tables to update for each player (sum_scores[NB_ACTIONS_MAX] and nb_visits[NB_ACTIONS_MAX]) and a common array of pointers (childs[NB_ACTIONS_MAX][NB_ACTIONS_MAX]).
Remark : decisions are not completely decoupled (unlike in the case where a tree is created for each player). The choices are decoupled at a given depth. But at the next depth, the choice of an action is influenced by the opponent’s previous choice, reflecting the reality of the game.
My implementation on Xmas Rush
-
Selection :
The selection method must handle the exploration/exploitation dilemma, i.e. the balance between the width and depth of the tree.
I used the (classic) UCB1 formula. It contains an essential parameter to tune (it is even the only parameter in the basic MCTS) generally noted C. 0.3 is often a good first choice for evaluations between -1 and 1. I chose C = 0.05 for the deep course. -
Expansion :
I used the decoupled variant, described above. Legal actions considered:- for a PUSH turn: the 28 legal pushes
- for a MOVE turn: the accessible cells (limited to 28 maximum to keep the same node tables)
-
Simulation :
According to the last action played during the selection-expansion:
-for a PUSH, evaluate the current position (no additional action)
-for a MOVE, play one additional push for each player according to the following heuristics: try 5 pushes at random, keep the most promising.
Evaluation function for a player : 0.1 * the number of quests retrieved + 0.001 * the number of accessible cells -> score between 0 and 1 most of the time.
Global evaluation : evaluation for me - evaluation for the opponent + bonus/malus “deadlock” (if the game is locked AND if my first push caused a lock -> bonus if the situation is favourable to me / malus if unfavourable) -> score between -1 and 1 most of the time.
Remark : after a push, my simulation validates all accessible quests regardless of distances (approximation to limit the simulation complexity).
This scoring function easily distinguishes a promising push. It is not true for moves. This is the reason for the push played systematically after a move.
This scoring function also produces significant deviations in the case where some quests are accessibles but very small if not. That’s why the C value needs to be so small. -
Backpropagation:
I used the decoupled variant, described above.
Why does it work?
Let’s take a simple situation of Xmas Rush to try to analyze the behavior of the algorithm : let’s assume that we have 2 “killer” actions (2 different pushes allowing to grab the final quest).
Suppose that one of them is selected during the Selection phase: if the opponent’s selection is inappropriate it leads to a good evaluation.
Then, at the next iteration, the corresponding node will be selected again (because its score is now higher).
At the same time, the score of the opponent’s action is decreased. After a few iterations an action that anihilates our murderous thrust is found.
- In a sequential algorithm that means <<the opponent can annihilate our action, it’s not a good move>>.
- But here, as the score is reduced, our Selection will choose another action and at some point it will choose the other killer move.
The same process will take place and sometimes the opponent will choose the action to react to the killer’s first push, sometimes
he will choose the action to react to the second killer push (note that he cannot select effectively because he is not aware of our selection).
- As a result, the probability for these killer pushes to win in the simulation phase can be as high as half. Hence their evaluation
will remain high. In the long term, the probability for each to be played will be 1/2 (as for a human choice in the same situation).
Bonus (new quests) :
My simulation engine reproduces the random mechanism of new quests.
Consequently, the same node can be visited many times but the quests will not necessarily have been the same.
This way known quests will have a better chance of being met than others. Known quests contribute more to the score than others.
So there is no need to adapt the evaluation function to manage this : I MCTS
Thanks a lot for your explanations! Just 2 quick questions:
- Do you copy the state in the node after the actions have been applied, or do you replay them during the selection phase?
- Can you give us an rough estimation of how many times you visited the root node?
I don’t copy the state in the node. Maybe it could improve performance
(I didn’t try in Xmas rush). But the feature about new quest would no longer work.
I visited about 10k the root node. This could be significantly improved.
Many thanks for the PM, Jolindien. Another question if that’s ok: If I understand correctly, the MCTS gives you a matrix of all possible move combinations, but that’s not an action. So how does one decide from this matrix which action to take?
the matrix is only for memorize which is the corresponding child (that’s why I talk about pointer -> child_node)
for exemple, you have 2 actions and the opponent 2 actions:
you need two arrays to memorize the scores :
e.g my_sum_scores = [17.58 12.2] and opp_sum_scores = [7.1 -3.57]
and two arrays for the number of visits my_nb_vists and opp_nb_vists
then you choose your action in function of my_sum_scores and my_nb_visits for you, etc.
if you choose the first action for you and the second for the opponent, the child pointer will be at childs[0][1]
This was the third contest for me. Well, second I actually had some time for… or actually: I realized this contest that 10 days just doesn’t work for me.
So I started with Legend of Code and Magic, which was 4 weeks. Because of work and a toddler at home, I could only start this LoCM after a week. Then I needed 1 week to understand the game, 1 week to develop a bot and 1 week to improve my bot. 4 weeks of work gave me 30th place in legend.
So I started XMas rush on the first day, but that was just enough to read the rules and see some games being played. Then I could not work on the bot for 3-4 days. So when I started again, I wanted to write a monte-carlo depth-limited search tree with simultaneous moves (like the Smitsimax). I spend 5 days trying to write it. I had difficulty finding a proper data structure. I worked on this for the whole Friday and the large part of the Saturday. My wife even went off to some friend with our kid give me the space and time.
After all this, 40 hours before deadline, I finally understood the game and realized my code would not work… I had to largely rewrite it in less then two days.
That’s just too short. I already felt so much stress. I gave up… I ended up never submitting my 1000-lines of (broken) code bot and finished 1212 of 1229 : 17th from the bottom.
PLEASE make competitions 4 weeks again. 10 days is just much too short for me!!!
Congrats for the 1st place jolindien. 1 question: during the simulation phase did you always simulate until the end of game (I’d be surprised) or did you set a depth limit?
Thanks Csipcsirip
I didn’t simulate until the end of the game.
During simulation phase (rollout) I just play one more move or not at all:
<<According to the last action played during the selection-expansion:
-for a PUSH, evaluate the current position (no additional action)
-for a MOVE, play one additional push for each player according to the following heuristics: try 5 pushes at random, keep the most promising >>
(promising = a good score with the Evaluation function for a player)
For many people, what I did was not an MCTS.
It is an UCT (Upper Confidence applies to Tree).
For me, there is enough randomness (during expansion, for additional action after a move, for new quests) to talk about MCTS (Monte Carlo TS)
A MCTS using UCB is called UCT (if i remember well).
But since you are using random to predict the future (for the next quests), it is called ISMCTS. This is what i tried (unsuccessful) in LOCAM. You just do a MCTS algorithm and you randomize what you don’t know.
Anyway, thanks you for the PM.
… and one week to rule them all !!!
I would suggest Halite III
or RAIC
http://russianaicup.ru/
Both of them will last longer than 4 weeks from now. CG has traditionally 10 day contests, which is what the community is used to the most. Personally i prefer the 10 days variant, although slightly different variant might be better suited even for this community, who knows?
Thank you for the suggestions. I know them and am already playing Halite. But these are just too long!
So for me this CG competition is like running the 500m race. Halite is more of a 42km marathon.
I would just like to run the 2km race…
Play multis, code at your own pace. There are many fun ones.
And try to ask in chat, picking the wrong strategy / data structure ends up in frustration and lot of useless code. I’m always open to discuss ideas on chat, challenges or not.
Is there a specific achievement for @RockyMullet and myself for being above beber after ~30min game ?
More seriously, great contest once again! Congratulations to the team who developed it and to the top players of course, for making the competition really worth watching and for the PMs.
I really struggled to simulate the push efficiently, because of the module copy.deepcopy in python which is very slow, and therefore hardly reached the 28 simulations per turn (1 push, 3BFS). I was advised on the chat to try to clone the objects instead of deepcopying them, which I will try next time with more success hopefully.
Why was copy.deepcopy necessary for the simulation?
You can do a full simulation without it. Just use a few for loops. Only needed one for each quest list iteration actually.
And then you can actually implement your own light-weight version which only copies the bits you need for the specific task, instead of the entire list of objects. Why did some folks use it and then complain it’s slow? I simply fail to see the benefits.
So while ignoring my horrible evaluation that clearly didn’t work, i could iterate through all push possibilities for myself and the opponent each turn.
On the move turn i could do BFS x 3 for a total of 3 quest items + another BFS for guessing the 4th item or moving towards the edge!
I used it to save my game state first, and then come back to it after simulating the push and the BFS. Yes, implementing reverse push or using bits would have been faster…but I don’t complain that deepcopy is slow, as if it did not exist, my final rank would be even worse . I blame myself for not finding a better data structure and for failing to implement some simple simulations efficiently, as the game was perfect for this!