Summer Challenge 2024 - Feedback and Strategies

3rd place overall

I’d like to thank CodinGame and Fiverr for organizing and sponsoring this exciting competition. I liked it a lot!

Solution

My bot was based on Decoupled-UCT. Its tree search and node selection algorithms were borrowed from AlphaZero, but its value/policy evaluation functions were hand-crafted ones. I chose the design hoping for a future transition to RL-based NNs (real AlphaZero), but it remained to be a future work until the end of the competition.

Details:

  • At the beginning of a turn, I constructed a tree only with a root node. In each tree search, I traversed the tree to a leaf node. Then, I evaluated it, expanded it by one level, and back-propagated the evaluation. On node expansions, I simulated each mini-game and started a new game if necessary, except for the skate. For the skate, I simulated only for the first actions, and its state gets frozen for the following expansions.
  • Node evaluations were performed by computing the final outcome of the entire game (i.e. the rating exchange on the leaderboard). I’ll write the details at the bottom of this post, but the final outcome was computed based on the current scores, the numbers of remaining games, and a guessed medal distribution for each ongoing mini-game (e.g. player 2 gets a gold/silver/bronze medal with the probability of 50/30/20%).
  • Medal distributions were evaluated differently for each mini-game, but they shared similar themes. For example, in archery, I computed the distribution of positions (i.e. player 1 is at (1,3), (7,3), (4, 6), (4, 0) with the probability of 50/10/10/30% in 2 turns). Position distributions at turn N were computed from the position distributions at turn N -1. Then, I computed the medal distribution comparing the position distributions at the final turn.
  • The position distribution was independently computed per player. I expected each player to be in any of the random/reasonable/serious modes. When a player is in the random mode, it moves uniformly randomly (25/25/25/25%) until the end of the ongoing mini-game. A player in the serious mode plays the ongoing mini-game consistently perfectly. After computing position distributions for all the modes, I took a weighted sum of the distributions for the final turn (not at each turn).
  • At first, I didn’t use the “mode-throughout-the-entire-game” concept and took a weighted sum at each turn, but it resulted in my bot stopping a diving combo earlier than needed because it didn’t expect other players could achieve 10+ combos. So, I introduced the concept to capture the high-level strategy of players.
  • My policy function was a rather simple heuristic based on statistics. I think the quality wasn’t so good.

Minor details about node evaluations:

  • I computed the final outcome of the entire game. As described above, I first guessed the medal distribution of the ongoing mini-games. Then, I computed the final score distribution for each mini-game using the current score, the medal distribution, and the number of remaining games (e.g. if a player can get a gold/silver/bronze medal at 50/30/20% for the ongoing game while it’s current score is 8 and there’s 1 game remaining, it can get 8/9/10/11/12/14 points with probabilities like 8/18/8/23/…/17%). Finally, I could compute the final outcome by multiplying the per-mini-game final score distributions and comparing them among players.
  • We get the final total score distribution by multiplying scores of the 4 mini-games. However, it’s prohibitively expensive to calculate all the combinations in a naive way. So, I took the logarithm of scores, then I scaled and rounded them to integers. This converted the multiplication operations to additions (i.e. log(X * Y) = log(X) + log(Y)). Then, I could perform polynomial multiplications to get the final distribution (i.e. a score distribution of {32: 40%, 8: 20%, 4: 10%, 1: 30%} can be converted to {0.4 * y^5 + 0.2 * y^3 + 0.1 * y^2 + 0.3 * y^1} with a binary logarithm).
  • Polynomial multiplication can be performeed efficiently by FFT, Karatsuba and other algorithms. I tried a few approaches, but it turned out to be most efficient to use a naive algorithm implemented with SIMD. That’d be because I limited the size of the polynomials and the score distributions tend to be sparse (i.e. many 0s).
  • My bot could search around 3000+ times per turn in the slowest turn (or 1000+ on slower CG CPUs).
25 Likes

#1

For such short contests, the most important thing is fighting with time and constantly improving the bot. I had a hard time keeping the tempo. I must say the term around the end of the (pre)school year is unfortunate. Although, it is not as bad as the term including Christmas, which is one of the worst options.

Thanks to the organizers, great idea for a game.
Congrats to @reCurse and @Nanaeda for the excellent competition!
Thanks to @aCat, @gaha, @surgutti, and @Zylo for our discussions and sharing ideas.

Game

The game is interesting yet difficult to play, and the rules in the referee were a piece of cake (compared to the nightmare from Fall Challenge 2023). It was possible to get the proper engine in just a few hours, and there was space for its further nontrivial optimization.

Overview

The algorithm is a variation of DUCT, mostly fully decoupled, which is called Smitsimax here. In the basic part, there are three independent trees of the players of branching 4, but this is extended around the root. The crucial parts are bizarre management of decisions in iterations based on the number of samples and optimizing a lot.

The fully decoupled DUCT is fragile and exploitable, but I supposed that the budget is too small for a more fancy algorithm. Indeed, the number of iterations strongly affects playing strength and the bot would still improve with more iterations.

Furthermore, the behavior varies depending on the time budget and requires tuning for a specific limit; this concerns mostly the things related to exploration. The fact that there were two different processors further complicated it, so I ended up with experiments with two respective budgets, to ensure that things worked good with both. The next step could be providing two separate tunings, but there was no time for that.

Efficiency matters a lot. The following are the results of the final bot with its two previous versions (99% CI). The final bot has doubled, normal, and halved the time budget, respectively.

MSz[time 200%] vs MSz(v-1) vs MSz(v-2):   57.91%  ±1.0  : 48.27%  ±1.0  : 43.82%  ±1.1
MSz[time 100%] vs MSz(v-1) vs MSz(v-2):   54.93%  ±1.0  : 49.55%  ±1.0  : 45.53%  ±1.1
MSz[time  50%] vs MSz(v-1) vs MSz(v-2):   49.94%  ±1.0  : 52.12%  ±1.0  : 47.95%  ±1.1

Algorithm

Iteration

The strategy for simulations (rollouts) is to use a policy, which is a chosen minigame that the player wants to maximize. The policy is chosen by weighting minigames depending on the current and predicted scores. It is kept for the next turns until that minigame ends or becomes fully determined, which means the outcome for that player will not change anymore regardless of the actions. Choosing an action is taking a random one from the best ones for this minigame. The best actions are hardcoded by a set of very simple rules. Stunning (in hurdle and skating) does not disable the policy, but then the player temporarily chooses another minigame and makes a best move for it. This simulation part dominates efficiency; the key is to keep it simple and fast. More sophisticated methods worked better sometimes, but not in the timed setting.

When a minigame ends, the next edition is generated, with the same distribution as in the referee. However, each minigame is generated at most once in a simulation, except for skating, which is not generated. When the generated minigames end, the iteration finishes. The scores are completed with predicted scores from future minigames, depending on their expected number of remaining editions.
These numbers are easily calculated exactly for archery, skating, and diving, whereas for hurdle it was obtained from massive experiments with already good agents.

Playing with trees

For most nodes, the trees are fully decoupled, which means there are only 4 children for the player’s actions. However, we can branch also on the opponents’ actions, which gives 16 possibilities, and on the skating order, which gives 24. I do not want to consider branching on possible generated minigames.
The branching on opponents’ actions is taken in the root, so the branching there is 64. For skating order, a separate tree of a small depth is managed. The latter tree is used for decisions only after gathering a certain number of samples.

The important issue is to take only quality actions in simulations. First, an action is decided by UCT only if the number of samples reaches a certain threshold; otherwise, it uses the policy as above. This gives similar effects as mixing UCT with heuristic values but avoids that expensive formula. Then, there is a surprising mechanism: In contrast with the usual UCT, deciding by the tree starts from a very small exploration, thus mostly based on the choices made previously with the policy. I think this helps by keeping the statistics of the node of relatively good quality. The exploration reaches its normal formula after a certain threshold of samples, which happens only for a small fraction of the nodes at the top.

The final bot performs better when the available budget is smaller, which is mostly the result of the described threshold mechanisms employed in the latest versions. The following are the results with altered budgets but fair – the same for all agents.

[  200% time] MSz vs MSz(v-1) vs MSz(v-2):   54.27%  ±1.0  : 50.21%  ±1.0  : 45.52%  ±1.1
[  100% time] MSz vs MSz(v-1) vs MSz(v-2):   54.93%  ±1.0  : 49.55%  ±1.0  : 45.53%  ±1.1
[   50% time] MSz vs MSz(v-1) vs MSz(v-2):   56.13%  ±1.0  : 49.61%  ±1.0  : 44.27%  ±1.1
[   25% time] MSz vs MSz(v-1) vs MSz(v-2):   59.52%  ±1.0  : 49.28%  ±1.0  : 41.21%  ±1.1
[   10% time] MSz vs MSz(v-1) vs MSz(v-2):   65.74%  ±1.0  : 45.78%  ±1.0  : 38.48%  ±1.0
[:-) 1% time] MSz vs MSz(v-1) vs MSz(v-2):   74.45%  ±0.7  : 72.17%  ±0.7  :  3.38%  ±0.4

The branching and thresholds are specific to the time limit. For example, increasing branching besides the root seems to be better under a larger time limit, but worse otherwise.

[200% time] MSz-IncBranching vs MSz(v-1) vs MSz(v-2):   55.04%  ±1.0  : 50.93%  ±1.1  : 44.03%  ±1.1
[100% time] MSz-IncBranching vs MSz(v-1) vs MSz(v-2):   53.32%  ±1.0  : 51.36%  ±1.1  : 45.33%  ±1.1

Optimization

As the number of iterations is crucial, optimization is essential. There are many tricks measured precisely to give a speed benefit.

The funniest part is precomputing all possible hurdle maps. Then generating a hurdle minigame boils down to get a random id from 1280 possibilities (some are repeated just to ensure the proper distribution). Furthermore, each map is already solved, hence getting the best actions and applying them is essentially done by one lookup and checking the stun.

The average numbers in the second turn over 10 runs on CG haswell:

Iterations:         5 264.6
Game states:      176 527.5
Tree nodes:        15 989.8 + 1 041.3
UCT calculations:  64 241.7

What did not work

Probabilities: An estimation for an unfinished minigame assuming random moves. Not that such probabilities are useless, but too costly compared to greedy decisions, and maybe not as realistic estimations, since the agents do not play randomly. Probabilities require more complex operations and/or big arrays, which are slow to access.

Weighted moves: The benefit of using a policy is not only imitating a realistic play but also in terms of efficiency – in most of the turns, we do not have to even look at the possible actions for the other minigames. Weighting moves or weighting minigames in place of random greedy decisions were too costly.

Restricting actions: Sometimes we know that certain actions are always worse than others, for example, when there is only one active nondetermined minigame. We can forbid choosing these actions in UCT, which normally tries them too. This helps, yet there are two obstacles: it is again too costly, and perhaps disturbs the statistics of a node because the same node can be used for different game states with different excluded actions.

27 Likes

Here is my postmortem: reCurse's Codingame Summer Challenge 2024 Postmortem · GitHub

Thanks CG for the contest and the competitors for being great again!

29 Likes

#4, C++

Thanks to the entire UWr Team: @gaha, @surgutti, @Zylo, @DomiKo, and especially, of course, @MSz.

Game

I am possitively surprised by the game idea. The level of originality was high (which is not easy given how many different games are already on CG). And having one action used in four different games seem hilarious.

On the other hand, the visualization was the worst ever (and I’m pretty sure I had similar thoughts a few contests ago, so CG is “improving” in that regard).

Totally useless:

  • player actions are not shown in a proper way (you need to look at diving - if it is not restarting).

  • no player messages.

  • it doesn’t show track bars and player positions in hurdle, no info about remaining stun. (CG used the ‘on hover’ feature years ago in-game visualizations - did they forget??)

  • no position/distance of player pointers in archery.

  • no traveled distance & stun in skating.

  • not even an option for a debug mode to pretend there may be some useful data there. No - it’s useless.

Algorithm

My approach and used techniques are overall pretty simple:

  • standard DUCT (one independent tree per player)

  • simulations one-generated-game-deep (except skating, which I do not generate at all - it is just computationally costly chaos - removing it lead to improvement)

  • solver for known games, which allows optimal moves for them

  • simple greedy policies for generated games. I tried multiple fancy options for skating (taking into account stunned opponents, etc.), but nothing was significantly better than some simple 3-rule policy. For archery it is just 1-turn greedy to be closest to the center.

  • policy moves return sets (and then I choose randomly), which works better than assuming there is only one best move, especially for hurdle/archery.

  • long term policies per player (which is choosing a game to try to optimize), which can deviate to short term policies if we are stunned. Games are chosen with uniform random - I wasted a lot of time trying some smarter choices here, but it wasn’t working better.

  • use policy move selection instead of UCT for nodes with a small number of visits.

  • the evaluation function is based on medal points (with some basic assumptions on future games) and differences in scores between me and opponents.

OK, maybe it’s not THAT simple, but honestly, no extremely fancy things inside.

Optimizations

  • the engine doesn’t simulate games that are useless. This include games which will not finish before turn 100, and games which should be generated second time, as others are still going during sim.

  • dynamic optimal move for hurdle (also for generated ones) with at most three array lookups. I didn’t pre-generate all possible scenarios.

  • general rule: keep frequent parts as light as possible, precompute everything you can (but note the access cost to big structures).

Closing thoughts

Thanks to CG for organizing the challenge; the entire community for having good time together and see you at the next one.

(BTW: CG could you PLEASE make announcements faster - more than a few people really do plan tasks around your contests so info less than two weeks before is a sad joke.)

21 Likes

I like the concept, but I struggle a bit, I think. I need to practice more

Hi! Can I ask what other optimization contests on other platforms? :slight_smile:

Hi sprkrd!

Here is an almost complete list of most common online coding competition:

HackerRank

Topcoder

CodeChef

Google Code Jam

LeetCode

International Collegiate Programming Contest

Google Hash Code

HackerEarth

ICFP Programming Contest

Microsoft Imagine Cup

Codewars

Project Euler

Advent of Code

Company contests

Codeforces

Coderbyte

GeeksforGeeks

Battlecode

Halite

Kaggle

I have two or three others in mind.
I guess someone will complete and will adjust this list (it will be perhaps me).

Keep practice and come back here for each spring summer fall and winter challenge.

CodeVanguard :cloud_with_snow::bat:

Code Jam and Hash Code are discontinued.
Also, OP is referring to optim contests, not general CP contests

Ended up #58 Legend.

Better late than never…
I’m really proud I could go so far despite my lack of experience. I’d love for this PM to inspire ppl with better coding skills to implement this approach and show me how far it can go =) Pls get in touch if you do.

Approach

I used GA. My first implementation would compute the fitness of an individual by blending together performance in the 3 minigames (I never considered skating). I realized quite quickly this approach would hit a hard limit. The game isnt about playing all minigames decently at the same time, but about choosing which fights to take and which to concede.

Enters NSGA II. The main idea is to evolve our populations without any assumptions about which game to prioritize nor how hard. Given two solutions that outperform each other on different games, we cannot say which is better–they are simply not comparable, neither of them “dominates” the other. However, there are plenty of cases where a solution outperforms another in every game; this solution dominates the other one. We can then sort solutions by “Pareto fronts”. Front 0 contains solutions which are dominated by none, front 1 those who are only dominated by front 0, front 2 only 0 or 1, etc.
If you want to delve more into it, I cannot recommend enough the last videos of this amazing playlist : https://youtube.com/playlist?list=PLL61h44ln0J0Pbs2EPR71wn-8wvwxHI9z&si=D4pT3GBfnpDXQtiu

I simulated each player equally, as to obtain (hopefully) every locally optimal solution, by which I mean the best scoring solution for a specific trade-off between games. I then matched my solutions against all the possible pairs of enemy locally optimal solutions, and chose the one giving me the best average “score”. For the first half of the games I add the medals points together so my algo doesn’t think 3 silver first turn is better than 2 golds and 1 bronze, and after a while I plug the actual multiplicative formula.

What worked

  • Eugenism ! It is trivial to find optimal diving strategies. It is also relatively easy to generate perfect runners en masse. There is indeed often plenty of different perfect runners (there are 15 ways to cover 7 empty squares in 3 turns!). Kickstarting my evolution process with multiple of these gave amazing improvements.

  • Much bigger population size than expected. I assumed 100 would be a good number and ended up with 682 (683 would make my population occupy a bigger power of two number of bytes).

  • Using real time instead of cpu time greatly diminished the number of random timeouts I experienced.

  • Going all-in on not playing for skate gave much better results than one could expect. Sure all my loses could be attributed to poor skating performance. But getting better at the 3 other games meant my enemies would either get starved in medals in other games or had to deprioritize skating and play the game on my own terms.

Possible Improvements

  • My non-domination sort is quite crude. I was unable to understand the nlog(n) approach explained in a paper. I was concerned the nn approach that precomputes every domination relations would be too slow. I ended up on a nn worst case but often better, that may be slightly off past front 0. There might be a faster, more exact or even both implementation out there.

  • I did not implement crowding distance computation. That may hurt in two ways : diminishing the genetic diversity during the evolution (even though we are guaranteed some baseline level of diversity due to sorting by non-domination), but most importantly if an opponent has multiple similar solutions on front 0, I will overfit against these due to how I pick which solution of my front 0 to play.

  • The way I pick which of my front 0 solution to play is very flawed… I suppose it worked great in Gold because ppl would pick some locally optimal solution. But assuming we did get all locally optimum solution (most of the time I had < 5, peaking to ~20 a few turns per game), this is a Nash equlibria problem. Iterated elimination of strictly dominated strategies would be a clear, improvement… Then pick an “anytime” nash equilibrium search algorithm.

  • My algo doesn’t even know about the existence of skating. I guess you could consider it a fourth dimension, even though it might just add to much noise in the limited computation time. I tried without success to bias my final choice depending on skating, but with mediocre heuristics. I’m pretty sure my algo would get better scores if I told it to drop everything on the last round to guarantee a gold medal in skating x)

  • I couldn’t set up a local testing environment, so there might be a lot of perf left in optimizing the parameters.

  • When a game ends my population diversity crashes, there must be a smarter way to deal with it. Making the algo understand that if a game has ended we need to keep as much options as possible in the next turns as to accommodate for the next round.

A big thank to the organizing for this thrilling competition !

6 Likes

Hello !

leojean890, 210/5237 (global), 15/1944 (python), 148/980 (gold)

Congratz to CG for having this interesting contest idea and organizing it, and congratz to all players !

My overall approach is a bruteforce (depth 5) applied each turn for each player.

First, I apply it on both opponents moves. I score each reached state at depth 5.

eval : I consider that each opponent will play :

  • a move combination favoring several mini-games at once
  • mini-games with the least medals (3*g+s)
  • mini-games where their opponents have the least medals (to deny them)
  • mini-games for which the issue is NOT certain (if the gap between 2 players is huge in a mini-game, we may sometimes consider that the issue is fully or almost determined) => I use some ponderation about this criterion too
  • I use lower coeff for skating and normalized coeffs for other mini-games
  • I ignore mini-games that won’t end before turn 100 and I don’t restart games finished during my depth 5 sim with random parameters.

I store/return each opponent’s state maximizing this eval at depth 5 to use it in the second part.

Second, I apply this bruteforce on my own moves. I score each reached state at depth 5.

I use almost the same eval than described upper but mostly considering if I’m beating the opponents “best reached state at depth 5” and at which point in each mini-game.

There is another criterion in the eval of my moves with bigger coeff : I add virtual medals (who is currently winning in each mini-game, or won it ?)
and I calculate the real score (multiplication) of each player considering the “new” amount of medals per mini-game per player.
This criterion is strongly ponderated in the eval of my moves.

I play the move maximizing this global eval.

4 Likes

Hi,

Sorry for the late reply. It was mostly the ICPC challenge sponsored by huawei (codeforces). I did this 3 times. But also did atcoder once or twice and some topcoder.

There also were russian AI cups, very interesting bots programming contests (harder than CG and lasting for 1 month ; nice local visualizer/tester available). But I’m not sure what it will be then in the future :slight_smile:

Finished 349/5237 around top 300 in the Gold League.

This is my best result in a contest, so far and it marks my 3rd contest finishing in the Gold League. I know it’s hardly impressive, but it is for me, I see progress, indeed very small and rather slow, but I feel improvement in myself.

I always have very hard time managing my time for contests and I usually end up with most of the time spend on the proper simulation of the game state and I always complain that games are very hard to simulate, because we have to take into account physics, vectors, randomness and so on. But when I saw this task, I instantly realized that it would be friendly for simulation and with low branching factor, I said to myself that this is my time to shine :slight_smile: So what I did:

I was very busy during the contest we had releases at work, I had to teach several lectures at a university course, 2 family vacations, … But I somehow managed to code for the contest at least 3 hours a day, mostly during the night, unfortunately.

Wood 2 league
I output the action, which gives the farthest position on the hurdle track, without a stun for my player.

Wood 1 league
Things started to get tricky. Tried several “if/esle” strategies, nothing worked. At the end I’ve evaluated each move and summed up the positions of my players, choosing the move with the best sum

Bronze/Silver leagues
Quickly realized that “if/esle” strategies won’t work. Decided to make the simulation of the game state right away, as excepted this time a decent simulation was done in a couple of hours, referencing the official referee.

Afterwards I’ve tried to evaluate the moves, but didn’t like the approach. For first time I’ve decided to implement early(before the silver league was opened) a search algorithm for the game and I was pretty sure that a Genetic Algorithm would do very well. I’ve considered the task as an optimization one, try to improve my score as much as possible, that’s mainly why I chose GA. I’ve managed to implement it relatively quickly using almost the same approach and code that I’ve described in my CG blog post.

The most important thing in a Genetic Algorithm is the evaluation function. My idea was to simulate the mini games until the first Game Over and evaluate each separately:
HURDLE: closer to the finish, better
ARCHERY: closer the center, better
SKATING: more distance traveled, better
DIVING: bigger score, better
For each mini game evaluation I also include bonuses for gold and silver medals. If I do not have medals in a mini game I divide the evaluation by 2.

And this actually worked pretty well when the Silver and Gold leagues were opened I was directly promoted in them automatically. I couldn’t believe the results :slight_smile:

Gold league
Since my GA played well so far I thought that it’s the right choice and I could reached the Legend league using it. I’ve tried improving the simulation, the evaluation function, the initial population to start with the perfect moves for all mini games, tweaking the GA parameters… but unfortunately nothing worked and for days I didn’t improve my rank.

I’ve consulted the discord server and saw that there were other players using GA with better ranks than me. But the main topic was that majority of players are using MCTS. I had implement MCTS on CG before but only for sequential games. In one simultaneous game I’ve managed to use MCTS. I’ve tried to port it for this game, but only a couple of days were left and I couldn’t.

Wins
I’m very happy that I’ve managed to implement a search algorithm early in the contest.
I’ve tried to improve my approach based on my remarks from the previous constest.

Mistakes
Considering the task as an optimization one was wrong. I didn’t include the opponent score in the final evaluations of the state.
I’ve ignored MCTS until the end of the contest.

Overall I liked the contest, I even gоt a little obsessed with it and afterwards I had a very big desire to reach the Legend league and I’m happy to report that I’ve managed :slight_smile:

Reaching top 100 in a bot contest is still on the list of my dreams :slight_smile:

1 Like

Can I ask you a question? Sorry ot bother you.
I want to know how do you choose the game to play when it’s not random?

I don’t know what you mean. You’re going to have to be more specific, referring to my post mortem.

You said :smiley:

I truncated my UCT forest after 5 moves. That meant I didn’t expand the tree beyond that and instead finished randomly. Finishing it completely is worse because your bot will think it can do a perfect diving game for example, because the other games don’t exist (in reality they do)

How do you backpropagate after 5 moves ? On which nodes ?

On the nodes of the UCT forest. You need to look into how that works before you can understand the post mortem. I have an article about it on my profile (a playground called Smitsimax)

Hi, Can I ask you a question ?
You said:

“mcts style” rollouts: I do fully random rollouts until all minigames are ended. So the depth can vary depending on the state of the turn (if all minigames needs to be finished, the tree will be deeper). For skating, I randomize the gpu at each search iteration (that feature was a noticeable boost)

What is a roll out? How do you select a node ? you choose your node by uct and after you do a full random rollout, and backpropagate?
Thank you !!! :smiley:

In a plain old MCTS, you do your selection with UCT, and when you reach a non expanded node, you expand it, pick a random child, and from there you do a simulation (which is also called a “rollout”), by playing randomly until the end of the game (more or less).

Answers to your questions depends on the algorithm, i’ll try to explain for smitsimax, as it is the one I last submitted.

With smitsimax you keep one MCTS tree per player, and play with all of them simultaneously. So basically:

  1. for each player, do a selection in its own tree (ie pick the action with UCT). If in a tree you reach a leaf node, expand it, and pick an action randomly (hence the “rollout”). This gives an array of actions (one per player). Play it in order to move on to the next gamestate.
  2. Repeat 1 until you reach the end of the game (in this case, I stopped when all minigames were finished, as I said).
  3. Once done, score the gamestate for each player, and backpropagate in its own tree.

With smitsimax, a node in a tree does not relate to a specific game state, but more to an aggregation of what can happen if a player follows this line of play (as the values that will be backpropagated will correspond to many situations this line of play ended up into, while other players did what they thought to be best for them…which can be different across iterations, as “what is best” evolves independently for each player).

As MSmits said, you have to understand the algorithm (and from my point of view, the best is to implement it to understand it :slight_smile: )

Hi, thank to answer !!!
Can you see my implementation on Github. I want to know if it’s a good Smitsimax implementation. The Smitsimax begin at line 745:
This is the link : Olymbits/olymbits_codingame.py at main · regismeyssonnier/Olymbits · GitHub