Sofia Labs Coding Challenge - Feedback & Strategies

This is the topic where you can tell how you liked the challenge and how you solved it.



I used a UCT forest for both the 1v1 and 1v1v1 part of the game. On CodinGame, I’m aware UCT forests mostly get referred to by the nickname of @MSmits’s implementation of a UCT forest, which is a good read if you are unfamiliar with them. This approach requires a discretization of the action space, which I did by dividing the action space into 12 uniformly spread directions, while using a thrust value of 100 each time.

Boost detection

A key part of my bot, especially for the 1v1, is the detection of boosts used by the opponent. It is not given in the input whether the opponent used boost in the last turn or how many boosts the opponent has left and since this information is essential to be able to properly simulate boosts, it is required to estimate whether the opponent used boost in the last turn based on the game state that is given. A first idea that I had was to see if the opponent’s velocity vector had a large displacement w.r.t. the previous turn. This worked very well if the opponent’s pod was not involved in a collision between turns, in which case it did not work well at all. Thus, something better was required.

After tinkering about a more clever way to detect boosts, I realized that in some cases it is extremely difficult to differentiate between a boost move and a non-boost move. Therefore, I opted to do a somewhat exhaustive check of various possible points (e.g. 1000) that the opponent could have boosted towards. If I see that the opponent’s pod is within some epsilon of the position that I received in the input, then I assume that a boost was used in the previous turn.

While this approach worked much better than my previous approach, it was unfortunately still suffering from incorrect detections, both false negatives and false positives, even after experimenting with the amount of directions and the epsilon. At this point, I’m still not sure why this is the case and am wondering if it could be due to a mistake in my simulation.


Some people may have observed that my bot plays the 1v1 part of the game differently from how most bots play. The peculiarity of it has even brought up the notion of a ‘cheese’ strategy. However, the playstyle of trying to prevent the opponent from reaching the next checkpoint is my best attempt at approximating optimal play and does not seem to provide a (trivial) counter, which by definition excludes it from being a ‘cheese’ strategy. To the best of my knowledge, the 1v1 part of the game boils down to ‘block or get blocked’.

An issue with this part of the game is that the set of maps that is used in the game tends to lead to highly unfair starting positions. While the pods start symmetrically w.r.t. the first checkpoint, they are not symmetric w.r.t. the second checkpoint. This results in one of the pods being on the inside of the turn towards the second checkpoint and the other pod being on the outside of the turn. The pod on the inside has a tremendous advantage. It may even be the case that the pod on the inside is always in a theoretically won position, but due to the simultaneous move aspect of the game I do not feel confident enough to conjecture this. Nonetheless, I was unable to find a counterexample and thus I would like to challenge others to find one.

The easiest way to resolve this would be to make the maps fully symmetric, i.e. to make all the checkpoints collinear. With this setup, I imagine a strong bot will be able to punish small tactical inaccuracies of a weaker bot.


In the 1v1v1, it was my observation that two roughly distinct strategies emerged on the leaderboard. The most common strategy, and the one which I also used, is to use the provided boosts in an aggressive manner at the start of the game with the aim to obtain an advantageous position and maintain it throughout the rest of the game. The other strategy, which to my knowledge only @Neumann implemented successfully, was to start the game at a slower pace, often ending up in third position, and later on use the saved boosts to surpass the others to ideally get the first place or at least secure the second place.

I do not think either strategy is inherently superior, although the slow strategy was better suited to play against the bots that are present in the leaderboard. If the game were to run for a prolonged time, then I suspect that some kind of strategic equilibrium would be reached. This is because a slow bot has an advantage when playing against two fast bots (common case), but does not have an advantage when playing against two other slow bots or against a fast bot and a slow bot. An analogous statement can be made for fast bots, thus showing that neither has an inherent advantage.

I tried out both strategies, but was unable to match @Neumann playing the slow strategy. Therefore it was my decision to sacrifice some of my overall win rate by playing the fast strategy, so I would at least have a good win rate against the other top three participants, since @Neumann’s pod most likely won’t be able to surpass my pod if I manage to win the early battle against the other player.

I’m curious how @Neumann managed to become so proficient at the slow strategy, especially the jump from Saturday to Sunday was incredible.

Lastly, I want to thank Sofia Labs, CodinGame and all of the participants for their efforts. I think it’s great that we have the opportunity to play these kinds of games with such a high level competition on a convenient platform.



Congratulations to the top 3 who were really above the rest of the leaderboard. Many thanks to those who organize this contest and to all participants. It was a pleasure to chat with you.

My early bot was a variant of the well known -3*v CSB heuristic. Given that the turn rate is not limited, there is much less drifting and the coefficient has to be different. After various tests, I found out that values between 2.0 and 2.5 were much better. I also performed a boost right after reaching a checkpoint. Despite its simplicity, it was a quite strong bot which I believe would have ranked around 40 in the final leaderboard.

My initial goal was to design a CSB-like GA bot: train my opponents against heuristic bots and train my bot against the trained opponents. Unfortunately, I could not find an evaI for a believable bot behaviour.
I then focused on improving my trajectory ignoring completely possible collisions with my opponents. Lessons learned from Search Race were useful. I kept also the idea of boosting early after each checkpoint.

This “fast” strategy proved quite efficient even sometimes against top 3 (I won one of my two games against Daporan !).


4 th
I used this challenge to try Simultaneous GA, some setup as described by pb4 to me during our work on Bit Runner 2048, but with tweaks I figured by reading the book ‘Algorithms to Live By’ during my easter holiday.


  • Create 1 population for each player of 20 genomes each, randomize them.
  • While there is time: Run 20 x NumPlayers fights where you have a sorted list of genomes (sorted by score)

This code creates fights and ensures each genome is used at least once:

                for (var i = 0; i < Amount; i++)
                    DoFight(p1.States[i], p2.Contestant, p3.Contestant, eval);
                    DoFight(p1.Contestant, p2.States[i], p3.Contestant, eval);
                    if(Game.NumPlayers == 3)
                    DoFight(p1.Contestant, p2.Contestant, p3.States[i], eval);

The Contestant value is found by using this code to generate a normalized picking order of the genomes, as they were sorted by score:

I guess there is a better way, but it would create an array with 1000 items of the values 0 - 20, favoring the better items more. Might be better to normalized based on the difference in score, but that’s for next time. Then simply pick a random item and use that as the index to pick genomes:
public static int GetTarget() => Chance[rnd.Next(0, 1000)];
  • A genome contains the details about Usages, TotalScore and AvgScore.
  • After all the fights on 1 round: (20 x NumPlayers)
    • Find the new AvgScore of each Genome
    • Multiply TotalScore and Usages with a discount factor of 0.999
    • Sort each population, keep the 10 best genomes and generate 10 more with mutations, randomization or crossovers. I also add some tiny score for the amount of usages.
  • When time is up, pick the genome in the final population which is used the most, not the highest score as that often is a genome with 1 usage being lucky…
    • Why pick the one with the most usages now? Because this is the one that survived the most times vs the current best population of enemy tactics.

I have found some flaws to this approach:

  • Hard to detect if a population is filled with totally equal genomes
  • Counters of strategies might get lost in a previous population as strategies fluctuate back and forth
  • Many games are run with the exact same player vs enemy

Thanks to CG for providing a free platform to have this amounts of fun. GG
And well played to the top 3 :slight_smile: (and the rest)


Nobody else? :slight_smile:
For the record, I adapted my CSB GA, found bugs, it was very average and I finished in an inconsequential position.
It was, however, very much fun and I wish I’d joined the contest earlier in the week.
Very demanding leaderboard, some great bots in there.
Well done to @Daporan and the rest of the top - that was some close competition!
That’s the closest you’ll ever get to a PM from me :stuck_out_tongue:
Also: @Illedan interesting approach!


Congratulations @Daporan and thanks for the ping. I haven’t followed this contest closely, but it seemed like a fun one. UCT forest seems like a very apt name for the algo.

Hope to see you a little more on CG. Maybe on the next contest?


If I google “UCT forest” I get some image of people on yoga mats, but if I search “smitsimax” I get the nerdy stuff I was looking for.

I rather prefer very specific names over common names (Breakthrough game…), so I’ll keep calling it Smitsimax no matter what.

I don’t think calling it smitsimax helps. But doubt UCT forest is the correct / complete name since google would return the right results if it were.

CadiaPlayer seems to be the most accurate name for it:

See search results for reference:

EDIT: nope, nevermind, that’s not the right name for it either although it’s linked in the smitsimax article

Yeah i skimmed through that cadiaplayer article and I wasn’t sure. Maybe that’s not what it is. Haven’t seen any other stuff that is similar, but I haven’t looked that hard.

EDIT: by the way @Marchete, I get the yoga mat thing too! Maybe the great google brain thinks we like the same stuff.

14th :cry:
I tried a sort of concurrent Particle Swarm Optimization, with a rather huge swarm (~ 300 particles), and a depth of 5 turns.

Particles were initialized uniformly in position (each 360°/N degrees), and the search space was limited for thrust to 100 and boost (boost were systematically tried on first two depths at initialization).

After testing some configurations, I ended with a strong “personal best” search factor vs a weak “global best” one. The idea behind this being that in practice the variables of the search space are correlated (changing the orientation of the first depth deeply impacts the following moves), so it did not really make sense to try to make deeper moves converge if they do not share less deep moves.

To counter that, what I did that really worked was to start optimization with particles uniformly distributed for ~ 50 iterations, and then each 50 iterations, the swarm was reset at the global best solution, with uniformly distributed random speeds. So during the first cycle of optimization, the overall draft of the moves was found, and then fine tuned by next cycles of iterations.

I ran in fact one swarm optimization per pod, which gave satisfying results.

Overall, the optimization part was not the hard one for me. I somehow got lost with my cost functions, adding more and more tuning variables to it, and ended up unable to find a cost function able to compete with the bests.

That’s something I’ve learned during this contest, and will try to better do next time: keep the cost function simple, and don’t loose control over it (when I had 8 different variables weights to tune, it became impossible…).


@jfaixo Do you find much improvement using PSO over a (rolling horizon) GA?

I did not try to solve it with a GA, so I cannot compare. It is rather a personal preference for this kind of problem, because I can “easily” visualize what is happening.

But in practice the two approaches are very close : each particle was composed of Depth pairs of (rotation, thrust) (similar to what you call rolling horizon I think), and each particle evolve by being influenced by its own velocity vector, and personal best and global best solutions. Depending on how you manage your GA mutations, it can be very close.
In fact there are some PSO variants that introduces mutation operators…

For me the main advantage was that it is very quick and straightforward to code a PSO (it’s less than 200 lines all included), without having to think about how to implement mutations, and also pretty easy to tune (there are only 3 hyperparameters involved).