Summer Challenge 2025 - Post Mortem

jfaixo · July 29, 2025, 3:41pm

Hello, I finished #26 in Legend.

Firstable, thanks to CG to keep hosting these events, these are always nice moments, and I’m particularly glad to have been able to properly participate to this one, as I failed to have enough time on some of the previous ones (kids…).

Approach

My approach is very similar to other ones already described above: UCT Forest (aka smitsimax)

Some details

as the game allows it, I decoupled moves and actions, using two depth of nodes per turn (depth 1 is move nodes, depth 2 action nodes, depth 3 move, etc). It allows to drastically diminish the number of nodes in trees. For each turn, I do the following:
- Pick a move node per agent out of the 5 possible
- Do the move part of the simulation with the moves picked by all agents
- Pick an action node per agent in the pool of the valid actions => So the action is picked knowing all the other agents future positions. I don’t think that’s an issue, as it makes the most plausible/interesting valid actions to have the biggest N count (for example a highly interesting action, but that can almost never be played, will have a low visit count)
- End the simulation of the turn & score
I allow almost all actions (only some closer bomb targets are not generated) for the first 2 turns, and starting at turn 3, bomb actions are not generated to avoid having too many nodes
during action selection, I filter out some undesired actions: self damaging bombs, and bombs that have a chance not to hit
I implemented the sim with TDD, unit tests being based directly on game inputs, which allowed easy refactoring without breaking my test suite. Bitboarding only for the scoring part

Things that didn’t went well, or tried and spectacularly failed

To compensate the fact that move collisions can make a whole backpropagation being done in the wrong tree branch, I tried to “fix” the tree, by detecting collisions after the move simulation, and changing the selected node to the “no move” node to better reflect what is happening. It made my agents be almost fully static… I still don’t understand truly why this failed so hard (well, I can see that it’s because it makes the “no move” node over selected, but still…)
My search depth was too high (depth 5, when games often ended in ~25 turns…), but never took the time to fix it properly, preferring instead to try to enhance my evaluation function. That was a bad call, as my number of full smitsi iterations was a bit low (~4k when all 5v5 agents are alive)

Flamefire · July 29, 2025, 6:10pm

How did you get the NN evaluation into CG?

There aren’t any relevant packages available for that, are they? So I figure one would need to do a basic implementation of the required layers manually. NumPy likely helps a lot to keep the code size low, but when using e.g. C++ it sounds infeasible including whole (optimized) matrix multiplication code although a basic one is easily possible.

So I’m wondering if there is a better way or it’s just that.

R1FA · July 29, 2025, 7:10pm

I used Python so yeah I relied on Numpy for handling tensors, matrix multplication, etc.
I don’t know C++ well enough so I’m not sure, but I think a naive loop would still be fast enough.
Also, I actually didn’t care much about speed since I only needed to evaluate the NN once per turn. But if the NN was used in a search algorithm, then you would care more about optimization.
I Think reCurse did implement their previous NN bots in C++, and with searches.

Sheeesh · July 29, 2025, 10:08pm

top 10 using reinforcement learning here is my post mortem:

gist.github.com

https://gist.github.com/Sheee-sh/8a1d43b26961e0991e3b185a44828ae5

post_mortem.md

### What i did:
 For the first time after 10 codingame contests i finally managed to get top 10 on a contest! by changing my approach and instead of relying on my brain for heuristics i relied on the machine!

### Why i wrote this post mortem:
 Most rl implementations are for normal environments like (lunar lander) which are not the same as selfplay environments. i hope this helps people on learning RL.
 Also i only worked on Deterministic grid-based games for now, i still haven't worked on a Coordinate-based game or a game involving stochasticity. so this may not work on everything.
 Keep in mind that I'm no researcher or anything, it's all trial and error and asking other people for help, so what i'm gonna say can be wrong.

### Summary:
 My bot is a model trained using ppo , the model archeticture is a copy cat Unet of size 106k with nothing added except removing two layers, there was only an end reward -1,0,1 for each player based on result + difference of points between players (it matters in case of tie or turn limit reached).

This file has been truncated. show original

bowwowforeach · July 29, 2025, 11:11pm

I use separate trees for movement and combat.

The movement tree selects actions without considering the combat tree.
Similarly, the combat tree selects actions without considering the movement tree.
(However, I did restrict obviously meaningless actions, such as attacking agents that will be out of range due to movement.)
Backpropagation of rewards is also handled independently for each tree.
So effectively, there are twice as many trees as there are agents.

I haven’t measured precisely, but each combat tree likely has around 5 to 10 children:

Do nothing (1)
Attack an enemy agent within range (0 to ~5)
Throw a bomb (0 to ~5)

Telokis · July 29, 2025, 11:37pm

Hey everyone!
I reached 225th (65th in gold) but I still want to share my experience!

Humble beginnings

As I often do with contests, I started using JavaScript so I could quickly prototype something and get a feel for the rules.
This worked rather well and I simply did the most naive approach: if a bomb touches at least 2 enemies, throw it (not considering my agents or enemy movements). If not, try to shoot and aim for the highest value of dmg / (100 - wetness) which would prefer targets that are closer to death. Finally, my movement would simply move me closer to the nearest enemy.

Quite the simplistic logic but it was enough to get me started and understand the rules.
Then I decided it was time to switch from JS to Rust.

Fighting the urge to search

Having switch my JS algorithm to Rust, I was obviously getting similar results.
I hesitated a lot about the approach: refine the heuristics or move to a search algorithm like MCTS.
I used MCTS exactly once in the past, for Ultimate Tic-Tac-Toe.
I didn’t feel very confident about applying it to this contest because of the simultaneity of the game. So I decided to instead focus on smarter heuristics.
I started with the obvious one: consider moving BEFORE throwing/shooting. My heuristic was so simple I could see that my opponents would almost always shoot/throw one turn before my agents.
This wasn’t difficult in practice: find every cell I can move to and then use my throw/shoot heuristics to compute a score. Move to the most promising cell and perform the most promising action.
This resulted in a big performance improvement, as one would expect.

I started working on a full simulation of the referee, just in case I decided to perform a search in the end.
It took more time than I’d hoped but was interesting to do. (Thanks a lot to Codingame for providing the referee’s code, that’s very useful for simulations and self-play)

Despite my decision not to use a search, I spent about 4 hours trying to make a crude version work every evening before going to sleep thinking “That’s stupid, I should focus on heuristics, my search don’t yield any result whatsoever”.

Another heuristics improvement I made was the handling of the sniper agent. I decided (through several hardcoded if statements) that the sniper would play like a sniper and find a nicely positioned cover to hide behind until there were less than 3 agents alive on my side.
This had a very positive impact on my scoring. The sniper would deal damage while being shielded from it and a little out of reach for bombs. And having an agent stay at low wetness greatly helped with scoring as well.

Using a search anyway

After reaching the gold league, I noticed I had some nice arrays containing potential actions from my heuristics. Moving East and throwing a bomb at A had the same score as moving North and throwing a bomb at B.
Up until now, I arbitrarily selected the first of the list but I saw the opportunity for a smaller scale search.
After wondering about MCTS one last time, I decided it was the time to try a Beam Search for the first time of my life.
The search would be very limited. My previous attempts naively selected all moves and all possible actions which obviously resulted in a gigantic search space I had no hopes of traversing. This time, though, I would drastically reduce scope:

If an agent can shoot a bomb, its only possible actions are all throws/moves that result in the max bomb score.
If an agent can shoot, its only possible actions are all shots/moves that result in the max shot score.
If an agent can do neither of those, its actions are the 5 possible movements.

Moreover, the enemies would not use a search but instead my previous heuristics. This is certainly not ideal since I would effectively be fighting a weaker version of myself but at least I had an opponent to simulate.

My beam search was ready. I arbitrarily chose a width of 5 with a depth of 5 to get started. Timeout.
Haha my bad, I overestimated the numbers. Width of 5 with depth of 2. Timeout.
Okay now it’s getting ridiculous. Width of 3 with depth of 2. Hey it works! I beat the previous version in 2% of the self matches.

Not surprising, a search is awful if you can’t consider enough situations. After investigating a little more, I noticed that 91% of the execution time was spent in the A* pathfinding. I had asked Claude AI to convert the official pathfinding to Rust. It worked fine when I didn’t care about performance but it was awful when playing a lot of games.
I quickly threw it out and replaced it with a naive BFS that performed no heap allocations at all. Didn’t even show in the profiler anymore.
At this point, I noticed I was able to use a beam width of 20 with a depth of 25. I also made it smarter and it would stop if a time deadline was reached. (So in practice I don’t reach the depth of 25 very often but it goes as deep as it can during the alloted time)

And that’s where I stopped! I’m rather proud since it’s my first time reaching the gold league during a contest. I hoped to reach legend but next time I’ll try to work on the contest during its whole duration instead of only the last week-end…

A word about LLMs

These past few years, I’ve often used LLMs as interactive rubber ducks.
I would give them a prompt like “I’m working on X. I’m gonna dump my brain and you will ask me questions to challenge my approach”. It would then ask me a lot of questions about the topic which would force me to think deeper about it.
It’s an extremely useful approach, I recommend you to try it sometimes.

I’ve also tried a few times to have the LLM write code for me.
My general conclusion is that they are way too stupid to write proper complex/deep code.
However, they can be extremely useful for one-shot, self-contained code.
Concrete examples: a Python script that would perform some specific operations on AWS. A set of unit tests for a function you give it. Or even a small benchmark framework to measure the performance of a function.

Basically, the LLM should not be considered a brain but a pair of hands. The human should perform the thinking but the LLM can execute the plan. We simply need to be very precise and tell it exactly what we want AND how we want it.

Though I’ll admit I’m a little bit salty that Whatever managed to make a bot that reached Legend using Gemini. (Discord message)

The wood leagues

I have moved my feedback about the Wood leagues to a dedicated forum topic.

Thank you

Alright, that’s it for my small post-mortem / feedback. I’m sorry I couldn’t write more, I’m a lazy guy.

Thanks a lot to the Codingame team for these contests and for the platform in general. It’s pure joy for someone like me.
Thanks a lot to the community for the chat/help on Discord and for the post-mortems I really enjoy reading.

See you all in the next one!
Telokis

0x6E0FF · July 30, 2025, 1:24pm

Thank you for the post-mortem, it is really inspiring. If i understand correctly, every beam iteration run the GA? How many iteration the GA uses to find a good combination of actions?

aangairbender · July 31, 2025, 11:40am

12th legend (the same bot is now 8th in multi)

2 depth smitsimax. Eval based on current points, agents hp and distance to opponent.

No opimizations, nothing smart. The same bot with depth 1 was in top1-3 during silver-gold.

FPSCUBE · July 31, 2025, 12:31pm

Hey everyone!
Finished at #49

In short: My AI is a heuristic-driven Min-Max algorithm with depth 1. At a micro level, I generate about 55 possible commands per agent (move + action), then at a macro level I combine these into up to 512 command sets per player. I simulate all 512 × 512 = 262,144 duels with a depth 1 lookahead, evaluate outcomes, and pick the command set that maximizes my minimum guaranteed score.

More in detail:

For each of my agents, I generate about 55 possible commands. Each command combines a move with an action (shooting, bombing, or hunkering). These commands are chosen based on smart heuristics — such as terrain control gain, proximity of enemies or allies, and shooting feasibility from that position.
Then, I combine commands from all my agents into sets, but to keep things manageable, I limit myself to 512 command combinations total for my team. I assume the enemy also has up to 512 combos.
That means I simulate up to 512 × 512 = 262,144 possible duels, with a depth 1 lookahead. I don’t do deeper searches — just a simulation of what happens if I pick a certain command set and the enemy responds.
I use a classic Min-Max approach with depth 1: for each command set I consider, I simulate all enemy responses and keep the worst-case outcome for me. Then I pick the command set that yields the best worst-case result — basically maximizing my minimum guaranteed score.
For evaluation, I consider kills, semi-kills (50% wetness), and wetness inflicted. For terrain scoring, I use only precomputed scores per agent move because computing it dynamically for each simulation would be too heavy. Finally, I use a BFS precomputation to estimate battlefield proximity.
I heavily optimized performance using SIMD instructions and preallocated memory so I can run all these simulations quickly in real time.

peerdb · July 31, 2025, 12:42pm

Legend 55th

I had a great blast playing this event.

I gave up in the wood leagues multiple times because of how strict the goals were and how unrealistic the scenarios were compared to real player games, but luckily I came back and as is tradition spent lots of late nights debugging my own silly mistakes.

For each agent (allies and enemies) I set up a beam search (for enemies of reduced width).

I generate the combat actions separately from the move actions and combine them if the combat action is possible after having done the move.

For each of the other beamsearches I grab the best State and use its actions to generate the new gamestate to evaluate.

This allows for great teamwork but produced problems with throwing bombs (so I miss a lot of them completely).

The heuristic value of the gamestate is determined as such

const Heuristic h = players[playerId].calculateHeuristic() - players[!playerId].calculateHeuristic();

Where the heuristic takes into account controlZones, the amount of alive agents, the amount of bombs left and (negatively) the total wetness of its agents.

Still many issues in my code that maybe I’ll fix once my sleep pattern has recovered.

xSkyline · August 2, 2025, 8:19am

#41 in Legend

My bot uses a UCT forest with one tree per agent, similar to what others did. I’m branching only on the move actions, but I keep score and visit counts for both types of actions at each node. When an agent picks a combat action during simulation, he’s only aware of his move history. Depth limit of 2, but I remove it when my team has 2 agents or less.
When reaching leaf nodes, I do map_height number of rollouts before using an evaluation function. Move actions in rollouts are random, but for combat actions, I pick a single one according to the priority you’d expect: throw if I have a good target, shoot if possible, otherwise hunker_down.

My algorithm is not particularly good, but I still managed to reach Legend for the first time—most likely due to how my workflow improved over time. If you’re like me and you don’t write these kinds of algorithms for a living, there’s virtually no chance you’ll do it without bugs in one go, so the most impactful thing you can do is to make your implement-test-debug cycle as short and painless as possible.

In order of importance:

Have a local setup where you can compile and test your code.
Have a way to reproduce locally any game state from any replay. I have a GameInput structure that I serialize into a space-separated string of values every turn to stderr, which I can parse back locally into a GameInput object.
- If I use random, I use a fast implementation that’s deterministic across platforms (e.g. XorShift) and seed it with the turn number.
- If my algorithm is timed, I also serialize the number of iterations made.
- Don’t use fast-math (-O3) to avoid floating-point non-determinism (C++).
Use sane development practices starting with Bronze league, e.g.:
- Bounds-checked arrays in debug builds instead of raw arrays
- const-declare your magic numbers
- Use a for loop even if your array has very few elements, instead of:

 eval[0] = health[0] + bombs[0] + points[0];
 eval[1] = health[0] + bombs[1] + points[1];

do:

         for (int player = 0; player < NUM_PLAYERS; ++player) {
             score[player] = health[player] + bombs[player] + points[player];
         }

(The version without the loop has a bug that’s pretty hard to spot in a larger codebase.)

(Optional) Use a local arena, e.g., psyleague, cg-brutaltester.
(Optional) Implement your own custom UI game state visualizer, useful if you want to make sure your search does what you’d expect—I didn’t do one for this contest.

I liked the game and enjoyed the contest very much.
Many thanks to the CG staff for organizing it, to the other participants for creating a very competitive environment, and congrats to @bowwowforeach for winning it.

Di_Masta · August 4, 2025, 7:20am

I still managed to reach Legend for the first time

Congrats, I have a dream to also do that someday

     for (int player = 0; player < NUM_PLAYERS; ++player) {
         score[player] = health[player] + bombs[player] + points[player];
     }

Is this indeed your evaluation or you have used something more fancy?

xSkyline · August 4, 2025, 12:26pm

It’s mostly that, it returns 0 for a loss, 3 for a win or if game is still ongoing: a weighted sum of health, bombs and current points difference. This value is in the range (0,3)
I weight the gunner’s health at 2x and bombs are worth 20 health.

_H3R0 · August 5, 2025, 9:48am

A huge thank you to the entire CodinGame team for organizing the Summer Challenge 2025! I had an incredible amount of fun tackling this complex game and pushing my bot to its limits to achieve a final rank of 79th in the Legend league. For anyone interested in the strategy and code behind my bot, I’ve written a detailed post-mortem here: https://github.com/yassine-nid/Summer-Challenge-2025-Post-Mortem

Fluxor · August 9, 2025, 6:08pm

Thanks for this PM, this is very interesting. There something I fail to understand though. After your adversarial GA step, you get a pool of pure strategies. The additional CFR+ step then compute an optimal mixed strategy which is a mixture of the pure strategies obtained in the previous step. And so when selecting the component with highest probability, you eventually end up selecting one of the pure strategy in the final GA population, or do I miss something? I do not really see how the last CFR+ step can improve compared to selecting the fittest individual in the final GA population (especially when fitness is based on the performance against the whole adversary population)…

delineate · August 9, 2025, 8:40pm

Hello, yes, your description is correct, one of the pure strategies is selected at the end. In theory the best unexploitable approach is to play a mixed strategy, but I found that did a bit worse in practice for some reason.

The CFR+ step basically helps filter the GA pool because not all of the opponent’s strategies are necessarily good or some of them might be dominated by others so an opponent should never play them. It often will pick just 3-4 strategies in the pool to mix between for both players, out of the larger pool.

If we’re picking one pure strategy based on performance vs the pool, it’s not clear how to weigh the opponent’s different strategies. If we just do equal weight, it’s possible the pool has lots of slightly inferior strategies and we don’t want to be optimizing for performance against them. CFR+ basically figures out the “weights” for us.

I did play around with a lot of different pool evaluation schemes (for the earlier GA steps), because it’s also not clear if we want to be maximizing the average score, the median score, the min score, or some combination. I found that maximizing the “average of the bottom 20%” of pool scores did the best in testing, which is pretty funny. Adding the nash equilibrium step at the very end did give a really big boost to performance in my local testing.

Fluxor · August 10, 2025, 9:06am

Thanks, it is very clear now!

gaha · August 14, 2025, 9:21pm

Sorry for the late replay.

Yes, I am running GA in every iteration of BS. You can think of the GA as part of “get_legal_moves” function in BS pseudo-code. The number of iterations is tuned (manually) depending on the depth, but for my actions it is somewhere between 20-400 iterations and for the opponent it was between 5-40.