Winter Challenge 2024 - Feedback and Strategies

thebitspud · January 6, 2025, 11:51pm

#24 Legend

Postmortem:
Cellularena PM - thebitspud (Rank 24, Legend)

gg to everyone who participated!

reCurse · January 7, 2025, 2:20am

#1 Legend

You may find my postmortem here: reCurse's Codingame Winter Challenge 2024 Postmortem · GitHub

Thank you!

delineate · January 7, 2025, 5:21am

Thanks for the fascinating postmortem! A few questions:

Does the policy directly output by PPO resemble anything like a nash equilibrium, so you could just move according to that? Or does search make it significantly stronger?
In the Search section, when you say “generate only a few likely candidate moves for each player, which I ended up fixing at 4”, are these complete actions (sets of moves)? (If so, how did you generate different move sets in the multi root case?)

FrictionlessBananas · January 7, 2025, 8:25am

Legend #13

My solution uses a greedy search with a heuristic evaluation function. I perform a breadth first search to find paths consisting of sequences of actions, and run the evaluation function on the board state resulting from each path. I compute the value of each path as the increase in heuristic score divided by the path length – in other words, the score gain per turn. I choose the highest-value path to keep, then search again starting only from organisms that have not yet moved, taking into consideration grid cells and proteins that are already used by previously chosen paths. The process repeats until no more positive-value paths are found. I execute the first action from each chosen path.

When my search reaches a new cell, I consider any plausible organ (e.g. harvester pointing at a protein, tentacle not pointing at a wall) as the last action in the path, but only one default organ for continuing the search to longer paths. The default organ is typically the one whose protein cost I can most afford (pointing in a good direction if possible), but is sometimes overridden to opportunistically harvest or attack. Sporers are also allowed to continue the search via sporing.

In a typical BFS, once a cell has been reached for the first time, it is marked visited and is ignored when the search reaches it again later. But in this problem, reaching a cell via growth is different than reaching it via sporing because only growth can create useful organs like harvesters and tentacles. So, I consider those two cases to be distinct states that can be reach separately. Also, paths of equal length reaching the same cell are not necessarily equally good due to things like absorbing or harvesting different proteins along the way. To expand the number of paths considered, I search twice: once preferring horizontal moves and once preferring vertical moves. Thus, each cell can be reach in four ways total (grow vs. spore and via horizontal- vs. vertical-preferring paths).

Late in the contest, I restricted the path length to 7 actions based on local testing.

My evaluation function includes a number of hand-tuned pieces:

the number of proteins of each type harvested per turn
the current number of proteins of each type
a notion of territory, including:
- the portion of the grid closer to me than to my opponent
- the portion of the grid within a distance of 2 from my organs (reduced to 1, then 0, towards the end of the game to encourage filling the board)
the number of roots

Many of these conditions use an exponentially decreasing function, so for example the first harvester is worth twice as much as the second, etc. The conditions are evaluated separately for both players and subtracted.

While the evaluation function is symmetric, the search is totally asymmetric because it assumes the opponent makes no moves. I tried running the search from the opponent’s perspective and using the first turn’s worth of opponent actions when perform the search from my perspective, but it made things worse. Eventually I found one way to incorporate opponent actions: in the evaluation function, I determine which of my organs the opponent can kill in one move and which proteins the opponent can absorb in one move, and treat them as though they don’t exist. I only do that when computing my half of the heuristic score.

reCurse · January 7, 2025, 1:22pm

Does the policy directly output by PPO resemble anything like a nash equilibrium, so you could just move according to that? Or does search make it significantly stronger?

By nash equilibrium, do you mean directly using the policy as a mixed strategy? In some ways I guess you could in theory. I noticed the policy distribution tends to get very sharp for critical moves, and less so when the network is uncertain (multiple moves could be good or bad, or a dilemma is encountered) or the outcome is considered decided anyway, so moves don’t matter much.

However I have not really encountered a situation or a game where sampling the policy gave better performance. Probably because a few more things must be considered. First, the policy is entropy-regularized (as per PPO) during training, which means we force the distribution to be more open so it keeps exploring different moves, otherwise training can get stale and make the policy collapse. So it may not mean the alternative moves are any good in most situations, but may still be worth exploring in some. Second, neural networks are lossy, noisy and imperfect. They have blind spots, they have errors, so search helps grounding the network in reality and smooths out the noise. In my local psyleague, adding search gives around +4 points vs just using max-policy.

In the Search section, when you say “generate only a few likely candidate moves for each player, which I ended up fixing at 4”, are these complete actions (sets of moves)? (If so, how did you generate different move sets in the multi root case?)

They are complete actions built with the iterative policy method I outlined. I just used a very simple variation on the first move, so the most likely combination is generated first, then it starts over by masking the first move it did from the previous set and building another action, until it reaches the limit. It’s most likely not the best way to go about it, but it’s simple and worked well enough in practice.

Zylo · January 7, 2025, 3:12pm

Yes, it’s true—I ended up in 22nd place with just a 0.08-point gap behind @wala
What did I have? A complex heuristic system that prioritized selecting the best global action in a conditional order, with a very strong emphasis on attack and defense actions.
How did I manage Sporers? I calculated the shortest time to reach important fields on the map (important in terms of map topology and resources that neither I nor my opponent had planted yet). I then selected the most common initial action leading to those fields.
First version of my most common First Action strategy was written in about 2 hours on the first day of the contest, and my program looked like this:

Check if I can attack the enemy with a distance of 2 or 3 from me (1 or 2 empty fields in between).
Check if I can play a first action leading me to the closest enemy root.
Check if I can harvest resources with a distance of 2 (1 empty cell).
Check if I can move to the nearest resource.
… and that was enough to get into the Gold league!

R1FA · January 7, 2025, 9:51pm

Well done!

Inspired by your previous feats, I also went for a RL approach with a CNN, but I was short on time so I only submitted one model on the last day and ended up 309th. This is still not so bad considering the amount of time I lost on many secondary issues (like figuring out how to setup the self-play env, or the NN compression into 100k chars).

I must say, I faced the same issues as you did and solved them similarly (although not that cleanly). I also had a very poorly optimized training code, and probably suboptimal hyperparams. And of course I didn’t have the search. You being able to get all of this working so fast amazes me.

One thing that I think may help your bot if you did not think of it / try it yet: using a U-net. For me this was much better than a ResNet (and much faster!). It allows the model to have a maximal receptive field, which is probably better than the 17x17 of your ResNet. But I only tried ResNets in the early stages of the project so maybe it is actually better. I also quickly tried a vision transformer, but i’m afraid it would be quite slow on CPU.

I also got a faster convergence by pretraining the model on a simple prediction task (predict what will be the state in 20 steps, with my bronze bot playing against itself).

reCurse · January 7, 2025, 10:17pm

That’s a good idea, it does have potential, I might try at some point later out of curiosity. I went with ResNet mainly because I had almost all of the related code optimized and ready from my other CG bots. It’s no secret, I get things running fast largely because of all the time I have spent doing it outside of challenges. Thanks for the suggestion and your input, much appreciated.

IloIlo · January 7, 2025, 11:19pm

Hi! Congrats on getting it working! 309th is not bad. Did you have experience building NNs? I always get inspired by reCurse PMs to try myself but I get discouraged by having very little knowledge to begin with.

R1FA · January 8, 2025, 12:52am

Thanks! Well I work with neural networks daily - I’m a PhD student in that field - which of course helped me a lot. I also had some experience in reinforcement learning, although this is further from my comfort zone.
Still, I am sure anybody can do it, it will simply take more time since you have to learn stuff on the way. This is something you could do right now, so that for the next challenge you will be more prepared ^^

If I have some advice for starting from scratch, that would be:

use Python, at least for the training part. ~~I think ReCurse uses C++~~ (edit: actually they use Python, with some C++ for accelerating stuff like the game simulation), but Python is much more friendly and has many libraries that will do most of the work for you: PyTorch (or Tensorflow) for the neural network part, and Gym + Stable-Baselines3 for the RL algorithms. This is what I used for this project. The only issue with python is for simulating the environment, which is quite slow.
learn about basic neural networks: how they work, how they learn. There are plenty of introductions online, like this one.
learn about reinforcement learning: understand at least the main concepts (agent, reward, observation, action, environment). Once again maybe try out some tutorials like here and here.
test a past codingame challenge. I think the Olymbits (Summer 2024) is a good start, since it is very easily castable to RL. And you have reCurse’s PM to help out!
for having a RL bot in a codingame challenge, I think the roadmap looks something like this:

reproduce the game as a RL environment
use PPO (for instance) to train a NN on this environment. Ideally we want to use self-play (the NN plays against itself), but you may start by having another opponent (simpler to introduce in the framework, unfortunately self-play is not well supported by the libraries so we have to be hacky).
export the trained NN in a codingame bot. This step involves compressing the weights of the NN in a (very long) string (too make it fit under the 100k character limit of codingame). Then in the final bot you uncompress these weights, and use the NN to process the game state to output the action it wants to do.
improve. There are maaaaany things to try for potential improvement. Better hyperparameters, better observations, reward shaping, neural network architecture.
reCurse also used an MCTS to add a search on top of the NN. I think this is not too important, first focus on getting a strong NN!

This message is getting longer than planned, but hopefully it may motivate some to try it out!

Edit: @reCurse you’re the professional here, feel free to correct me if I missed something ^^’

IloIlo · January 8, 2025, 1:22am

Thanks for the detailed response! I’ll try to start with something simple and build up from there. Cheers!

reCurse · January 8, 2025, 2:50pm

Great post. Only a small correction.

I use Python, PyTorch, etc. for training as well. It’s only the game simulation and specific logic that’s in C++ and leveraged from Python.

joelthelion · January 8, 2025, 5:59pm

I’d be interested in the interface you use between your C++ game simulation and python. Do you use batches for better performance? Any specific libraries/frameworks you use?

joelthelion · January 8, 2025, 6:01pm

My approach is to have objects as small as possible, avoiding dynamic allocations as much as possible. So my object creation is quite fast. I guess I should benchmark both approaches…

reCurse · January 8, 2025, 6:21pm

Yes the interactions happen over 512 games at once as I mentioned in the PM. I use pybind11 for exposing C++ as a Python module.

0x6E0FF · January 10, 2025, 10:34pm

you said you use one beam search per organism, are they independent? in what order do you process each root?

IloIlo · January 11, 2025, 1:26pm

I sort them by proximity to rival organs to prioritise attack/deffense. Mainly attack really since I don’t simulate the rival…

aCat · January 12, 2025, 7:07pm

#3 Legend, C++

Congratulations and Acknowledgments

Congratulations to @reCurse for a win, @MSz for the runner-up, and @marwar22 for holding the first place for a long time.

Separate congrats for top-performing representatives of our school: @MSz @marwar22 @gaha @Zylo

Contest timeline

The timeline for this contest is simply a cruelty. Too long, and during the holiday.

Humans (and cats) really deserve some break and not making their only regeneration time during the semester ruined by a coding marathon.

Each year, I wish I had the will to boycott the event, and each time I fail to do so.

Game

Excellent game design.

In terms of AI, it was, of course, awful: multiaction with a variable number of actions, simultaneous moves, huge branching factor, and requiring long-term planning; but yeah, it was challenging to deal with.

Approach

Search

A combination of beamsearch and greedy choice depends on the depth.

Greedy evaluates all generated actions, chooses best, applies it and back to generation, until no improvement or given action depth reached.

There are separate beamsearches for each turn depth, operating on an action-depth level.

Beamsearch uses global and local beams, I remembered the idea mentioned in our Hypersonic paper with DomiKo et al.

My max reachable depth is set to 7 - if I allow my bot to search deeper the results are worse.

Opponent prediction

Beamserach for depth 1
Greedy for larger depths

Evaluation

Classical hand-crafted evaluation, linear combination of features, symmetric (zero-sum).

Main features

organism size, bonus for the player with a larger organism
closer Voronoi cells, bonus for the player with a larger area
resources and income (values for first, <=n, more). There is no decisive answer as to which resource was the most important; weights that worked for me seemed all around the place.
threats: penalties when the opponent organism is near and I don’t have my cell guarded by a tentacle, large penalty when the spot is already guarded by the opponent tentacle
endgame fill: bonus Voronoi-based score after some predefined turn number
plus some other minor weirdo things

Scoring

Score at each depth was s = previous_score * γ + current_depth_score (γ<1), with additional depth-dependent weight for each level.

The score for the final choice is s/number_of_visits + α*initialState_score

So, my scoring formula is rather overcomplicated, but I couldn’t find anything simpler without the loss in performance.

Move generation

sporer only if a predefined number of cells before is empty; the opposite cell has to be wall/opponent
roots not adjacent to my cells, limit to 7 roots
harvesters only when targeting unharvested resource
tentacle only if a cell is close to the opponent
plus other rules if I cannot afford basic

Visualization

Python script producing images based on the agent’s output.

MSz · January 29, 2025, 1:09pm

My PM:

github.com

marekesz/contests/blob/main/2024 - CodinGame Winter Challenge (Celluarena) - PM.md

2nd

Thanks for the organizers and congratulations to @reCurse, @aCat, and @marwar22.

The Winter term including holidays was very inconvenient, as usual.
The game was interesting and the engine was well defined (e.g., without arbitrary pathfinding or dependency on double arithmetic).
Although it seemed too simple for a longer contest (deterministic, perfect information, flat and quite small grid, predictable opponent), which makes ordinary solutions work well.

### Algo

Nested beam search:
The main beam search is over full turns and is very standard.
However, action sets for a turn are computed in independent beam searches with a very small width (2 or 3).
The depth of the main beam is unlimited but in most cases, it runs out of time between depth 2 and 8.

For the opponent or when time is running up, the beam is replaced with a much chaper fast greedy choice: It evaluates each action separately, sort them, and then considers them in that order to build an action set.

The opponent prediction in the first turn consists of calculating three scenarios (potentially three different opponent's action sets).
Then, own actions are applied together with these opponent's choices.
For the eval of own actions, a weighted average is used, but for the next state for the beam, the worst case is used.

This file has been truncated. show original

PatrickMcGinnisII · January 29, 2025, 10:02pm

I’ve been working on the critical objective border cell calculations and pathing as described in your Potential Improvements in your PM, for weeks before I even saw this post. Thanks for providing the seed. I ran it offline and determined that my paths to those objective(s) needed improvement. My CanReach calculation for a given bank on turn 0 is easy enough. Obviously, the cell in the center just needs a Tentacle to control >50% of the board and have enough proteins left to harvest/backfill. Example, a path to that cell only absorbing what is necessary is a distance of 19 (w/o sporing). Two distinct paths exist at dist of 16 w/o regard to absorbing at all. One of them is better with at least 1 spore, reducing dist to 11. The calculation given bank of 3,3,10,7 -1 tentacle -1 spore -1 root is dist=10. Which still appears unreachable. Adding back in the absorbs on the path of 3D 3B 3A gives 13 which is plenty. Then determining if backfill is possible with the remaining protein bank is possible with C at dist 1 and D at dist 1 w/ 2 harvesters. The combinations seem all highly complicated, but the generalized path to the bordercell that Guarantees victory exists. And doesn’t require any kind of full tree search for it. There is a scenario where C at 9,1 is harvested along the way, further reducing the backfill requirement. This gets worse when there is more than one critical border cell, but establishing that closest first approach appears the best. This breaks down completely when it can’t reach even the first one. My testing on other seeds with 2 unconnected bordercells seems to require some reevaluation because the enemy can choose to harvest, defend near the closest cell, and expand further into ‘your’ territory beyond the secondary. Any map with >1 connected bordercells still seem to completely benefit from this kind of turn 0 decision-making imo. So great job on your visualizer! I’d like to figure out how to do it correctly.