Winter Challenge 2024 - Feedback and Strategies

#16 Legend

I described my approach here : CG's Winter Challenge 2024 - Neumann PM · GitHub

Congratz to the winners and thanks CG <3

10 Likes

#14 legend

After skipping some number of contests in almost 4 years, I finally got simultaneously free time and game type that I adore: cells, simple rules, no physics. And the game delivered, I enjoyed it very much.

In short, my bot generates some positions after 1 move, estimates them with an estimating function and picks the best several times.

In long:

  1. Generate all possible not pruned single organism actions for both players. List of some pruned actions: tentacle to wall, tentacle to parent if some other non-wall neighboring cell exists, harvester at nothing if neighboring protein cell exists, … Total number of actions can go 800+ in heavy situations.

  2. Estimate all generated actions against the waiting opponent, sort by estimate.

  3. Pick top N my best actions and top N enemy’s best actions, generate and estimate all NxN positions when both actions were taken.

  • Do it not row by row, but in a specific order such that if we reach a timeout then we will be left with a smaller n x n or (n+1) x n set of payoffs of top actions.
  1. Perceive evaluations as payoffs for normal form game NxN (which it is actually) and solve it with some algorithm for normal form games (I just copy pasted my implementation of CFR+ algorithm from Xmasrush 6 years ago). As my estimating function is not perfect, each player picks action with the highest probability instead of sampling according to probability which would be done in the theoretical perfect case.

  2. Fixate 1 picked action for me and 1 picked action for the enemy. Now we will try to find the 2nd action for me (if I have the 2nd root). Exclude from the list of not pruned actions ones by used organism and run simulations and estimations as in stage 1. Add the best action to my move.

  3. We have 2 picked actions for me and 1 action for the enemy. If we have 3rd root, find the best 3rd action similar to stage 5.

  4. Output up to 3 found actions and waits for my other roots (some found actions could be waits too).

Also I have an endgame detection mechanism (when the field is partitioned in 2 parts or players stuck in a tentacle loop) which circumvents aforementioned stages and picks top not waiting actions if number of turns left <= number of cells left to fill. On the last day I saw even more advanced endgame strategy from top bots: sometimes killable opponent cell should not be killed immediately and instead should be killed in the last turn so the opponent will not have time to refill emptied space, but didn’t bother to implement it myself.

A few hacks with estimating function:

  • I will call a cell defenselessly killable by the opponent if some adjacent cell is not attacked by the player and attacked by the opponent (and the opponent has resources to build a tentacle). Also all descendants of such cells are defenselessly killable too. During my BFS such cells do not emit waves, so their territorial influence is not counted.

  • During BFS all sporers fire in all possible cells on the first wave (if the player has enough resources and some extra resources to not be left with 0 proteins of some type). This allowed my bot to build meaningful sporers without ever simulating 2 moves of real game state, but often it started to just build many sporers to stay with potential and never shoot. This forced me to add a bonus for 2+ roots in the estimating function which partially solved the problem.

  • To prevent dying on a map with scarce resources I devised the following. I will call a player sustainable if he harvests A protein or any 2 of B,C,D proteins. When estimating a non-sustainable state I roughly calculate “moves to sustain” - number of moves to get into a sustainable state, and subtract it with some weight from the estimate. With distances to the closest proteins (you will need 2nd closest for C and D also) already computed by BFS it is possible to calculate “moves to sustain” with a long sequence of ifs in O(1) time (without regard that actual grow action may decrease more than 1 closest distance). And if “moves to sustain” > number of possible organs from mined proteins, then the estimate is heavily punished.

My bot exceeded my expectations, I thought it would end at the bottom of legend, since I simulate only at depth 1 and can’t see optimal many-moves structures which could be seen with something like beam search. I was really surprised with the low number of simulations + estimations per turn that my bot can manage, it was around 1000-3000. I expected something closer to 100k since the size of the playing field is only around 200 cells. Probably I should stick to floodfill instead of bfs or dive into bits magic to store each horizontal line in bits of int32.

Codingame team, thanks again for a great contest!

11 Likes

#24 Legend

Postmortem:
Cellularena PM - thebitspud (Rank 24, Legend)

gg to everyone who participated!

11 Likes

#1 Legend

You may find my postmortem here: reCurse's Codingame Winter Challenge 2024 Postmortem · GitHub

Thank you!

30 Likes

Thanks for the fascinating postmortem! A few questions:

  1. Does the policy directly output by PPO resemble anything like a nash equilibrium, so you could just move according to that? Or does search make it significantly stronger?

  2. In the Search section, when you say “generate only a few likely candidate moves for each player, which I ended up fixing at 4”, are these complete actions (sets of moves)? (If so, how did you generate different move sets in the multi root case?)

1 Like

Legend #13

My solution uses a greedy search with a heuristic evaluation function. I perform a breadth first search to find paths consisting of sequences of actions, and run the evaluation function on the board state resulting from each path. I compute the value of each path as the increase in heuristic score divided by the path length – in other words, the score gain per turn. I choose the highest-value path to keep, then search again starting only from organisms that have not yet moved, taking into consideration grid cells and proteins that are already used by previously chosen paths. The process repeats until no more positive-value paths are found. I execute the first action from each chosen path.

When my search reaches a new cell, I consider any plausible organ (e.g. harvester pointing at a protein, tentacle not pointing at a wall) as the last action in the path, but only one default organ for continuing the search to longer paths. The default organ is typically the one whose protein cost I can most afford (pointing in a good direction if possible), but is sometimes overridden to opportunistically harvest or attack. Sporers are also allowed to continue the search via sporing.

In a typical BFS, once a cell has been reached for the first time, it is marked visited and is ignored when the search reaches it again later. But in this problem, reaching a cell via growth is different than reaching it via sporing because only growth can create useful organs like harvesters and tentacles. So, I consider those two cases to be distinct states that can be reach separately. Also, paths of equal length reaching the same cell are not necessarily equally good due to things like absorbing or harvesting different proteins along the way. To expand the number of paths considered, I search twice: once preferring horizontal moves and once preferring vertical moves. Thus, each cell can be reach in four ways total (grow vs. spore and via horizontal- vs. vertical-preferring paths).

Late in the contest, I restricted the path length to 7 actions based on local testing.

My evaluation function includes a number of hand-tuned pieces:

  • the number of proteins of each type harvested per turn
  • the current number of proteins of each type
  • a notion of territory, including:
    • the portion of the grid closer to me than to my opponent
    • the portion of the grid within a distance of 2 from my organs (reduced to 1, then 0, towards the end of the game to encourage filling the board)
  • the number of roots

Many of these conditions use an exponentially decreasing function, so for example the first harvester is worth twice as much as the second, etc. The conditions are evaluated separately for both players and subtracted.

While the evaluation function is symmetric, the search is totally asymmetric because it assumes the opponent makes no moves. I tried running the search from the opponent’s perspective and using the first turn’s worth of opponent actions when perform the search from my perspective, but it made things worse. Eventually I found one way to incorporate opponent actions: in the evaluation function, I determine which of my organs the opponent can kill in one move and which proteins the opponent can absorb in one move, and treat them as though they don’t exist. I only do that when computing my half of the heuristic score.

7 Likes

Does the policy directly output by PPO resemble anything like a nash equilibrium, so you could just move according to that? Or does search make it significantly stronger?

By nash equilibrium, do you mean directly using the policy as a mixed strategy? In some ways I guess you could in theory. I noticed the policy distribution tends to get very sharp for critical moves, and less so when the network is uncertain (multiple moves could be good or bad, or a dilemma is encountered) or the outcome is considered decided anyway, so moves don’t matter much.

However I have not really encountered a situation or a game where sampling the policy gave better performance. Probably because a few more things must be considered. First, the policy is entropy-regularized (as per PPO) during training, which means we force the distribution to be more open so it keeps exploring different moves, otherwise training can get stale and make the policy collapse. So it may not mean the alternative moves are any good in most situations, but may still be worth exploring in some. Second, neural networks are lossy, noisy and imperfect. They have blind spots, they have errors, so search helps grounding the network in reality and smooths out the noise. In my local psyleague, adding search gives around +4 points vs just using max-policy.

In the Search section, when you say “generate only a few likely candidate moves for each player, which I ended up fixing at 4”, are these complete actions (sets of moves)? (If so, how did you generate different move sets in the multi root case?)

They are complete actions built with the iterative policy method I outlined. I just used a very simple variation on the first move, so the most likely combination is generated first, then it starts over by masking the first move it did from the previous set and building another action, until it reaches the limit. It’s most likely not the best way to go about it, but it’s simple and worked well enough in practice.

2 Likes

Yes, it’s true—I ended up in 22nd place with just a 0.08-point gap behind @wala
What did I have? A complex heuristic system that prioritized selecting the best global action in a conditional order, with a very strong emphasis on attack and defense actions.
How did I manage Sporers? I calculated the shortest time to reach important fields on the map (important in terms of map topology and resources that neither I nor my opponent had planted yet). I then selected the most common initial action leading to those fields.
First version of my most common First Action strategy was written in about 2 hours on the first day of the contest, and my program looked like this:

  • Check if I can attack the enemy with a distance of 2 or 3 from me (1 or 2 empty fields in between).
  • Check if I can play a first action leading me to the closest enemy root.
  • Check if I can harvest resources with a distance of 2 (1 empty cell).
  • Check if I can move to the nearest resource.
    … and that was enough to get into the Gold league! :wink:
4 Likes

Well done!

Inspired by your previous feats, I also went for a RL approach with a CNN, but I was short on time so I only submitted one model on the last day and ended up 309th. This is still not so bad considering the amount of time I lost on many secondary issues (like figuring out how to setup the self-play env, or the NN compression into 100k chars).

I must say, I faced the same issues as you did and solved them similarly (although not that cleanly). I also had a very poorly optimized training code, and probably suboptimal hyperparams. And of course I didn’t have the search. You being able to get all of this working so fast amazes me. :slight_smile:

One thing that I think may help your bot if you did not think of it / try it yet: using a U-net. For me this was much better than a ResNet (and much faster!). It allows the model to have a maximal receptive field, which is probably better than the 17x17 of your ResNet. But I only tried ResNets in the early stages of the project so maybe it is actually better. I also quickly tried a vision transformer, but i’m afraid it would be quite slow on CPU.

I also got a faster convergence by pretraining the model on a simple prediction task (predict what will be the state in 20 steps, with my bronze bot playing against itself).

8 Likes

That’s a good idea, it does have potential, I might try at some point later out of curiosity. I went with ResNet mainly because I had almost all of the related code optimized and ready from my other CG bots. It’s no secret, I get things running fast largely because of all the time I have spent doing it outside of challenges. :slight_smile: Thanks for the suggestion and your input, much appreciated.

2 Likes

Hi! Congrats on getting it working! 309th is not bad. Did you have experience building NNs? I always get inspired by reCurse PMs to try myself but I get discouraged by having very little knowledge to begin with.

Thanks! Well I work with neural networks daily - I’m a PhD student in that field - which of course helped me a lot. I also had some experience in reinforcement learning, although this is further from my comfort zone.
Still, I am sure anybody can do it, it will simply take more time since you have to learn stuff on the way. This is something you could do right now, so that for the next challenge you will be more prepared ^^

If I have some advice for starting from scratch, that would be:

  • use Python, at least for the training part. I think ReCurse uses C++ (edit: actually they use Python, with some C++ for accelerating stuff like the game simulation), but Python is much more friendly and has many libraries that will do most of the work for you: PyTorch (or Tensorflow) for the neural network part, and Gym + Stable-Baselines3 for the RL algorithms. This is what I used for this project. The only issue with python is for simulating the environment, which is quite slow.
  • learn about basic neural networks: how they work, how they learn. There are plenty of introductions online, like this one.
  • learn about reinforcement learning: understand at least the main concepts (agent, reward, observation, action, environment). Once again maybe try out some tutorials like here and here.
  • test a past codingame challenge. I think the Olymbits (Summer 2024) is a good start, since it is very easily castable to RL. And you have reCurse’s PM to help out!
  • for having a RL bot in a codingame challenge, I think the roadmap looks something like this:
  1. reproduce the game as a RL environment
  2. use PPO (for instance) to train a NN on this environment. Ideally we want to use self-play (the NN plays against itself), but you may start by having another opponent (simpler to introduce in the framework, unfortunately self-play is not well supported by the libraries so we have to be hacky).
  3. export the trained NN in a codingame bot. This step involves compressing the weights of the NN in a (very long) string (too make it fit under the 100k character limit of codingame). Then in the final bot you uncompress these weights, and use the NN to process the game state to output the action it wants to do.
  4. improve. There are maaaaany things to try for potential improvement. Better hyperparameters, better observations, reward shaping, neural network architecture.
  5. reCurse also used an MCTS to add a search on top of the NN. I think this is not too important, first focus on getting a strong NN!

This message is getting longer than planned, but hopefully it may motivate some to try it out! :smile:

Edit: @reCurse you’re the professional here, feel free to correct me if I missed something ^^’

18 Likes

Thanks for the detailed response! I’ll try to start with something simple and build up from there. Cheers!

Great post. Only a small correction.

I use Python, PyTorch, etc. for training as well. It’s only the game simulation and specific logic that’s in C++ and leveraged from Python.

4 Likes

I’d be interested in the interface you use between your C++ game simulation and python. Do you use batches for better performance? Any specific libraries/frameworks you use?

My approach is to have objects as small as possible, avoiding dynamic allocations as much as possible. So my object creation is quite fast. I guess I should benchmark both approaches…

Yes the interactions happen over 512 games at once as I mentioned in the PM. I use pybind11 for exposing C++ as a Python module.

1 Like

you said you use one beam search per organism, are they independent? in what order do you process each root?

I sort them by proximity to rival organs to prioritise attack/deffense. Mainly attack really since I don’t simulate the rival…

1 Like

#3 Legend, C++

Congratulations and Acknowledgments

Congratulations to @reCurse for a win, @MSz for the runner-up, and @marwar22 for holding the first place for a long time.

Separate congrats for top-performing representatives of our school: @MSz @marwar22 @gaha @Zylo

Contest timeline

The timeline for this contest is simply a cruelty. Too long, and during the holiday.

Humans (and cats) really deserve some break and not making their only regeneration time during the semester ruined by a coding marathon.

Each year, I wish I had the will to boycott the event, and each time I fail to do so.

Game

Excellent game design.

In terms of AI, it was, of course, awful: multiaction with a variable number of actions, simultaneous moves, huge branching factor, and requiring long-term planning; but yeah, it was challenging to deal with.

Approach

Search

A combination of beamsearch and greedy choice depends on the depth.

Greedy evaluates all generated actions, chooses best, applies it and back to generation, until no improvement or given action depth reached.

There are separate beamsearches for each turn depth, operating on an action-depth level.

Beamsearch uses global and local beams, I remembered the idea mentioned in our Hypersonic paper with DomiKo et al.

My max reachable depth is set to 7 - if I allow my bot to search deeper the results are worse. :person_shrugging:

Opponent prediction

  • Beamserach for depth 1

  • Greedy for larger depths

Evaluation

Classical hand-crafted evaluation, linear combination of features, symmetric (zero-sum).

Main features

  • organism size, bonus for the player with a larger organism

  • closer Voronoi cells, bonus for the player with a larger area

  • resources and income (values for first, <=n, more). There is no decisive answer as to which resource was the most important; weights that worked for me seemed all around the place.

  • threats: penalties when the opponent organism is near and I don’t have my cell guarded by a tentacle, large penalty when the spot is already guarded by the opponent tentacle

  • endgame fill: bonus Voronoi-based score after some predefined turn number

  • plus some other minor weirdo things

Scoring

Score at each depth was s = previous_score * γ + current_depth_score (γ<1), with additional depth-dependent weight for each level.

The score for the final choice is s/number_of_visits + α*initialState_score

So, my scoring formula is rather overcomplicated, but I couldn’t find anything simpler without the loss in performance.

Move generation

  • sporer only if a predefined number of cells before is empty; the opposite cell has to be wall/opponent

  • roots not adjacent to my cells, limit to 7 roots

  • harvesters only when targeting unharvested resource

  • tentacle only if a cell is close to the opponent

  • plus other rules if I cannot afford basic

Visualization

Python script producing images based on the agent’s output.

10 Likes