My beam search is only depth 3, so I simulate only 3 turns ahead. My eval function is costly because I run a bfs, so I only perform between 500 and 1000 node evaluations per turn. I probably should have followed blasterpoard approach with a simpler eval function, more depth/width, and different algorithm for figths, but I used the same algorithm for everything.
Thanks everyone for this contest!
Legend 39th here with JS and here is my PM of my bruteforce heuristic AI Winter Challenge 2024 - Cellularena - Apofils PostMortem - Legend 39h - JavaScript Ā· GitHub.
I hope it can help Gold players to have new ideas or to improve their bot to get to Legend!
GL HF!
Thanks for the PM! Quick question: in your experience, is the object pool much faster compared to creating objects on the stack (assuming the objects are of reasonable size, of course)?
Object creation is time consumign yes, so having a pool improves performance. I never benchmarked by how much, but itās significant. You can use the first 1s turn to create every object you will need, and then reutilize them.
Hello everyone !
I finished as Legend #48 (in JavaScript) ! I joined the competition for fun, with no particular expectations, so Iām thrilled to have made it this far. Competing against my brother my brother was a big motivator, pushing me to keep improving right up until the final days ^^
Big thanks to Codingame for organizing this challenge, and congrats to the winners!
I wasnāt keen on using genetic algorithms or Monte Carlo approaches because of the huge number of parameters, interactions and moves to evaluate each turn. Instead, I focused on basic one-turn strategies, which I refined progressively until the final day.
In the first few days, my bot was overly aggressive and quickly ran out of resources, especially on maps with limited proteins. So I improved its protein-harvesting strategies and added defensive actions, like placing tentacles at bottlenecks on the map.
Taking actions
One of my biggest challenges was figuring out how to prioritize actions across multiple organisms while ensuring distant organisms didnāt take resources needed by those engaged in combat.
I end up breaking up the strategy whole in small parts and optimize them separately. Each turn, I find some candidates organisms for each type of action, then go to the next action with available organisms and so on.
- Close Defense
- Two-Cell Defense
- Identify Bottlenecks ā Secure progression using tentacles.
- Spawn New Root
- Harvest Proteins
- Create New Sporer
- Find Missing Proteins ā Pathfind for necessary proteins.
- Cherry-Pick Proteins
- Expand ā Maximize cells reached first.
- Consolidate ā Expand to free cells if no other organisms have valid moves.
- Finish ā Expand to all free cells, even if proteins are harvested.
- Wait
Final Days
In the last day, I squashed some obvious bugs, ensured no two organisms targeted the same cell, and experimented with different action orders. For example: should I prioritize cherry-picking proteins or expanding first?
Introducing Cherry-Pick and Identify Bottlenecks boosted my bot into the top of the Gold League. I made it to Legend on the final night of the challenge!
@Apofils, thanks for your detailed PM! Iāll definitely give visualizer and Psyleague a try for the next challenge.
Bye
#5 Legend
- Find the best d-depth plan. The list of possible moves at each step is pruned to the most promising ones. Assume just one move per turn and the opponent does nothing. d==4 first turn, then later 1<=d<=3 (mostly d==1).
- With the plan(s) from the first step locked in place, keep repeating the first step, greedily adding moves/plans.
Most of the logic was in a complex board evaluation function that involved ~20 different factors. Some which I found most important:
a) # of squares controlled (positive factor): each square is proportionally owned by its distance to each player (~1/distance^3). Then the overall weight of this factor is adjusted depending on the map and income/resources. More closed off maps ā higher weight. More income/resources ā higher weight.
b) Vulnerabilities left behind (negative factor). I had 4 different types of vulnerabilities. For the first 3, eval is done after the moves are processed:
- d1-defendable (a square the opponent could use to attack next turn but itās āblockableā)
- d1-undefendable (unblockable version)
- d2 (one square away from the above)
- d1-defendable-now (this uses the start of turn board position. squares that both players could build on/attack from right now)
There was lots of fine-tuning, where the relative value of everything is highly dependent on other variables. From parameter sweeps, my program found the C resource to be by far the most important one, which is interesting.
I found the game to have a lot of strategic depth with a very high skill ceiling. Reminds me of go, with the local and global characteristics and large branching factor. Congrats to reCurse and the rest of the top 3 for creating such impressive entries and showing what high level play can look like in this game.
#16 Legend
I described my approach here : CG's Winter Challenge 2024 - Neumann PM Ā· GitHub
Congratz to the winners and thanks CG <3
#14 legend
After skipping some number of contests in almost 4 years, I finally got simultaneously free time and game type that I adore: cells, simple rules, no physics. And the game delivered, I enjoyed it very much.
In short, my bot generates some positions after 1 move, estimates them with an estimating function and picks the best several times.
In long:
-
Generate all possible not pruned single organism actions for both players. List of some pruned actions: tentacle to wall, tentacle to parent if some other non-wall neighboring cell exists, harvester at nothing if neighboring protein cell exists, ā¦ Total number of actions can go 800+ in heavy situations.
-
Estimate all generated actions against the waiting opponent, sort by estimate.
-
Pick top N my best actions and top N enemyās best actions, generate and estimate all NxN positions when both actions were taken.
- Do it not row by row, but in a specific order such that if we reach a timeout then we will be left with a smaller n x n or (n+1) x n set of payoffs of top actions.
-
Perceive evaluations as payoffs for normal form game NxN (which it is actually) and solve it with some algorithm for normal form games (I just copy pasted my implementation of CFR+ algorithm from Xmasrush 6 years ago). As my estimating function is not perfect, each player picks action with the highest probability instead of sampling according to probability which would be done in the theoretical perfect case.
-
Fixate 1 picked action for me and 1 picked action for the enemy. Now we will try to find the 2nd action for me (if I have the 2nd root). Exclude from the list of not pruned actions ones by used organism and run simulations and estimations as in stage 1. Add the best action to my move.
-
We have 2 picked actions for me and 1 action for the enemy. If we have 3rd root, find the best 3rd action similar to stage 5.
-
Output up to 3 found actions and waits for my other roots (some found actions could be waits too).
Also I have an endgame detection mechanism (when the field is partitioned in 2 parts or players stuck in a tentacle loop) which circumvents aforementioned stages and picks top not waiting actions if number of turns left <= number of cells left to fill. On the last day I saw even more advanced endgame strategy from top bots: sometimes killable opponent cell should not be killed immediately and instead should be killed in the last turn so the opponent will not have time to refill emptied space, but didnāt bother to implement it myself.
A few hacks with estimating function:
-
I will call a cell defenselessly killable by the opponent if some adjacent cell is not attacked by the player and attacked by the opponent (and the opponent has resources to build a tentacle). Also all descendants of such cells are defenselessly killable too. During my BFS such cells do not emit waves, so their territorial influence is not counted.
-
During BFS all sporers fire in all possible cells on the first wave (if the player has enough resources and some extra resources to not be left with 0 proteins of some type). This allowed my bot to build meaningful sporers without ever simulating 2 moves of real game state, but often it started to just build many sporers to stay with potential and never shoot. This forced me to add a bonus for 2+ roots in the estimating function which partially solved the problem.
-
To prevent dying on a map with scarce resources I devised the following. I will call a player sustainable if he harvests A protein or any 2 of B,C,D proteins. When estimating a non-sustainable state I roughly calculate āmoves to sustainā - number of moves to get into a sustainable state, and subtract it with some weight from the estimate. With distances to the closest proteins (you will need 2nd closest for C and D also) already computed by BFS it is possible to calculate āmoves to sustainā with a long sequence of ifs in O(1) time (without regard that actual grow action may decrease more than 1 closest distance). And if āmoves to sustainā > number of possible organs from mined proteins, then the estimate is heavily punished.
My bot exceeded my expectations, I thought it would end at the bottom of legend, since I simulate only at depth 1 and canāt see optimal many-moves structures which could be seen with something like beam search. I was really surprised with the low number of simulations + estimations per turn that my bot can manage, it was around 1000-3000. I expected something closer to 100k since the size of the playing field is only around 200 cells. Probably I should stick to floodfill instead of bfs or dive into bits magic to store each horizontal line in bits of int32.
Codingame team, thanks again for a great contest!
#24 Legend
Postmortem:
Cellularena PM - thebitspud (Rank 24, Legend)
gg to everyone who participated!
#1 Legend
You may find my postmortem here: reCurse's Codingame Winter Challenge 2024 Postmortem Ā· GitHub
Thank you!
Thanks for the fascinating postmortem! A few questions:
-
Does the policy directly output by PPO resemble anything like a nash equilibrium, so you could just move according to that? Or does search make it significantly stronger?
-
In the Search section, when you say āgenerate only a few likely candidate moves for each player, which I ended up fixing at 4ā, are these complete actions (sets of moves)? (If so, how did you generate different move sets in the multi root case?)
Legend #13
My solution uses a greedy search with a heuristic evaluation function. I perform a breadth first search to find paths consisting of sequences of actions, and run the evaluation function on the board state resulting from each path. I compute the value of each path as the increase in heuristic score divided by the path length ā in other words, the score gain per turn. I choose the highest-value path to keep, then search again starting only from organisms that have not yet moved, taking into consideration grid cells and proteins that are already used by previously chosen paths. The process repeats until no more positive-value paths are found. I execute the first action from each chosen path.
When my search reaches a new cell, I consider any plausible organ (e.g. harvester pointing at a protein, tentacle not pointing at a wall) as the last action in the path, but only one default organ for continuing the search to longer paths. The default organ is typically the one whose protein cost I can most afford (pointing in a good direction if possible), but is sometimes overridden to opportunistically harvest or attack. Sporers are also allowed to continue the search via sporing.
In a typical BFS, once a cell has been reached for the first time, it is marked visited and is ignored when the search reaches it again later. But in this problem, reaching a cell via growth is different than reaching it via sporing because only growth can create useful organs like harvesters and tentacles. So, I consider those two cases to be distinct states that can be reach separately. Also, paths of equal length reaching the same cell are not necessarily equally good due to things like absorbing or harvesting different proteins along the way. To expand the number of paths considered, I search twice: once preferring horizontal moves and once preferring vertical moves. Thus, each cell can be reach in four ways total (grow vs. spore and via horizontal- vs. vertical-preferring paths).
Late in the contest, I restricted the path length to 7 actions based on local testing.
My evaluation function includes a number of hand-tuned pieces:
- the number of proteins of each type harvested per turn
- the current number of proteins of each type
- a notion of territory, including:
- the portion of the grid closer to me than to my opponent
- the portion of the grid within a distance of 2 from my organs (reduced to 1, then 0, towards the end of the game to encourage filling the board)
- the number of roots
Many of these conditions use an exponentially decreasing function, so for example the first harvester is worth twice as much as the second, etc. The conditions are evaluated separately for both players and subtracted.
While the evaluation function is symmetric, the search is totally asymmetric because it assumes the opponent makes no moves. I tried running the search from the opponentās perspective and using the first turnās worth of opponent actions when perform the search from my perspective, but it made things worse. Eventually I found one way to incorporate opponent actions: in the evaluation function, I determine which of my organs the opponent can kill in one move and which proteins the opponent can absorb in one move, and treat them as though they donāt exist. I only do that when computing my half of the heuristic score.
Does the policy directly output by PPO resemble anything like a nash equilibrium, so you could just move according to that? Or does search make it significantly stronger?
By nash equilibrium, do you mean directly using the policy as a mixed strategy? In some ways I guess you could in theory. I noticed the policy distribution tends to get very sharp for critical moves, and less so when the network is uncertain (multiple moves could be good or bad, or a dilemma is encountered) or the outcome is considered decided anyway, so moves donāt matter much.
However I have not really encountered a situation or a game where sampling the policy gave better performance. Probably because a few more things must be considered. First, the policy is entropy-regularized (as per PPO) during training, which means we force the distribution to be more open so it keeps exploring different moves, otherwise training can get stale and make the policy collapse. So it may not mean the alternative moves are any good in most situations, but may still be worth exploring in some. Second, neural networks are lossy, noisy and imperfect. They have blind spots, they have errors, so search helps grounding the network in reality and smooths out the noise. In my local psyleague, adding search gives around +4 points vs just using max-policy.
In the Search section, when you say āgenerate only a few likely candidate moves for each player, which I ended up fixing at 4ā, are these complete actions (sets of moves)? (If so, how did you generate different move sets in the multi root case?)
They are complete actions built with the iterative policy method I outlined. I just used a very simple variation on the first move, so the most likely combination is generated first, then it starts over by masking the first move it did from the previous set and building another action, until it reaches the limit. Itās most likely not the best way to go about it, but itās simple and worked well enough in practice.
Yes, itās trueāI ended up in 22nd place with just a 0.08-point gap behind @wala
What did I have? A complex heuristic system that prioritized selecting the best global action in a conditional order, with a very strong emphasis on attack and defense actions.
How did I manage Sporers? I calculated the shortest time to reach important fields on the map (important in terms of map topology and resources that neither I nor my opponent had planted yet). I then selected the most common initial action leading to those fields.
First version of my most common First Action strategy was written in about 2 hours on the first day of the contest, and my program looked like this:
- Check if I can attack the enemy with a distance of 2 or 3 from me (1 or 2 empty fields in between).
- Check if I can play a first action leading me to the closest enemy root.
- Check if I can harvest resources with a distance of 2 (1 empty cell).
- Check if I can move to the nearest resource.
ā¦ and that was enough to get into the Gold league!
Well done!
Inspired by your previous feats, I also went for a RL approach with a CNN, but I was short on time so I only submitted one model on the last day and ended up 309th. This is still not so bad considering the amount of time I lost on many secondary issues (like figuring out how to setup the self-play env, or the NN compression into 100k chars).
I must say, I faced the same issues as you did and solved them similarly (although not that cleanly). I also had a very poorly optimized training code, and probably suboptimal hyperparams. And of course I didnāt have the search. You being able to get all of this working so fast amazes me.
One thing that I think may help your bot if you did not think of it / try it yet: using a U-net. For me this was much better than a ResNet (and much faster!). It allows the model to have a maximal receptive field, which is probably better than the 17x17 of your ResNet. But I only tried ResNets in the early stages of the project so maybe it is actually better. I also quickly tried a vision transformer, but iām afraid it would be quite slow on CPU.
I also got a faster convergence by pretraining the model on a simple prediction task (predict what will be the state in 20 steps, with my bronze bot playing against itself).
Thatās a good idea, it does have potential, I might try at some point later out of curiosity. I went with ResNet mainly because I had almost all of the related code optimized and ready from my other CG bots. Itās no secret, I get things running fast largely because of all the time I have spent doing it outside of challenges. Thanks for the suggestion and your input, much appreciated.
Hi! Congrats on getting it working! 309th is not bad. Did you have experience building NNs? I always get inspired by reCurse PMs to try myself but I get discouraged by having very little knowledge to begin with.
Thanks! Well I work with neural networks daily - Iām a PhD student in that field - which of course helped me a lot. I also had some experience in reinforcement learning, although this is further from my comfort zone.
Still, I am sure anybody can do it, it will simply take more time since you have to learn stuff on the way. This is something you could do right now, so that for the next challenge you will be more prepared ^^
If I have some advice for starting from scratch, that would be:
- use Python, at least for the training part.
I think ReCurse uses C++(edit: actually they use Python, with some C++ for accelerating stuff like the game simulation), but Python is much more friendly and has many libraries that will do most of the work for you: PyTorch (or Tensorflow) for the neural network part, and Gym + Stable-Baselines3 for the RL algorithms. This is what I used for this project. The only issue with python is for simulating the environment, which is quite slow. - learn about basic neural networks: how they work, how they learn. There are plenty of introductions online, like this one.
- learn about reinforcement learning: understand at least the main concepts (agent, reward, observation, action, environment). Once again maybe try out some tutorials like here and here.
- test a past codingame challenge. I think the Olymbits (Summer 2024) is a good start, since it is very easily castable to RL. And you have reCurseās PM to help out!
- for having a RL bot in a codingame challenge, I think the roadmap looks something like this:
- reproduce the game as a RL environment
- use PPO (for instance) to train a NN on this environment. Ideally we want to use self-play (the NN plays against itself), but you may start by having another opponent (simpler to introduce in the framework, unfortunately self-play is not well supported by the libraries so we have to be hacky).
- export the trained NN in a codingame bot. This step involves compressing the weights of the NN in a (very long) string (too make it fit under the 100k character limit of codingame). Then in the final bot you uncompress these weights, and use the NN to process the game state to output the action it wants to do.
- improve. There are maaaaany things to try for potential improvement. Better hyperparameters, better observations, reward shaping, neural network architecture.
- reCurse also used an MCTS to add a search on top of the NN. I think this is not too important, first focus on getting a strong NN!
This message is getting longer than planned, but hopefully it may motivate some to try it out!
Edit: @reCurse youāre the professional here, feel free to correct me if I missed something ^^ā
Thanks for the detailed response! Iāll try to start with something simple and build up from there. Cheers!
Great post. Only a small correction.
I use Python, PyTorch, etc. for training as well. Itās only the game simulation and specific logic thatās in C++ and leveraged from Python.