Spring Challenge 2021 - Feedbacks & strategies

The topic where you can tell how you liked the challenge, & how you solved it.

As the challenge game will be released soon as a classic game on the platform, please don’t share your code publicly.

Global leaderboard: https://www.codingame.com/contests/spring-challenge-2021/leaderboard/global

Next event: Fall Challenge 2021

Thanks a lot for participating!


Finished 12th


I’m using beam search. It’s a bit off-standard because of the game logic that works in days. So I do a beam search for the current day and make sure all states end with a WAIT action, all while pruning states after each step. Then I prune the final states again and start the next day. Some simplified code:

List<Board> currentBoards = new List<Board> { this };
for (int day = Day; day < 24 && timeRemaining; day++)
    HashSet<Board> endOfDay = new HashSet<Board>();
    for (int depth = 1; ; depth++)
        HashSet<Board> next = new HashSet<Board>();
        foreach (Board c in currentBoards)
            foreach (Board n in c.Expand())
                if (n.Day > day) endOfDay.Add(n);
                else next.Add(n);
        if (next.Count == 0) break;
        currentBoards = FilterBoards(next); // only keep those boards with the highest score
    currentBoards = FilterBoards(endOfDay);
Board best = currentBoards.First();

My beam width is between 150 and 200. What is I take the best 150 child states at first. But I also try to get at least 50 states that had a GROW action on the current day and keep some more states even with a worse score.

Move generation

To cut down the number of possible states, it can help to not generate all of them.
Regarding suns, it’s always the cheapest to COMPLETE first, then GROW starting from the largest sizes and SEED at the end. So I only generate actions in this order (if a SEED is for free, I can later print it first to claim the cell for myself). Furthermore I disallow seeding from size 1 and have at most 1 seed on the map.


That was the hardest part for me. It’s hard to analyze the replays as a human and see what’s working, so I ended up testing against my own bot with brutaltester.

I ended up with:

  • Points + (0.33+0.01*remainingDay) * Suns, taking mine and subtracting the opponent score
  • Simulating the whole game until the end with both players waiting to count the suns at the end of the game. Taking factor * (mySunsPerDay - opponentSunsPerDay) with a factor starting at 0.7 and getting multiplied by 0.98 to further decrease over time.
  • -1 for each pair of my own trees that can possibly give shadow to each other when reaching size 3
  • A bonus for having trees of a certain size. Size 1 is worth 1/11, size 2 4/11 and size 3 is 11/11. These are equal to the costs of growing a tree from size 0 to size 3 without any other trees. I multiply these by (Nutrient - 0.5) * Min(1, remainingDays * 0.09) as the motivation to keep a tree will drop over time. And a multiplier on top, that penalizes having too many trees of the same size.

I made my simulation incorrect by not decreasing the nutrient on each cut tree. Instead I decrease it by 1 on each WAIT action.

Opponent prediction

I consider two opponents. One that is always waiting and another one that grows whatever is theoretically possible (even if the combination of all the grow actions exceeds the available suns).
To predict my suns for the next turn, I then take 2/3 * the suns for a waiting opponent + 1/3 * the suns for a growing opponent.

I tried to use my own search to predict the opponent but the results were on par at best.


144th in Gold (Python)
40 line code (+starter)

I ignored simulations, shadows, future states, board scoring and took bussiness-like/economy approach.

Each action is rated on the basis: score = income - cost
The currency is sun point. RoE: 1 game point == 3 sun points

Computing cost is easy and straightforward:

  • WAIT: 0,
  • SEED: amount of seeds on the board,
  • COMPLETE: 4,
  • GROW: {0: 1, 1: 3, 2: 7}.get(tree.size) + amount of 3-trees

Computing income is harder:

  • WAIT: 0 (and this makes base (red line) as WAIT score is 0 (income - cost)),
  • SEED, COMPLETE, GROW: functions

The whole idea is to find efficient functions. The factors to be included: my-trees-in-line, amount-of-my-trees, size-of-a-tree, nutrients, day (time).
There are two more interesting factors to be included:

  • FV (future value) of all above,
  • lost opportunity cost.

I know that you guys like to write a lot of code (as your are paid for each line :wink:). I stayed with Pareto principle and I put 20% of a possible work to get 80% of the results.

(I should have at least check next round “spooky”)


I think I will finish around the 10th legend rank after the final rerun.

My code is a sequential MCTS with the following feature:

Reuse the tree

Between two turn, I try to keep the tree of the previous turn. It only works when my opponent did only one action. If i used WAIT and my opponent did two or more actions, I just start with a new empty tree.

Save and load states

I tried many things to have the best performances. In this game, the best way I found is to save a part of the game state (tree sizes, players suns, players scores …) and when I need to, I have a load function to restore this specific part of the game. I have no bitboard and I never copy my entire Game object.

MCTS exploration

My MCTS is sequential. So in my simulations, my opponens knows my move. The exploration part of my MCTS is classic. I use UCB formula with an exploration constant of 1.0.

MCTS expand (actions computation)

When I compute possible actions for a player, I prune a lot of actions. I do like this:

  • Never try a COMPLETE before the 11th day.
  • Never try a SEED on a cell next to one of my tree.
  • Only SEED when the cost is 0
  • Never WAIT if a SEED is possible (with the previous conditions).
  • All the condition on SEED are disabled after the 21th day to let the MCTS try to win a “tree draw war”.

MCTS rollout

When I expand a node, I rollout to the end of the game. The rollout is pseudo-random. When I compute possible actions for a player during a rollout, I do like this:

  • Never try a COMPLETE before the 11th day.
  • COMPLETE is tried only when the player has at least 4 trees of size 3.
  • If we are at the 22th day or later, we can COMPLETE every tree we want.
  • Only try GROW when there is no possible COMPLETE (with the previous conditions).
  • Never try GROW if we are above the 19th day.
  • If there is no possible COMPLETE and no possible GROW, try a SEED with the same condition as the possible actions computation (not next to one of my tree and only when the seed cost is 0).

MCTS evaluation

I evaluate the game as follow:

float myScore = me.score + me.sun / 3;
float hisScore = him.score + him.sun / 3;

if (myScore > hisScore) {
    float diff = myScore - hisScore;

    if (diff > 5) {
        return 1.0 + (diff - 5) * 0.001;
    } else {
        return 0.5 + 0.5 * diff / 5;
} else if (myScore < hisScore) {
    float diff = hisScore - myScore;

    if (diff > 5) {
        return -1.0 - (diff - 5) * 0.001;
    } else {
        return -0.5 - 0.5 * diff / 5;
} else {
    if (me.numberOfTrees > him.numberOfTrees) {
        return 0.25 + myScore * 0.001;
    } else if (me.numberOfTrees < him.numberOfTrees) {
        return -0.25 + myScore * 0.001;
    } else {
        return myScore * 0.001;

What didn’t work

I tried a MCTS DUCT with simultaneous move. The performances was horrible. And the loss of performance was too high and not compensated by the “intelligence gain”.

A smarter rollout : Everytime I tried to add more feature to the rollout selection, I was better in local (tested with brutaltester) but worse in the arena. Overfit nightmare.


The contest was nice. I’m happy to be so high with a MCTS. I hope I’m not the only MCTS in the legend top 10.


BlitzProg - 490th

Quite possibly the most try-hard I’ve been in all of the contests I’ve been participating in - ever. We’re talking about 50-60 hours of the most inefficient and frustrating coding I’ve ever packed in a week. My immediate feeling is something of being completely and utterly deflated… Which is pretty sad because the overall concept was interesting and the game assets were really cute.

Wood 2 to Bottom Bronze: Complete my trees, easy enough!

Bottom Bronze to Gold: Some testing heuristics with a lot of if/else, and a lot of programming errors resulting in my AI performing much better than it should. In a blink, I was in the top 50. The above took some hours of coding and was completed during Sunday night.

Gold to Higher up Gold: I’ve made a very late fix (like 30 minutes before the end of the contest) - removing obvious commands such as trying to grow a complete tree by the end of the game, and a less obvious attempt at economy flexing by waiting out before completing trees when the opponent can’t score much any more and I have better economy (for extra sun points) (example : https://www.codingame.com/replay/557478930 )

  • The current PHP script I use take the set of commands and attribute score to them. Namely, favor seed placement with bonus tiles but apply a large penalty if near a tree I have, and even larger if fully grown. This also helps deciding which tree should be grown or cut first.

  • Complete, Grow and Seed always are used in this order of priority, the heuristic is here to find which of the same command should be used.

  • Grow commands are strongly favored the more the tree I’m trying to grow is itself expensive to grow. (I’m trying to describe what one of my glitch actually does here… It was supposed to favor cheap GROW commands but I realized much later in the contest this is what it does instead)

  • Complete commands are simply made whenever I exceed a preset cap of tree for a given day, and remove tree with heuristic over shadow and richness to figure which is bet to cut first.

Things I tried:

Improved heuristics, correcting bugs and fixes - Several versions of my PHP code were made, such as move ordering, and avoiding useless grow/complete near the end of a game, but always ending up much worse than the original.

Monte Carlo. Prepared a full SP version of the game for simulation in C++. When the MC part came in, I could not figure any decent heuristic that would get my AI to play anything better than a simple random bot.

Genetic factor search with Brutal tester - Magus wrote a nice java app allowing you to test offline, so I took my original AI and spent a day coding some Genetic approach to tweak my factors over several hours of testing. I eventually gave up because it couldn’t yet again find anything that would score anything better than my original AI.

Monte Carlo Tree Search - This is what gave the most hopes at some point, sometimes scoring in the high 90s with some awkward combinaison of moves. After pruning the move search and improving it, I eventually realized my sim would never be fast enough to stand a chance in the Gold league. After tweaking without many idea of what I was doing, I eventually realized this was probably not going to be of any help.

Opening book - Looking at the best players, which moves were most often performed at the start of the game? I have tried using some opening books for my Heuristic AI, only for it to be squashed by my sunday PHP script that didn’t have any.

My Monte Carlo Tree Search being slow is mostly due to my poor experience in building optimized AI, even though I’ve put some thinking in the simulation.

For example my state was a bitboard made of 5 Uint64 and 8 Uint8, conveniently small for fast copies, but I had a lookup table on the side to find cell neighbors because I didn’t come to my mind it would be possible for the hexagonal field to be processable with bit-shifting. Clearly, after reading a few post-contest strategies, I didn’t think enough.

In the end, I scrapped up everything and just sent over my Original AI in hope to reclaim the position it first had, before I fixed its flaws and sent the new one over. Hence my very late push.

Perhaps this is a sign I am meant to stay in the gold league and have reached my limit, and cannot perform any better?

Thanks for the contest and congratulation to everyone, some AI are really outstanding! Looking forward reading the post mortem of other players!


Magus, how many plays can you do at the start of the game?
Don’t you have hardcoded the first turns like all have done to wait and grow the two trees first or your MCTS is stronger enough to do that for you?

~#18 here, and I have the same approach as Magus. The score used in my MCTS is the difference between the two scores of the players at the end of the rollout, divided by 10 (to have coefficients not too big).
For the heuristics, it is almost the same. I do not have these sequential rules (no wait if a seed). For the rollouts, as waiting is often bad, I tweaked the randomness by forcing the WAIT action to be picked twice less than the other actions, and on day 23, do all the possible actions before WAIT.
I think I would have performed better if I had more rollouts, I am doing around 1700 full games in the second turn, but this is probably not the limiting part. Improved heuristics could maybe perform better.
To conclude, I wold say that having good rollouts is more important than having a lot. This is what some people called ‘dummies’, and the better your dummy, the better your evaluation, the better your MCTS.
I really want to thank the french chat that helped a lot on many implementation question I had, and for all the fun during this week !


My AI output the number of expand/rollout (the first number) at each turn.

I didn’t hardcode the first turns because I reuse the MCTS tree at the next turn. So I let the MCTS computes what it must compute.


Finished at 300 Gold.

Bronze to silver: Sort all moves by tree’s index and print the first move, self shadow avoidance.
Silver to gold: Minimax at depth 1. I assume opponent always grows his best tree that counters whatever move I make. I tried to switch to Genetic Algorithms for a better performance and deeper search but the results were worser than minimax. So I switched back to minimax.

my final eval look like this:

eval(playerId) {
	if (day == 23) {
		solution.score = 10 * (players[playerId].score + players[playerId].sun / 3);

	if(number of trees of size i exceed a hardcoded cap) {
		score -= capPenalty * (numTrees[i] - cap[day][i]);
	if (day < 22 && I'm cutting the last level 3 tree) {
		solution.score -= cuttingLastTreePenalty;
	else if (playerId == 1 || (day > 10
		&& (day + tomorrow's recovery + current sun points >= 35))) {
		solution.score += 3 * players[playerId].score + players[playerId].sun;

		for (int d = 1; (d <= days_depth) && (d + day <= 22); d++) {
			solution.score += recoveryMultiplier[d] * recovery on dth day from today;
	else {
		for (int d = 1; d <= days_depth; d++) {
			solution.score += recoveryMultiplier[d] * recovery on dth day from today
		solution.score += for each tree pow(priority[pos.y][pos.x], 3) * tree size  * dayMultiplier[d]; //divide this by two if enemy shadows this position.
int cuttingLastTreePenalty = 10000000;
int recoveryMultiplier[6] = { 0, 10000, 5000, 3000, 2000, 1000 };
int dayMultiplier[6] = { 0, 5, 4, 3, 2, 1 };


I spent most of the contest working on Neural Network code. I thought a Neural Network was possible and would therefore dominate the competition. However, I was unable to get good convergence of the network.
The game had a state which could be represented by only ~45 numbers, or 341 if you used 8 planes for the two players and the 4 possible tree sizes. The game had too many possible actions (37x37 SEEDs+37 COMPLETE+37 GROW+WAIT) which made a straight-up AlphaZero implementation difficult because AZ learns a policy over the actions. A simplifying idea would be to learn only a value function.
I will mention two of the main approaches I tried:

  • Pseudo-AZ: Have an AI using the NN-Eval play itself and learn an ever improving value function V from the (state,game_outcome) database. For example, I had a DUCT using the NN’s eval play itself.
  • Bellman equation: Inspired from Q learning, have a epsilon-greedy AI play itself and create a database of states. Then apply the Bellman equation to learn via temporal difference learning. For every state you can calculate what the value of that state should be as a function of the estimates of the values of the following states. For a state S, players have N0 and N1 actions. For all the pairs of N0*N1 actions you simulate the following state S’ and fill up a payoff matrix using the NN’s eval of the S’ states. The payoff matrix can be solved with fictitious play to produce the optimal policies for both players. Then the value of the state can be calculated with the sum of policy0[i]*policy1[j]*payoff_matrix[i][j]. The learning signal for the NN is then the difference between the estimate V(S) and the estimate calculated from all the following states S’.

Once I have a neural network, I submitted it to CG wrapped in a DUCT search with only the network coefficients compressed in base65536 unicode within a clear text cpp code to not violate obfuscation rules. Coding this took a bit of time as well.

For a neural network code, my ranking is very low, and I attribute this to a poor evaluation function. I found bugs but I suspect I still have bugs or simply an unsuitable RL algorithm to properly learn a good V function.


Around ~100,
just a iterative fail soft alpha-beta with selectiv moves for the 16 first rounds and an evaluation computing an evaluation of the final score (i.e. the number of suns that all trees will see and try to compute the order of the forthcomming COMPLETEs).

Selectivity as others, just trying to remove as most SEED as possible… And in the end only generating free SEEDs is no GROWs are generated. (Push me from gold to legend…)

Also used a 300 000 entries transposition tables.


I’m probably going to finish top 30 with a neural network based bot. As with the other NN bots I’ve done it’s heavily alphazero based which means each time the NN gives an evaluation of the current position (value) and a policy value for each possible move (policy) to help the MCTS pick good moves to explore.

534 inputs (528 binary 1/0 and 6 floats):

  • Binary inputs:

    • 37 for blocked cells (richness 0 / trees)
    • 2x (One set for each player, current player first):
      • 37 x Dormant
      • 37 x Tree size 0
      • 37 x Tree size 1
      • 37 x Tree size 2
      • 37 x Tree size 3
      • 37 x Sun Count (1 in the position for current sun count - if more than 37 sun then just set to 37)
    • 24 x day
    • 21 x nutrients
    • 1 x Current Player Sleeping (thinking about this after I’m not sure why this is here as I would never evaluate a state where this was true?)
    • 1 x Opponent sleeping
  • 6 floating point inputs (3 per player):

    • Score / 100
    • Sun / 100
    • Total Sun Gathered / 100


  • 1 value (tanh) for win chance
  • 38 policy outputs (softmax). 1 for each hex and 1 for wait. I tried having more policy outputs to differentiate between complete/grow/seed but found the simpler version worked better. Seeding source was chosen deterministically based on the trees available.

Two hidden dense layers of 96 nodes with relu activation.


Sequential MCTS without rollouts. Instead the NN is evaluated with the value backpropagated. Exploration constant is based on the policy rather than a fixed global value (see alphazero papers for more details).

Gamestate was only updated after both players had chosen a move, but I allowed my player moves priority on the submitted bot (eg if both players tried to seed I’d simulate it as mine seeding and the other bot failing, or give me the full nutrients and the enemy nutrients - 1) as allowing the enemy to always chose to exactly counter my move was causing problems in my search.

I did some pruning to cut down number of moves on my bot :

  • No seeds on day 19->22
  • At most 2 seeds
  • On day 23 only have one seed target per turn(doesn’t matter where, just to increase tree count).
  • No growing on days 20+ unless it can get to a size 3 (eg on day 21 we only allow grow on size 1/2).

Training was done via self play - my final submission was based on 700000 games after I restarted on Sunday. Games were run on batches of 2000 games with the last 3million states (data from a single turn) uniformly sampled for training inbetween batches. This meant the network was trained on games from about the last 15 versions of itself (so it doesn’t overfit to new stuff).
The policy targets are the visit counts from the root MCTS node and the value target is the eventual result of the game (-1 for loss, 0 for draw, 1 for win).

I was pretty convinced I needed to do something better about simultaneous moves so spent Wednesday onwards trying to get a DUCT based version working. It was getting close on Sunday (could get into top 100) but never got to the same strength as my sequential version so I gave up and went back to trying to tweak my older version.


I made a MC.
I remapped the cell indexes to a 64-bit integer like this during I/O:

25 24 23 22 ** ** ** **
26 11 10  9 21 ** ** **
27 12  3  2  8 20 ** **
28 13  4  0  1  7 19 **
** 29 14  5  6 18 36 **
** ** 30 15 16 17 35 **
** ** ** 31 32 33 34 **
** ** ** ** ** ** ** **

This allows all trees to be processed at the same time by doing binary shifts of
- bitBoard << 1,
- bitBoard >> 1,
- bitBoard << 8,
- bitBoard >> 8,
- bitBoard << 9,
- bitBoard >> 9.

This is useful both for the calculation of the list of possible actions, for the calculation of shadows and sun points.


Place: 25th in legend.

My bot is constructed around dynamic programming. If days of opponent tree cuts are fixed, shadows mechanic simplified to energy gain = e - e / C, where e = t[1] + 2 * t[2] + 3 * t[3] - maximal energy gain, t[] - numbers of trees of given size, C - const, and all richness is equal to medium reachness, then the game becomes the solo optimization game. Which can be entirely solved with dynamic programming. Naive way would be to construct multidimensional array day x nutrient x t[0] x t[1] x t[2] x t[3] x energy and to fill score values in it with day < 24, nutrient <= 20, t[3] <= 5 and so on … but this array contains millions of elements and can’t be filled in 100ms turn. Instead I have multidimensional array day x nutrient x t[0] x t[1] x t[2] x t[3] (actually day x nutrient x code and apply move generation and other things directly to “code” in cached fashion) filled with lists of pairs of energy and score. These pairs constantly are filtered from dominated ones. After heavy optimizations and a few tricks I was able to solve solo optimization game in around 30 ms or less in any position or set of starting positions (time not really differs between 1 starting position and 1000 starting positions).

Of course this is heart of my algorithm, but not entire algorithm. In almost any game position I simulate current player 2 days ahead with opponent making no moves and then apply dynamic programming to all children depth 2 at once. Globally, I apply all aforementioned to play as opponent against my dummy cut timings and then as myself to play against already calculated opponent’s cut timings. Also some heuristics are added as modifications to children’s score values.

This contest was fun, good job everyone! :slight_smile:


Place ~10-15 in legend


I use recursive principal variation search with iterative deepening, hash table with zobrist hashing and principal variation table to check potentially good moves fist. Principal variation table is just a table where I store last best move indexed by recursion depth. The idea of principal variation table is that best move doesn’t change much based position, but I’m not sure how well it works here. It least it doesn’t make things worse, and very easy for implementation.
Minimax doesn’t work well if in recursive descend it gets to different days because it will try to force day end early to be able to make actions on the next day. So I consider fixed number of moves per day (2 or 3 moves for each side, works equally good) and then go to the next day. This basically means that I switch days on a fixed depth in recursive descend.

Estimation function

Position is evaluated by function that finishes the game without players moves. If initial day when we started minimax is 12 or higher it will harvest all possible trees in the last day sorted by richness when it’s profitable. Last day harvesting is simultaneous for both players. The final result is calculated as score * 3 + sun.
I tried different ideas to improve estimation function, like considering trees with size less than 3 at the end or growing all possible trees each day based on greed strategy, but nothing improved the play.

Moves choosing

In minimax function I consider only GROW and COMPLETE moves. COMPLETE moves are considered only if start day 12 or higher. By start day I mean the day when we start minimax search, but not the day where we get in the recursive descend, otherwise it will give different results for different calculation depths.
Moves are sorted by shadows in the next day - for COMPLETE moves shadowed trees will be considered first, for GROW moves trees that will outgrow the shadow.

Seed moves

The effect of seed moves is deferred for a few days, so minimax doesn’t work well on them (or I just failed to write good estimation function). So I use a simple strategy - throw a new seed if it’s free. The exact place is chosen by shadows map - how many shadows will be on a cell till game end and how many ours/enemies trees will be shadowed by a seed if it grows each day.


I will probably finish somewhere between 100-150 legend. I ran with a pretty standard MCTS, because I didn’t think I would manage a decent eval. However, I had to cut some corners to get decent enough performance for it to work. Here are the specific things I did:

Very standard for a MCTS, with an exploration constant roughly tuned to 0.95

I allowed for way more things than others, that’s definitely something I could have improved to get more performance because reading other PMs, I realize I had branches in my tree that just don’t make sense.
My only rule was don’t SEED if you’ve already got two, and never SEED next to your own trees.

Here I really cut corners. First of all, no SEEDs in rollouts, and rollouts limited to the next 10 days.
Basically what this does is tell me if I would win the game if it were to kinda stop right now.
Obviously this is less than ideal and probably explains why I didn’t rank higher.

Very standard: -1, 0 or 1 depending on if the game is a lose, draw or win according to the rollout.

Tree reuse:
I reuse my tree as long as I can find the move the opponent played. As long as the opponent didn’t play more than one move or one move plus a wait, I’ll reuse my tree. Otherwise I will start fresh. I thought of exploring more to find the right node when the opponent played multiple moves while I sleep, but it didn’t look like it would bring much.

Other than that I only had the usual precalcs for shadows and seed ranges.

I’m very happy I managed to get to legend with a MCTS, it wasn’t obvious given the necessary adaptations for simultaneous play. I think some more pruning in the possible actions and some more precise rollouts would have gone a long way retrospectively.

Great contest again, see you next time!


Hello, here is my postmortem (finished around #30)

The Beam Search

I implemented a beam search exactly as described by eulerscheZahl and with a similar method for filtering moves. My eval consists in :

  • my sun - my opponent sun
  • a pondered sum of the sun income difference between my bot and the opponent for the following days,
  • the score earned,
  • a malus for trees under the shadow of my own trees,
  • a malus if the income is under a certain threshold, fixed for each day

The opponent is simulated in a very basic way:

  • every opponent tree grows every day (except the first simulated day if it is dormant or his sun is too low),
  • the opponent has a “free” complete everyday starting from day 12, reducing the amount of nutrients.

Late gold I started to manipulate the overall strategy of my bot by modifying the income threshold in the eval and the estimated nutrients for each simulated day.
I thought it would be possible to adapt this strategy depending on what my opponent plays :

my opponent completed a lot of trees early :
-> nutrients will go down more slowly as his sun income is lower, so a long term strategy involving a high sun income might be more viable

my opponent has an excessive amount of size 3 trees and did not complete much
-> nutrients will go down abruptly near the end so completing fast might be a good idea,

I’m actually not sure if this is a good idea, as I did not experiment with this much.
I went another way:

The DUCT experiment

As legend opened, I tried to implement progressively to a Decoupled UCT. The top 3 on the previous challenge did it with great success, so why not ?

But I slightly underestimated the work required to make it functional. Mixing different algorithm might be strong, but it is time consuming.

After two days of coding, my DUCT version (when activated from day 15) was performing as well as my beam search in the leaderboard, which was slightly frustrating.
So in the end, I reverted back to my beam search and fine-tuned a couple of magic numbers.


Pros: I learned something new
Cons: I won’t have a new T-shirt

Congrats to the winners, thanks to CG, it was fun !


Question for NN bots:

Have you considered using current game state + every possible move at a time to score it?
You are given the moves as input each time the state changes, you can score each move individually.

About my Bot:

Chokudai’s version of beam search with random heuristics. This is what got me best results.

My eval function wasn’t suitable for most part of the contest, didn’t focus much on them either, regret not starting with a heuristic bot instead. Each time I promoted it was due to some bugs in my eval actually, so I thought I’ll just switch to some random heuristics.

Early Sunday was at around 1000 in gold, new bot went up to 300, didn’t bother checking since then, but it proves a point.

It wasn’t my favorite board game and I didn’t find it very intuitive or fun, but thought i try some new things out, which is what kept me in the contest. CG still offers one of the best platform for contests, if not the best.

Switched a few times between standard Beam Search and Chokudai’s version and I think Chokudai’s version is great because it’s a far better mix of exploration and exploitation.

Evaluating opponent didn’t do much for me at least. So at most I evaluated WAIT. COMPLETEs were useful to evaluate, but I didn’t include this in my final bot. It’s just strange how much better things went when certain parts were left out entirely.


Thanks for the write up, super interesting approach!
Just a few questions:

  • On what architecture did you do the training? Even at only 5 seconds per game, 700,000 games is a cumulative ~40 days of game play, so you’d need a lot of cores to get it done quickly.
  • What structure do you use to encode your neural network’s weights? 534 inputs + 2 dense layers with 96 nodes + 38 inputs is a lot of weights to encode (probably over 60,000, so needs good compression to cram into the 100ko CG limit!)
  • Why encode the day, nutrients, and sun count in binary rather than float?

I’ll finish 10-20 legends with a “staged” BeamSearch for continuing actions of the current day and waits, +/- the same as eulerscheZahl.
I have “regular” actions pruning as other described, nothing fancy.

I tried to help my evaluation with some arbitrary day limits (12 & 15) where I drop the expected suns bonus, seems like it was good with the meta.
I struggle a lot with the coeffs, i’ll work on the use of (existing) tools to find best parameters next time…

For the endgame (day > 20), i simulate a greedy opponent that complete as soon as possible, that help me a lot in the end.

performance wise, I have around 40k actions in 50ms (that’s not a lot, but I’m using java, so …)