Spring Challenge 2021 - Feedbacks & strategies

rafaelSorel1 · May 17, 2021, 7:19pm

Thanks for the feedback though, what was the search depth per day and the overall average of processed nodes for the 6 days ?

chabes · May 17, 2021, 7:33pm

Finished 11th, always disappointing position.
But ok, I was hoping to be within top 100

Search

I used beam search while maintaining all nodes on the same day even if it meant waiting both player in some situation. This was to ensure that all evaluations are comparable.
The beam width was readjusted after each turn trying to achieve a simulation depth of 25 moves (including the wait), and with a minimum width of 75. This way, at the end of the game, I could enlarge the width up to 5000+ while always reaching day 24.

For each node, I first evaluate one turn ahead the possibles moves of the opponent (with myself waiting) and choose the best move for him. Then I search my different moves considering the opponent always play that move.

While searching, within a day, I always play move in a certain order : COMPLETE > GROW > SEED, GROW 1 > GROW 2, etc.
As described by orthers, I should have sorted GROW by tree size (reverse) and not target cell.

I rebuild the full search at each turn.

Moves pruning
I only allow one seed one the board except on day 23 (some chance to win in case of identical score).
I disregard COMPLETE before day 7
I disregard SEED if the target is right next to one of my trees, unless the map is already very full.

Scoring

score points
sun points * (0.23 --> 0.33 while days progress)
for each tree, a probability to cut it before end of game (depending on tree size and current day) * (-1.33 + nutrients*0.5 + bonusRichness()). The -1.33 is to take into account the cost in sun points of a complete.
a simulation of the sun points to collect until the end of the game while waiting, * 0.33

adding all that for myself and substracting for the opponent

struct · May 17, 2021, 7:35pm

The search depth per day is based on the number of actions I can do on that turn, on middle game the days are longer.
I calculated around 1.5mill states in 90ms.

Fluxor · May 17, 2021, 7:49pm

Hi all, I ended up #33 in LEGEND with beam search.

I think this is my first PM, but since reading other PM is the most enjoyable (and instructive) part of the contest, I feel compelled to write my own.

Algo
My origin plan was to start with MC to investigate game simulation easilier, then to integrate into MCTS, and then maybe go for D-UCT depending on the MCTS result.
But the huge branching factor (and the lack of time with 2 children) quickly made me change my mind, so I went for what is now called “staged” beam search with 3 queues (current, next intra-day step, and next day).

Sorting and pruning happens only in the next day queue. The intermediate steps are fully explored (exploration is equivalent to BFS). I could afford this because I had efficient game representation and simulation code (since I originally planned to do billions of rollout in my stillborn MCTS).

Move selection
I really do not cut much when picking next move:

I can seed anytime from any tree, but with the condition that the 3-sized shadow from the seed does not cover any of my existing trees. This is very defensive and probably loose attacking opportunities, but at the same time it gives me full flexibility in later grow orders.
I do not try to order anything at all, I just allow all valid actions in any order at any time…

At least I discard sequences of moves that result in the exact same [trees, dormant, score, sun] state, though it is a pity, as different shadow order could lead to different evaluations. So I think that generating moves in proper order (complete-> grow 3, 2, 1 -> seed) like most did is better…

Evaluation
My evaluation uses the waiting and growing opponents as described by @eulersheZahl
It is based on the following big tuple:

3 * score + sun
potential shadow over growing opponent
richness (weighted by tree size)
number of trees (weighted by tree size)
sun left
potential shadow over waiting opponent
guaranteed production against growing opponent

These are distinct integers values in a tuple.
I compare these one by one, looking at the next only in case of equality.

I did not try to mix them with magic constants because I hate spending hours searching for “right” magic values (yes, I love spending hours searching for the right order instead…)

A little anecdote
After my MC attempt, I forgot to remove a “if both player are waiting, switch to next day” from my applyMove method.
Because of this, I was switching days twice after each of my WAIT in my simulation… but only when my opponent was waiting and I was prompted to play alone!
It took me 2 days to find out why all my heuristic improvements made things worse: each time I was getting a sun advantage, forcing my opponent to wait, I started to play non-sense (but non-sense which is close to impossible to detect for mere humans…)

Conclusion
I miss the t-shirt again, and Amadeus looses by 4 points to Thales (congrats to them)… but I am still happy with the result. I really enjoyed the game, which is much more complex that it looks.

Congrats to @reCurse for the lesson… and to @Saelyos for winning the non-NN parallel contest.
And thanks to the CG team for a great contest again!

Illedan · May 17, 2021, 8:12pm

Lower part of legend as my bot never managed to climb back up to < 100, even with the same code. Guess it was too random.

I did create a GA within the first 24 hours and only played around the Eval for the remainder of the contest, as this contest screamed Beam the moment I looked at it, which felt too close to the previous contests and didn’t spark a joy. I hope the next one features interactions

Also learned this contest that the restrictions on IDE games were changed.
Which blocked my IDE playing on 2 days ( even the last night ), breaking my main source of fun with param tweaking as I’m used too Kinda sad this wasn’t published more broad before the contest, as it changes how people play. It is fine when you know it, but not when you expect the same setups as previously…

Anyway, most of the time was used with: https://twitter.com/erik_kvanli/status/1388841498955948032

GG to the winners and thanks to CG for providing the best platform!

Saelyos · May 17, 2021, 8:25pm

It’s not specific to DUCT, It’s something I implemented for this game (even if I’m not an expert of MCTS/UTC so I have know idea if it is something classical or not). The idea is that in this game the winner is determined by score difference and not by specific conditions (such as in UTTT for example). So in order to make moves less risky, it makes sens to try to reach the maximum score difference, otherwise the opponent could up come with a solution that I either pruned or with a specific sequence of actions that the MCTS don’t anticipate (which did append against Beam Search opponents who find an optimal way to optimize their score in late game), and if the MCTS was only targeting +1 score, it will be enough to lose.

For mid-game evals, it also makes sens to give an evaluation between 0 and 1 because you don’t know the winner at this point. The exploration parameter c of UCT needs to be adjusted appropriately.

Not completly I also had sometimes 20 possible seeds (but it was unusual). But an important part of the pruning is also the fact that I only allow seeding if there are no seeds, so even if I might have 400 children at some point if both players have no seed in play, it will only be at a specific depth and not for all nodes. Also remember that with DUCT you don’t have to explore those 400 children : once each player has tried his 20 actions, it will redo one of his actions based on UCT formula, exploring often some of the 400 children while other will never be visited.

reCurse · May 17, 2021, 8:39pm

History

Where to begin and how to explain? Maybe with some history. Because in my case, this is not the result of mere 10 days of competition, but rather years in the making. It all started in 2019 when @pb4 and @Agade broke all CSB records by submitting, to my knowledge, the very first overwhelmingly dominant bots based on neural networks (NN) and reinforcement learning (RL) at CG. I remember being completely floored by the achievement, and I am so thankful for all the help and pointers @pb4 generously provided to get me started on my own journey of machine learning (ML). For the next two years I have read dozens of papers on the subject, failed hundreds of attempts and made a few things work along the way (Coders Strike Back, Bit Runner 2048), as well as applying ML for a project at work.

Fast forward to around last Christmas, when I started building a new RL pipeline inspired from Deepmind’s AlphaGo/Zero. I like naming things but had no inspiration until this contest, so I am now calling it “Totozero”. It began, as usual, with having way too much curiosity and free time on my hands, a general desire to learn more ML, as well as accumulated years of frustration / hatred at being unable to gain a significant edge with my bot at Ultimate-Tic-Tac-Toe. At first I only wanted to finally see what near-perfect play was like, even if it could not run on CG. But one thing led to another, and after months of development and optimization, finally got something to work and briefly took the #1 spot.

It is a lot of hard work, but the payoff is so high when you get something to finally learn on its own, I could not stop there. Soon after the pipeline became more flexible and able to easily plug other games into it, such as Breakthrough, Othello and Checkers. Results were generally very good and achieved in a short time, though not as dominant as initially hoped. However the motivation faded soon after the first results, as I was much more interested in the process than the games themselves. (BUT if I get to chess one day…)

Contest Development

Given this context, when the contest started, the game rules created the perfect storm: board game, 1v1, perfect information, state with a simple representation, large state space, complex evaluation, 100ms per turn… I could not have asked for a better fit. Having all the ingredients and tools ready for RL, all that was needed is: code the simulation, plug it in the pipeline, figure out the right way to do training, pick the right neural architecture, let it run and do more tweaks along the way. Oh, and make it fit under contest constraints, which are more strict than the multiplayer games (100K source code, no “obfuscation”).

It still took a few days before finally starting to get results. First tests in IDE were disappointing, maybe around top 100 at the time? For some reasons, I initially thought having a “single-player” Monte Carlo Tree Search (MCTS) where the next state is obtained by simply predicting the opponent’s next move with my own chosen move would be enough. Maybe because I assumed opponent interaction to be minimal and therefore mostly ignorable by focusing on maximizing my own score. It turned out to be an awful direction. Self-play training hit difficult problems where it would often collapse into both sides WAITing most of the rounds. One side thought it was winning for sure so ended WAITing to accelerate the end of the game, which caused the move prediction to disproportionately output WAITs, which then made the other bot think it was winning as well because it assumed the opponent would WAIT and did minimal adjustments, so ended up mostly WAITing as well, which turned into a feedback loop with a side of self-fulfilling prophecy and matches averaging 30 turns. Even after adding hacks left and right, the results were still unconvincing.

I was afraid of biting the bullet of exponential action space by considering the opponent’s moves, but it had to be done. Enter Decoupled UCT (DUCT) into the mix, and at the end of the first training it… worked so unreasonably well.

No seriously, it was really hard to find losses against the top bots at the time. I have participated in competitions for years and it just never happens, it felt a bit surreal. To the point where nothing afterwards felt nearly as groundbreaking. Sure, a lot more time was still spent on dozens of fine-tuning here and there, while letting my GPU run hot like it’s in a cryptomining farm, and it did end up gaining around +600 self-play Elo rating throughout the process, after it was already hardly losing. This was measured by keeping a local arena with dozens of checkpoints playing against each other, acting as a sort of weak validation the training is not stagnating or regressing. I used Ordo to compute ratings, which is a great little tool for that. Sure, the arena does not confirm the bots are not stuck in a pointless little meta of their own, but that’s the gamble of self-play. The RL setting seems to help counteract a lot of the homogeneity problems that occur in local arenas like that, where the performance is often meaningless because of local exploitation / overfitting.

Silver League

My confidence was high when finally submitting my first version from the Bronze league on Saturday night, before it got hit by multiple losses in the Silver league.

Wait what?

After a bit of panic, the cause turned out to be something hard to believe. It was losing because the opponent was playing… too badly?!

By the midgame, the bot was getting such impossibly big leads, never seen during training (and not as visible during my testing in IDE against legend bots), that it caused the neural network to become complacent and incredibly unhelpful. It saw every single move, even after going through multiple plies, as winning with over 99.9% certainty. But convergence to a good move cannot be achieved without some contrast in the scoring signal to tell what’s good and what’s bad. Nope, “there is literally no way to lose from this position”. Except it lost by more or less playing random moves until the situation got sometimes too dire to recover from.

There comes a point in every programmer’s career where the need to “ship it” takes over everything else, and thus a disgusting hack was born. If the NN returns a too high win (or loss) percentage, the bot takes the state, computes the final score delta between self and opponent as if the game ended right there, and… multiplies that by a win percentage, adding it directly to the result. Yes, that means nonsense like 110% or even 200% winrate could be backpropagated. At least it’s a signal that converges into “winning more” and finally convert the game. If it works, is simple, gets the job done and you need to ship now… Well, ship it!

It turns out this was also sometimes helpful even in Legend league. It took a bit of tuning to get right, but it was definitely beneficial. I did sense a bit of irony plugging in magic hardcoded constants, old fashioned style, into something meant to not be needing it in the first place, but hey that’s life. Looking back now, I have several ideas that could more properly fix this issue, but that will be something to explore with much less time pressure.

The Devil in the Details

For good or (probably, mostly) bad reasons, I am not comfortable publically exposing too many details, so I apologize for that. Still, hopefully this will satiate most of your curiosity.

Bitboards were extensively used to simulate the game. Mainly motivated for training performance, not bot performance.
I only used minimal pruning, prefering to give more freedom for the AI to learn on its own. Only one source of seeding was allowed per cell, and it favors the biggest tree to do it. No completes on last turn if it lowers the final score. No grows on last turn. No seeding at all on last round since I did not think it should matter, something I was very wrong about.
As a result of minimal pruning, I have no idea if my bot violates common knowledge like “do not seed next to your tree” on purpose or by mistake. To be continued.
Only a few 1000s of states are evaluated on average each turn. This is often the source of bad plays in endgames where more bruteforce approaches would be strictly better.
The neural network is made of “many” hex-convolutional layers stacked like pancakes.
Its input is the hex grid stored in two dimensions with “many” channels per cell.
Its outputs are comprised of one value [-1,1] for win/lose confidence and two policies, one per player, to accomodate for asymmetry and simultaneous play.
Inference was done through extensive use of C++ templates, AVX2 intrinsics, and verification through disassembly.
Evaluations are cached in a hashtable and reused throughout the game.
One of the most common misconceptions on CG is that NNs are very hard or impossible to use because of the size limits. However only 71% of the maximum allowed source size was utilized to compile my bot. “100K ought to be enough for anybody!” (see encoding techniques already described in this thread, though personally I prefer bit streams :P)
As a sidenote, IMO the real constraints on CG has always been the low runtime on CPU before the code size. But I have often compared Codingame as being the demoscene of AI. I find having to make the best out of very limited (even if maybe arbitrary) constraints to be a beauty in itself.
The final model played approximately 1.3 million games in self-play for over 50 hours of training, using an Intel i7-8700k, a GeForce RTX 3080 and 64GB of well-utilized RAM. I lost track of the failed runs, and did not include the best model I ended up not using for safety reasons (only +45 elo), but the main one is where the vast majority of training went.
The local arena contained 85 checkpoints and played 189k matches in total.
For those concerned about the asymmetry of “AI hiding”, believe me or not I have run zero batches during this competition. Only maybe a couple of hundreds of plays in the IDE as random validation that things did not go wrong. I am however, regardless of that, still extremely opposed to CG crippling batches permanently as of this contest.
A note to CG for future contests, please make the mapgen code in a way that is easy to code in other languages than Java. Had to hardcode the result of that line, was luckily constant regardless of the seed.
Speaking of map generation, my NN is very unhappy with some of the maps coming out of the generator, because of the sun asymmetry. Sometimes it even estimated its chance of winning at 20% before the game even started…
Probably forgot to mention something.

Thanks to Codingame and the community for another great competition! I sincerely hope this will remain one of the main focus of this website for many years to come.

pardouin · May 17, 2021, 8:59pm

We were looking for a cool name for this contest but couldn’t find anything really catchy.

But now that’s it : TotoZero. Hands down.

chabes · May 17, 2021, 9:09pm

I have read everything. But I didn’t understand everything. Well done anyway, and thanks for sharing

GaborZeller · May 17, 2021, 10:00pm

Rank: 251
Language: TypeScript (transpiled to JS)

It was a great competition, many thanks to the admins and everyone involved in organising it.

General Strategy

Unlike most people who were using trees, neural networks and such I have used a hierarchical utility based approach.

Managing Game State

There is a very good article from redblobgames about hexagonal grids that helped me to understand how to work with an axial coordinate system so I ended up using matrixes, coordinates, vectors and dictionaries that mapped between cell indexes and coordinates for faster performance.

Utility functions

There are lot of great pieces about the topic, but the gist of it is that you have something called a utility function and after you pass into it a set of variables (in my case the entire game state) you get back a normalized value between 0 and 1 that is a representation of how valuable/desireable the action is.

So the order for a utility function is:

Clone the current state
Apply the action (e.g. SEED 7) to the cloned state
Calculate a normalized utility between 0 and 1 of the modified state for each utility function
Average the results

Example for grow:

const clonedGameState = cloneGameState(oldGameState);
const newGameState = applyActionToGameState(clonedGameState, playerAction);
const utility = average([
	calculateRelativeSunProducedForHalfCycleUtility(newGameState),
	calculateRelativeSunProducedForFullCycleUtility(newGameState),
	calculateStopGrowingTreesAtTheEndUtility(newGameState),
]);

Example for seed:

const clonedGameState = cloneGameState(oldGameState);
const newGameState = applyActionToGameState(clonedGameState, playerAction);
const utility = average([
	caluclatePreventSeedingTooEarlyUtility(newGameState),
	caluclatePreventSeedingAtTheEndOfGameUtility(newGameState),
	calculateMapCellsControlledUtility(newGameState),
	calculateAvoidCramnessUtility(newGameState),
	calculateAvoidSpammingSeedsUtility(newGameState),
	calculateAvoidCastingShadowsOnOwnTreesUtility(newGameState),
	calculateSeedAreasWithRichSoilUtility(newGameState)
]);

Hierarchical utility

As you can see the above examples are action specific, so in order to make this work I grouped together the actions to action-sets at each tick.

[WAIT, COMPLETE X, COMPLETE Y...]
[WAIT, GROW X, GROW Y...]
[WAIT, SEED X, SEED Y...]
[WAIT]

Check if completing a tree or waiting is better
If completing a tree is better than waiting do the action (exit)
If not move on to the next action set
Check if growing a tree or waiting is better
If growing a tree is better than waiting do the action (exit)
If not move on to the next action set
Check if seeding or waiting is better
If seeding is better than waiting do the action (exit)
If not then wait (exit)

Accounting for opponent actions

I did account for the locations and shadows being cast by the opponent’s trees, but did not account for the potential moves the opponent could make in a certain game state. I simply applied my action and calculated the utility.

What I liked / did not like about the approach

Quick prototyping and relatively good results quickly
Enabling/disabling small utility functions made it easy to discard functions that weren’t affecting the game much (or at all)
Sometimes it was difficult to wrap my head around the fact that a utility function was evaluating the state without the knowledge of which action changed the state (e.g. a utility function for grow didn’t know which tree was grown). Maybe shouldn’t have done it this way, but I wanted to stick to my choice

Thanks again, looking forward to the fall challenge.

R4N4R4M4 · May 17, 2021, 10:34pm

Hi everybody,

Thanks for this very nice challenge. I finished #207 Legend with my last Gold Code I couldn’t improve after passing the Boss. Nearly all the contest is streamed here https://youtu.be/qb_xfMaoqdg.

Early in Silver, I noticed 4 phases in the game in my heuristics :

Expansion : when you have to take cells but don’t do size 3 trees
Hatching : to have a maximum of size 3 trees without achieve them
Supremacy : to make game points achieving trees while maintaining sun points income
Conclusion : to make the most points possible

Day 0 to 4 : Play the best evaluated move in a list with : no complete, no seed from size 1 tree, grow big size first, only free seed with best cell pre-selected on 6 directions shadows (size 3) and richness
Day 5 to 11 : Play the best evaluated move in a list with same rules than in the first phase, but allowing growing size 3 trees before other actions
Day 12 to 18 : Play the best evaluated move in a list with same
rules than before, but allowing complete before other actions
Day 19 to 23 : For each move in a list, play till the end of game counting 1 if the game is won and -1 if the game is lost, 0 if draw. Choose the move which leads to the bigger number of wins. The list of moves is the same than before, except no grow are allowed on a tree that cannot be complete on the last turn, and there is no more limit to plant seeds. Player 1 always plays wait.

In Legend, I tried to replace move evaluation by depth 1 or depth 2 evaluation in each of the three first phases, but I didn’t find an evaluation good enough to do better. Maybe I’try again when multi is opened, or maybe other algos according to all feedbacks given here.

This was a great challenge, Thanks to Codingame and all the game makers, it is very nice and fun.

AntiSquid · May 17, 2021, 10:34pm

What exactly is the name of the ML algo you used? I see you mentioned alphazero, if it’s derived from that, what are the differences?

Please mention more details regarding the overall NN structure if you don’t mind.

Not sure what this means.

reCurse · May 17, 2021, 11:31pm

The ML algo is mostly coming from Alpha Go/Zero (I took ideas from both). The most important differences here are using hex convolutions and two policies as output. Of course there are dozens of minor deviations lying mostly in all the implementation details involved, which do matter, but that would be enough subject matter for an entire series of articles, e.g. you can take a look here for an example of how deep the rabbit hole can get.

You can look at this cheat sheet for a friendly overview of the type of NN architecture involved.

A hex-convolutional layer means it uses 6 neighbors for convolution instead of 8 like in a regular grid, in order to better mimic the spatial representation of the game. That way it correctly involves information about cells that have a distance of exactly 1.

Convolutions typically work with cartesian dimensions, and don’t directly work with hex grids. So I worked around it by using a conventional 2d grid as input for storing the hex grid, as pictured in the link I provided.

EDIT: Sure, I guess I could exhaustively write all the details of my pipeline, but I would view that as a form of retirement from competitive coding, if I end up giving away all my tricks and secrets. This is not a decision I am ready to make yet.

Antonin_L · May 18, 2021, 12:08am

Awesome, wasn’t even aware that you could use these kind of method in the context of this platform / game where you can’t import any kind of external file ? Being a data scientist myself with no pure computer science training I always use libraries (compatible with python 3) and store models in objects like joblib dumps or pickle, I hardly see where did you store the information of your trained network ? Anyway great theoretical and practical knowledge, well played !

reCurse · May 18, 2021, 12:13am

You can take any binary data and transform it into a string using the base64 algorithm, which you paste in your source code. Then you decode that string into a binary array at the beginning of your program.

However, Codingame has the particularity of counting the number of characters instead of the number of bytes as the source code limitation. So you can do a lot better than the 6 bits per character base64 gives you by using the admissible extent of a UTF-8 character representation, which brings you up to easily 15 bits per character.

Antonin_L · May 18, 2021, 12:16am

Impressive

AntiSquid · May 18, 2021, 2:10am

Not expecting you to share everything. I’ll accept all the details I can get though. Thanks.

jacek1 · May 18, 2021, 6:38am

It’s good to understand how NN (is it possible?) works, but also the training pipeline. My only known NN breakthrough bot was very leading until reCurse came and took over with 100% winrate against my bot. This motivated me to improve my training framework and given the same NN architecture it now wins against my old bot over 90% and is almost as good as reCurse’s one. And this training pipeline improved my other board games bots.

putibuzu · May 18, 2021, 8:54am

Impressive and inspiring stuff! Thank you for linking all the external resources and explaining so much

Di_Masta · May 18, 2021, 9:50am

Hi, I finished around 1200th. Hardly a good or “achievement/T-Shirt” rank, but I want to tell yo a few words.

Personal record:
For first time I’ve reached the Gold League in a contest (I’ve participated in around 7 contests before that) and I’m proud of myself

Lessons learned:
As always I’ve learned several things throughout the contest, which I’ll definitely try to follow the next time:

Implementing a heuristic(simple if else logic) bot, based on the current game state, without looking in the future is very very important. Until now I wasn’t a big fan of heuristic evaluations, pruning moves and learning/analyzing the game myself, I just wanted to implement a general bot, given the state to output a move.
But this time I’ve decided to start with a heuristic bot and it was worth it. It was fun, just chilling morning/afternoon programming, no fancy algorithms, no complex tree/graph data structures, no messing probabilities a dozen times, … only observing and trying new things. This really got me into the game and I was more and more eager to move forward.
For first time I’ve watched a streamer and saw him analyzing a game of the top players very deeply (something which I don’t do), which helped me coming up with new ideas myself. Also chatting with outer players was nice.
This helped me made a bot which SEEDs only when the seed cost is 0(on the best soil), COMPLETEs lately in the game and GROWs the trees on the best soil. And that’s it, this several lines of code moved me to SILVER league. In my opinion a heuristic bot could get you in GOLD, so dare to try
After that I was ready to make a simulation of the game with given commands sequence and look into future.
The simulation part is very very important for any kind of AI approach, so implementing fast and correct simulation is a must.
I was wondering how to test my simulation and came up with the idea to try classic Monte Carlo simulation approach, get the possible moves for the current turn, play as much random games from each one until the end as the time allows, choose the turn with most wins.
After fixing simulation bugs for 2 days I saw that my MC couldn’t reach a lot of simulations, but after incorporating the ideas from my heuristic bot the simulations increased and I was able to reach GOOOLD
Then I wanted to implement MCTS and almost got it, but unfortunately couldn’t polish it for submit, which could got me in LEGEND, I’ll definitely try it in the classic game later.
Time management is also very important, for first time I spent at least 3 hours per day for each day of the contest, which definitely paid off.

TODO:

I want to optimize my simulation and available commands gathering, maybe by using bitboards
Implement MCTS with random playouts
Implement MCTS with heuristic bot playouts
Implement MCTS with evaluation “playouts”
Implement MCTS with solver

The challenge was very fun