Spring Challenge 2021 - Feedbacks & strategies

RoboStac · May 17, 2021, 8:56am

I’m probably going to finish top 30 with a neural network based bot. As with the other NN bots I’ve done it’s heavily alphazero based which means each time the NN gives an evaluation of the current position (value) and a policy value for each possible move (policy) to help the MCTS pick good moves to explore.

Network
534 inputs (528 binary 1/0 and 6 floats):

Binary inputs:
- 37 for blocked cells (richness 0 / trees)
- 2x (One set for each player, current player first):
  - 37 x Dormant
  - 37 x Tree size 0
  - 37 x Tree size 1
  - 37 x Tree size 2
  - 37 x Tree size 3
  - 37 x Sun Count (1 in the position for current sun count - if more than 37 sun then just set to 37)
- 24 x day
- 21 x nutrients
- 1 x Current Player Sleeping (thinking about this after I’m not sure why this is here as I would never evaluate a state where this was true?)
- 1 x Opponent sleeping
6 floating point inputs (3 per player):
- Score / 100
- Sun / 100
- Total Sun Gathered / 100

Outputs:

1 value (tanh) for win chance
38 policy outputs (softmax). 1 for each hex and 1 for wait. I tried having more policy outputs to differentiate between complete/grow/seed but found the simpler version worked better. Seeding source was chosen deterministically based on the trees available.

Two hidden dense layers of 96 nodes with relu activation.

Search:

Sequential MCTS without rollouts. Instead the NN is evaluated with the value backpropagated. Exploration constant is based on the policy rather than a fixed global value (see alphazero papers for more details).

Gamestate was only updated after both players had chosen a move, but I allowed my player moves priority on the submitted bot (eg if both players tried to seed I’d simulate it as mine seeding and the other bot failing, or give me the full nutrients and the enemy nutrients - 1) as allowing the enemy to always chose to exactly counter my move was causing problems in my search.

I did some pruning to cut down number of moves on my bot :

No seeds on day 19->22
At most 2 seeds
On day 23 only have one seed target per turn(doesn’t matter where, just to increase tree count).
No growing on days 20+ unless it can get to a size 3 (eg on day 21 we only allow grow on size 1/2).

Training:
Training was done via self play - my final submission was based on 700000 games after I restarted on Sunday. Games were run on batches of 2000 games with the last 3million states (data from a single turn) uniformly sampled for training inbetween batches. This meant the network was trained on games from about the last 15 versions of itself (so it doesn’t overfit to new stuff).
The policy targets are the visit counts from the root MCTS node and the value target is the eventual result of the game (-1 for loss, 0 for draw, 1 for win).

Failures:
I was pretty convinced I needed to do something better about simultaneous moves so spent Wednesday onwards trying to get a DUCT based version working. It was getting close on Sunday (could get into top 100) but never got to the same strength as my sequential version so I gave up and went back to trying to tweak my older version.

BrainSolver · May 17, 2021, 9:06am

Hello,
I made a MC.
I remapped the cell indexes to a 64-bit integer like this during I/O:

25 24 23 22 ** ** ** **
26 11 10  9 21 ** ** **
27 12  3  2  8 20 ** **
28 13  4  0  1  7 19 **
** 29 14  5  6 18 36 **
** ** 30 15 16 17 35 **
** ** ** 31 32 33 34 **
** ** ** ** ** ** ** **

This allows all trees to be processed at the same time by doing binary shifts of
- bitBoard << 1,
- bitBoard >> 1,
- bitBoard << 8,
- bitBoard >> 8,
- bitBoard << 9,
- bitBoard >> 9.

This is useful both for the calculation of the list of possible actions, for the calculation of shadows and sun points.

miklla · May 17, 2021, 9:08am

Place: 25th in legend.

My bot is constructed around dynamic programming. If days of opponent tree cuts are fixed, shadows mechanic simplified to energy gain = e - e / C, where e = t[1] + 2 * t[2] + 3 * t[3] - maximal energy gain, t[] - numbers of trees of given size, C - const, and all richness is equal to medium reachness, then the game becomes the solo optimization game. Which can be entirely solved with dynamic programming. Naive way would be to construct multidimensional array day x nutrient x t[0] x t[1] x t[2] x t[3] x energy and to fill score values in it with day < 24, nutrient <= 20, t[3] <= 5 and so on … but this array contains millions of elements and can’t be filled in 100ms turn. Instead I have multidimensional array day x nutrient x t[0] x t[1] x t[2] x t[3] (actually day x nutrient x code and apply move generation and other things directly to “code” in cached fashion) filled with lists of pairs of energy and score. These pairs constantly are filtered from dominated ones. After heavy optimizations and a few tricks I was able to solve solo optimization game in around 30 ms or less in any position or set of starting positions (time not really differs between 1 starting position and 1000 starting positions).

Of course this is heart of my algorithm, but not entire algorithm. In almost any game position I simulate current player 2 days ahead with opponent making no moves and then apply dynamic programming to all children depth 2 at once. Globally, I apply all aforementioned to play as opponent against my dummy cut timings and then as myself to play against already calculated opponent’s cut timings. Also some heuristics are added as modifications to children’s score values.

This contest was fun, good job everyone!

kurnevsky · May 17, 2021, 9:22am

Place ~10-15 in legend

Algorithm

I use recursive principal variation search with iterative deepening, hash table with zobrist hashing and principal variation table to check potentially good moves fist. Principal variation table is just a table where I store last best move indexed by recursion depth. The idea of principal variation table is that best move doesn’t change much based position, but I’m not sure how well it works here. It least it doesn’t make things worse, and very easy for implementation.
Minimax doesn’t work well if in recursive descend it gets to different days because it will try to force day end early to be able to make actions on the next day. So I consider fixed number of moves per day (2 or 3 moves for each side, works equally good) and then go to the next day. This basically means that I switch days on a fixed depth in recursive descend.

Estimation function

Position is evaluated by function that finishes the game without players moves. If initial day when we started minimax is 12 or higher it will harvest all possible trees in the last day sorted by richness when it’s profitable. Last day harvesting is simultaneous for both players. The final result is calculated as score * 3 + sun.
I tried different ideas to improve estimation function, like considering trees with size less than 3 at the end or growing all possible trees each day based on greed strategy, but nothing improved the play.

Moves choosing

In minimax function I consider only GROW and COMPLETE moves. COMPLETE moves are considered only if start day 12 or higher. By start day I mean the day when we start minimax search, but not the day where we get in the recursive descend, otherwise it will give different results for different calculation depths.
Moves are sorted by shadows in the next day - for COMPLETE moves shadowed trees will be considered first, for GROW moves trees that will outgrow the shadow.

Seed moves

The effect of seed moves is deferred for a few days, so minimax doesn’t work well on them (or I just failed to write good estimation function). So I use a simple strategy - throw a new seed if it’s free. The exact place is chosen by shadows map - how many shadows will be on a cell till game end and how many ours/enemies trees will be shadowed by a seed if it grows each day.

YannT · May 17, 2021, 9:42am

I will probably finish somewhere between 100-150 legend. I ran with a pretty standard MCTS, because I didn’t think I would manage a decent eval. However, I had to cut some corners to get decent enough performance for it to work. Here are the specific things I did:

Select:
Very standard for a MCTS, with an exploration constant roughly tuned to 0.95

Expand:
I allowed for way more things than others, that’s definitely something I could have improved to get more performance because reading other PMs, I realize I had branches in my tree that just don’t make sense.
My only rule was don’t SEED if you’ve already got two, and never SEED next to your own trees.

Rollout:
Here I really cut corners. First of all, no SEEDs in rollouts, and rollouts limited to the next 10 days.
Basically what this does is tell me if I would win the game if it were to kinda stop right now.
Obviously this is less than ideal and probably explains why I didn’t rank higher.

Backpropagation:
Very standard: -1, 0 or 1 depending on if the game is a lose, draw or win according to the rollout.

Tree reuse:
I reuse my tree as long as I can find the move the opponent played. As long as the opponent didn’t play more than one move or one move plus a wait, I’ll reuse my tree. Otherwise I will start fresh. I thought of exploring more to find the right node when the opponent played multiple moves while I sleep, but it didn’t look like it would bring much.

Other than that I only had the usual precalcs for shadows and seed ranges.

I’m very happy I managed to get to legend with a MCTS, it wasn’t obvious given the necessary adaptations for simultaneous play. I think some more pruning in the possible actions and some more precise rollouts would have gone a long way retrospectively.

Great contest again, see you next time!

Flo · May 17, 2021, 10:08am

Hello, here is my postmortem (finished around #30)

The Beam Search

I implemented a beam search exactly as described by eulerscheZahl and with a similar method for filtering moves. My eval consists in :

my sun - my opponent sun
a pondered sum of the sun income difference between my bot and the opponent for the following days,
the score earned,
a malus for trees under the shadow of my own trees,
a malus if the income is under a certain threshold, fixed for each day

The opponent is simulated in a very basic way:

every opponent tree grows every day (except the first simulated day if it is dormant or his sun is too low),
the opponent has a “free” complete everyday starting from day 12, reducing the amount of nutrients.

Late gold I started to manipulate the overall strategy of my bot by modifying the income threshold in the eval and the estimated nutrients for each simulated day.
I thought it would be possible to adapt this strategy depending on what my opponent plays :

my opponent completed a lot of trees early :
-> nutrients will go down more slowly as his sun income is lower, so a long term strategy involving a high sun income might be more viable

my opponent has an excessive amount of size 3 trees and did not complete much
-> nutrients will go down abruptly near the end so completing fast might be a good idea,

I’m actually not sure if this is a good idea, as I did not experiment with this much.
I went another way:

The DUCT experiment

As legend opened, I tried to implement progressively to a Decoupled UCT. The top 3 on the previous challenge did it with great success, so why not ?

But I slightly underestimated the work required to make it functional. Mixing different algorithm might be strong, but it is time consuming.

After two days of coding, my DUCT version (when activated from day 15) was performing as well as my beam search in the leaderboard, which was slightly frustrating.
So in the end, I reverted back to my beam search and fine-tuned a couple of magic numbers.

Conclusion

Pros: I learned something new
Cons: I won’t have a new T-shirt

Congrats to the winners, thanks to CG, it was fun !

AntiSquid · May 17, 2021, 10:09am

Question for NN bots:

Have you considered using current game state + every possible move at a time to score it?
You are given the moves as input each time the state changes, you can score each move individually.

About my Bot:

Chokudai’s version of beam search with random heuristics. This is what got me best results.

My eval function wasn’t suitable for most part of the contest, didn’t focus much on them either, regret not starting with a heuristic bot instead. Each time I promoted it was due to some bugs in my eval actually, so I thought I’ll just switch to some random heuristics.

Early Sunday was at around 1000 in gold, new bot went up to 300, didn’t bother checking since then, but it proves a point.

It wasn’t my favorite board game and I didn’t find it very intuitive or fun, but thought i try some new things out, which is what kept me in the contest. CG still offers one of the best platform for contests, if not the best.

Switched a few times between standard Beam Search and Chokudai’s version and I think Chokudai’s version is great because it’s a far better mix of exploration and exploitation.

Evaluating opponent didn’t do much for me at least. So at most I evaluated WAIT. COMPLETEs were useful to evaluate, but I didn’t include this in my final bot. It’s just strange how much better things went when certain parts were left out entirely.

Jeff06 · May 17, 2021, 10:19am

Thanks for the write up, super interesting approach!
Just a few questions:

On what architecture did you do the training? Even at only 5 seconds per game, 700,000 games is a cumulative ~40 days of game play, so you’d need a lot of cores to get it done quickly.
What structure do you use to encode your neural network’s weights? 534 inputs + 2 dense layers with 96 nodes + 38 inputs is a lot of weights to encode (probably over 60,000, so needs good compression to cram into the 100ko CG limit!)
Why encode the day, nutrients, and sun count in binary rather than float?

nmahoude · May 17, 2021, 10:23am

I’ll finish 10-20 legends with a “staged” BeamSearch for continuing actions of the current day and waits, +/- the same as eulerscheZahl.
I have “regular” actions pruning as other described, nothing fancy.

I tried to help my evaluation with some arbitrary day limits (12 & 15) where I drop the expected suns bonus, seems like it was good with the meta.
I struggle a lot with the coeffs, i’ll work on the use of (existing) tools to find best parameters next time…

For the endgame (day > 20), i simulate a greedy opponent that complete as soon as possible, that help me a lot in the end.

performance wise, I have around 40k actions in 50ms (that’s not a lot, but I’m using java, so …)

RoboStac · May 17, 2021, 10:53am

Training was done on my pc (i7-8700K) using 12 parallel threads of games. CPU based for generating self play data with the network training done using tensorflow-gpu in between self play batches. The self play games have a fixed number of MCTS iterations rather than using timelimits (set to 1500 iterations per turn) so turns don’t take as long. When running on cg there would be a lot more iterations per turn (around 20-30k)). My self play history size was chosen to not cause me to run out of ram (by repeatedly lowering it every time I ran of of ram - I really need to change this away from python).

They’re all 14 bit floats using 1 unicode character per weight (using base65536 but the way codingame measures characters means that most 15/16bit values would count as two characters so I use 14 bit floats). Total number of weights is 64552, leaving just under 36k characters of code space (my final bot is 98002 characters).

Something jacek showed me on oware is that when the reason you want the value isn’t linear it often trains much better (or at least require less weights / time) if you split the value into different binary inputs instead of just passing the float value (in oware I went from 14 floats to 434 binary inputs and trained a much stronger bot in 1/10th the number of games - I was also able to reduce the number of hidden layers / size of hidden layers significantly). It’s easy to optimise the first layer of the NN to skip any 0’s so more binary inputs is mostly a code size issue. Originally everything was binary, but the number of inputs was causing code space issues so I changed points / sun to be floats. Total sun was added later on after analysing lots of games and it seeming to be a good indicator of win chance. I left the sun binary inputs there as it seemed like it would probably be useful in both formats (more sun = better, but specific sun values allow certain moves).

jolindien · May 17, 2021, 10:55am

Hi everyone.
Finished 4th before the final rerun.
My strategy is based on … surprise … MCTS.

My version make use of two Trees, one for each player.
The main difficulty for this is that legal moves depend on other player actions.
I handled that issue in the following way:

when computing childs of a node, give it all “potential” actions and not only actions that are legal for the current state. I mean by “potential” that it would be legal if opponent actions were differents. For exemple, allow a GROW even if the price is too high, or allow a SEED even if the cell is occupied.
when making node selection (i.e when choosing a legal move) choose between childs that are really legals for the current state.

The goal was to reduce the branching factor.
To reduce it further I also strongly pruned the SEED actions.

An other important choice (but I am not sure it was a good idea) was to impose an order in actions during a day : first COMPLETE, then GROW for trees of size 2, size 1, size 0 and eventually SEED.
It usually make actions cheaper in this order (GROW price depend on number of trees of higher size).
For the sake of simplicity I also limited the number of action during a day at only one by category (COMPLETE, GROW1, GROW2, etc.).

I noticed that several gamers have used Neural Networks with success. I can’ t wait to read their PM.

AntiSquid · May 17, 2021, 11:05am

So how do you go about generating potential actions? I am guessing you need to prune a lot more. So how do you know what to keep?

I am a bit confused how you’d store this in the search tree format. So for example for a player you generate a tree with all possible moves while opponent is on WAIT the entire time?

sprkrd · May 17, 2021, 11:25am

Nice!

I use bit masks to represent the state, too, and the __builtin_ctzll function to iterate through the trees when needed, although that was more of a convenience for my heuristic, since I could compute the shadows in constant time with the help of a few look up tables (I could’ve implemented my heuristic also using the bitmasks directly, but the potential speed gain was small in comparison to the effort).

Other stuff that required tree counting I did with __builtin_popcountll.

Anyway, my ultra-fast sim didn’t get me further than ~1000th in gold. I tried MCTS at first, but made the mistake of implementing the tree as a hash table instead of an actual tree: it’s easier to handle transpositions and I don’t have to think hard for reusing the tree (just plug in the new fresh state and call it a day). However, hash tables are mighty slow compared to an actual pointer-based tree, so I could only do ~300 or so simulations in the early to mid-game, which is ridiculous. If I’d made MCTS use an actual tree, maybe I’d be alongside @Magus

BrunoFelthes · May 17, 2021, 11:26am

I’ll finished 100-120 legend using Java,
This was the most difficult contest that I participate ever…
As a human, I didn’t understand what was going on at the board, and why i won or lose. What move was bad or good…
I started the contest like all contest, with a simple Monte Carlo strategy.
Then I tried beam search, using every play, and it was not very good, so I tried to synchronize the days, and it improved a little… But by score function was very bad, and the improvements was more based at the heuristics than the search.
Than, I think that I could try to use MCTS for the end of the game, but the garbage collector was timing out a lot… So I forget about it, and try my last approach

Genetic Algorithm

I create two population, one with my moves and one with opponent moves, in an array[day][move], with a max of 5 moves per day.
Then i play all my population against the opponent population and evolve both, based of the final board score reached.
My game simulation was not very fast, but fast enough to find good individuals…
I added at the genes, a lot of restrictions to play, like almost everyone…

AntiSquid · May 17, 2021, 11:30am

How did you eval the genes to know what to keep? Just endgame state? Or was it sim up to certain depth and then check sun score for rest of the game?

Also

BrunoFelthes · May 17, 2021, 11:33am

Day < 16, score board at the last day calculated, Day>=16 endgame score… The score board is a lot of crap things, but most important things is “Sun Collected” - “Sun Wasted” (Wasted mean grow a seed 1 to 2 with 1 tree 2 at the board, you waste 1 sun point) plus some tree values… I do not fit this constants with local play, only with felling…

snowfrogdev · May 17, 2021, 11:46am

I went with simultaneous MCTS with UCB1. Only restriction was don’t seed if you already have a seed and the evaluation function was just won-lose-tie. It showed promise but the problem was performance, not enough sims per turn. Was in the process of switching to bitboard but ran out of time.

jolindien · May 17, 2021, 11:54am

thanks for sharing and congrats for your work !

Do you have a trick to convert quickly the state into NN inputs ?

ChadThundercode · May 17, 2021, 12:10pm

Hi, rank 3306 here!
Some amazingly inspiring ideas already posted above, can’t wait to try them out. What could I possibly add?
Well, I’ve translated my code into a few languages on my quest to get that sweet Legendary achievement. And I think it’s a great exercise in many ways. But, for me specifically, it helped unearth how the readability of my code remains questionable still even to this day. Can recommend for sure.
I don’t know many platforms, but doing this on CodinGame surely was enjoyable. See you guys this fall in Gold, hell, maybe even in Legendary :>

update: looking through some post mortems, and seeing that I was moving in the right direction, even if I did not implement no mcts/beam search or anything like it, now that is truly inspiring

Rikerslash · May 17, 2021, 12:47pm

Hello,
Rank 172 in Master with java here. This is my second competition and I think I used a beam search approach if I understand it correctly.
I did not use the back end code.
Some methods I used were:
Copying a game:
Here I had a big mistake where two dimensional arrays were not copied correctly which mattered a lot and was hard to find. Thank god to unittests.

Applying a move to a game, Calculating all possible moves for a game,
Calculating all plausible moves:
I would trim the move list by having varying maxtreenumbers [1,2,4, 6] mostly and not complete while my opponent had no size 3 tree which helped a lot when my opponent made a selloff in the midgame in the hopes of winning. (Also selling when this happens would maybe have been better).

This was my beam kind of. I simulated a whole day of my turns to reach the maximum value of a game until I reach a Wait. My opponent did nothing meanwhile.

Then I had shadowCalculation for all 6 directions and a potental shadow calculation where I simulated my opponent using all grow actions which were possible at this point.

Then there is my gameevaluating algorithm… Oh my… I think it had like 20 variable parameters, because I had not much time and had no gamebreaking fast ideas left at sunday night to work with so I made minor changes to close the gap from 60 gold to legend.
First the basic stuff everything was transformed to points so score was always 1 and sun was 1/3. I also had sunproduction for next round at 1/6 first, then 1/3, and 0 in last turn.
Lastly each tree had a complex score, depending on size [1.1,2.2,4.4,9.6] until late game. Added decreasing scores for the next turns where it was not in a shadow depending on size. And a decrease if it is in potential shadow. Seeds also had decreasing downgrades depending on neighboring trees of same player. This made my last push possible, because I totally thought that the downgrade for shadows was enough which was a big mistake. Lastly some bonusscore for the richnessvalue of the cell. All of this was multiplied by a factor which decreased when turns ran by to make completes more useful.
For endgame I had slightly different values where the basevalue of a tree was much lower since trees are kind of worthless at the end if you did not complete them.

I also started completing at earliest in turn day 11.

In total it was a fun challange where I was happy to first time reach master even with not being able to code for three days. Next time I would like to have less fixed parameters, but rather a better algorithm. I would also make even more unittests since I had like 3-4 pretty gamebreaking kind of invisible bugs. It was a little bit annoying to manage the parmeters at the end. It got late and I got tired at 02 a.m watching my last commit barely making it after an hour.