Spring Challenge 2021 - Feedbacks & strategies

My AI output the number of expand/rollout (the first number) at each turn.

I didn’t hardcode the first turns because I reuse the MCTS tree at the next turn. So I let the MCTS computes what it must compute.

6 Likes

Finished at 300 Gold.

Bronze to silver: Sort all moves by tree’s index and print the first move, self shadow avoidance.
Silver to gold: Minimax at depth 1. I assume opponent always grows his best tree that counters whatever move I make. I tried to switch to Genetic Algorithms for a better performance and deeper search but the results were worser than minimax. So I switched back to minimax.

my final eval look like this:

eval(playerId) {
	if (day == 23) {
		solution.score = 10 * (players[playerId].score + players[playerId].sun / 3);
		return;
	}

	
	if(number of trees of size i exceed a hardcoded cap) {
		score -= capPenalty * (numTrees[i] - cap[day][i]);
	}
	if (day < 22 && I'm cutting the last level 3 tree) {
		solution.score -= cuttingLastTreePenalty;
	}
	else if (playerId == 1 || (day > 10
		&& (day + tomorrow's recovery + current sun points >= 35))) {
		solution.score += 3 * players[playerId].score + players[playerId].sun;

		for (int d = 1; (d <= days_depth) && (d + day <= 22); d++) {
			solution.score += recoveryMultiplier[d] * recovery on dth day from today;
		}
	}
	else {
		for (int d = 1; d <= days_depth; d++) {
			solution.score += recoveryMultiplier[d] * recovery on dth day from today
		}
		solution.score += for each tree pow(priority[pos.y][pos.x], 3) * tree size  * dayMultiplier[d]; //divide this by two if enemy shadows this position.
	}
}
int cuttingLastTreePenalty = 10000000;
int recoveryMultiplier[6] = { 0, 10000, 5000, 3000, 2000, 1000 };
int dayMultiplier[6] = { 0, 5, 4, 3, 2, 1 };
5 Likes

~600th

I spent most of the contest working on Neural Network code. I thought a Neural Network was possible and would therefore dominate the competition. However, I was unable to get good convergence of the network.
The game had a state which could be represented by only ~45 numbers, or 341 if you used 8 planes for the two players and the 4 possible tree sizes. The game had too many possible actions (37x37 SEEDs+37 COMPLETE+37 GROW+WAIT) which made a straight-up AlphaZero implementation difficult because AZ learns a policy over the actions. A simplifying idea would be to learn only a value function.
I will mention two of the main approaches I tried:

  • Pseudo-AZ: Have an AI using the NN-Eval play itself and learn an ever improving value function V from the (state,game_outcome) database. For example, I had a DUCT using the NN’s eval play itself.
  • Bellman equation: Inspired from Q learning, have a epsilon-greedy AI play itself and create a database of states. Then apply the Bellman equation to learn via temporal difference learning. For every state you can calculate what the value of that state should be as a function of the estimates of the values of the following states. For a state S, players have N0 and N1 actions. For all the pairs of N0*N1 actions you simulate the following state S’ and fill up a payoff matrix using the NN’s eval of the S’ states. The payoff matrix can be solved with fictitious play to produce the optimal policies for both players. Then the value of the state can be calculated with the sum of policy0[i]*policy1[j]*payoff_matrix[i][j]. The learning signal for the NN is then the difference between the estimate V(S) and the estimate calculated from all the following states S’.

Once I have a neural network, I submitted it to CG wrapped in a DUCT search with only the network coefficients compressed in base65536 unicode within a clear text cpp code to not violate obfuscation rules. Coding this took a bit of time as well.

For a neural network code, my ranking is very low, and I attribute this to a poor evaluation function. I found bugs but I suspect I still have bugs or simply an unsuitable RL algorithm to properly learn a good V function.

26 Likes

Around ~100,
just a iterative fail soft alpha-beta with selectiv moves for the 16 first rounds and an evaluation computing an evaluation of the final score (i.e. the number of suns that all trees will see and try to compute the order of the forthcomming COMPLETEs).

Selectivity as others, just trying to remove as most SEED as possible… And in the end only generating free SEEDs is no GROWs are generated. (Push me from gold to legend…)

Also used a 300 000 entries transposition tables.

8 Likes

I’m probably going to finish top 30 with a neural network based bot. As with the other NN bots I’ve done it’s heavily alphazero based which means each time the NN gives an evaluation of the current position (value) and a policy value for each possible move (policy) to help the MCTS pick good moves to explore.

Network
534 inputs (528 binary 1/0 and 6 floats):

  • Binary inputs:

    • 37 for blocked cells (richness 0 / trees)
    • 2x (One set for each player, current player first):
      • 37 x Dormant
      • 37 x Tree size 0
      • 37 x Tree size 1
      • 37 x Tree size 2
      • 37 x Tree size 3
      • 37 x Sun Count (1 in the position for current sun count - if more than 37 sun then just set to 37)
    • 24 x day
    • 21 x nutrients
    • 1 x Current Player Sleeping (thinking about this after I’m not sure why this is here as I would never evaluate a state where this was true?)
    • 1 x Opponent sleeping
  • 6 floating point inputs (3 per player):

    • Score / 100
    • Sun / 100
    • Total Sun Gathered / 100

Outputs:

  • 1 value (tanh) for win chance
  • 38 policy outputs (softmax). 1 for each hex and 1 for wait. I tried having more policy outputs to differentiate between complete/grow/seed but found the simpler version worked better. Seeding source was chosen deterministically based on the trees available.

Two hidden dense layers of 96 nodes with relu activation.

Search:

Sequential MCTS without rollouts. Instead the NN is evaluated with the value backpropagated. Exploration constant is based on the policy rather than a fixed global value (see alphazero papers for more details).

Gamestate was only updated after both players had chosen a move, but I allowed my player moves priority on the submitted bot (eg if both players tried to seed I’d simulate it as mine seeding and the other bot failing, or give me the full nutrients and the enemy nutrients - 1) as allowing the enemy to always chose to exactly counter my move was causing problems in my search.

I did some pruning to cut down number of moves on my bot :

  • No seeds on day 19->22
  • At most 2 seeds
  • On day 23 only have one seed target per turn(doesn’t matter where, just to increase tree count).
  • No growing on days 20+ unless it can get to a size 3 (eg on day 21 we only allow grow on size 1/2).

Training:
Training was done via self play - my final submission was based on 700000 games after I restarted on Sunday. Games were run on batches of 2000 games with the last 3million states (data from a single turn) uniformly sampled for training inbetween batches. This meant the network was trained on games from about the last 15 versions of itself (so it doesn’t overfit to new stuff).
The policy targets are the visit counts from the root MCTS node and the value target is the eventual result of the game (-1 for loss, 0 for draw, 1 for win).

Failures:
I was pretty convinced I needed to do something better about simultaneous moves so spent Wednesday onwards trying to get a DUCT based version working. It was getting close on Sunday (could get into top 100) but never got to the same strength as my sequential version so I gave up and went back to trying to tweak my older version.

43 Likes

Hello,
I made a MC.
I remapped the cell indexes to a 64-bit integer like this during I/O:

25 24 23 22 ** ** ** **
26 11 10  9 21 ** ** **
27 12  3  2  8 20 ** **
28 13  4  0  1  7 19 **
** 29 14  5  6 18 36 **
** ** 30 15 16 17 35 **
** ** ** 31 32 33 34 **
** ** ** ** ** ** ** **

This allows all trees to be processed at the same time by doing binary shifts of
- bitBoard << 1,
- bitBoard >> 1,
- bitBoard << 8,
- bitBoard >> 8,
- bitBoard << 9,
- bitBoard >> 9.

This is useful both for the calculation of the list of possible actions, for the calculation of shadows and sun points.

27 Likes

Place: 25th in legend.

My bot is constructed around dynamic programming. If days of opponent tree cuts are fixed, shadows mechanic simplified to energy gain = e - e / C, where e = t[1] + 2 * t[2] + 3 * t[3] - maximal energy gain, t[] - numbers of trees of given size, C - const, and all richness is equal to medium reachness, then the game becomes the solo optimization game. Which can be entirely solved with dynamic programming. Naive way would be to construct multidimensional array day x nutrient x t[0] x t[1] x t[2] x t[3] x energy and to fill score values in it with day < 24, nutrient <= 20, t[3] <= 5 and so on … but this array contains millions of elements and can’t be filled in 100ms turn. Instead I have multidimensional array day x nutrient x t[0] x t[1] x t[2] x t[3] (actually day x nutrient x code and apply move generation and other things directly to “code” in cached fashion) filled with lists of pairs of energy and score. These pairs constantly are filtered from dominated ones. After heavy optimizations and a few tricks I was able to solve solo optimization game in around 30 ms or less in any position or set of starting positions (time not really differs between 1 starting position and 1000 starting positions).

Of course this is heart of my algorithm, but not entire algorithm. In almost any game position I simulate current player 2 days ahead with opponent making no moves and then apply dynamic programming to all children depth 2 at once. Globally, I apply all aforementioned to play as opponent against my dummy cut timings and then as myself to play against already calculated opponent’s cut timings. Also some heuristics are added as modifications to children’s score values.

This contest was fun, good job everyone! :slight_smile:

27 Likes

Place ~10-15 in legend

Algorithm

I use recursive principal variation search with iterative deepening, hash table with zobrist hashing and principal variation table to check potentially good moves fist. Principal variation table is just a table where I store last best move indexed by recursion depth. The idea of principal variation table is that best move doesn’t change much based position, but I’m not sure how well it works here. It least it doesn’t make things worse, and very easy for implementation.
Minimax doesn’t work well if in recursive descend it gets to different days because it will try to force day end early to be able to make actions on the next day. So I consider fixed number of moves per day (2 or 3 moves for each side, works equally good) and then go to the next day. This basically means that I switch days on a fixed depth in recursive descend.

Estimation function

Position is evaluated by function that finishes the game without players moves. If initial day when we started minimax is 12 or higher it will harvest all possible trees in the last day sorted by richness when it’s profitable. Last day harvesting is simultaneous for both players. The final result is calculated as score * 3 + sun.
I tried different ideas to improve estimation function, like considering trees with size less than 3 at the end or growing all possible trees each day based on greed strategy, but nothing improved the play.

Moves choosing

In minimax function I consider only GROW and COMPLETE moves. COMPLETE moves are considered only if start day 12 or higher. By start day I mean the day when we start minimax search, but not the day where we get in the recursive descend, otherwise it will give different results for different calculation depths.
Moves are sorted by shadows in the next day - for COMPLETE moves shadowed trees will be considered first, for GROW moves trees that will outgrow the shadow.

Seed moves

The effect of seed moves is deferred for a few days, so minimax doesn’t work well on them (or I just failed to write good estimation function). So I use a simple strategy - throw a new seed if it’s free. The exact place is chosen by shadows map - how many shadows will be on a cell till game end and how many ours/enemies trees will be shadowed by a seed if it grows each day.

16 Likes

I will probably finish somewhere between 100-150 legend. I ran with a pretty standard MCTS, because I didn’t think I would manage a decent eval. However, I had to cut some corners to get decent enough performance for it to work. Here are the specific things I did:

Select:
Very standard for a MCTS, with an exploration constant roughly tuned to 0.95

Expand:
I allowed for way more things than others, that’s definitely something I could have improved to get more performance because reading other PMs, I realize I had branches in my tree that just don’t make sense.
My only rule was don’t SEED if you’ve already got two, and never SEED next to your own trees.

Rollout:
Here I really cut corners. First of all, no SEEDs in rollouts, and rollouts limited to the next 10 days.
Basically what this does is tell me if I would win the game if it were to kinda stop right now.
Obviously this is less than ideal and probably explains why I didn’t rank higher.

Backpropagation:
Very standard: -1, 0 or 1 depending on if the game is a lose, draw or win according to the rollout.

Tree reuse:
I reuse my tree as long as I can find the move the opponent played. As long as the opponent didn’t play more than one move or one move plus a wait, I’ll reuse my tree. Otherwise I will start fresh. I thought of exploring more to find the right node when the opponent played multiple moves while I sleep, but it didn’t look like it would bring much.

Other than that I only had the usual precalcs for shadows and seed ranges.

I’m very happy I managed to get to legend with a MCTS, it wasn’t obvious given the necessary adaptations for simultaneous play. I think some more pruning in the possible actions and some more precise rollouts would have gone a long way retrospectively.

Great contest again, see you next time!

17 Likes

Hello, here is my postmortem (finished around #30)

The Beam Search

I implemented a beam search exactly as described by eulerscheZahl and with a similar method for filtering moves. My eval consists in :

  • my sun - my opponent sun
  • a pondered sum of the sun income difference between my bot and the opponent for the following days,
  • the score earned,
  • a malus for trees under the shadow of my own trees,
  • a malus if the income is under a certain threshold, fixed for each day

The opponent is simulated in a very basic way:

  • every opponent tree grows every day (except the first simulated day if it is dormant or his sun is too low),
  • the opponent has a “free” complete everyday starting from day 12, reducing the amount of nutrients.

Late gold I started to manipulate the overall strategy of my bot by modifying the income threshold in the eval and the estimated nutrients for each simulated day.
I thought it would be possible to adapt this strategy depending on what my opponent plays :

my opponent completed a lot of trees early :
-> nutrients will go down more slowly as his sun income is lower, so a long term strategy involving a high sun income might be more viable

my opponent has an excessive amount of size 3 trees and did not complete much
-> nutrients will go down abruptly near the end so completing fast might be a good idea,

I’m actually not sure if this is a good idea, as I did not experiment with this much.
I went another way:

The DUCT experiment

As legend opened, I tried to implement progressively to a Decoupled UCT. The top 3 on the previous challenge did it with great success, so why not ?

But I slightly underestimated the work required to make it functional. Mixing different algorithm might be strong, but it is time consuming.

After two days of coding, my DUCT version (when activated from day 15) was performing as well as my beam search in the leaderboard, which was slightly frustrating.
So in the end, I reverted back to my beam search and fine-tuned a couple of magic numbers.

Conclusion

Pros: I learned something new
Cons: I won’t have a new T-shirt

Congrats to the winners, thanks to CG, it was fun !

21 Likes

Question for NN bots:

Have you considered using current game state + every possible move at a time to score it?
You are given the moves as input each time the state changes, you can score each move individually.

About my Bot:

Chokudai’s version of beam search with random heuristics. This is what got me best results.

My eval function wasn’t suitable for most part of the contest, didn’t focus much on them either, regret not starting with a heuristic bot instead. Each time I promoted it was due to some bugs in my eval actually, so I thought I’ll just switch to some random heuristics.

Early Sunday was at around 1000 in gold, new bot went up to 300, didn’t bother checking since then, but it proves a point.

It wasn’t my favorite board game and I didn’t find it very intuitive or fun, but thought i try some new things out, which is what kept me in the contest. CG still offers one of the best platform for contests, if not the best.

Switched a few times between standard Beam Search and Chokudai’s version and I think Chokudai’s version is great because it’s a far better mix of exploration and exploitation.

Evaluating opponent didn’t do much for me at least. So at most I evaluated WAIT. COMPLETEs were useful to evaluate, but I didn’t include this in my final bot. It’s just strange how much better things went when certain parts were left out entirely.

7 Likes

Thanks for the write up, super interesting approach!
Just a few questions:

  • On what architecture did you do the training? Even at only 5 seconds per game, 700,000 games is a cumulative ~40 days of game play, so you’d need a lot of cores to get it done quickly.
  • What structure do you use to encode your neural network’s weights? 534 inputs + 2 dense layers with 96 nodes + 38 inputs is a lot of weights to encode (probably over 60,000, so needs good compression to cram into the 100ko CG limit!)
  • Why encode the day, nutrients, and sun count in binary rather than float?
2 Likes

I’ll finish 10-20 legends with a “staged” BeamSearch for continuing actions of the current day and waits, +/- the same as eulerscheZahl.
I have “regular” actions pruning as other described, nothing fancy.

I tried to help my evaluation with some arbitrary day limits (12 & 15) where I drop the expected suns bonus, seems like it was good with the meta.
I struggle a lot with the coeffs, i’ll work on the use of (existing) tools to find best parameters next time…

For the endgame (day > 20), i simulate a greedy opponent that complete as soon as possible, that help me a lot in the end.

performance wise, I have around 40k actions in 50ms (that’s not a lot, but I’m using java, so …)

12 Likes

Training was done on my pc (i7-8700K) using 12 parallel threads of games. CPU based for generating self play data with the network training done using tensorflow-gpu in between self play batches. The self play games have a fixed number of MCTS iterations rather than using timelimits (set to 1500 iterations per turn) so turns don’t take as long. When running on cg there would be a lot more iterations per turn (around 20-30k)). My self play history size was chosen to not cause me to run out of ram (by repeatedly lowering it every time I ran of of ram - I really need to change this away from python).

They’re all 14 bit floats using 1 unicode character per weight (using base65536 but the way codingame measures characters means that most 15/16bit values would count as two characters so I use 14 bit floats). Total number of weights is 64552, leaving just under 36k characters of code space (my final bot is 98002 characters).

Something jacek showed me on oware is that when the reason you want the value isn’t linear it often trains much better (or at least require less weights / time) if you split the value into different binary inputs instead of just passing the float value (in oware I went from 14 floats to 434 binary inputs and trained a much stronger bot in 1/10th the number of games - I was also able to reduce the number of hidden layers / size of hidden layers significantly). It’s easy to optimise the first layer of the NN to skip any 0’s so more binary inputs is mostly a code size issue. Originally everything was binary, but the number of inputs was causing code space issues so I changed points / sun to be floats. Total sun was added later on after analysing lots of games and it seeming to be a good indicator of win chance. I left the sun binary inputs there as it seemed like it would probably be useful in both formats (more sun = better, but specific sun values allow certain moves).

13 Likes

Hi everyone.
Finished 4th before the final rerun.
My strategy is based on … surprise … MCTS.

My version make use of two Trees, one for each player.
The main difficulty for this is that legal moves depend on other player actions.
I handled that issue in the following way:

  • when computing childs of a node, give it all “potential” actions and not only actions that are legal for the current state. I mean by “potential” that it would be legal if opponent actions were differents. For exemple, allow a GROW even if the price is too high, or allow a SEED even if the cell is occupied.
  • when making node selection (i.e when choosing a legal move) choose between childs that are really legals for the current state.

The goal was to reduce the branching factor.
To reduce it further I also strongly pruned the SEED actions.

An other important choice (but I am not sure it was a good idea) was to impose an order in actions during a day : first COMPLETE, then GROW for trees of size 2, size 1, size 0 and eventually SEED.
It usually make actions cheaper in this order (GROW price depend on number of trees of higher size).
For the sake of simplicity I also limited the number of action during a day at only one by category (COMPLETE, GROW1, GROW2, etc.).

I noticed that several gamers have used Neural Networks with success. I can’ t wait to read their PM.

30 Likes

So how do you go about generating potential actions? I am guessing you need to prune a lot more. So how do you know what to keep?

I am a bit confused how you’d store this in the search tree format. So for example for a player you generate a tree with all possible moves while opponent is on WAIT the entire time?

1 Like

Nice!

I use bit masks to represent the state, too, and the __builtin_ctzll function to iterate through the trees when needed, although that was more of a convenience for my heuristic, since I could compute the shadows in constant time with the help of a few look up tables (I could’ve implemented my heuristic also using the bitmasks directly, but the potential speed gain was small in comparison to the effort).

Other stuff that required tree counting I did with __builtin_popcountll.

Anyway, my ultra-fast sim didn’t get me further than ~1000th in gold. I tried MCTS at first, but made the mistake of implementing the tree as a hash table instead of an actual tree: it’s easier to handle transpositions and I don’t have to think hard for reusing the tree (just plug in the new fresh state and call it a day). However, hash tables are mighty slow compared to an actual pointer-based tree, so I could only do ~300 or so simulations in the early to mid-game, which is ridiculous. If I’d made MCTS use an actual tree, maybe I’d be alongside @Magus :slight_smile:

4 Likes

I’ll finished 100-120 legend using Java,
This was the most difficult contest that I participate ever…
As a human, I didn’t understand what was going on at the board, and why i won or lose. What move was bad or good…
I started the contest like all contest, with a simple Monte Carlo strategy.
Then I tried beam search, using every play, and it was not very good, so I tried to synchronize the days, and it improved a little… But by score function was very bad, and the improvements was more based at the heuristics than the search.
Than, I think that I could try to use MCTS for the end of the game, but the garbage collector was timing out a lot… So I forget about it, and try my last approach

Genetic Algorithm

I create two population, one with my moves and one with opponent moves, in an array[day][move], with a max of 5 moves per day.
Then i play all my population against the opponent population and evolve both, based of the final board score reached.
My game simulation was not very fast, but fast enough to find good individuals…
I added at the genes, a lot of restrictions to play, like almost everyone…

15 Likes

How did you eval the genes to know what to keep? Just endgame state? Or was it sim up to certain depth and then check sun score for rest of the game?

Also

3 Likes

Day < 16, score board at the last day calculated, Day>=16 endgame score… The score board is a lot of crap things, but most important things is “Sun Collected” - “Sun Wasted” (Wasted mean grow a seed 1 to 2 with 1 tree 2 at the board, you waste 1 sun point) plus some tree values… I do not fit this constants with local play, only with felling…

2 Likes