I’m probably going to finish top 30 with a neural network based bot. As with the other NN bots I’ve done it’s heavily alphazero based which means each time the NN gives an evaluation of the current position (value) and a policy value for each possible move (policy) to help the MCTS pick good moves to explore.
Network
534 inputs (528 binary 1/0 and 6 floats):
-
Binary inputs:
- 37 for blocked cells (richness 0 / trees)
- 2x (One set for each player, current player first):
- 37 x Dormant
- 37 x Tree size 0
- 37 x Tree size 1
- 37 x Tree size 2
- 37 x Tree size 3
- 37 x Sun Count (1 in the position for current sun count - if more than 37 sun then just set to 37)
- 24 x day
- 21 x nutrients
- 1 x Current Player Sleeping (thinking about this after I’m not sure why this is here as I would never evaluate a state where this was true?)
- 1 x Opponent sleeping
-
6 floating point inputs (3 per player):
- Score / 100
- Sun / 100
- Total Sun Gathered / 100
Outputs:
- 1 value (tanh) for win chance
- 38 policy outputs (softmax). 1 for each hex and 1 for wait. I tried having more policy outputs to differentiate between complete/grow/seed but found the simpler version worked better. Seeding source was chosen deterministically based on the trees available.
Two hidden dense layers of 96 nodes with relu activation.
Search:
Sequential MCTS without rollouts. Instead the NN is evaluated with the value backpropagated. Exploration constant is based on the policy rather than a fixed global value (see alphazero papers for more details).
Gamestate was only updated after both players had chosen a move, but I allowed my player moves priority on the submitted bot (eg if both players tried to seed I’d simulate it as mine seeding and the other bot failing, or give me the full nutrients and the enemy nutrients - 1) as allowing the enemy to always chose to exactly counter my move was causing problems in my search.
I did some pruning to cut down number of moves on my bot :
- No seeds on day 19->22
- At most 2 seeds
- On day 23 only have one seed target per turn(doesn’t matter where, just to increase tree count).
- No growing on days 20+ unless it can get to a size 3 (eg on day 21 we only allow grow on size 1/2).
Training:
Training was done via self play - my final submission was based on 700000 games after I restarted on Sunday. Games were run on batches of 2000 games with the last 3million states (data from a single turn) uniformly sampled for training inbetween batches. This meant the network was trained on games from about the last 15 versions of itself (so it doesn’t overfit to new stuff).
The policy targets are the visit counts from the root MCTS node and the value target is the eventual result of the game (-1 for loss, 0 for draw, 1 for win).
Failures:
I was pretty convinced I needed to do something better about simultaneous moves so spent Wednesday onwards trying to get a DUCT based version working. It was getting close on Sunday (could get into top 100) but never got to the same strength as my sequential version so I gave up and went back to trying to tweak my older version.