30th in Legend! Viva la neuroevolution!
As some may already have suspected or know, I used NN for the contest. It was trained using evolution strategies. It was coevolution, where in each generation, all the genomes fought against each other and the winners were favored for reproduction. Most if not all the logic and strategy, that is recycler builds, spawns and movements, was done by NN. I think I got lazy, nowadays when I see a game I’m thinking about how I can NN it so I don’t have to think about anything. Though for this game I tried to understand some of the gameplay logic myself.
Wood → Bronze
Nothing interesting, just some crappy bot with random spawns, randomly wandered prefering rather neutral and enemy cells.
Bronze and later
I’ve been playing with neuroevolution for quite some time, that is randomly perturbing neural networks weights in hope to find better one suited for the task. I had some modest success with it in other games, like sub-11k time in Search Race in my alt account, or being 3rd in clash of bots (at the time of writing). The problem with this particular game was that it is multi-agent and there are many actions sequences possible, not to mention variable map size. So for starters I broke the problem into parts:
- ask NN for each cell (my own cells, in random order) what to do (build, spawn, movement)
- the inputs was the 5x5 square vision around the cell, + my money, + opp money
- I had 3 networks: first with one output for building recycler if it outputs > 0; second with one output for spawning if it outputs > 0; and third with five outputs for movement (wait, N, S, E, W), highest score wins
- for now the above networks only made decision for spawning 1 robot, or for moving 1 robot, not multiple
I used modified referee code (i.e. removed parsing and made the actions directly) to have accurate environment. I used pgpelib for the evolution strategies. My popsize was 200, and each genome fought against each other 2 games, so around 40k games per generation. Quite big, unfortunately, the ES is kinda slow in this regard, as in, it is not sample efficient. Nevertheless, after just a handful of generations, the winrate against my crappy wood->bronze bot spiked. Then when I uploaded the somewhat evolved bot, its rank jumped from ~800th/1500 to ~200th/1500, so it wasn’t a delusion, the ES really worked. I let it run for few thousand generations and uploaded. Lo and behold, it was 20th, the rank was jumping around down and up from 15th to 60th, but the ES proved that the NN is feasible in this game. It didn’t have global situation of whats going on, the cell had vision of only 5x5, nevertheless it was beating at least semi-serious bots at the time. I think this early attempt was already enough to get into gold league.
In the meantime people were DMing me for advice and hints of recycler placements and spreading. I wish I knew how to play this game . All the decisions were learned. Until later I used more game-specific logic to, for example, restrict obviously stupid moves.
In the meantime, I was playing around with the right inputs, architecture etc. From python code, I was calling the java referee, but this was somewhat slow. I rewritten the game to C++. The biggest problem was the A* pathfinding implemented in the referee, I wanted the environment to be exactly the same as the one in the referee. Turned out it was the PriorityQueue, the java one handled ties differenetly than in C++, so I had to implemented, at least partly, the priority queue the same as the java one. But it worked and the speedup was considerable.
There are many ways to tackle this game from a perspective of NN. I tested many minor and major ideas, some quickly, some thoroughly. Different inputs, different architecture (more hidden units, different outputs, etc.). For example I was testing 7x7 vision, but it was much slower to see some progress. Maybe it would hit the higher ceiling, but I didn’t have time for that. I was considering putting global map into inputs, but that was out of question, at least for now. I was talking with fellow codingamer about the ideas (he also used the ES) and I used some of his ideas and he used some of mine.
The summary of the final net:
- recycler and spawn networks merged into one with 2 outputs
- spawn output acted more like ‘regression’, I used its output to spawn more number of units. bigger output - more spawns
- I asked each cell with units up to 4 times per cell for movement, each time with updated inputs with ‘movement’. so my bot can move up to 4 robots at a time, from 1 cell
- the 5x5 inputs contained more, like: each cell had nearest bfs distance to enemy non-recycler cell, euclidean distance to center scaled accordingly
- added global inputs like map width or earlyFinishCounter
- instead of asking random order of cells, I sort them by max of outputs of recycler/spawn net
Bot finally had partially some idea what was going on in global sense. Also, I implemented BFS to find and fill my own islands. Not only it sped up the learning generation-wise, but also time-wise, as there were less NN inferences because I didn’t ask NN what to do with for cells in my own islands. The evolution was progressing and that was reflected in the leaderboard. Bot was in top 30 for most time of gold, down to top 60 when more serious bots appeared.
Gold → Legend
I kept my last experiment run for few days and nights straight. After the legend opened, the bot was at the top gold but that was not enough to get to the legend. I added some if-elses, for example I restricted spawning and moves to soon-to-be-grass cells, because despite learning the NN would occassionally do that. It significantly improved the bot and I finally got into legend. Seeing how the evolution still progressed, I let it run until the end of contest. Patience is virtue, I was stopping the earlier experiments before reaching ~10k generations, maybe they could be better if I let them run for longer. Last run was about 25k generations, took around 5 days and nights and I still see it was improving before I finished experiment.
As for the game, it seemed simple enough. I was surprised there was no fog of war, not that I miss it, but I expected much rps and ties for the top bots. Except for alternative moves when someone places recycler, the statement was clear. There was also plenty time for experimentation. If I were in more competetive mood, I would think of more logic for the game myself, instead of relying only on NN. Restricting obviously stupid moves yielded much better results. If I used some hungarian algorithm etc. for many other edge cases I’m sure the bot would be much stronger. NN is good for approximating stuff, but not for exact outcome. I’m hyped to try this approach in other multi-agent games.