#73 in Legend, #2 in C, which is nice.
Big thank you to aCat, radekmie and the CG team for this awesome contest that got me back to CG.
Feedback
I really loved this game, a couple points that stood out for me were:
- The one month format. I wouldn’t have participated on a 1-week contest
- It was the first contest I participated in where actions are not played simultaneously for both players, which was very refreshing.
- The rules were extremely simple to implement.
- Language performances didn’t matter much.
Most of it has already be said, but here is what I would like to see in the MP game:
- Provide the list of cards that have been played by the opponent.
- Some form of bonus for player 2.
- A more visible effect in the UI far which cards can/cannot attack & which player’s turn it is.
- More cards!
Intro
I’ve never played with Neural Networks before this contest, but it’s something I’ve wanted to do for a long time. This contest seemed like a good opportinity since the number of inputs and ouputs is relatively small, so i went for it.
I used the keras module to train my models locally, and Q-learning to define the output to fit my models on.
I went to legend with a Python2 bot, and switched to C during the last week for better perfomances (which ended up changing nothing to my ranking despite going from 5 to 5000 simulations per turn…).
Draft phase
I used a first NN to evaluate the fitness of a card in a given deck of card, so I would get three fitnesses at every turn, and pick the card that maximizes my fitness.
Here is the network topology:
“Picked Stats” are the average stats of the card I picked previously, and “Seen stats” are the average stats of the cards I have been presented to during the draw phase so far.
To train the model, I ran games locally, and gave a reward of:
- +10 for cards picked by the winning player
- -10 for cards picked by the losing player
- -10 for cards not picked by the winning player
- +10 for cards not picked by the losing player
After ~100 batches of 500 games, I converged to a draft fairly similar to the rest of the legend players.
Duel phase
I used a second NN as my eval function, and like most here I ran a depth 2 minimax with randomly generated actions for both players.
Here is the second network (I hope you have good eyes):
I reduced the number of weights by cutting the state of the game into a large number of small inputs, and using the same layer weights for all inputs of the same type.
So for instance, for creatures on my board I average of the output of the same fully connected layer applied to the 6 slots of my board (with zeros for empty board slots).
To train the model, I used the following formulas:
- Reward = 10 if the player has just won, -10 if he has just lost, else 0
- target_fitness = reward + gamma*next_turn_fitness + (1-gamma)*current_fitness
- gamma = 0.7
- probability to make a random action (epsilon) = 100% during the first batch, slowly decreasing towards 5%
- learning rate high at the beginning, slowly decreasing
Here is a breakdown of learning process:
During the first batch, predicted fitnesses are random, and the model only learns what a winning and losing turn look like.
After 10 batches, the bot is still unable to figure out what a good state is at the beginning of the game, but the reward is slowly back-propagating.
After 100 batches, the bots start to have a good idea of how good their situation is at any point of the game.
Overall the predicted fitness is not that good, given my final ranking. I’ve had a really hard time figuring out all the meta-parameters in keras (and I don’t think I’ve got them right yet), I suffered a from poor understanding of NN and Qlearing during the first few weeks of the contest, and I think it’s not too good to have a bot only play itself.
However, it was a lot of fun, and I’m glad I’ve learned new things!
Oh and if you’re wondering, my bot weighs 97.1kb, not even close to the 100kb limit