Sorry for my late feedback, literally got off my plane just a few hours ago from a 2 weeks vacation outside the country. I am taking this opportunity to finally do a write-up, so please excuse the following incoherent rambling on the intense jet lag.
Finished 2nd place, which I’m rather satisfied with considering I wasn’t even intending to participate seriously at first. But I ended up doing it anyway for just under a couple of weeks, missing out on the last 3 days due to vacations. Such is the danger of starting something even if just for streaming. Kudos to @RobotAI and @_Royale for a tough run at the end.
I want to say thanks to @aCat, @radekmie and the CG staff for what must have been a lot of work putting this contest together. And thanks to everyone who watched my streams, it’s been an experience way above my expectations on all fronts (almost 1.5k views on Youtube so far, that’s insane!).
For the record, my final bot has very little code left from the stream, maybe just some of the simulation part survived.
Performance and self-play
Shortly after I started playing “for real”, it became quickly obvious to me there was very little point spending time dissecting the games, other than fixing a few blatant bugs here and there. The sheer amount of random factors involved and differing in each game, as well as the very limited options available as a player (more on that later), meant that finding a viable strategy would require a lot of “bruteforcing” and tweaking through trial and error on win rate statistics. And for the same reason, a lot of games (and I do mean a LOT) would need to be played out to appropriately sample the game space and see if any improvement happened.
Because of these reasons, batching games against live opponents seemed like a ridiculous waste of time (3 games per minute on that state space?!). Therefore I hoped local self-play would be enough to robustly test improvements, and sporadically validate against real opponents once in a while. It worked out really well in practice, which is surprising because overfitting against yourself is almost always an issue that is quite difficult to overcome, but not in this game apparently. I am assuming the randomness of the game helps a lot fighting against it, maybe for the same reason TD-Gammon worked so well while not having the same success in other non-random games. But I digress.
All this to say I spent some time optimizing the hell out of my engine and my bot to get as much performance out of my high-end desktop CPU, and therefore test a LOT of tweaks. I am talking about self-playing at a rate of around 1000 games per second here.
As I mentioned during my stream, it was easy to upgrade my bot to also simulate the opponent, which is precisely what I did at first. So the Monte Carlo search was going to “depth 2” to simulate the opponent moves, restricted to attacking with his creatures on board. In order to do that, each best move found on my side was then tested against 200 random moves of the opponent for scoring in a mini-max fashion.
However, because performance was such an issue for local play (as explained above), I then cut out this search very early on. I replaced it with some sort of depth 2 exhaustive graph search through efficient pruning that has already been explained very well in previous posts, so I will leave it at that. My arena bot still officially used Monte Carlo until near the end, as exhaustive search was, after all, barely better, just much faster. I also voluntarily slowed down my bot to hide that fact. This contest had so little room to distinguish yourself that giving away as little information as possible was mandatory. But more on that later.
What I used all this self-play for (spoiler: drafting)
Being able to churn out all those games made it easy to find the most balanced weights for my evaluation, but since the game has so little room for playing strategy (more on that later), I quickly hit a plateau just tweaking those evaluation weights (player hp, all card attributes, runes, cards drawn, etc.). And as was first quickly discovered by @RobotAI, almost all of the game actually revolves around just having a good draft.
So you can see where this is going. To figure out the value of each card, I whipped out some sort of what I will presumptously call “reinforcement learning”, even though it’s all hand-made and with little academic background. Long story short, my bot was playing the same game against itself twice, except one side (the “player”) would be forced to have one card of its draft randomly chosen, while the “opponent” would always draft to what it currently believes to be optimal.
So let’s say if the “random” turn of the draft had cards 1 2 3, first game the “player” would be forced to choose 1, second game it would be forced to choose 2. If the game result for the “player” is better on one of these games, that means the random card was probably better than the other. So if the first game is a win and the second one a loss, therefore card 1 wins over card 2 and you adjust the card value accordingly. So given enough games, eventually the card value will be pretty close to what it should be, and draft accordingly.
I spent almost all of my time iterating and iterating further on this design. Despite all the focus on performance, progression was quite slow to make, but at least my computer was doing most of the work, not me. A good training session would require several hours, I estimated around 20-25 million games were needed to get a stable draft. I then ended up adding stuff like player-specific card values, the “opponent” would sometimes pick from a selection of best “drafts” encountered to fight overfitting, etc. And of course, there was a lot of “feedback loops”, where a better draft meant finding a better evaluation was possible, and a better evaluation made for finding a better draft, etc. I guess you could call that some hill climbing meta-algorithm. It’s very possible by doing so I hit a local maxima by being stuck in a suboptimal strategy (e.g. I heard some call my bot very aggressive), but since I don’t think there was much room for variety in this game to begin with, I think it didn’t matter that much. I mostly needed those last missing few days to polish things more after I found a last minute bug in the training algorithm…
Oh and as a side note, I actually failed to get any improvements with a dynamic draft, but was pleased when others thought I did. I actually just randomly picked among my best static drafts as a form of copy protection. Having all your hard work taken so easily (and enthusiastically, to my great disappointment) by the competition really sucks. On that note I am glad @RobotAI ended up winning and not someone who took his draft. Though to be fair, the core of the problem lies probably more with the game itself.
That’s about all I can think of interesting to say about my bot for now… The rest is just boring details I think.
I mean no offense or disrespect to the hard work put forward by the authors, but I don’t think the game was good. It has a lot of complexity but very little depth for either human or AI players. It is very obvious when you play Hearthstone or MtG or other such games, that the complexity of playing well, and the depth and variety of their possible strategies, requires a lot more mechanics to be able to interact together.
Here it was basically green creature deck vs green creature deck every game, you focus on summons and trading, and hope the RNG works in your favor. Most (if not all?) possibilities of making interesting decisions were missing from the game, including very basic stuff like mulligans (or card redraw), player balance (e.g. mana token) or spells, which would have helped a lot making this game more competitive. Most of the difference was instead in trying to draft a bit better, and not as much according to a certain logic or synergy, but as to what statistically turns out to be better regardless of the context.
Also, those who said balance wasn’t important because plays are symmetrical are missing the point, it is actually a very big problem. If your bot needs to be twice as good as the opponent in order to win the series reliably, it introduces a very high amount of noise in the rankings, which, coupled with low game counts especially for this type of game, makes most of the ranking very random.
And such has been the problem of many of the community contests so far. When you take inspiration from existing successful game genres, they work because they had a lot of time exploring and balancing their game, introducing just the right amount of mechanics to give a good array of meaningful choices to the players. It is simply not the case here. What little mechanics there were, ended up either useless (drain) or overpowered (lethal/ward), and always straightforward. In fact, playing out a game was almost deterministic in the decision making, which means depth was close to inexistant. It really was all about figuring out the flaws of the cards in a out-of-game fashion, which made for a rather boring game in practice. I don’t expect any human players would play it for long.
It is simply not possible to balance out so much complexity with the necessary depth in the context of a CG game, at least not without extensive prior play. I have repeated this multiple times in the past so I will spare the details. I would have actually 100% preferred a carbon copy of TES:Legends or Hearthstone instead, as the necessary tools to handle the imbalances and certain situations would have been at least available. And I suspect many design decisions were taken to cut mechanics to make it seemingly more accessible, but in practice it was misguided, instead just killing a lot of the necessary depth and tools for experts that beginners could have just simply easily ignored with no harm.
I am sorry if this is harsh criticism, but I hope my intent to be constructive comes across as such. I am still hoping for either a return to more classical CG games, or fully embracing the depth and complexity of a game genre, but not the in-betweens I’ve seen so far.
And on a final note, I think the whole draft copying issue was disappointing and anti-competitive, but nonetheless is the most perfect example I could hope for to further underline my point about the necessity of hiding in order to be competitive, in spite of displeasing some notions of ‘fairness’.
I have written way too much already and am getting sleepy, but feel free to poke me for further discussion.