Legends of Code & Magic (CC05) - Feedback & Strategies

@Agade, @pb4, thanks for correcting me, my formula is indeed a bit messy.

I misnamed the metaparameters in my bot at the beginning, and I didn’t fix it for the PM, my bad:

  • What I called “gamma” in my PM is actually the “alpha” of pb4’s formula
  • As pb4 mentionned, “alpha” is the learning rate, which, in this case of a neural network, is a metaparameter of the optimizer used

So yes, I could have change the alpha=0.7 from my formula to alpha=1.0 and let keras do the rest in which case my formula becomes:

  • target_fitness = reward + next_turn_fitness

Which is Agade’s forumla with gamma=1, which is fine since the game is finite.

Thanks pb4 for clarifying things in the chat :slight_smile:

Thanks Closet. So 1 is best and 160 is worst, right ? For instance, decimate is ranked 138th ?

Wow! I was thinking that my algo to identify the order pattern was not very good, because “65” and “7” are the same card, and “15” and “17” too, so, they should have the same place. My algo was resulting like yours… Why 80 is between 65 and 7 @RobotAI ?

1 Like

Many people say they “stole” ClosetAI draft, and your method is a rigorous but quite painful way to do it.
Was there an other way ?

1 Like

Some people did it by hand. I find my method way less painful than theirs :smiley:
The only downside is that CG doesn’t keep more than 500 replays anymore per AI, so you need to wait a few hours and re-download replays in several batches.

2 Likes

At the beginning of the contest, ClosetAI error output was the card score during the draft. So it was pretty easy to steal it. There’s a way to force the referee to use specific cards during the draft. So you just need 3 games to know all the scores.

At the end of the contest there was no output anymore. But i think you could also force many games in the IDE with forced draft phase. Neumann method works too.

3 Likes

No, it says cards sorted from best to worst, so decimate (151) is 3rd best

@BrunoFelthes That should be just randomness, probably winrates are very close

2 Likes

Wait, what?! You can see other people’s error output?

1 Like

No, but the messages that appear as text next to the player.

1 Like

Oops. Sorry, my mistake. I was talking about the text message in the standard output.

1 Like

I ended in Gold, #285 globally.

I originally thought I would take the time to improve my first sim in a contest, because I liked the game but found it too cumbersome to do. The new expansion of HearthStone and their new puzzles section (really nice, I recommend) didn’t help. And summer, obviously. So this is my 1-st week code without items, pretty close to what I had coded for tests.

During tests, I felt Ward and Lethal were the best abilities despite aCat telling me it was Drain (sorry bro :wink: ). I’m sorry for the team aCat and radekmie because they wanted to do a hardcore game and we (at CodinGame) really pushed them not to, sometimes even imposing our requests. I’m glad it ended up well in the end.

Serious kudos to @_CG_Jack and @Eury for the improvements of assets, on their free time (they’re not supposed to work on contests here, only Julien and I do). They basically colorized the board and the assets, 1 by 1.

About the game, I think we could have:

  • removed BreakThrough
  • made Drain apply every time (not just attack)
  • merged red and green items
  • made blue items clearer/different
  • balanced some more cards
  • add board clears

but everything took already a lot of time (tests, statements, debates over the original balance of cards…) so it was difficult.

6 Likes

LoCaM 2 now!

2 Likes

Face is the place! :grin:

1 Like

LoCaM - Naxxramas :o

2 Likes

#7 in legend in the marathon, but I could have ended up anywhere between #4 and ~#20. In the sprint, I couldn’t get CG to show me even a single replay after the first 30 or so minutes, which made it impossible to get a good rank.

Evaluation

I didn’t evaluate the board, I just evaluated the cards individually (see Search). Creatures were evaluated by the following formula:

double EHP = (hp+(hasW?min(4.0,_turn*0.8):0));
double EDMG = ((hasL && dmg>0)?max(min(4.5, 0.5+0.8*_turn), dmg):dmg) * (hasD?1.1:1);
eval = 0.1*EHP + 0.4*EDMG + 0.5*sqrt(EHP*EDMG);
if (hasG) eval *= 1.1;
if (eval > 4) eval = pow(eval, 0.8) + 1;

Green items were evaluated as the difference between the evaluation of a 3/3 creature and a 3/3 that was buffed by the item. Red/blue items had yet another formula. Cards in my hand had their evaluation multiplied by 0.3 (hand needs to be evaluated as well, to prevent wasting items / strong creatures when something less valuable does the job equally well)

Search

My search algorithm consisted of 3 phases:

First, for every combination of my cards that can be played in one turn (both in the hand and on the table), find out which creatures they kill, and remember this as a “threat”. Another threats that I considered are summoning a creature, removing G/L/W and hitting face. For each threat, I calculated how much damage it does to the face, whether the guards need to be removed before it’s legal, its manacost, and the change of evaluation of every participating card - for example, if my creature with eval=4 dies during the trade, its change of evaluation is -4. Then, I do the same with the combinations of my opponent’s cards, assuming that his hand contains some red items that he really doesn’t want to use (their eval is high). If there are too many threats, I select only the most threatening ones.

Next, I construct two graphs (one for me, one for my opponent) whose vertices are these threats, and there is an edge between two of them if they can be executed in the same turn (they don’t kill the same creature and they don’t use any of the cards twice). I find all (or at least most of) the cliques in these graphs such that their total mana cost doesn’t exceed my mana- cliques in my graph represent my possible turns, cliques in the opponent’s graph are his responses. Problems with guards are solved by sorting the threats by whether guards need to be removed before their execution, and then by the sum of evaluation changes. Each clique also remembers how many guards it removes from each side, what creatures die/items are used, evaluation changes in all the threats it’s composed of, which unsummoned creatures it contains, and its total face damage.

The last part of the search algorithm is simply minimax using these cliques - using the data that I remember about them, it is easy to find which of the combinations of cliques represent legal turns and which are not. I normally search to depth 3 (my turn, opponent’s turn, and my turn again), because I want to see which of the creatures that I can summon threaten my opponent the most. My worst possible hp at depth 2 and the opponent’s hp at depth 1 and 3 also enters the evaluation of my possible turns. The function that I’m trying to minimax is roughly: '1*(sum of eval changes in my first clique) - 0.95*(sum of eval changes in the opponent's clique) + 0.7*(sum of eval changes in my second clique) (it gets a bit more complicated when one of the creatures is in more than 1 clique, but for simplicity, I’ll leave it at that).

Why it doesn’t work

I got quite a bit of speed and pruning form my search algorithm, but the best possible turn might consist of hitting something but not killing it, or buffing a creature so that it doesn’t get killed next turn… my algo doesn’t take the limit of 6 creatures on the table into consideration, it doesn’t use green items to buff creatures that will hit face, and also it has a lot of bugs because of how complicated it is. Making a local arena (brutaltester is way too slow for my laptop) would help as well but for some reason I relied on intuition and the very noisy info that I got from submits.

Draft

I started with ClosetAI’s silver/gold draft order to be able to have true mirror matches (the same draft) with the rest of the bots at the top. My final bot is a bit different - each of the cards has a number associated with it, which tells the bot how much it should want to draft it (and if you sort the cards by these numbers, you do not get Closet’s draft - but it’s still close to it, because the cards are not balanced), but there are also bonuses/maluses for having too many/few early/late creatures, removals, card draw or green items.

Feedback

I think I enjoyed both of these new formats less than the standard 10 days. However, the game itself was good, and despite the huge amounts of luck involved, at least the top 3 were clear and deserved.

9 Likes

we know :imp:

I mean a lot of people, me including, were evaluating your draft phase

edit: not proud of it. but just couldn’t find an own strategy that was better… If you can’t beat them…

Thank you for this game. I ended #1049, but with many rewards.

First, I received the ‘#1 in a language’ badge after the rush.

Then, being waiting for a train during the rush, and on holidays during the marathon, I had no choice but to code everything on a 5’ smartphone screen. Believe it or not, I went to Bronze this way. I could fix two bugs from a PC near the end of the contest, but I could code everything on a phone. A HUGE kiss to the Codingame team, who developed the on-line IDE!

Coding without cut-paste, without unit testing, and with a ‘keyboard’ that barely handles 2 strokes per second is very unnatural. I’ve learnt A LOT, trying to give short yet unique and explicit names, ordering functions so that I won’t have to move them (i.e., copy the function character by character, then enclosing it with braces, then double tap on one of them, then tap backspace…)

A final word on my sim. Most of the times, I could run a full bruteforce search. I computed combos of cards, and then I searched for all combinations of combos-to-target.
All combos are limited to my mana.
A combo starts with:

  • A creature from my board
  • A creature from my hand, if it has Charge
  • A creature from my hand, without Charge, but combined with a green card giving Charge

There are actually few such combinations.

Then I mix each combo seed with 0 or more green cards, red cards or blue cards with damages.
Each combo has some characteristics that can combined, and will NOT depend on the order the cards are played: damages, added/removed features, effects on players, mana cost, use of extra spaces on board… The can be computed before trying combining comboes.
One of these features is a bitmask, telling which cards are in the combo. There are 6+6+8 cards in game, so 20 bits are enough. It is then easy to check if two combos are compatible, and easy to compute the effects of combining two combos.

I also generate “bonus combos” for all possible combinations of blue cards, creature cards from hand and green cards.

Then from a situation, I start picking one possible target (including the opponent’s hero), and apply 0 or more creature-based combos. This results in a number of new situations. Using the bitmask (and mana cost and extra-space cost), I delete from these situations all combos that can no longer be used. And do that again with the next target.
When they are no more targets, I start combining the bonus combos with the situations.
I just have to evaluate the final situations and keep the best.

Coding that with a functional language, using only immutable objects, was a real pleasure.

I had no time to tune the evaluation, nor the drafting phase (I used a mana curve), so my final rank is not good, but I had a lot of fun. Please, do that again!

5 Likes

44th Legend

Feedbacks

Really appreaciated the game, it was fun to play, fun to view. Congrats to the game creators!
I’m a big fan of the new format for several reasons:

  1. The sprint is an ideal way to enter the game and to decide if you want to continue or not on the marathon.

  2. As a regular Hub organizer, the sprint is certainly the spot to organize it and gather some people together on the entire event. I’d love to try it on next contest

  3. 4 weeks match betters with the available time I have. I was able to develop the game engine, mostly implement all what I had in mind, and even write some tooling.

I need a little refactoring to publish it, but for example, I have a small local leaderboard that should now be compatible with the official referees in java, and that uses trueskill the same way as we are ranked. Interesting when you want to test several versions of your bot.

There was certainly too many draw games on the leaderboard. I tried on my local leaderboard to change that and considered as a small victory if in case of draw I had more life than when my opponent wins. Not sure it was ideal, but was making the leaderboard converge much faster.

##Draft

I started with a simple scoring of the cards based on the characteristics. I quickly identified that would not be the best, and saw an opportunity (as other players it looks like) to download the replays of top players and identify what was working or not.
I ended up distinguishing when playing in first or second, each card being rated as the mean proportion of selection by reCurse and ClosetAI on their last 500 battles on the moment I ran the tool and only considering when they won.

##Battle

I separated in two the battle, summon and use items on one hand, attacks on the other. For the summon and use items, I was simply trying all the possible combinations. Then I used a simple euristic for myself and the opponent player in order to evaluate the board and retain only this state as a starting point for the attacks.

The euristic was symmetric for each player, trying two behavior and retaining the best for the player (like a 2 ply minimax on a binary tree)
The first behavior was trying to remove the guards cards if they exist using the most interesting card (less rate that kills the card if none less rate card), and then only strike the opponent player
Second behavior was trying to make the best trades of the cards, trying to kill the most valued card with the less valued one.

I had in mind to use a genetic algorithm at the beginning, and consider as gene the different actions of a single turn. But practically I only implemented a random generator, without mutation or merging. As I was doing around 6k simulations per turn, it was enough to cover most of the possible combinations for the attacks. I tried to include the summon and uses, but certainly my number of simulations wasn’t enough and it was performing poorly.

##Evaluation function

A board is always evaluated after the opponent player has played using the euristics described above.

I tried to translate what the abilities of the cards was meaning in term of life/attack/defense to score each card.
It means that for example a card with lethal, its attack is the max defense of the opponent board
Guard is translated as the defense being additional life for the player
Drain is also translated as additional life for the player
Ward is translated to additional defense being the minimun of attack on the opponent board

If I needed to consider floor values for min attack and max defense of the opponent, it was allowing the bot to rate a card dynamically in function of the situation : a lethal has more value when there is high defense in the opponent board, a ward has a high value when there is only high attacks.

Additionaly to those scores for each card, I add the life of the player, and the number of cards on the board.

I did not integrated the runes and card draw mechanism, which reading the other players post mortem is certainly a mistake.

To balance attack/defense/life, I ran several versions of my bot locally and as other players found out that attack is more important than defense that is more important than life.

At each begining of the turn, I fine tune the parameters in function of the situation (I’m leading, I might be killed in 2 turns etc) to change in particular the life impact. I’m not sure it gave that big advantage in the end.

##Performances

I used a single instance of the game for each turn, doing super-fast modifications to the state when applying an action, and being able to revert it as fast. It took me time in particular to implement the reverse functions of each actions, but I think it was worth it because it made exploring the game tree almost free.
This is the part well secured by unit tests, in particular I extracted several situations, and was testing I was able to get the same results as the official game engine, and that I was able to revert back to the initial state.
I added some checks in my bot that was verifying I had the exact same state after all the apply and reverts generated by the simulations than before it.
Fun fact : when I was writing those tests, I get a crash (I thrown an exception) because my opponent was using too much mana with the actions that I translated in code from one game turn.
I thought I identified a bug and asked on the codingame channel why in this particular turn my opponent was able to attack… I did not read carefully the rules, and so I simply did not noticed that I did not require to use mana to make a creature attack when it was already on board :smile: It explained why I was at the queue of the bronze league and certainly make the people present in the chat laugh!

7 Likes

You were actually not alone to understand that. The rules were maybe not so clear.

To me it was useless up to Gold league (I didn’t access Legend). Among all games I triggered manually, I only saw one finishing with empty decks…

Sorry for my late feedback, literally got off my plane just a few hours ago from a 2 weeks vacation outside the country. I am taking this opportunity to finally do a write-up, so please excuse the following incoherent rambling on the intense jet lag. :smiley:

Finished 2nd place, which I’m rather satisfied with considering I wasn’t even intending to participate seriously at first. :grin: But I ended up doing it anyway for just under a couple of weeks, missing out on the last 3 days due to vacations. Such is the danger of starting something even if just for streaming. Kudos to @RobotAI and @_Royale for a tough run at the end.

I want to say thanks to @aCat, @radekmie and the CG staff for what must have been a lot of work putting this contest together. And thanks to everyone who watched my streams, it’s been an experience way above my expectations on all fronts (almost 1.5k views on Youtube so far, that’s insane!).

For the record, my final bot has very little code left from the stream, maybe just some of the simulation part survived.

Performance and self-play

Shortly after I started playing “for real”, it became quickly obvious to me there was very little point spending time dissecting the games, other than fixing a few blatant bugs here and there. The sheer amount of random factors involved and differing in each game, as well as the very limited options available as a player (more on that later), meant that finding a viable strategy would require a lot of “bruteforcing” and tweaking through trial and error on win rate statistics. And for the same reason, a lot of games (and I do mean a LOT) would need to be played out to appropriately sample the game space and see if any improvement happened.

Because of these reasons, batching games against live opponents seemed like a ridiculous waste of time (3 games per minute on that state space?!). Therefore I hoped local self-play would be enough to robustly test improvements, and sporadically validate against real opponents once in a while. It worked out really well in practice, which is surprising because overfitting against yourself is almost always an issue that is quite difficult to overcome, but not in this game apparently. I am assuming the randomness of the game helps a lot fighting against it, maybe for the same reason TD-Gammon worked so well while not having the same success in other non-random games. But I digress.

All this to say I spent some time optimizing the hell out of my engine and my bot to get as much performance out of my high-end desktop CPU, and therefore test a LOT of tweaks. I am talking about self-playing at a rate of around 1000 games per second here.

Search

As I mentioned during my stream, it was easy to upgrade my bot to also simulate the opponent, which is precisely what I did at first. So the Monte Carlo search was going to “depth 2” to simulate the opponent moves, restricted to attacking with his creatures on board. In order to do that, each best move found on my side was then tested against 200 random moves of the opponent for scoring in a mini-max fashion.

However, because performance was such an issue for local play (as explained above), I then cut out this search very early on. I replaced it with some sort of depth 2 exhaustive graph search through efficient pruning that has already been explained very well in previous posts, so I will leave it at that. My arena bot still officially used Monte Carlo until near the end, as exhaustive search was, after all, barely better, just much faster. I also voluntarily slowed down my bot to hide that fact. This contest had so little room to distinguish yourself that giving away as little information as possible was mandatory. But more on that later.

What I used all this self-play for (spoiler: drafting)

Being able to churn out all those games made it easy to find the most balanced weights for my evaluation, but since the game has so little room for playing strategy (more on that later), I quickly hit a plateau just tweaking those evaluation weights (player hp, all card attributes, runes, cards drawn, etc.). And as was first quickly discovered by @RobotAI, almost all of the game actually revolves around just having a good draft.

So you can see where this is going. To figure out the value of each card, I whipped out some sort of what I will presumptously call “reinforcement learning”, even though it’s all hand-made and with little academic background. Long story short, my bot was playing the same game against itself twice, except one side (the “player”) would be forced to have one card of its draft randomly chosen, while the “opponent” would always draft to what it currently believes to be optimal.

So let’s say if the “random” turn of the draft had cards 1 2 3, first game the “player” would be forced to choose 1, second game it would be forced to choose 2. If the game result for the “player” is better on one of these games, that means the random card was probably better than the other. So if the first game is a win and the second one a loss, therefore card 1 wins over card 2 and you adjust the card value accordingly. So given enough games, eventually the card value will be pretty close to what it should be, and draft accordingly.

I spent almost all of my time iterating and iterating further on this design. Despite all the focus on performance, progression was quite slow to make, but at least my computer was doing most of the work, not me. :slight_smile: A good training session would require several hours, I estimated around 20-25 million games were needed to get a stable draft. I then ended up adding stuff like player-specific card values, the “opponent” would sometimes pick from a selection of best “drafts” encountered to fight overfitting, etc. And of course, there was a lot of “feedback loops”, where a better draft meant finding a better evaluation was possible, and a better evaluation made for finding a better draft, etc. I guess you could call that some hill climbing meta-algorithm. It’s very possible by doing so I hit a local maxima by being stuck in a suboptimal strategy (e.g. I heard some call my bot very aggressive), but since I don’t think there was much room for variety in this game to begin with, I think it didn’t matter that much. I mostly needed those last missing few days to polish things more after I found a last minute bug in the training algorithm… :frowning:

Oh and as a side note, I actually failed to get any improvements with a dynamic draft, but was pleased when others thought I did. I actually just randomly picked among my best static drafts as a form of copy protection. Having all your hard work taken so easily (and enthusiastically, to my great disappointment) by the competition really sucks. On that note I am glad @RobotAI ended up winning and not someone who took his draft. Though to be fair, the core of the problem lies probably more with the game itself.

That’s about all I can think of interesting to say about my bot for now… The rest is just boring details I think.

Game feedback

I mean no offense or disrespect to the hard work put forward by the authors, but I don’t think the game was good. It has a lot of complexity but very little depth for either human or AI players. It is very obvious when you play Hearthstone or MtG or other such games, that the complexity of playing well, and the depth and variety of their possible strategies, requires a lot more mechanics to be able to interact together.

Here it was basically green creature deck vs green creature deck every game, you focus on summons and trading, and hope the RNG works in your favor. Most (if not all?) possibilities of making interesting decisions were missing from the game, including very basic stuff like mulligans (or card redraw), player balance (e.g. mana token) or spells, which would have helped a lot making this game more competitive. Most of the difference was instead in trying to draft a bit better, and not as much according to a certain logic or synergy, but as to what statistically turns out to be better regardless of the context.

Also, those who said balance wasn’t important because plays are symmetrical are missing the point, it is actually a very big problem. If your bot needs to be twice as good as the opponent in order to win the series reliably, it introduces a very high amount of noise in the rankings, which, coupled with low game counts especially for this type of game, makes most of the ranking very random.

And such has been the problem of many of the community contests so far. When you take inspiration from existing successful game genres, they work because they had a lot of time exploring and balancing their game, introducing just the right amount of mechanics to give a good array of meaningful choices to the players. It is simply not the case here. What little mechanics there were, ended up either useless (drain) or overpowered (lethal/ward), and always straightforward. In fact, playing out a game was almost deterministic in the decision making, which means depth was close to inexistant. It really was all about figuring out the flaws of the cards in a out-of-game fashion, which made for a rather boring game in practice. I don’t expect any human players would play it for long.

It is simply not possible to balance out so much complexity with the necessary depth in the context of a CG game, at least not without extensive prior play. I have repeated this multiple times in the past so I will spare the details. I would have actually 100% preferred a carbon copy of TES:Legends or Hearthstone instead, as the necessary tools to handle the imbalances and certain situations would have been at least available. And I suspect many design decisions were taken to cut mechanics to make it seemingly more accessible, but in practice it was misguided, instead just killing a lot of the necessary depth and tools for experts that beginners could have just simply easily ignored with no harm.

I am sorry if this is harsh criticism, but I hope my intent to be constructive comes across as such. I am still hoping for either a return to more classical CG games, or fully embracing the depth and complexity of a game genre, but not the in-betweens I’ve seen so far.

And on a final note, I think the whole draft copying issue was disappointing and anti-competitive, but nonetheless is the most perfect example I could hope for to further underline my point about the necessity of hiding in order to be competitive, in spite of displeasing some notions of ‘fairness’.

I have written way too much already and am getting sleepy, but feel free to poke me for further discussion. :slight_smile:

25 Likes