A Code of Ice & Fire - Feedbacks and Strategies

k4ng0u · May 28, 2019, 9:50pm

k4ng0u - 25th in Legend (Python 3).

This was my second contest, the first one being CalM. I really enjoyed the fact that the rules were simple and thanks to this it was easy to understand the different strategies and their flaws.

The starter kit was really (too?) well done as it split the phases of a turn (read inputs and update game state,train, move,output result) and had basic implementations for each phase. Even in my final code I more or less kept the same phases.

To get started I made my units move to opponent HQ instead of the map center and this instantly sent me to wood 3.

A flood fill algorithm to train and move units pushed me to Bronze by Friday evening. At this point I went on working on my bot even though I think that a good floodfill algorithm might reach silver league without towers/level 2-3 units.

My next “improvement” was to build level2-3 units to destroy ennemy units whenever they are close to my territory. This didn’t work too well as it depleted my gold and reduced my income so drastically that I wasn’t able to do anything as long as the costly units were standing. This made me realize the (over)power of towers (cheap and no upkeep). So I decided to put towers on all cells with a ditance of 2 from my ennemy border. With this, my bot was a solid top 50 and reached Silver league.

After that my bot slowly dropped in the ranking. This is when I noticed that I was losing a few games I was totally dominating in terms of territory. This was due to what I call the “finisher” move (and the fact that my bot didn’t know how to handle area protected by turrets). Some bots were saving gold to spend it all in one turn and finish the game without caring about upkeeps. While stalling the game isn’t always good, being able to know how much gold is needed to win in one turn is quite interesting. I used a djisktra from my hq taking into account the cost to access each cell: 0 for my cells, 10 for an empty cell, 20 for a cell with an ennemy unit lv1, 30 for a cell with an ennemy unit lv2/3 or protected by a tower. I had a protection dict to indicate for each cell how many towers were protecting it. During the dijsktra, each time a cell was visited, if it was a tower, I updated the protection dict associated to the path so that the cost of the next moves takes the tower destruction into account. The path computation takes an underwhelming 0.8ms. I wonder if my algorithm is particularly bad of if it’s just Python being very slow. Anyway thanks to the finisher move, I was now winning most of the games where stalling was involved.

On gold league opening, my game phases were as follows and it never changed afterwards:
_Move units (floodfill, try to avoid blocking other units)
_Play Finisher (if we can finish the game, finish it)
_Check if game can be finished next turn (based on opp hq distance to mine, my gold, and my income). If it can, skip the next phases (save money!)
_Play cuts (my initial basic implementation was just to train lvl3 units on top of opponent lvl3 units)
_Build towers
_Train units
_Build mines (build mines closest to my hq with the gold left if my income is less than the opponent income)

From Wednsday to Friday my ranking dropped between 70 and 110. I had 3 points that were really not going well:
1)cuts => many opponents were cutting my territory but I didn’t cut theirs.
2)floodfill => my oppponents were consistently spreading further than me.
3)tower building => Building towers on cells at distance 2 from ennemy border was wasting way too much money.

I wasn’t much motivated because I had no clue on how to improve point 2 and 3, and for point 1 given the computation time my diksjtra took, I wasn’t sure I could get enough depth to have meaningful cuts.

I tried to to study the top player games. But it was nearly impossible to understand why they did a specific action (the moves computed by simulations are not always deterministic and even if they are it’s not always easy to understand that a move is good because it will be interesting 3 turns later). This is when the legend league opened and the Broken Boss saved me. His moves were quite easy to predict (or at least it gave me good information on what type of gameplay I should use):
_move/train: always go to the tile that is nearest to opponent hq, in case of multiple tiles at equal distance, alternate priority to right and down (or left and up when we are blue). This would guarantee a fast expansion to the center and spread your territory at the same time.
_tower building: only build tower to protect cells that are in danger on the border. A cell is in danger if it’s not protected and if it’s empty or if the adjacent opponent unit can freely move onto your cell. If opponent gold+income>=30 all cells on the border are in danger.

With this I thought that maybe if I manage to cut plus strenghten my weaknesses with the boss strategy I could climb up a few ranks.
I started with the move/train expansion improvement that was relatively easy to implement. It was already 20-30 ranks gained.
Then I moved on the cut which gave me a hard time. I don’t know any algorithm to efficiently cut graphs. So what I did is to bruteforce all possible paths starting from ennemy cells that are adjacent to my territory with a depth of 4 (This was pruning a lot of interesting moves starting from neutral cells but it is all I could afford without timing out). Then I would compute the ennemy score and my score (based on constants, number of tiles owned, number of units, income), how much the cut cost me and the upkeep needed for the chain. With a mix of all this I tried to determine if it was worth cutting. The advantage of computing a score based on territory of both the opponent and me is that the cut algorithm also naturally handles reconnection to big blocks of inactive cells. I tried to adjust the constants a bit. But due to the weird mixing of information (income, gold, nb of units, unit levels), my cuts are still sometimes over aggressive or very defensive (particularly afraid of using lvl3 or cut an opponent lvl3 since it’s a 20gold income shift). But it still made me raise into the top 5 of the gold league.

My last change was turret management. So far I was filling the map with towers once I reached ennemy territory. I changed it into the Broken Boss strategy. It helped me a lot during the expansion phase and was a game changer. I was now only building necessary towers to protect the front line and using the rest of the money to expand my territory in early game. That and a few bug fixes made me pass to legend!

Many thanks to Azkellas and Tehelka, it was really an entertaining contest! I found it particularly pleasant that even without using heavy computations we could make a relevant bot. I would try to learn C++ for the next contest though as it’s a little bit frustrating to restrain from some possibilities because the language is too slow (and I’d like to check if my code is slow because of the language or the one who uses it :p). It would be quite interesting to know how the other Python programmers handled the cuts and path computation/scoring. For me it was definitely the bottleneck of my computation time. It would be nice also to have a story of the game conception, how it evolved and how Azkellas and Tehelka came out with such a strong Broken Boss!

PED · May 28, 2019, 11:10pm

Finished 27th, first in C, after luckily being pushed to legend by some kind stranger the night before the end of the contest.

So tl;dr : I built a system to gather replays from the CodinGame API, and trained a NN to replicate the actions of the top players of the leaderboard. It was a very fun experience.

If you want some (undocumented) Python sources to gather replays, you can find some scripts here.

General statistics about the contest

Since I have replays locally available, I thought I could share some stats about the game.

I Gathered 71k replays from players ranked 1 to 200 after the contest ended. On this player base, the winrate for the red player is 47.97%. The average number of turns per game is 17.5, with a standard deviation of 8.5 turns.

“Metagame” as a function of players rank

These chart have the rank of players as the x axis, meaning that on the left are the “strongest” bots, and on the right the “weakest”.

Less lvl1 units and more lvl3 units in the top. Which can be linked to the increasing number of towers in the top of the leaderboard I suppose.

In average 1 mine built every 5 games.

Lots of bugs

Top 1-25 vs Top 26-200 : Turn by turn evolution of a game

Apologies if these are a bit hard to read. They compare the evolution of several metrics turn by turn.

There doesn’t seem to be a huge difference in average number of active tiles per player before the 7th turn. The top bots are probably better at choosing in which direction to expand however.

Less units and commands/turn for high ranking bots. Probably a side effect of using more towers and piling up gold for a lethal.

If you’re interested in other statistics about the game don’t hesitate to ask.

The Bot

So I wanted to try to make a bot that replicates other bots for quite some time, and since the contest looked adapted for convolutional NNs I thought I’d give it a shot (even though in the end I didn’t have time for a CNN, and simply used a few dense layers).

General idea

I wanted to keep the logic of the bot as simple as possible to focus on the learning side. In pseudo-code, here is what it does:

while number_of_avalable_actions > 0:
    a = find_action_with_best_fitess()
    apply_action(a)

So no cutting, no chaining actions, no simulation of the opponent turn, etc. All is done through an eval function, which evaluates the fitness of a (state, action) pair.

Gathering replays

I used a python script (available here) to scroll over CodinGames API, load replays of the top5 of the current leaderboard, keep the important part of the frames and store it locally as jsons. A cronjob called this script every hour.

Building a dataset

It’s easiest to train the model in Python, but ideally you want a fast language for your bot once the model is trained, so I implemented a referee in C, and used bindings with ctypes to feed the replays to the referee and gather “state vectors” which act as inputs of the dataset.

Output

(state, action) pairs that were seen in winning bots were attributed a label of +1.0

By choosing random action amongst the actions available, I could get (state, action) pairs winning bots chose not to do. Those were attributed a label of -1.0

Data from the loosing bots were discarded.

Input

Because of the 100k characters limit, I had to keep the number of inputs of my model relatively low (I couldn’t fit a lot more than 300 neurons in the input layer).

I focused on the state near the cell in which the potential action was being made, with a lot of inputs for the cells at a distance 1 from the center, a bit less for cells at a distance 2, and the bare minimum for cells at a distance 3. (using manhattan distance)

In the end, here is what the input feature vector looked like:

99 inputs not impacted by the action
- position of walls
- ownership of tiles around the action
- …
82 inputs impacted by the action before applying the action
- gold of each player
- active tiles around the action
- cost to reach tile
- presence of units
- …
82 inputs impacted by the action after applying the action
- gold of each player
- active tiles around the action
- cost to reach tile
- presence of units
- …

Having the state before and after the action is what allows the NN to determine whether this is a good action or not.

Topology

I didn’t try fancy stuff. If you have suggestion I’m love to hear them. Here is the model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 263)               0         
_________________________________________________________________
dense (Dense)                (None, 30)                7920      
_________________________________________________________________
dense_1 (Dense)              (None, 20)                620       
_________________________________________________________________
dense_2 (Dense)              (None, 20)                420       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 21        
=================================================================
Total params: 8,981
Trainable params: 8,981
Non-trainable params: 0
_________________________________________________________________

Activations are relu, except the last one linear. Loss is the MSE. Optimizer is default Adam from keras.

Thoughts about the contest

I thought it was great. Easy to play, hard to master, simple rules & mechanics but a huge depth.

As several people mentioned, it’s probably best to not change the rules for the multiplayer.

However, if rules are changed, I’d like to see a metagame more oriented towards efficient trading rather than piling up gold & towers to get a lethal first.

This would probably mean adding an upkeep to towers, reducing severely the cost of mines, and maybe making lvl3 units more expensive to train but cheaper to maintain.

Anyway, big thanks to @Azkellas & @Tehelka!

I fought, I lost, now I rest

pb4 · May 29, 2019, 5:58am

Could this have been a mistake?

You were taking replays from the top 5 only: you can assume that even though the player lost, he played well in those defensive situation
Winning bots will likely less often have been put in defensive situations. If you discard data from losing bots, you discard valuable data to learn how to handle those defensive situations.

Congrats on making a NN sort of work here

PED · May 29, 2019, 9:08am

I’m not sure, I didn’t try adding the loosing side’s data.

Intuitively, I think if you train on winner data only, you can end up with a trained model that is better than all bots you used to trained it (because you kept the good parts and disregarded the less good ones).

Alternatively, if you train and winner + loser, the best you can hope is an average of the bots used to train your model.

However, I agree, you get a bias when disregarding the loser data. You’re likely to end up, for instance with an over aggressive bot.

Choice of the input features can mitigate this bias: when I tried to add the coordinates of the action as a parameter of the NN, it improved the RMSE during training, but the bot had a tendency to rush for the opponent which lowered its winrate.

NinjaDoggy · May 29, 2019, 9:06pm

First time doing a PM so I’m going to copy MSmits’ format here

I finished 3rd in the contest. Code of Ice and Fire is the first contest I spent the whole week on! Usually I get to bronze in the first weekend, take take around 5 days to coming up with an algorithm and implementing it in the last 2 days(or give up on the contest if I couldn’t think of one), but this contest had me addicted from the start!

Opening:

I considered the Game to be in the “opening” phase if the closest pair of unit between me and my enemy were at least 3 tiles apart.

I ran a beam search that first moved my units current, then trained as much as possible, and then simulated movement for 3 more turns(on existing unit and the newly trained ones). My evaluation function consisted of a voronoi to determine the territory difference, then number of tiles I owned, and tie broke my unit distance to the enemy HQ(precomputed using a BFS)

The depth of the beam search was dependent on the number of units in play, so I set an arbitrary max depth of 200

Chain Finish

Chaining was by far the best part about my bot. I believe I was the first one to implement the chain train to the enemy HQ on the first day of the contest.

To calculate the minimum cost to get to any tile, I set up a directed graph I could run a Dijkstra’s on. The weight of edges between two tiles A -> B was calculated in the following manner:

check if B is an inactive tile. Weight = 0
check if B has an enemy tower. Weight = 30
check if B is a Neutral/Inactive Enemy Tile. Weight = 10
(now the tile MUST an active enemy tile)
check if B is protected by an enemy tower.

if B is protected by exactly 1 enemy tower and A is an enemy tower tile calculate cost in 5
otherwise Weight = 30

check the enemy unit level on this tile(0 if empty or mine) and add 1 to it(capped at level 3). Weight = level * 10

This calculation was done between all adjacent tiles to set up the weighted graph.
To retrace the path, I stored the minimum cost to get to a tile as well as which tile it came from.

Cutting

To compute cuts, I took the minimum cost path to each enemy tile, simulated placing units on the path, and running a flood fill from the enemy HQ to calculate what the enemy losses from the cut. Along a cut I might also reactivate some tiles and this was factored into the evaluation. I evaluated each possible cut based on the following formula:

value = (enemyTilesLost+tilesGained) * A + (enemyTowersDeactivated) * B + (enemyUnitsLost) * C - (goldCostOfPath) * D - (incomeCostOfPath) * E

Finally, I took the “cut” with the largest value repeatedly until the largest value was < 0.

This algorithm can miss cuts if there are multiple minimum cost paths or if a more expensive path is actually better, but from my experience, this cutting algorithm rarely(almost never) missed a worthy cut.

Evaluation Function

Nothing in my search factors in enemy response so I put it in the evaluation. Whenever I evaluated, I simulated ending my turning, running the cut algorithm from the opponent side, and then greedily moving every enemy unit before ending the turn again(back to my turn) and then evaluating based on the following criteria:

difference in gold
difference in active tiles
difference in equity: units + towers + rubble harvesters(oops I meant mines)
difference in unit quality: determined by its adjacent cells and if it was sitting on a tile protected by a tower or not

Moving units and placing towers on the border

I ended up going with beam search again, except evaluating differently compared to my early game expansion. Something I did that was quite clever(IMO) was allowing placement of a tower on a tile currently occupied by one of my units, and forcing the unit to move away during the beam search if it was possible, or to take away the state from the beam search if it was not. This allowed my bot to plan ahead and synergize building a tower and moving units away from the border which lead to the desirable behavior of having my border tiles only defended by towers while my units are free to expand and claim more tiles.

Flow of a turn

Take all cuts evaluated to be positive
Beam search unit movement and tower placement on the border
Try all possible tower placements(Except for tiles already adjacent to another tower)
Evaluate
Rerun except subtracting 15 gold at the start, and adding 15 gold before placing a tower(This was to prevent my bot taking too many cuts and exposing it to a big cut from the enemy)

Unfinished business
You might think a section is missing… and well, that’s because it is.
After the early game phase, my bot stops training units unless it’s for cutting(Obviously not optimal).
The strongest part of my bot was saving up gold and going for devastating cuts on the enemy while at the same time, not exposing itself to any cuts. Every time I tried to add in training units it would make my bot perform a lot worse and in the end I didn’t have time to implement it.

Bobbadillio · May 30, 2019, 12:17am

Hi folks! I’m a first-time competitor on CodinGame, and ended at ~500 on the leaderboard, so I’ll understand if you just skip this write-up and look for input from people who solved more interesting problems On the other hand, I also learned about an important consideration for timing your bot, so maybe some other newbies will find this useful. Search ahead for ‘ms’ to find that piece!

A big part of my motivation for joining CodinGame has been to learn Rust better, and this has been a great exercise for implementing pathfinding logic in Rust!

I developed my bot in a bit of a backwards way, that I’ll describe now!

Define data structures:

a 2d point with row/col
a Unit with owner, id, level, and location
a Player with gold, income, an optional HQ location, and a flag for me/not me
a map cell with a location, cell type (enum over impassable, neutral, mine(in/active), opponent(in/active), and a flag to show if another unit is moving to the cell
a map containing a vector of vectors of MapCells

Parse all the inputs into those data structures (buildings were ignored, except to fill in the optional HQ location)
Find shortest paths to neutral territory, to opponent’s territory, and to my territory.
Make a move for each of my units. Making a move meant moving towards the closest neutral square that isn’t impassable or claimed, with ties broken by distance to the nearest opponent owned square, then claiming and activating the cell.
Figure out how to spawn units (this probably should’ve been earlier!). Initially, this just spawned units at squares prioritized by distance to neutral territory, with squares closest to an opponent square being preferred in ties. It did this without accounting for neutral zones claimable due to movement or spawning though, which wasn’t ideal.
add another shortest path search after unit moves.
modify the moving logic to re-run distance evaluation after unit moves
modify the spawning logic to add a spawned cell’s neighbors

Steps 1-4 did nothing of course, but it meant once I implemented basic spawning, I think got to run straight out of wood league. After that, I think 6-8 were all required to get out of bronze league and into silver.

I learned a good pointer that time starts running after you read a line, which resolved some confusion I had. It looked like my turns were taking 100+ms, but I wasn’t getting DQ’d, and after I learned that tip about when the clock started I found that despite running search 6 times, my turns still topped out at ~8 ms. I had some pretty big weaknesses at this point of course, but decided to focus on the codebase weaknesses rather than the AI weaknesses I’m pretty new to Rust, and fell prey to excessive cloning, which is well described by someone else here: https://thenewwazoo.github.io/clone.html

IloIlo · May 30, 2019, 11:32am

Hi @reCurse, congrats on another great performance on a contest and thanks for your post, really good read as usual.

I’m trying to understand your opponent handling. From what I could grasp, you run your plan search without considering opponent movement, then perform the chain search (or full plan search) for the opponent on the resulting state, and then rerun your plan search but now simulating the opponent chain or plan when reaching the same state.

I don’t know if I uderstand this end condition. Does it mean that if you determine the best end state is a state where you already have a stored opponent response, then you stop the search ? That would make sense, but then how would you have more than one response for the opponent on one state as mentioned here ?

Thanks !

reCurse · May 30, 2019, 12:37pm

No, opponent responses are stored regardless of the state they were found with. Those responses are then executed on every state being evaluated during a new search iteration. This can therefore result in invalid actions, which will truncate the response execution.

Imagine a big opponent chain is found and in the next iteration you move a unit in the way. You block it, so the chain is interrupted because of invalid action and the resulting score is much higher.

BenjaminUrquhart · July 23, 2019, 2:36am

I’m a bit late to the party, but hear me out.

This was my first contest. I landed smack in the middle of Silver, which put me in 541st place. Not the best score, but it was much better than my prediction of landing top in bronze.

I would like to thank @MadKnight (this one) for encouraging me to take the steps necessary to get out of bronze. I ended up breaking my bot at the last minute with lowered my ranking by a few hundred places, but I still did fairly well.

My bot was (I’m rewriting it) based on a few simple ideas:

Building a tower next to HQ dramatically increases my chances of surviving.
Chain kills are top priority

Troop movement is pseudo-random. Troops are moved on spaces with the following priorities:

Next to enemy HQ
Can capture/attack enemy unit/tile
Tile is unclaimed
Tile is closer to enemy HQ

My territory claiming solely consisted of the “tile is unclaimed” case. It worked fairly well, although it wasn’t great.

Because of the tower next to HQ and poor attacking decision making, my bot usually ended up “turtling” and winning the game due to the infamous lethal chain. I also never ironed out cutting, leading it to make incredibly stupid cuts (up against a wall for example).

This contest was a great learning experience. I learned a lot from it, and I’m applying my knowledge to other multiplayer games.