# A Code of Ice & Fire - Feedbacks and Strategies

I finished number 2 in a Code of Ice and Fire (which puts me 2 for 2 at being 2). Thanks to @Azkellas and @Tehelka for a great game, and for everyone at CG for hosting such great contests. Congratulations to @reCurse, the worthy winner.

# Game thoughts

The game was interesting. Two aspects combined to make searching a nightmare:

• The action space is huge; you have many many options for moves, spawns and towers.
• Units can’t be planned independently. The other game I’ve played with a many-unit army to control is Halite 3; but there, you could mostly plan many turns of one unit, ignoring the rest of your army. Here, that’s a non-starter; the whole moveset is very interdependent.

That meant that a pile of heuristics seemed likely to be the way to go.

I had some concerns with the game at various points. There was a very strong early tower meta, which I worried made higher level units pretty pointless. At various times, the game seemed rather prone to large swings, but high level games mostly got over that. The one remaining problem is mines, which simply seem too weak - but overall, the game is pretty well balanced and has a lot of interesting aspects.

# Algorithm

Here’s a sketch of the interesting bits of my algorithm.

## Cut detection

We detect cuts for the two players in the same way; I’ll assume we are looking for our cuts against the enemy when describing the algorithm. The algorithm finds a single, best cut.

1. Use a (slightly bad) variant of Dijkstra’s algorithm to calculate the minimum path from our territory to every square in the enemy’s territory, involving an initial move and then some spawns. For each square, record the parent in the minimum path, the spawn cost of the chain to get there, the total upkeep of the chain, and the level of the unit which moved at the start of it (if any).
2. For every chain we find in 1, work out how much enemy territory it cuts. The score for cutting of a piece of territory is x_1 * income + x_2 * killed_unit_cost + x_3 * destroyed towers + infinity_if_HQ_killed
3. Score every chain as cut_score - build_cost - x_4 * chain_upkeep - x_5 * moving_unit_level
4. Pick the best chain.

This has the advantage of finding almost all good cuts, while considering relatively few cuts, which is useful because we do this many times each turn. However, it misses a fair few good ones - when there’s a good cut to a square, but also a cheaper path to it that is a bad cut.

## Defending cuts

To defend an enemy cut, we do the first of these that works:

• Move a unit to a square we don’t own that blocks or reconnects it.
• Spawn a tower which blocks it. Prefer tower sites that block many possible cuts.
• Spawn a unit in a square we don’t own that blocks or reconnects it.

## Flow of a turn

1. Cuts. We play out lines where
i) We make no cuts, and then defend their highest value cuts until we can’t defend one (or they run out).
ii) We make one cut, and then defend their highest value cuts until we can’t defend one (or they run out).

iii) We make cuts until we can’t find another, and then defend their highest value cuts until we can’t defend one (or they run out).
Each of these lines of play is scored as VALUE_OF_OUR_CUTS - x*VALUE_OF_THEIR_UNDEFENDED_CUT. We play the line which scores best. This process allows us to find cuts which reduce their income or territory enough to prevent their terrifying cut.
2. Move our units. We process units in order of how close they are to the enemy HQ, to minimise conflicts (as units usually move that way). In order of priority, we prefer to move to square that are the enemy’s, then not ours, then not blocking this unit or an ally from empty squares, then close to the enemy HQ, then not adjacent to a friend, then close to the HQ diagonal. Except level 3 units, who only really care about eating tower and moving towards the enemy HQ. I did say this was a pile of heuristics.
i) After moving all our units, look for nasty enemy cuts. If any go through squares we’ve vacated, and we don’t have the gold to cover them, mark those moves as illegal and replan all moves.
3. Do another pass of enemy chains, looking for any new ones from our movement and defending them.
4. Build towers, anywhere that covers a friendly square adjacent to an enemy one which is not next to another tower.
5. Spawn anywhere empty with no nearby friends, or a nearby enemy square, using roughly the same prioritisation as we do for moving.
i) After spawning all our units, look for nasty enemy cuts. If cut off new units, mark those spawns as illegal and replan all spawns.
6. Build mines, maybe? I’m not sure I ever actually build mines, but there’s some code here that could do that.

If I had more time, I would have:

• Detect cuts better. As noted above, I miss a fair few, and I think there are simple measures that would detect the lion’s share of these (for a start, detecting paths of equal cost to the minimum path to a square which link to different squares in our territory).
• Sped things up. I’m using too many Java generics, which are a bit faster for me to write but slow. I could rewrite a lot to use arrays, which would be significantly faster. That would allow me to…
• Write a better cut search. My one is currently very focused; at each stage it considers two options - make our best cut, or stop cutting and defend the opponent’s best cut. It could be both wider (consider >1 cut at a move), and deeper; I don’t model our responses to opponent cuts at all, making them hard to value. This would also involve possibly saving up for future turns (which I already do slightly for kill chains)
• Do the opening better; my units move greedily, but filling territory near-optimally shouldn’t be too hard.
• Play with some magic numbers. I never changed the values of these.
21 Likes

Of course We hope the game to open for multiplayer around next week, after we fix some “details”

5 Likes

Good !! super happy to see that

Loved the contest. My best rank was #2 in Gold

I used Simulated annealing at depth = 1 (just a dummy enemy). My dummy never builds or trains, only moves.

I had two different scoring functions (1 for enemy and 1 for me). Both are very similar except enemy doesn’t have a score for towers and mines.

My Eval:

1. If win return INF, if lost return -INF
2. If my income becomes -ve, return -INF
3. Better reachable cells (actual squared distance (not BFS length) closer to opponent)
4. Better unit positioning (BFS length from opponent HQ)
5. Better income
6. Troops alive (100*troop level)
7. Towers alive
8. Better mine locations if mines exists (squared distance closer to my HQ)

I simulated around 1000-3000 options during every turn to decide the best combination of moves.

My first 6 of those simulations are finding a combination of moves followed by the best training chain. (These simulations never build towers). The moves are also picked in careful order with a global priority. Say in first simulation all the units try to kill opponent troops, then capture opponents’ cell, then capture neutral cells and if none of them is possible move to my own cell. In the other 5 simulations, I choose a different order of movements)

The rest of the simulations are random with proper pruning.

1. All my unit movements are completely random followed by either building towers or trainings. The probability of building towers before training is 1/2 and after training is done is 1/2. Probability of training T1, T2 and T3 troops next is 8/10, 1/5 and 1/10 for 100 iterations if I have gold)

2. Train T1 Troops only on neutral positions or enemy captured positions.
After the first 10 turns, the location chosen must be the closest to enemy HQ. (same as gold boss).

3. Train T2 Troops only on top of T1 troops and if I initially had >= 30 gold.

4. Train T3 Troops only on towers or tower affected areas or T2 troops if I initially had >= 40 gold

5. Build towers with a minimum gap of 2 with each other. Initially place towers defensively (2 distance from opponents). Later based on who is leading the game, choose to build towers at 1 distance from enemy (if leading) or 2 distance from enemy (if losing).

6. There’s a maximum threshold for each of my units which also depends on the number of neutral cells on the first turn of the game. If all the cells are neutral then my threshold numbers are 21 T1 troops, 3 T2 Troops and 6T3 Troops and 7 towers.

Overall the contest was very addicting. I did not sleep in the last 48 hours trying to get to Legend. Nevertheless, this is one of the best rankings I have had in the community contest. So I’m happy. Thank you contest creators.

13 Likes

My first contest in CG, with a decent 190th in Gold, submitting code in the last hour.
Also as my first coding competition, it really inspired productivity, and wrote over 1500 lines in Javascript, and testing the code countlessly in the game IDE

After getting into silver on the 2nd-3rd day I was working on rewriting the game simulation environment with proper objects and methods and wrote a bunch of feature extraction until 8th day. At that time started writing a game logic based on heuristic, observations and other comments on the forum and chat,
mainly ending up with a common solution :

• Quickly expand base in initial phase towards enemy HQ
• Find ideal cuts by a feature map based on how far the connected tiles are from enemy HQ, and how many tiles it connects to
• Retake tiles if enemy cuts into my region
• Finish off the opponent by a chain train cut when possible.

Could not finish up tweaking the heuristic game logic , as there are a set of ideal actions that can be described, and the part of my code with building towers did not work yet. Sometimes the code randomly broke, and repeating the game the malfunction didn`t appear. I assume it was a Javascript timeout.

My game logic was based on the feature maps like:

• Map regions
• Distances from borders with different perspectives
• Number of owned tiles connected to HQ through owned tile
• Cost to conquer a tile with chain train on the cheapest path
• Distance from walls
• Distance from HQ

Ideal aim for heuristic logic was:

• Decide game phase by a set of metrics(exploration, first touch, …)
• Control units movements based on their level and game phase
(lvl1 is for exploration as main source of income/ map control and holding out vs other lvl1, for example blocking takeover of tile, and increasing opponent training costs. Lvl2 and lvl3 roles are mostly agressive)
• Most of the winning or losing is decided on placement of new units, and it gives most of the edge in controlling the map.
• Strategic tower placement, to increase costs for taking over regions, and main source of defense, when the fight starts heating up
• Mines seem to worth building only in a game of attrition as summoning a lvl1 outperforms building a mine in terms of utility and income while there are many tiles to conquer.

My experience on this contest, that I have spent too much time thinking and working out different feature detections in the allotted time, which I have not used in the end that much at all.

All in all I have enjoyed the contest. Thanks

6 Likes

# Introduction

Rank 1! It’s been two years (already?) since I last won an official contest here, so I am obviously really happy. It was a tough competition, kudos to the other challengers for a fierce battle, you did great!

# Game design

I have been on the fence on whether I like the design or not for most of the competition. The foundations are solid and gave me a good vibe at the start, but the rise and complete domination of the chaining mechanic seemed to overshadow the entire game. In the end though, I think I expected a very different game to develop and was disappointed when it turned out to be something else. After playing more with it, some more subtleties started to emerge and made me appreciate its rich depth.

Nonetheless, I still have a slight dislike of how the game state is too brittle and volatile, making it prone to a large swing of momentum that is very hard to recover from if done properly. Figuring out a move is almost like walking on eggshells, navigating a minefield of shallow traps. The meta heavily favors playing as defensively and greedily as possible to finish on one single game ending move. Not the most exciting kind of game to play or watch, but still.

Yet it is also a coding competition after all, so game design is not everything. And on the coding part, it hit very high marks. I am pleased to see the variety of ways the problem was tackled, namely on dealing with chains and choosing moves, which gave way to a lot of creativity there, if less so in actual game strategy. The branching factor was huge, which allowed both search and search-less bots to thrive, and optimization was not the absolute end-all be-all. It is also impressive to see 3 of the top 4 bots were coded in garbage collected languages!

So overall, this contest was really well done. I could comment more on design stuff I wish the game did differently, but it’s pointless. I would even recommend against making any changes when the multi comes out, as the past has shown the level of activity drops too sharply post contest. Those changes would hurt more than help, as untouched contest bots are less viable out of the box, which still form the majority of the multi leaderboard.

# Simulation

The statement screamed bitmaps (or bitfields) at me on the first read. Small grid, lots of floodfill, neighborhood and surface checks… So the first thing I coded is a Bitmap class to handle all those operations, and my game state therefore contains one bitmap for each feature: walkable, mineable, active, owner (per player), units (per level), towers and mines. Combining them gave all the information needed for very fast simulation of game actions, which I did next.

The rest was pretty much run-of-the-mill implementation, and trying to minimize computing the activation map which dominated the rest.

# Plan search

From now on, what I will refer to as a plan contains all the actions (move, train, etc.) consecutively taken during a single turn.

To start, actions must be generated from the current state. This list is comprised of:

• The best X chains found (more on this later)
• All valid unit moves
• Training level 1 units on all valid cells
• Building a tower on all valid cells
• Same as above, but on cells currently occupied by units, creating a number of possible move chains
• Building a mine on… just kidding. Mines are useless.

I purposefully exclude training level 2 and 3 units that are not part of a chain. I believe there is no situation under which it is a good action, because aggressive upkeep costs are crippling the most precious game resource: gold income. A decisive chain however does not make it an issue, as the game should be largely won by that point.

A simple hill climbing algorithm then picks the action from the list resulting in the best evaluation score. A new state is computed as a result of that action, new actions are generated from it and the search continues, until no action further improving the score can be found.

# Chain search

I tried different approaches with various degrees of pruning techniques and results (e.g. depth-first search, breadth-first search…). Ultimately I settled on a compromise in the form of a best-first Dijkstra search performed on every valid starting position, using gold spent as the cost function. Those were either units that have not moved yet, or cells on the front line that can be directly trained on. It would then search nearby cells that were not already owned, until it ran out of money or space to explore.

It is important to note the actions making the chain were not simulated for performance reasons. And for the same reason, a different and faster evaluation function was used on every step to keep the best chain found. It was primarily based on lethality, the number of new active cells resulting from the chain, its gold cost, and an additional penalty for using level 3 units.

A further improvement that came later on was to keep the X best chains found instead of just one, to make up for the weaknesses of the light evaluation function. The real best one can then be figured out with the normal, heavier evaluation function, along with the other actions evaluated in the root search.

# Opponent response

It is just as important, perhaps even more so, to anticipate the opponent moves following yours, especially if it would be a massive blunder. I originally tried to run a chain search for the opponent and executing it in the state given to the evaluation function, but it proved to be too expensive due to the massive number of calls, causing blind spots in crucial moments due to aborted searches.

What helped a lot more was to decouple computing the opponent response from the plan search, composition and evaluation step described above. So at first, a whole plan is formulated as if the opponent would not do anything, and scored as such.

Then, once a plan is completed, a chain search for the opponent is performed on the resulting state. The resulting opponent plan is added to a memory of opponent responses, and the entire plan search algorithm restarts from the beginning, with a key difference. On every evaluation, it will execute all of the stored opponent responses on the state being evaluated, score each of the resulting states and keep the lowest one, behaving similarly to a minimax search.

This process will repeat itself until it runs out of time, or until the expected opponent plan already exists the memory. In practice, this rarely exceeds 5 or 10 different responses, which keeps it fast enough. The best plan according to opponent responses is then sent to the game.

The result was a massive improvement in strength. It also allowed me later on to just run the exact same, full plan search for the opponent instead of a single chain search, once the planning got good enough.

# Evaluation

This remained the weakest part of my bot for a long while, as I spent most efforts on the search described above, along with optimizing and lots of bug fixing. I was convinced I could just iron out this part later, as it was more important to have those other solid foundations first. I think it paid off tremendously, as a good evaluation on a broken search is so much worse than the other way around. It is also a lot easier to tweak towards the end of the competition, as compared to doing major algorithm work.

The main features of my evaluation function:

• Winning condition (duh)
• Number of active cells
• Gold and income
• Front line computation, defined by cells that have unowned neighbors. Mostly used to favor early aggression.
• Distance to front line, per unit.
• Minimum distance to front line, globally.
• Number of front line cells defended by a tower.
• Number of capturable cells, per unit. Mostly used to favor early spread and territory grab.

I also eventually doubled the weight on gold and income past an arbitrary turn number. This resulted in very valuable behavior changes. It stopped overspending and overdefending in the mid game, instead building up income and gold reserves to prepare attacking. This also favored very aggressive chains that would cripple the opponent income.

# Miscellaneous

Having a heavily optimized simulation and bot also paid off tremendously in the form of local testing. I was able to self-play a little under 1000 games per minute, which was a great way of aggressively pruning bad changes and spotting potentially good ones.

The viewer was a bit lackluster in usability this time around. The frames were sometimes confusing in the execution and change of player turn, and matching it up with the error output was even more of a pain than usual. The lack of a debug view also doesn’t help readability, especially since level 1 and level 2 units were hard to distinguish IMO. I ended up coding my own viewer to help with those problems (e.g. big circle with 1, 2, 3 inside).

# Conclusion

Thanks to @Azkellas, @Tehelka and Codingame for the amazing hard work put in making this contest a reality. Thanks to everyone for participating and making this a fun community event!

35 Likes

Hi everybody,

I just discovered Codingame a month ago. I was really amazed with the platform and its community.This was my first contest, it was very interesting and I learned a lot in the process.

I managed to implement several functions, calling they in this order:

1. search for a win, looking for a path from opponent HQ that I could afford.
2. search for a cut. Only one movement, didn’t get in time to use path search here.
3. build towers
4. move units
5. train new units

I applied variations of this functions depending of three stages of the game with their own goal:

a. conquer more territory
b. build a defense with towers
c. explore the rest of the map

I wonder if, in addition to path searching, some graph algorithms could be used to find the better spots to cut off the enemy.

Thanks a lot to @Azkellas, @Tehelka and other collaborators of this game. It was a great work!
Also thank to all the players that shared their insights and strategies.

5 Likes

Thanks to @Azkellas, @Tehelka, it is a great contest and I really like it.

Since I have some other stuffs to work on, my goal is simply to get to gold before the contest started.
With the limit of time I had, I decided that I won’t write a simulator anyhow.

So I write a flood-fill without taking action order into consideration.
And that flood-fill algorithm just teleport me from wood 3 to bronze…

Add the logic to detect when to use level 2 and level 3 units and get to silver.

While I was in Silver, I think it should be the time when I need to implement the cutting logic.
So I spent some days doing other stuff while thinking how to implement that without a simulator.
I ended up with a 24 cuts search, 12 rows and 12 columns.
Search if I can occupy a row or a column and calculate the enemy territory to the left and to the right (or up - down).

This worked dramatically and get me to gold.
Rank 254 overall.

Very satisfied with the result and didn’t touch the code again.

4 Likes

10th final, best result so far

i really liked this contest, chains were interesting from the beginning, and even though tower defense seemed too strong (no upkeep cost and survive cut), the game still had enough other details/concepts to kept us busy (performance optimization was not crucial).
also, it was easy to improve incrementally (or fix bugs). as without the need of heavy background calculations or multi-turn plan ahead, inefficiencies can be easily spotted. and without randomness those can be reproduced in ide/local.

implementing a proper set of early heuritics kept me in top3-10 first week. but as usual, those run out of steam. at least this time i was able to implement some more sophisticated versions and after fixing all bugs in those i got back to top10.
competition was strong (eg. halite top players participation), congratulation for winners + early top: Euler and Siman.

early heuristics (order of appearance)
greedy move/train lvl1
killing blow
heuristic tower defense
train lvl2 + 2step cut to kill oppnent lvl2
hardcoded first turn movements

order of actions
-killchain
-moves
-opponent killchain
-cuts
-simple tower defense
-train
-mine (single one)

first turns moves
-still greedy
-focus on going toward opponent base
-avoid situations where a unit cannot take an empty cell
-tried different greedy methods for maps with huge holes, without success
killchain
-DFS from opponent base
-post adjustment to compensate tower kill

moves/train
-greedy with score calculation
-avoid stronger opponent units
-check cut value with DFS from opponent base
-no need to stay on tower defended cells
-avoid being easily cut (leave undefended cell next to opponent unit)
-avoid many “backfiller” units
-prefer move toward opponent base/unit

cuts
-BFS from edge points
-only through not-actively-owned cells
-check cut value with DFS from opponent base
-only check cut value if it really cuts (check on nearby cells)
-steplimit to avoid timeout (9-7 somewhat dynamic)

cuts order
-run a few rounds for self and for opponent
-with various gain rate, and minimum gain parameters

7 Likes

Legend 6th.

Really good contest. I like this game and I also like visual design of this contest so much, it is so cool.

Simulation

I used Simulated Annealing. I simulated only my moves, no opponent moving simulation.

I divided my turn into two phases: MOVEs, then TRAINs and BUILDs. First, simulate the MOVEs in 10ms, then the TRAINs and BUILDs in remaining time. These are using same evaluation function.

In the MOVE simulation, I annealed movement order and moving directions of units.

In the TRAIN and BUILD simulation, I annealed units train position and its level, and tower construction positions. No mines I used.

I did not optimized the annealing method a lot. I spent a lot of time to improve the evaluation function.

Evaluation

In the beginning of the game, evaluate that securing a lot of areas is the most important task, and if it is same evaluate that the distance to opponent HQ is small. In the later half, saving gold is more important than securing areas because I wanted to do a powerful attack that reaches the opponent HQ.

Also, I calculated the max range in which opponent player can train units. We can calculate this by doing Dijkstra Search from opponent HQ. (In Dijkstra Search, if in opponent area moving cost as zero, if not, moving cost as the minimum cost to training unit on that cell). My own cells in this range are considered to be occupied by the opponent player on the next turn, so decrease the evaluation value. As a result, I got the result of building the tower so that the range attacked by the opponent would be small.

9 Likes

Thank you for the explanation euler, very clear !

@reCurse - thanks for the write-up. Your chain search, and particularly how you incorporate opponent responses, is very clever. One question - how did you build defense into it? It seems expensive to put all possible tower locations into a “plan”; do you do something about using locations which block your opponent’s responses?

Very interesting contest ! Thanks @Azkellas and @Tehelka for this game.

I finished 55th (11th Gold) and I’m waiting for the multi version to improve my bot. Some technical aspects of my code need to be rewritten :

• My BFS is approximative, with one pass to calculate first ways to the enemy GQ, and a second pass to minimize cost from neighbour cells and propagate this minimum to child cells.
• My game states are bugged. When I play moves on a copied state, the original state could be modified, and sometimes crashes. So I managed to play on one unique state.

Here, I used heuristics. Moves, trains, builds, are all evaluated with magic numbers given to each possible situations for a cell and the 8 next cells (sometimes for neighbours or those cells too), and functions of the distances from my GQ and the opponent GQ. I didn’t find a good balance for my criterias (missing a NN trained with the top 10 players ), and my expansion is like following a growing circle.

Cuts and lethal are calculated with my BFS from the target cell to all my active cells, or potentially active cells if one of my unit can move on it.

With BFS and heuristics my bot will fill the actions queue in this order :

1. Find a lethal move
2. Find a cut : captured cells / cost >= 2.0
3. Move all units who can go to neutral or opponent cells
4. Move all units who can only go to cells of mine. I do several passes to avoid blocking my units each other.
5. Build towers (maximum 2 towers)
6. Train level 3 unit on opponent level 3 units near my zone
7. Train units
8. Build mines

Then play all actions.

This contest was very fun and I think I will spend a lot of time on it when the multi will be online.

Thanks to CG and all the participants.

2 Likes

Defense is an implicit consequence of scoring well on the evaluation function. So if there is a better score by placing a tower, the result will be to “play defensively”. But it could also score even better by attacking instead (or doing both). So those concepts are an emergent result of scoring, and not explicitly implemented.

Thanks to the decoupling, it is relatively inexpensive to test a lot of actions, such as all possible tower locations, because I am using cached opponent responses during evaluation instead of recomputing them on the fly. Their response is also truncated on the first invalid action, so blocking those responses are rewarded by minimizing the score loss. But it could be by building a tower, moving a unit, attacking, etc…

1 Like

Ah, I see. Do you plan one move at a time (where “build a chain” is a move), and then evaluate? It seems like you’d get to maybe 100 possible actions, which is fine, but an infeasible number of possible plans.

Yes, planning is done one move at a time, and the move yielding the best score is chosen. Because of greedy hill climbing, the number of actions to evaluate are added instead of multiplied during the plan composition, so the number of possible plans is actually fairly low.

1 Like

k4ng0u - 25th in Legend (Python 3).

This was my second contest, the first one being CalM. I really enjoyed the fact that the rules were simple and thanks to this it was easy to understand the different strategies and their flaws.

The starter kit was really (too?) well done as it split the phases of a turn (read inputs and update game state,train, move,output result) and had basic implementations for each phase. Even in my final code I more or less kept the same phases.

To get started I made my units move to opponent HQ instead of the map center and this instantly sent me to wood 3.

A flood fill algorithm to train and move units pushed me to Bronze by Friday evening. At this point I went on working on my bot even though I think that a good floodfill algorithm might reach silver league without towers/level 2-3 units.

My next “improvement” was to build level2-3 units to destroy ennemy units whenever they are close to my territory. This didn’t work too well as it depleted my gold and reduced my income so drastically that I wasn’t able to do anything as long as the costly units were standing. This made me realize the (over)power of towers (cheap and no upkeep). So I decided to put towers on all cells with a ditance of 2 from my ennemy border. With this, my bot was a solid top 50 and reached Silver league.

After that my bot slowly dropped in the ranking. This is when I noticed that I was losing a few games I was totally dominating in terms of territory. This was due to what I call the “finisher” move (and the fact that my bot didn’t know how to handle area protected by turrets). Some bots were saving gold to spend it all in one turn and finish the game without caring about upkeeps. While stalling the game isn’t always good, being able to know how much gold is needed to win in one turn is quite interesting. I used a djisktra from my hq taking into account the cost to access each cell: 0 for my cells, 10 for an empty cell, 20 for a cell with an ennemy unit lv1, 30 for a cell with an ennemy unit lv2/3 or protected by a tower. I had a protection dict to indicate for each cell how many towers were protecting it. During the dijsktra, each time a cell was visited, if it was a tower, I updated the protection dict associated to the path so that the cost of the next moves takes the tower destruction into account. The path computation takes an underwhelming 0.8ms. I wonder if my algorithm is particularly bad of if it’s just Python being very slow. Anyway thanks to the finisher move, I was now winning most of the games where stalling was involved.

On gold league opening, my game phases were as follows and it never changed afterwards:
_Move units (floodfill, try to avoid blocking other units)
_Play Finisher (if we can finish the game, finish it)
_Check if game can be finished next turn (based on opp hq distance to mine, my gold, and my income). If it can, skip the next phases (save money!)
_Play cuts (my initial basic implementation was just to train lvl3 units on top of opponent lvl3 units)
_Build towers
_Train units
_Build mines (build mines closest to my hq with the gold left if my income is less than the opponent income)

From Wednsday to Friday my ranking dropped between 70 and 110. I had 3 points that were really not going well:
1)cuts => many opponents were cutting my territory but I didn’t cut theirs.
2)floodfill => my oppponents were consistently spreading further than me.
3)tower building => Building towers on cells at distance 2 from ennemy border was wasting way too much money.

I wasn’t much motivated because I had no clue on how to improve point 2 and 3, and for point 1 given the computation time my diksjtra took, I wasn’t sure I could get enough depth to have meaningful cuts.

I tried to to study the top player games. But it was nearly impossible to understand why they did a specific action (the moves computed by simulations are not always deterministic and even if they are it’s not always easy to understand that a move is good because it will be interesting 3 turns later). This is when the legend league opened and the Broken Boss saved me. His moves were quite easy to predict (or at least it gave me good information on what type of gameplay I should use):
_move/train: always go to the tile that is nearest to opponent hq, in case of multiple tiles at equal distance, alternate priority to right and down (or left and up when we are blue). This would guarantee a fast expansion to the center and spread your territory at the same time.
_tower building: only build tower to protect cells that are in danger on the border. A cell is in danger if it’s not protected and if it’s empty or if the adjacent opponent unit can freely move onto your cell. If opponent gold+income>=30 all cells on the border are in danger.

With this I thought that maybe if I manage to cut plus strenghten my weaknesses with the boss strategy I could climb up a few ranks.
I started with the move/train expansion improvement that was relatively easy to implement. It was already 20-30 ranks gained.
Then I moved on the cut which gave me a hard time. I don’t know any algorithm to efficiently cut graphs. So what I did is to bruteforce all possible paths starting from ennemy cells that are adjacent to my territory with a depth of 4 (This was pruning a lot of interesting moves starting from neutral cells but it is all I could afford without timing out). Then I would compute the ennemy score and my score (based on constants, number of tiles owned, number of units, income), how much the cut cost me and the upkeep needed for the chain. With a mix of all this I tried to determine if it was worth cutting. The advantage of computing a score based on territory of both the opponent and me is that the cut algorithm also naturally handles reconnection to big blocks of inactive cells. I tried to adjust the constants a bit. But due to the weird mixing of information (income, gold, nb of units, unit levels), my cuts are still sometimes over aggressive or very defensive (particularly afraid of using lvl3 or cut an opponent lvl3 since it’s a 20gold income shift). But it still made me raise into the top 5 of the gold league.

My last change was turret management. So far I was filling the map with towers once I reached ennemy territory. I changed it into the Broken Boss strategy. It helped me a lot during the expansion phase and was a game changer. I was now only building necessary towers to protect the front line and using the rest of the money to expand my territory in early game. That and a few bug fixes made me pass to legend!

Many thanks to Azkellas and Tehelka, it was really an entertaining contest! I found it particularly pleasant that even without using heavy computations we could make a relevant bot. I would try to learn C++ for the next contest though as it’s a little bit frustrating to restrain from some possibilities because the language is too slow (and I’d like to check if my code is slow because of the language or the one who uses it :p). It would be quite interesting to know how the other Python programmers handled the cuts and path computation/scoring. For me it was definitely the bottleneck of my computation time. It would be nice also to have a story of the game conception, how it evolved and how Azkellas and Tehelka came out with such a strong Broken Boss!

8 Likes

Finished 27th, first in C, after luckily being pushed to legend by some kind stranger the night before the end of the contest.

So tl;dr : I built a system to gather replays from the CodinGame API, and trained a NN to replicate the actions of the top players of the leaderboard. It was a very fun experience.

If you want some (undocumented) Python sources to gather replays, you can find some scripts here.

## General statistics about the contest

Since I have replays locally available, I thought I could share some stats about the game.

I Gathered 71k replays from players ranked 1 to 200 after the contest ended. On this player base, the winrate for the red player is 47.97%. The average number of turns per game is 17.5, with a standard deviation of 8.5 turns.

#### “Metagame” as a function of players rank

These chart have the rank of players as the x axis, meaning that on the left are the “strongest” bots, and on the right the “weakest”.

Less lvl1 units and more lvl3 units in the top. Which can be linked to the increasing number of towers in the top of the leaderboard I suppose.

In average 1 mine built every 5 games.

Lots of bugs

#### Top 1-25 vs Top 26-200 : Turn by turn evolution of a game

Apologies if these are a bit hard to read. They compare the evolution of several metrics turn by turn.

There doesn’t seem to be a huge difference in average number of active tiles per player before the 7th turn. The top bots are probably better at choosing in which direction to expand however.

Less units and commands/turn for high ranking bots. Probably a side effect of using more towers and piling up gold for a lethal.

If you’re interested in other statistics about the game don’t hesitate to ask.

## The Bot

So I wanted to try to make a bot that replicates other bots for quite some time, and since the contest looked adapted for convolutional NNs I thought I’d give it a shot (even though in the end I didn’t have time for a CNN, and simply used a few dense layers).

#### General idea

I wanted to keep the logic of the bot as simple as possible to focus on the learning side. In pseudo-code, here is what it does:

``````while number_of_avalable_actions > 0:
a = find_action_with_best_fitess()
apply_action(a)
``````

So no cutting, no chaining actions, no simulation of the opponent turn, etc. All is done through an eval function, which evaluates the fitness of a (state, action) pair.

#### Gathering replays

I used a python script (available here) to scroll over CodinGames API, load replays of the top5 of the current leaderboard, keep the important part of the frames and store it locally as jsons. A cronjob called this script every hour.

#### Building a dataset

It’s easiest to train the model in Python, but ideally you want a fast language for your bot once the model is trained, so I implemented a referee in C, and used bindings with ctypes to feed the replays to the referee and gather “state vectors” which act as inputs of the dataset.

##### Output

(state, action) pairs that were seen in winning bots were attributed a label of +1.0

By choosing random action amongst the actions available, I could get (state, action) pairs winning bots chose not to do. Those were attributed a label of -1.0

Data from the loosing bots were discarded.

##### Input

Because of the 100k characters limit, I had to keep the number of inputs of my model relatively low (I couldn’t fit a lot more than 300 neurons in the input layer).

I focused on the state near the cell in which the potential action was being made, with a lot of inputs for the cells at a distance 1 from the center, a bit less for cells at a distance 2, and the bare minimum for cells at a distance 3. (using manhattan distance)

In the end, here is what the input feature vector looked like:

• 99 inputs not impacted by the action
• position of walls
• ownership of tiles around the action
• 82 inputs impacted by the action before applying the action
• gold of each player
• active tiles around the action
• cost to reach tile
• presence of units
• 82 inputs impacted by the action after applying the action
• gold of each player
• active tiles around the action
• cost to reach tile
• presence of units

Having the state before and after the action is what allows the NN to determine whether this is a good action or not.

##### Topology

I didn’t try fancy stuff. If you have suggestion I’m love to hear them. Here is the model summary:

``````_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (InputLayer)         (None, 263)               0
_________________________________________________________________
dense (Dense)                (None, 30)                7920
_________________________________________________________________
dense_1 (Dense)              (None, 20)                620
_________________________________________________________________
dense_2 (Dense)              (None, 20)                420
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 21
=================================================================
Total params: 8,981
Trainable params: 8,981
Non-trainable params: 0
_________________________________________________________________
``````

Activations are relu, except the last one linear. Loss is the MSE. Optimizer is default Adam from keras.

I thought it was great. Easy to play, hard to master, simple rules & mechanics but a huge depth.

As several people mentioned, it’s probably best to not change the rules for the multiplayer.

However, if rules are changed, I’d like to see a metagame more oriented towards efficient trading rather than piling up gold & towers to get a lethal first.

This would probably mean adding an upkeep to towers, reducing severely the cost of mines, and maybe making lvl3 units more expensive to train but cheaper to maintain.

Anyway, big thanks to @Azkellas & @Tehelka!

I fought, I lost, now I rest

20 Likes

Could this have been a mistake?

• You were taking replays from the top 5 only: you can assume that even though the player lost, he played well in those defensive situation
• Winning bots will likely less often have been put in defensive situations. If you discard data from losing bots, you discard valuable data to learn how to handle those defensive situations.

Congrats on making a NN sort of work here

1 Like

I’m not sure, I didn’t try adding the loosing side’s data.

Intuitively, I think if you train on winner data only, you can end up with a trained model that is better than all bots you used to trained it (because you kept the good parts and disregarded the less good ones).

Alternatively, if you train and winner + loser, the best you can hope is an average of the bots used to train your model.

However, I agree, you get a bias when disregarding the loser data. You’re likely to end up, for instance with an over aggressive bot.

Choice of the input features can mitigate this bias: when I tried to add the coordinates of the action as a parameter of the NN, it improved the RMSE during training, but the bot had a tendency to rush for the opponent which lowered its winrate.

1 Like