A Code of Ice & Fire - Feedbacks and Strategies

My first contest in CG, with a decent 190th in Gold, submitting code in the last hour.
Also as my first coding competition, it really inspired productivity, and wrote over 1500 lines in Javascript, and testing the code countlessly in the game IDE

After getting into silver on the 2nd-3rd day I was working on rewriting the game simulation environment with proper objects and methods and wrote a bunch of feature extraction until 8th day. At that time started writing a game logic based on heuristic, observations and other comments on the forum and chat,
mainly ending up with a common solution :

  • Quickly expand base in initial phase towards enemy HQ
  • Find ideal cuts by a feature map based on how far the connected tiles are from enemy HQ, and how many tiles it connects to
  • Retake tiles if enemy cuts into my region
  • Finish off the opponent by a chain train cut when possible.

Could not finish up tweaking the heuristic game logic , as there are a set of ideal actions that can be described, and the part of my code with building towers did not work yet. Sometimes the code randomly broke, and repeating the game the malfunction didn`t appear. I assume it was a Javascript timeout.

My game logic was based on the feature maps like:

  • Map regions
  • Distances from borders with different perspectives
  • Number of owned tiles connected to HQ through owned tile
  • Cost to conquer a tile with chain train on the cheapest path
  • Distance from walls
  • Distance from HQ

Ideal aim for heuristic logic was:

  • Decide game phase by a set of metrics(exploration, first touch, …)
  • Control units movements based on their level and game phase
    (lvl1 is for exploration as main source of income/ map control and holding out vs other lvl1, for example blocking takeover of tile, and increasing opponent training costs. Lvl2 and lvl3 roles are mostly agressive)
  • Most of the winning or losing is decided on placement of new units, and it gives most of the edge in controlling the map.
  • Strategic tower placement, to increase costs for taking over regions, and main source of defense, when the fight starts heating up
  • Mines seem to worth building only in a game of attrition as summoning a lvl1 outperforms building a mine in terms of utility and income while there are many tiles to conquer.

My experience on this contest, that I have spent too much time thinking and working out different feature detections in the allotted time, which I have not used in the end that much at all.

All in all I have enjoyed the contest. Thanks

6 Likes

Introduction

Rank 1! It’s been two years (already?) since I last won an official contest here, so I am obviously really happy. It was a tough competition, kudos to the other challengers for a fierce battle, you did great!

Game design

I have been on the fence on whether I like the design or not for most of the competition. The foundations are solid and gave me a good vibe at the start, but the rise and complete domination of the chaining mechanic seemed to overshadow the entire game. In the end though, I think I expected a very different game to develop and was disappointed when it turned out to be something else. After playing more with it, some more subtleties started to emerge and made me appreciate its rich depth.

Nonetheless, I still have a slight dislike of how the game state is too brittle and volatile, making it prone to a large swing of momentum that is very hard to recover from if done properly. Figuring out a move is almost like walking on eggshells, navigating a minefield of shallow traps. The meta heavily favors playing as defensively and greedily as possible to finish on one single game ending move. Not the most exciting kind of game to play or watch, but still.

Yet it is also a coding competition after all, so game design is not everything. And on the coding part, it hit very high marks. I am pleased to see the variety of ways the problem was tackled, namely on dealing with chains and choosing moves, which gave way to a lot of creativity there, if less so in actual game strategy. The branching factor was huge, which allowed both search and search-less bots to thrive, and optimization was not the absolute end-all be-all. It is also impressive to see 3 of the top 4 bots were coded in garbage collected languages!

So overall, this contest was really well done. I could comment more on design stuff I wish the game did differently, but it’s pointless. I would even recommend against making any changes when the multi comes out, as the past has shown the level of activity drops too sharply post contest. Those changes would hurt more than help, as untouched contest bots are less viable out of the box, which still form the majority of the multi leaderboard.

Simulation

The statement screamed bitmaps (or bitfields) at me on the first read. Small grid, lots of floodfill, neighborhood and surface checks… So the first thing I coded is a Bitmap class to handle all those operations, and my game state therefore contains one bitmap for each feature: walkable, mineable, active, owner (per player), units (per level), towers and mines. Combining them gave all the information needed for very fast simulation of game actions, which I did next.

The rest was pretty much run-of-the-mill implementation, and trying to minimize computing the activation map which dominated the rest.

Plan search

From now on, what I will refer to as a plan contains all the actions (move, train, etc.) consecutively taken during a single turn.

To start, actions must be generated from the current state. This list is comprised of:

  • The best X chains found (more on this later)
  • All valid unit moves
  • Training level 1 units on all valid cells
  • Building a tower on all valid cells
  • Same as above, but on cells currently occupied by units, creating a number of possible move chains
  • Building a mine on… just kidding. Mines are useless.

I purposefully exclude training level 2 and 3 units that are not part of a chain. I believe there is no situation under which it is a good action, because aggressive upkeep costs are crippling the most precious game resource: gold income. A decisive chain however does not make it an issue, as the game should be largely won by that point.

A simple hill climbing algorithm then picks the action from the list resulting in the best evaluation score. A new state is computed as a result of that action, new actions are generated from it and the search continues, until no action further improving the score can be found.

Chain search

I tried different approaches with various degrees of pruning techniques and results (e.g. depth-first search, breadth-first search…). Ultimately I settled on a compromise in the form of a best-first Dijkstra search performed on every valid starting position, using gold spent as the cost function. Those were either units that have not moved yet, or cells on the front line that can be directly trained on. It would then search nearby cells that were not already owned, until it ran out of money or space to explore.

It is important to note the actions making the chain were not simulated for performance reasons. And for the same reason, a different and faster evaluation function was used on every step to keep the best chain found. It was primarily based on lethality, the number of new active cells resulting from the chain, its gold cost, and an additional penalty for using level 3 units.

A further improvement that came later on was to keep the X best chains found instead of just one, to make up for the weaknesses of the light evaluation function. The real best one can then be figured out with the normal, heavier evaluation function, along with the other actions evaluated in the root search.

Opponent response

It is just as important, perhaps even more so, to anticipate the opponent moves following yours, especially if it would be a massive blunder. I originally tried to run a chain search for the opponent and executing it in the state given to the evaluation function, but it proved to be too expensive due to the massive number of calls, causing blind spots in crucial moments due to aborted searches.

What helped a lot more was to decouple computing the opponent response from the plan search, composition and evaluation step described above. So at first, a whole plan is formulated as if the opponent would not do anything, and scored as such.

Then, once a plan is completed, a chain search for the opponent is performed on the resulting state. The resulting opponent plan is added to a memory of opponent responses, and the entire plan search algorithm restarts from the beginning, with a key difference. On every evaluation, it will execute all of the stored opponent responses on the state being evaluated, score each of the resulting states and keep the lowest one, behaving similarly to a minimax search.

This process will repeat itself until it runs out of time, or until the expected opponent plan already exists the memory. In practice, this rarely exceeds 5 or 10 different responses, which keeps it fast enough. The best plan according to opponent responses is then sent to the game.

The result was a massive improvement in strength. It also allowed me later on to just run the exact same, full plan search for the opponent instead of a single chain search, once the planning got good enough.

Evaluation

This remained the weakest part of my bot for a long while, as I spent most efforts on the search described above, along with optimizing and lots of bug fixing. I was convinced I could just iron out this part later, as it was more important to have those other solid foundations first. I think it paid off tremendously, as a good evaluation on a broken search is so much worse than the other way around. It is also a lot easier to tweak towards the end of the competition, as compared to doing major algorithm work.

The main features of my evaluation function:

  • Winning condition (duh)
  • Number of active cells
  • Gold and income
  • Front line computation, defined by cells that have unowned neighbors. Mostly used to favor early aggression.
    • Distance to front line, per unit.
    • Minimum distance to front line, globally.
    • Number of front line cells defended by a tower.
  • Number of capturable cells, per unit. Mostly used to favor early spread and territory grab.

I also eventually doubled the weight on gold and income past an arbitrary turn number. This resulted in very valuable behavior changes. It stopped overspending and overdefending in the mid game, instead building up income and gold reserves to prepare attacking. This also favored very aggressive chains that would cripple the opponent income.

Miscellaneous

Having a heavily optimized simulation and bot also paid off tremendously in the form of local testing. I was able to self-play a little under 1000 games per minute, which was a great way of aggressively pruning bad changes and spotting potentially good ones.

The viewer was a bit lackluster in usability this time around. The frames were sometimes confusing in the execution and change of player turn, and matching it up with the error output was even more of a pain than usual. The lack of a debug view also doesn’t help readability, especially since level 1 and level 2 units were hard to distinguish IMO. I ended up coding my own viewer to help with those problems (e.g. big circle with 1, 2, 3 inside).

Conclusion

Thanks to @Azkellas, @Tehelka and Codingame for the amazing hard work put in making this contest a reality. Thanks to everyone for participating and making this a fun community event!

35 Likes

Hi everybody,

I just discovered Codingame a month ago. I was really amazed with the platform and its community.This was my first contest, it was very interesting and I learned a lot in the process.

I managed to implement several functions, calling they in this order:

  1. search for a win, looking for a path from opponent HQ that I could afford.
  2. search for a cut. Only one movement, didn’t get in time to use path search here.
  3. build towers
  4. move units
  5. train new units

I applied variations of this functions depending of three stages of the game with their own goal:

a. conquer more territory
b. build a defense with towers
c. explore the rest of the map

I wonder if, in addition to path searching, some graph algorithms could be used to find the better spots to cut off the enemy.

Thanks a lot to @Azkellas, @Tehelka and other collaborators of this game. It was a great work!
Also thank to all the players that shared their insights and strategies.

5 Likes

Thanks to @Azkellas, @Tehelka, it is a great contest and I really like it.

Since I have some other stuffs to work on, my goal is simply to get to gold before the contest started.
With the limit of time I had, I decided that I won’t write a simulator anyhow.

So I write a flood-fill without taking action order into consideration.
And that flood-fill algorithm just teleport me from wood 3 to bronze…

Add the logic to detect when to use level 2 and level 3 units and get to silver.

While I was in Silver, I think it should be the time when I need to implement the cutting logic.
So I spent some days doing other stuff while thinking how to implement that without a simulator.
I ended up with a 24 cuts search, 12 rows and 12 columns.
Search if I can occupy a row or a column and calculate the enemy territory to the left and to the right (or up - down).

This worked dramatically and get me to gold.
Rank 254 overall.

Very satisfied with the result and didn’t touch the code again.

4 Likes

10th final, best result so far

i really liked this contest, chains were interesting from the beginning, and even though tower defense seemed too strong (no upkeep cost and survive cut), the game still had enough other details/concepts to kept us busy (performance optimization was not crucial).
also, it was easy to improve incrementally (or fix bugs). as without the need of heavy background calculations or multi-turn plan ahead, inefficiencies can be easily spotted. and without randomness those can be reproduced in ide/local.

implementing a proper set of early heuritics kept me in top3-10 first week. but as usual, those run out of steam. at least this time i was able to implement some more sophisticated versions and after fixing all bugs in those i got back to top10.
competition was strong (eg. halite top players participation), congratulation for winners + early top: Euler and Siman.

early heuristics (order of appearance)
greedy move/train lvl1
killing blow
heuristic tower defense
train lvl2 + 2step cut to kill oppnent lvl2
hardcoded first turn movements

order of actions
-killchain
-moves
-opponent killchain
-cuts
-simple tower defense
-train
-mine (single one)

first turns moves
-still greedy
-focus on going toward opponent base
-avoid situations where a unit cannot take an empty cell
-tried different greedy methods for maps with huge holes, without success
-should have added random/simulate-ahead method
killchain
-DFS from opponent base
-post adjustment to compensate tower kill

moves/train
-greedy with score calculation
-avoid stronger opponent units
-check cut value with DFS from opponent base
-no need to stay on tower defended cells
-avoid being easily cut (leave undefended cell next to opponent unit)
-avoid many “backfiller” units
-prefer move toward opponent base/unit

cuts
-BFS from edge points
-only through not-actively-owned cells
-check cut value with DFS from opponent base
-only check cut value if it really cuts (check on nearby cells)
-steplimit to avoid timeout (9-7 somewhat dynamic)

cuts order
-run a few rounds for self and for opponent
-with various gain rate, and minimum gain parameters

7 Likes

Legend 6th.

Really good contest. I like this game and I also like visual design of this contest so much, it is so cool.

Simulation

I used Simulated Annealing. I simulated only my moves, no opponent moving simulation.

I divided my turn into two phases: MOVEs, then TRAINs and BUILDs. First, simulate the MOVEs in 10ms, then the TRAINs and BUILDs in remaining time. These are using same evaluation function.

In the MOVE simulation, I annealed movement order and moving directions of units.

In the TRAIN and BUILD simulation, I annealed units train position and its level, and tower construction positions. No mines I used.

I did not optimized the annealing method a lot. I spent a lot of time to improve the evaluation function.

Evaluation

In the beginning of the game, evaluate that securing a lot of areas is the most important task, and if it is same evaluate that the distance to opponent HQ is small. In the later half, saving gold is more important than securing areas because I wanted to do a powerful attack that reaches the opponent HQ.

Also, I calculated the max range in which opponent player can train units. We can calculate this by doing Dijkstra Search from opponent HQ. (In Dijkstra Search, if in opponent area moving cost as zero, if not, moving cost as the minimum cost to training unit on that cell). My own cells in this range are considered to be occupied by the opponent player on the next turn, so decrease the evaluation value. As a result, I got the result of building the tower so that the range attacked by the opponent would be small.

9 Likes

Thank you for the explanation euler, very clear !

@reCurse - thanks for the write-up. Your chain search, and particularly how you incorporate opponent responses, is very clever. One question - how did you build defense into it? It seems expensive to put all possible tower locations into a “plan”; do you do something about using locations which block your opponent’s responses?

Very interesting contest ! Thanks @Azkellas and @Tehelka for this game.

I finished 55th (11th Gold) and I’m waiting for the multi version to improve my bot. Some technical aspects of my code need to be rewritten :

  • My BFS is approximative, with one pass to calculate first ways to the enemy GQ, and a second pass to minimize cost from neighbour cells and propagate this minimum to child cells.
  • My game states are bugged. When I play moves on a copied state, the original state could be modified, and sometimes crashes. So I managed to play on one unique state.

Here, I used heuristics. Moves, trains, builds, are all evaluated with magic numbers given to each possible situations for a cell and the 8 next cells (sometimes for neighbours or those cells too), and functions of the distances from my GQ and the opponent GQ. I didn’t find a good balance for my criterias (missing a NN trained with the top 10 players :slight_smile: ), and my expansion is like following a growing circle.

Cuts and lethal are calculated with my BFS from the target cell to all my active cells, or potentially active cells if one of my unit can move on it.

With BFS and heuristics my bot will fill the actions queue in this order :

  1. Find a lethal move
  2. Find a cut : captured cells / cost >= 2.0
  3. Move all units who can go to neutral or opponent cells
  4. Move all units who can only go to cells of mine. I do several passes to avoid blocking my units each other.
  5. Build towers (maximum 2 towers)
  6. Train level 3 unit on opponent level 3 units near my zone
  7. Train units
  8. Build mines

Then play all actions.

This contest was very fun and I think I will spend a lot of time on it when the multi will be online.

Thanks to CG and all the participants.

2 Likes

Defense is an implicit consequence of scoring well on the evaluation function. So if there is a better score by placing a tower, the result will be to “play defensively”. But it could also score even better by attacking instead (or doing both). So those concepts are an emergent result of scoring, and not explicitly implemented.

Thanks to the decoupling, it is relatively inexpensive to test a lot of actions, such as all possible tower locations, because I am using cached opponent responses during evaluation instead of recomputing them on the fly. Their response is also truncated on the first invalid action, so blocking those responses are rewarded by minimizing the score loss. But it could be by building a tower, moving a unit, attacking, etc…

1 Like

Ah, I see. Do you plan one move at a time (where “build a chain” is a move), and then evaluate? It seems like you’d get to maybe 100 possible actions, which is fine, but an infeasible number of possible plans.

Yes, planning is done one move at a time, and the move yielding the best score is chosen. Because of greedy hill climbing, the number of actions to evaluate are added instead of multiplied during the plan composition, so the number of possible plans is actually fairly low.

1 Like

k4ng0u - 25th in Legend (Python 3).

This was my second contest, the first one being CalM. I really enjoyed the fact that the rules were simple and thanks to this it was easy to understand the different strategies and their flaws.

The starter kit was really (too?) well done as it split the phases of a turn (read inputs and update game state,train, move,output result) and had basic implementations for each phase. Even in my final code I more or less kept the same phases.

To get started I made my units move to opponent HQ instead of the map center and this instantly sent me to wood 3.

A flood fill algorithm to train and move units pushed me to Bronze by Friday evening. At this point I went on working on my bot even though I think that a good floodfill algorithm might reach silver league without towers/level 2-3 units.

My next “improvement” was to build level2-3 units to destroy ennemy units whenever they are close to my territory. This didn’t work too well as it depleted my gold and reduced my income so drastically that I wasn’t able to do anything as long as the costly units were standing. This made me realize the (over)power of towers (cheap and no upkeep). So I decided to put towers on all cells with a ditance of 2 from my ennemy border. With this, my bot was a solid top 50 and reached Silver league.

After that my bot slowly dropped in the ranking. This is when I noticed that I was losing a few games I was totally dominating in terms of territory. This was due to what I call the “finisher” move (and the fact that my bot didn’t know how to handle area protected by turrets). Some bots were saving gold to spend it all in one turn and finish the game without caring about upkeeps. While stalling the game isn’t always good, being able to know how much gold is needed to win in one turn is quite interesting. I used a djisktra from my hq taking into account the cost to access each cell: 0 for my cells, 10 for an empty cell, 20 for a cell with an ennemy unit lv1, 30 for a cell with an ennemy unit lv2/3 or protected by a tower. I had a protection dict to indicate for each cell how many towers were protecting it. During the dijsktra, each time a cell was visited, if it was a tower, I updated the protection dict associated to the path so that the cost of the next moves takes the tower destruction into account. The path computation takes an underwhelming 0.8ms. I wonder if my algorithm is particularly bad of if it’s just Python being very slow. Anyway thanks to the finisher move, I was now winning most of the games where stalling was involved.

On gold league opening, my game phases were as follows and it never changed afterwards:
_Move units (floodfill, try to avoid blocking other units)
_Play Finisher (if we can finish the game, finish it)
_Check if game can be finished next turn (based on opp hq distance to mine, my gold, and my income). If it can, skip the next phases (save money!)
_Play cuts (my initial basic implementation was just to train lvl3 units on top of opponent lvl3 units)
_Build towers
_Train units
_Build mines (build mines closest to my hq with the gold left if my income is less than the opponent income)

From Wednsday to Friday my ranking dropped between 70 and 110. I had 3 points that were really not going well:
1)cuts => many opponents were cutting my territory but I didn’t cut theirs.
2)floodfill => my oppponents were consistently spreading further than me.
3)tower building => Building towers on cells at distance 2 from ennemy border was wasting way too much money.

I wasn’t much motivated because I had no clue on how to improve point 2 and 3, and for point 1 given the computation time my diksjtra took, I wasn’t sure I could get enough depth to have meaningful cuts.

I tried to to study the top player games. But it was nearly impossible to understand why they did a specific action (the moves computed by simulations are not always deterministic and even if they are it’s not always easy to understand that a move is good because it will be interesting 3 turns later). This is when the legend league opened and the Broken Boss saved me. His moves were quite easy to predict (or at least it gave me good information on what type of gameplay I should use):
_move/train: always go to the tile that is nearest to opponent hq, in case of multiple tiles at equal distance, alternate priority to right and down (or left and up when we are blue). This would guarantee a fast expansion to the center and spread your territory at the same time.
_tower building: only build tower to protect cells that are in danger on the border. A cell is in danger if it’s not protected and if it’s empty or if the adjacent opponent unit can freely move onto your cell. If opponent gold+income>=30 all cells on the border are in danger.

With this I thought that maybe if I manage to cut plus strenghten my weaknesses with the boss strategy I could climb up a few ranks.
I started with the move/train expansion improvement that was relatively easy to implement. It was already 20-30 ranks gained.
Then I moved on the cut which gave me a hard time. I don’t know any algorithm to efficiently cut graphs. So what I did is to bruteforce all possible paths starting from ennemy cells that are adjacent to my territory with a depth of 4 (This was pruning a lot of interesting moves starting from neutral cells but it is all I could afford without timing out). Then I would compute the ennemy score and my score (based on constants, number of tiles owned, number of units, income), how much the cut cost me and the upkeep needed for the chain. With a mix of all this I tried to determine if it was worth cutting. The advantage of computing a score based on territory of both the opponent and me is that the cut algorithm also naturally handles reconnection to big blocks of inactive cells. I tried to adjust the constants a bit. But due to the weird mixing of information (income, gold, nb of units, unit levels), my cuts are still sometimes over aggressive or very defensive (particularly afraid of using lvl3 or cut an opponent lvl3 since it’s a 20gold income shift). But it still made me raise into the top 5 of the gold league.

My last change was turret management. So far I was filling the map with towers once I reached ennemy territory. I changed it into the Broken Boss strategy. It helped me a lot during the expansion phase and was a game changer. I was now only building necessary towers to protect the front line and using the rest of the money to expand my territory in early game. That and a few bug fixes made me pass to legend!

Many thanks to Azkellas and Tehelka, it was really an entertaining contest! I found it particularly pleasant that even without using heavy computations we could make a relevant bot. I would try to learn C++ for the next contest though as it’s a little bit frustrating to restrain from some possibilities because the language is too slow (and I’d like to check if my code is slow because of the language or the one who uses it :p). It would be quite interesting to know how the other Python programmers handled the cuts and path computation/scoring. For me it was definitely the bottleneck of my computation time. It would be nice also to have a story of the game conception, how it evolved and how Azkellas and Tehelka came out with such a strong Broken Boss!

8 Likes

Finished 27th, first in C, after luckily being pushed to legend by some kind stranger the night before the end of the contest.

So tl;dr : I built a system to gather replays from the CodinGame API, and trained a NN to replicate the actions of the top players of the leaderboard. It was a very fun experience.

If you want some (undocumented) Python sources to gather replays, you can find some scripts here.

General statistics about the contest

Since I have replays locally available, I thought I could share some stats about the game.

I Gathered 71k replays from players ranked 1 to 200 after the contest ended. On this player base, the winrate for the red player is 47.97%. The average number of turns per game is 17.5, with a standard deviation of 8.5 turns.

“Metagame” as a function of players rank

These chart have the rank of players as the x axis, meaning that on the left are the “strongest” bots, and on the right the “weakest”.




Less lvl1 units and more lvl3 units in the top. Which can be linked to the increasing number of towers in the top of the leaderboard I suppose.

In average 1 mine built every 5 games.

Lots of bugs :stuck_out_tongue:

Top 1-25 vs Top 26-200 : Turn by turn evolution of a game

Apologies if these are a bit hard to read. They compare the evolution of several metrics turn by turn.


There doesn’t seem to be a huge difference in average number of active tiles per player before the 7th turn. The top bots are probably better at choosing in which direction to expand however.

Less units and commands/turn for high ranking bots. Probably a side effect of using more towers and piling up gold for a lethal.

If you’re interested in other statistics about the game don’t hesitate to ask.

The Bot

So I wanted to try to make a bot that replicates other bots for quite some time, and since the contest looked adapted for convolutional NNs I thought I’d give it a shot (even though in the end I didn’t have time for a CNN, and simply used a few dense layers).

General idea

I wanted to keep the logic of the bot as simple as possible to focus on the learning side. In pseudo-code, here is what it does:

while number_of_avalable_actions > 0:
    a = find_action_with_best_fitess()
    apply_action(a)

So no cutting, no chaining actions, no simulation of the opponent turn, etc. All is done through an eval function, which evaluates the fitness of a (state, action) pair.

Gathering replays

I used a python script (available here) to scroll over CodinGames API, load replays of the top5 of the current leaderboard, keep the important part of the frames and store it locally as jsons. A cronjob called this script every hour.

Building a dataset

It’s easiest to train the model in Python, but ideally you want a fast language for your bot once the model is trained, so I implemented a referee in C, and used bindings with ctypes to feed the replays to the referee and gather “state vectors” which act as inputs of the dataset.

Output

(state, action) pairs that were seen in winning bots were attributed a label of +1.0

By choosing random action amongst the actions available, I could get (state, action) pairs winning bots chose not to do. Those were attributed a label of -1.0

Data from the loosing bots were discarded.

Input

Because of the 100k characters limit, I had to keep the number of inputs of my model relatively low (I couldn’t fit a lot more than 300 neurons in the input layer).

I focused on the state near the cell in which the potential action was being made, with a lot of inputs for the cells at a distance 1 from the center, a bit less for cells at a distance 2, and the bare minimum for cells at a distance 3. (using manhattan distance)

In the end, here is what the input feature vector looked like:

  • 99 inputs not impacted by the action
    • position of walls
    • ownership of tiles around the action
  • 82 inputs impacted by the action before applying the action
    • gold of each player
    • active tiles around the action
    • cost to reach tile
    • presence of units
  • 82 inputs impacted by the action after applying the action
    • gold of each player
    • active tiles around the action
    • cost to reach tile
    • presence of units

Having the state before and after the action is what allows the NN to determine whether this is a good action or not.

Topology

I didn’t try fancy stuff. If you have suggestion I’m love to hear them. Here is the model summary:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 263)               0         
_________________________________________________________________
dense (Dense)                (None, 30)                7920      
_________________________________________________________________
dense_1 (Dense)              (None, 20)                620       
_________________________________________________________________
dense_2 (Dense)              (None, 20)                420       
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 21        
=================================================================
Total params: 8,981
Trainable params: 8,981
Non-trainable params: 0
_________________________________________________________________

Activations are relu, except the last one linear. Loss is the MSE. Optimizer is default Adam from keras.

Thoughts about the contest

I thought it was great. Easy to play, hard to master, simple rules & mechanics but a huge depth.

As several people mentioned, it’s probably best to not change the rules for the multiplayer.

However, if rules are changed, I’d like to see a metagame more oriented towards efficient trading rather than piling up gold & towers to get a lethal first.

This would probably mean adding an upkeep to towers, reducing severely the cost of mines, and maybe making lvl3 units more expensive to train but cheaper to maintain.

Anyway, big thanks to @Azkellas & @Tehelka!

I fought, I lost, now I rest :sleeping:

20 Likes

Could this have been a mistake?

  • You were taking replays from the top 5 only: you can assume that even though the player lost, he played well in those defensive situation
  • Winning bots will likely less often have been put in defensive situations. If you discard data from losing bots, you discard valuable data to learn how to handle those defensive situations.

Congrats on making a NN sort of work here :slight_smile:

1 Like

I’m not sure, I didn’t try adding the loosing side’s data.

Intuitively, I think if you train on winner data only, you can end up with a trained model that is better than all bots you used to trained it (because you kept the good parts and disregarded the less good ones).

Alternatively, if you train and winner + loser, the best you can hope is an average of the bots used to train your model.

However, I agree, you get a bias when disregarding the loser data. You’re likely to end up, for instance with an over aggressive bot.

Choice of the input features can mitigate this bias: when I tried to add the coordinates of the action as a parameter of the NN, it improved the RMSE during training, but the bot had a tendency to rush for the opponent which lowered its winrate.

1 Like

First time doing a PM :open_mouth: so I’m going to copy MSmits’ format here :slight_smile:

I finished 3rd in the contest. Code of Ice and Fire is the first contest I spent the whole week on! Usually I get to bronze in the first weekend, take take around 5 days to coming up with an algorithm and implementing it in the last 2 days(or give up on the contest if I couldn’t think of one), but this contest had me addicted from the start!

Opening:

I considered the Game to be in the “opening” phase if the closest pair of unit between me and my enemy were at least 3 tiles apart.

I ran a beam search that first moved my units current, then trained as much as possible, and then simulated movement for 3 more turns(on existing unit and the newly trained ones). My evaluation function consisted of a voronoi to determine the territory difference, then number of tiles I owned, and tie broke my unit distance to the enemy HQ(precomputed using a BFS)

The depth of the beam search was dependent on the number of units in play, so I set an arbitrary max depth of 200

Chain Finish

Chaining was by far the best part about my bot. I believe I was the first one to implement the chain train to the enemy HQ on the first day of the contest.

To calculate the minimum cost to get to any tile, I set up a directed graph I could run a Dijkstra’s on. The weight of edges between two tiles A -> B was calculated in the following manner:

  1. check if B is an inactive tile. Weight = 0
  2. check if B has an enemy tower. Weight = 30
  3. check if B is a Neutral/Inactive Enemy Tile. Weight = 10
    (now the tile MUST an active enemy tile)
  4. check if B is protected by an enemy tower.
  • if B is protected by exactly 1 enemy tower and A is an enemy tower tile calculate cost in 5
  • otherwise Weight = 30
  1. check the enemy unit level on this tile(0 if empty or mine) and add 1 to it(capped at level 3). Weight = level * 10

This calculation was done between all adjacent tiles to set up the weighted graph.
To retrace the path, I stored the minimum cost to get to a tile as well as which tile it came from.

Cutting

To compute cuts, I took the minimum cost path to each enemy tile, simulated placing units on the path, and running a flood fill from the enemy HQ to calculate what the enemy losses from the cut. Along a cut I might also reactivate some tiles and this was factored into the evaluation. I evaluated each possible cut based on the following formula:

value = (enemyTilesLost+tilesGained) * A + (enemyTowersDeactivated) * B + (enemyUnitsLost) * C - (goldCostOfPath) * D - (incomeCostOfPath) * E

Finally, I took the “cut” with the largest value repeatedly until the largest value was < 0.

This algorithm can miss cuts if there are multiple minimum cost paths or if a more expensive path is actually better, but from my experience, this cutting algorithm rarely(almost never) missed a worthy cut.

Evaluation Function

Nothing in my search factors in enemy response so I put it in the evaluation. Whenever I evaluated, I simulated ending my turning, running the cut algorithm from the opponent side, and then greedily moving every enemy unit before ending the turn again(back to my turn) and then evaluating based on the following criteria:

  • difference in gold
  • difference in active tiles
  • difference in equity: units + towers + rubble harvesters(oops I meant mines)
  • difference in unit quality: determined by its adjacent cells and if it was sitting on a tile protected by a tower or not

Moving units and placing towers on the border

I ended up going with beam search again, except evaluating differently compared to my early game expansion. Something I did that was quite clever(IMO) was allowing placement of a tower on a tile currently occupied by one of my units, and forcing the unit to move away during the beam search if it was possible, or to take away the state from the beam search if it was not. This allowed my bot to plan ahead and synergize building a tower and moving units away from the border which lead to the desirable behavior of having my border tiles only defended by towers while my units are free to expand and claim more tiles.

Flow of a turn

  1. Take all cuts evaluated to be positive
  2. Beam search unit movement and tower placement on the border
  3. Try all possible tower placements(Except for tiles already adjacent to another tower)
  4. Evaluate
  5. Rerun except subtracting 15 gold at the start, and adding 15 gold before placing a tower(This was to prevent my bot taking too many cuts and exposing it to a big cut from the enemy)

Unfinished business
You might think a section is missing… and well, that’s because it is.
After the early game phase, my bot stops training units unless it’s for cutting(Obviously not optimal).
The strongest part of my bot was saving up gold and going for devastating cuts on the enemy while at the same time, not exposing itself to any cuts. Every time I tried to add in training units it would make my bot perform a lot worse and in the end I didn’t have time to implement it.

12 Likes

Hi folks! I’m a first-time competitor on CodinGame, and ended at ~500 on the leaderboard, so I’ll understand if you just skip this write-up and look for input from people who solved more interesting problems :slight_smile: On the other hand, I also learned about an important consideration for timing your bot, so maybe some other newbies will find this useful. Search ahead for ‘ms’ to find that piece!

A big part of my motivation for joining CodinGame has been to learn Rust better, and this has been a great exercise for implementing pathfinding logic in Rust!

I developed my bot in a bit of a backwards way, that I’ll describe now!

  1. Define data structures:
  • a 2d point with row/col
  • a Unit with owner, id, level, and location
  • a Player with gold, income, an optional HQ location, and a flag for me/not me
  • a map cell with a location, cell type (enum over impassable, neutral, mine(in/active), opponent(in/active), and a flag to show if another unit is moving to the cell
  • a map containing a vector of vectors of MapCells
  1. Parse all the inputs into those data structures (buildings were ignored, except to fill in the optional HQ location)
  2. Find shortest paths to neutral territory, to opponent’s territory, and to my territory.
  3. Make a move for each of my units. Making a move meant moving towards the closest neutral square that isn’t impassable or claimed, with ties broken by distance to the nearest opponent owned square, then claiming and activating the cell.
  4. Figure out how to spawn units (this probably should’ve been earlier!). Initially, this just spawned units at squares prioritized by distance to neutral territory, with squares closest to an opponent square being preferred in ties. It did this without accounting for neutral zones claimable due to movement or spawning though, which wasn’t ideal.
  5. add another shortest path search after unit moves.
  6. modify the moving logic to re-run distance evaluation after unit moves
  7. modify the spawning logic to add a spawned cell’s neighbors

Steps 1-4 did nothing of course, but it meant once I implemented basic spawning, I think got to run straight out of wood league. After that, I think 6-8 were all required to get out of bronze league and into silver.

I learned a good pointer that time starts running after you read a line, which resolved some confusion I had. It looked like my turns were taking 100+ms, but I wasn’t getting DQ’d, and after I learned that tip about when the clock started I found that despite running search 6 times, my turns still topped out at ~8 ms. I had some pretty big weaknesses at this point of course, but decided to focus on the codebase weaknesses rather than the AI weaknesses :wink: I’m pretty new to Rust, and fell prey to excessive cloning, which is well described by someone else here: https://thenewwazoo.github.io/clone.html

5 Likes

Hi @reCurse, congrats on another great performance on a contest and thanks for your post, really good read as usual.

I’m trying to understand your opponent handling. From what I could grasp, you run your plan search without considering opponent movement, then perform the chain search (or full plan search) for the opponent on the resulting state, and then rerun your plan search but now simulating the opponent chain or plan when reaching the same state.

I don’t know if I uderstand this end condition. Does it mean that if you determine the best end state is a state where you already have a stored opponent response, then you stop the search ? That would make sense, but then how would you have more than one response for the opponent on one state as mentioned here ?

Thanks !

2 Likes

No, opponent responses are stored regardless of the state they were found with. Those responses are then executed on every state being evaluated during a new search iteration. This can therefore result in invalid actions, which will truncate the response execution.

Imagine a big opponent chain is found and in the next iteration you move a unit in the way. You block it, so the chain is interrupted because of invalid action and the resulting score is much higher.

2 Likes