Ocean of Code - Feedback & Strategies

kovi · April 21, 2020, 11:08am

Thanks for the great game, it had all elements to be enjoyable even for 4 weeks!

I started a few days late, but quickly catched up with effective tracking+selftracking combined with heavy use of silence. From silver I was mostly top3 by implementing many ideas first/early. I’m still a bit disappointed with final run where I slowly down from #1 to #4. But it is justice, with many games my winrates were not good enough vs. actual top3. Gratz for them!

Most of the key ideas were already covered in detail multiple times (tracking, floodfill, finisher, damage-probability, mine-danger-map), so I just list them with a couple of less important but interesting heuristics:

eval: my turn only with most reasonable action combos/order (prefer silence in the end) + additional heuristics for specific actions
scoring: accessible good cells, enemy damage threat, detectability
init: biggest lake, 5x5 open area, near center, spiral movement, quick mining
mines: least overlap (weakened after surfaces), next to suspected enemy
triggering: conservative (except when mines overlap), possible wait for enemy stepping over it
enemy tracking probability adjustments: follow of island shapes, silence statistics, mine avoidance, not attacking!, stuck with few cells
sonar: charge last, use when it gives the most info (cut states/probability to approx. half)
player detection: by behaviour, for key top players, enable/disable some tactics vs. them
finisher + check enemy finisher and avoid those positions
last resort with no moves (or very last turn as 2nd player): use any action, trigger all mines, even “fire in the dark” or do a low-chance-finisher combo

PawelLabaj · April 21, 2020, 11:43am

Thanks for the contest,
It was my big return to the CodinGame. Also I encouraged my daughter to join the contest as well!

I finished in Silver League, but I still have lot of ideas to update my bot

Thanks again

pb4 · April 21, 2020, 11:55am

Can you say a bit more about this ?

kovi · April 21, 2020, 2:37pm

observable behaviour: initial position/movement, use/timing of abilities, etc.
reaction: re-weight some probabilities/coeffs (eg. expect balanced silence length use, strong mine avoidance)

pb4 · April 21, 2020, 3:46pm

How accurate do you think your prediction was ? I really like your idea, jolindien must have been easy to identify seeing how he started on the sides.

How much do you evaluate you could gain against a specific player with the right special tactics activated ?

kovi · April 21, 2020, 4:08pm

I didn’t spent too much time on this, but after watching my terrible looserate vs. Jolindien for a week, and generic techniques were not helping overally, I decided to give it a go.
But It won’t work all the time, I would say that it made me loose less by 5% (I didn’t benchmark, and seeing final run, I’m not sure about anything). Also it may have false positives on other players, which i accepted.

TBali · April 21, 2020, 4:42pm

Great contest, great time!
Unfortunately I could spend on it much less time, than in any of previous 3 shorter contests (maybe net 2-3 days in total), so my bottom Gold (~20% of field) result does not really justify sharing any strategies. (My opponent- and self-tracking was quite OK, but my heuristic action selection not so.)
Instead, an interesting observation: my last submit was 8 days before the close date. Despite this, my position fluctuated +/-25 places during the week (280-330). I was about to self-congratulate but in last 24 hours I lost 120 places… Which reminds me to https://youtu.be/qdmTw5L1YpY

wlesavo · April 21, 2020, 4:59pm

That is realy nice. I also liked your probability adjustment, i tried to come up with something like that, but not very succesfully

IloIlo_JRFerguson · April 21, 2020, 5:44pm

Thanks codingame and the creators @eulerscheZahl, @Illedan and @G-Rom for a great contest ! I really liked it despite having had very little time to play. I set myself the goal of reaching gold and finally did it after struggling with the silver boss.

When will the multi be available?

Looking forward for the next challenge ! (maybe calling it “spring” is not very global friendly, it’s autumn here in the south )

wlesavo · April 21, 2020, 6:02pm

Already available

pb4 · April 21, 2020, 6:22pm

Here is a link to my post-mortem for the Ocean of Code contest.

VincentBab · April 21, 2020, 6:40pm

#22
This is my second contest on CG (The first one was Code Royale back in 2018)
I’m very happy with the result considering I started April 11 but a bit disapointed not to have won the t-shirt

I first started with PHP as it is my main language but switched to C++ because I didn’t want to spend time to optimize. This allowed me to try some nice new features of modern C++ (auto, closure, “foreach”, …) and it was very pleasant.

My bot was quite simple with 2 main components:

Tracking
Everybody in legend had good tracking and my tracking was working almost the same as what @Saelyos explained except it didn’t have weight.
I use the same tracking on myself.

At the begining of each turn I build a bitboard of mine - 1 if probability of mine is greater than a certain threshold (high at the begining, lower at the end) else 0 - to help my path finding (more on that later)

This part took me roughtly half of the time to be bug free.

Decision
Power: if (turn <=16): MINE*2+SILENCE+SONAR else: TORPEDO > SILENCE > MINE > SONAR
Start: Center of largest square area

FATALITY > SURFACE > TORPEDO > TRIGGER > MOVE > TORPEDO > TRIGGER > SILENCE > TRIGGER > MINE > SONAR

FATALITY: check every combination of SURFACE / MOVE / SILENCE / TORPEDO / TRIGGER
SURFACE: if there is no more reachable cells considering the mine bitboard
TORPEDO: if at least 1 damage can be made and safe for me
TRIGGER: if probability to hit opponent is greater than a certain threshold and safe for me
MOVE: if in danger (considering mine bitboard) find the shortest path to the largest safe area else consider in this order: safeness(mine bitboard), number of reachable cells (mine bitboard), avoit cut vertices, number of possible positions, cell with the most edges
SILENCE: if my possible positions are below a certain threshold, always 0
MINE: if at least 4 more cells are covered
SONAR: if possible sectors >= 2 and opponent used SILENCE , try my sector or sector 5 or random

My move heuristics is probably what allowed me to reach legend despite its relative simplicity and only depth 1.
I tried some heuristics for silence (always 1, or up to 4 if the move heuristics says 4 time the same direction) but it was worse than silence 0 and I reverted this change for final submit

Congratulation to @pb4 for the win. I loved watching his replays and see how he swimmed so easily in mine fields

And last but not least: Thanks so much to the authors of this great contest @Illedan, @eulerscheZahl and @G-Rom I had a great time doing it

leojean890 · April 21, 2020, 7:50pm

72/2279 (overall), 8/435 (gold), 6/685 (python)

I implemented tracking propagating constraints, based on opponent orders : moves, silences (duplication of paths, impossibility to come back to already visited spots and to go out of the map), torpedo and trigger locations (BFS distances), and cohesion between theorical and real remaining opponent life (about TRIGGER and TORPEDO actions) for each possible position.

I kept possible enemy positions and paths, and associated mines positions turn after turn, and reset paths on “new paths containing only each possible position” when the global number of paths was too high if the opponent used too much silences.

My move actions were chosen using a ponderation of

myNumberOfPossibleSpots (to be untrackable)
numberOfFutureAccessibleSpots (to avoid useless SURFACE), calculated with BFS algorithm.
opponent mine probability (only if he is almost detected and if I am also, else he wouldn’t trigger).
bonus if my torpedo cooldown is lower than his and I’m close to the opponent
malus if opponent_torpedo_cooldown < my_torpedo_cooldown and I’m close to him

I used silences only when the enemy is close (minDist short), when I’m almost detected, or for killerMoves, or to escape the opponent when opponent_torpedo_cooldown < my_torpedo_cooldown
Its length depends on the needs (distance from the opponent).

I use surface only for killerMoves when I need to go through my own current path.

I use torpedos only if the enemy is almost detected, on the most central position among several accessible positions.
Idem for triggers, but according to a less restrictive threshold (it may be used a little as a sonar).

I used sonar only if it is useful to shoot him a short time later and if there are more than 1 possible sector for the enemy (and if he may be close).

I put mines not too close to each other, and to map limits. I use a decreasing factor to put more mines at the beginning than at the ending, because the opponent will have more time to go near them if I put them early.

My priority is to load TORPEDO and then MINE or SILENCE depending on the turn number and when everything else is loaded, SONAR.

I tried to do killerMoves when I can and “move, then shoot and escape with silence” actions when I also can.

I assumed that the enemy would load torpedo and then silence when he can to calculate his minCooldowns.

Congrats to @pb4, @jolindien, @valgrowth

Thanks to @Illedan, @eulerscheZahl and @G-ROM for this amazing contest !

ValGrowth · April 22, 2020, 1:41pm

Legend 3rd.

Thank you for the great contest, I had a lot of fun for a month

【Algorithm】
I used a bit modified MCTS.
In the expansion phase in general MCTS, one child node is randomly expanded and then it is evaluated, while in my method, evaluate all children nodes and the only one with the highest evaluation value is expanded.
If the kind of games in which the evaluation function cannot be designed at all, there is no choice but to randomly expand the child nodes like general MCTS. But in this game it is not difficult to design the evaluation function, greedily selecting and expanding child node can achieve more efficient simulation rather than randomly.

【Action Selection】
Permutation of MOVE, TORPEDO and SILENCE.
Then TRIGGER, MINE, SONAR.

MOVE
All directions.
TORPEDO
Fire only one target with the most high expected damage. Do not fire if expected damage is below threshold(=0.6).
SILENCE
All directions and distances.
MINE
Put the damage range does not overlap with other mines, if cannot do so in entire of map we can put it anywhere.
TRIGGER
like torpedo target selection, trigger only one mine with the most high expected damage. Do not trigger if expected damage is below threshold(=0.9).
SONAR
Use immediately if it works.

The priority of the skills to be charged is TORPEDO > SILENCE > MINE > SONAR.

【Evaluation function】
Use MyLife - OpponentLife as the basic evaluation value, and consider the following contents

Damage likely to be taken by an opponent TORPEDO next turn.
Damage likely to be taken by opponent mines next turn. (only if my position has been guessed to 15 coordinates or less)
Damage that my MINE gives oppponent cannot avoid.

In simulating my turn, additionary consider the following contents.

My TORPEDO, SILENCE cooldown
My number of cells that can be moved
The number of cells I may be in from opponent view

【Predict Opponent Position】

After strict narrowing down of opponent coordinates, I heuristically weighing them.
Doing flood fill to count the number of movable cells, and the higher the number of them, the higher the evaluation.
At the beginning of the game, this prediction doesn’t hit very well, so I used it after the opponent position was narrowed down to some extent.

pb4 · April 22, 2020, 3:49pm

I know what Jolindien will say about your use of MCTS

I am actually very curious, and I’m not sure I fully understand what you did.

How deep would your MCTS go on average ? How deep would it go in the “principal variation” ?

How is the uncertainty on the opponent position and path handled ? I mean, if he might be in several positions you can’t apply the same action to all those positions ?

ValGrowth · April 22, 2020, 5:22pm

I had a lot of trouble dealing with uncertainty, but I thought as follows.
When the referee progresses the game turn, the positions of both players must be unique.
On the other hand, when a player evaluates and selects his action, opponent’s position should be uncertain.

More details are as follows.

In order to do simulations, it is necessary to uniquely determine the opponent’s position and path.
I do it when selecting a child node from root node, randomly determines one position and path.
Each iteration, ramdomly select.
The root node has at most as many child nodes as the number of actions × the number of opponent’s positions.
Therefore, when the opponent’s position is uncertain at all, the number of children of the root node is so large, cannot search too deeply.

As a point to be aware of, even in the nodes of depth 2 and after, where the opponent’s position is determined, the opponent’s position is treated uncertainly when calculating the evaluation value.
Players estimate opponent position and path as usual, evaluate, select actions, and back up evaluation value to the root node.
When expanding a child node, we must simulate the result of action, so we must use the unique opponent’s position selected in the root node.

Depth were very variable, when both of players had been exhausted their skills and only striking torpedoes, it could explore very deeply, with a depth of about 15 turns.
(I think the one of the reasons why it is so deeply is that I set the constant C of UCB1 to a very small value(=0.01))
On the other hand, the depth is only 1 or 2 when the skill charges are full and the opponent position is uncertain.

The evaluation function is executed about 3000 times in one turn.

Fangel · April 22, 2020, 9:02pm

I was 14th. It was a nice surprise to have this contest during the lockdown. Even if, I lacked of motivation on the beginning. It is probably the 30 days contest effect. Overall, I think I spent the same time than for a 10 days contest but it was more diffuse :). Well I thank CG and the game authors for this work. Thanks also to all participants, it was nice to chat the day without motivation !

I code in python and I was on the edge concerning time limit. Thus, I had to optimize many aspects of my bot.

Pathfinding

So first step was naturally to be able to choose a path that does not lead to a surface every 5 turns. I used a simple flood fill for this. Depth 2, that is to say, for every direction, I keep the largest of its adjacent continuous component. With depth 1, there is indeed the risk to cut off a big zone in two. At the end I added some features to pass close visited cells. It actually enlarged the paths.

Tracking

I won’t detail all the points of the tracking process as many others did it. I used a tuple structure (rpath, rpos, rmines). “r” is for relative, rpos is the position in the relative map (29x29). Rpath denotes the visited cells and rmines the mining positions. rpath and rmines are bitboards, so this is a 3 integers tuple.

Then for every potential cell of the opponent, I add a set of these tuple with hp states. Then, when I compute a tuple transformation like a “silence”, I do a hash map of the tuple to tuple transformation to only compute it once by tuple. Nevertheless, it was not sufficient for ultra-furtive opponents, you know, the ones that does not load other weapon that silence… So if the number of different tuple was to large, I cut it.

I did not consider probability of presence for the potential cases and the mines. Maybe this was a weakness of my bot.

Evaluation

My decision was based on a heuristic. I considered 8 cases, for every direction with and without surface. For each I evaluated 4 possibilites:

basic (MOVE)
attack (MOVE with TORPEDO before or after)
silence (MOVE then SILENCE)
attack+silence (MOVE then SILENCE with a TORPEDO anywhere)
Then I complete with mines, trigger and sonars.

I did not have the SILENCE then MOVE option, and already missed some last hit because of it.

Among the items evaluated there were:

keeping the biggest continuous component for move
expected damage done to the other
risk of mine and risk to be targeted by a torpedo
stealth (log of the number of potential positions)
item consumed (I had a price by load point, so silence costs basically twice a torpedo)
distance (If i was winning on hp, I tried to reduce distance)

What I missed ?

I noticed that on the games I lost, a lot were on specific situations that I will call “zoning” and “the final fight”. I wished to have the time, the imagination and the motivation to build a strategy switch for these situations. Let’s detail it.

Zoning
That’s when players cover map with mines and globally split the map in two. With some players like blasterpoard or Illedan, it happened 90 % of time and I generally lost on it. If a player tries to enter the other zone, he has very small chance to win (except to last hit him). So, in this game, one has to make efficient paths to save time and optimize the cases when players cross at frontier and trade some torpedos. An interesting point would be not to move every turn to save time (on surface or mine).

Final fight This is when both players position are known, have no more silence and just blast each other as fast as possible. It needed some depth to guess what would happen… For instance 4 hp vs 4 hp, you think, “I move and strike first and will win”. But the other, reload after the first torpedo and launch the second before you… This needed some analysis of depth simulation but I am pretty sure I could improve on it.

Well, now let’s sleep properly and get ready for next challenge !

pb4 · April 23, 2020, 7:57am

Given all the post mortems above, I wanted to see if some players were statistically better in the early, mid or late game.

Here is a graph plotting game durations for the top 7, split on whether the game was a win or a loss.

What I read here :

jolindien forces games to finish early, but doesn’t seem skewed for early game wins
kovi forces games to finish late, and he’s indeed slightly more effective later in the game
Saelyos doesn’t force games to finish early or late, but he’s more effective in winning early

EDIT : same graph for the top 15:

tutubalin · April 23, 2020, 10:49pm

Final rank: 88th (25th in Gold).

My first bot was on Python and it was pretty basic: it tracked only final possible locations of enemy, no mines, no sonar, very simple navigation. Still it was able to reach #11 rank at some point. Though not for long.

Then I invented another algorythm, but I assumed that Python is not suitable for it due to performance reasons, so I switched language and created the new bot from the scratch.

My second bot was on Java and it had pretty good tracking algorithm.
First, now it tracked trajectories. Trajectory is:

Final location
Matrix of visited cells
List of locations from which mines were placed

Second, instead of just simple yes/no logic, now I used probabilities. Every trajectory has a probability. I analized battles and I found that:

It is more likely that opponent starts in center cell (about 12% chance) or in top-left corner cell (about 4% chance).
During SILENCE enemies usually travel 0 cell (about 40%) or 1 cell (40% in Silver, 20% in Gold) away. Travelling 3 cells away is the least likely.

Third, I used DP approach to track damage. Every trajectory also has “undistributed damage” field. When analizing TORPEDO and TRIGGER commands (both opponent’s and my own) trajectories that were hitted had this value decreased. In the end of analisys all trajectories that still had undistributed damage left were dropped.

Forth, it tracked mines pretty accurate: if opponent’s TRIGGER command can be applied to several different mines, trajectory forks: for every possible mine a separate trajectory was created without that paticular mine. So it was totally impossible that bot chooses a wrong mine to explode: it always took into account all possibilities.

Of course, when opponent TRIGGERs and trajectory has no mines that fits trigger location, that trajectory is dropped.

Fifth, I decided to drop self-tracking support. I replaced it with a simple “alert level” integer. When bot performs risky action (like SURFACE or TORPEDO) alert level raises, when it performs SILENCE, alert level decreases. And even that looked unnecessary.

Finally, using probabilities, I could build a “danger map”: what is the average amount of HP bot can lose if it stays in the cell. This danger map was used in Dejkstra algorithm as weight. Also, if bot finds it’s in too dangerous cell and SILENCE is fully charged, it will jump to the least dangerous cell. It helped to jump over mines or run away from enemy.

Alas, travelling algorithm was still pretty basic, so my bot ended up SURFACEing much more often than it should. Also fatality was not implemented. I think that was what didn’t allow me to reach Legend.

jolindien · April 25, 2020, 9:04am

Legend 2nd.

First of all thanks to the creators and to CG. You did a great job especially considering the tight deadline !
My pm is a bit late, but I had a lot of work to catch up on. Please CG don’t propose a month contest anymore

My strategy was based on two main algorithms : a mcts when the opponent is close, a monte carlo otherwise.

evaluation functions

evaluate actions :
- surface : -1
- trigger / torpedo : expected damages
- mine : .2 + nb free neighbors / 10
evaluate turn state :
let the player be in position c,
- self detection : 1 - proba to be in c - .5 * proba to be in a neigbor of c
- opponent detection : (proba that actions will discriminate c) * (1 - this proba), e.g for sonar (proba that opponent is in sonar sector) * (1 - this proba)
- hazards : proba there is an opponent mine in c neighborhood * self detection
- cooldowns : - sum of cooldowns / 10
evaluate terminal state: nb reachables cells (floodfill) / nb cells that are not island
decrease future evaluations : all evaluation are weighted with a decrease coefficient : 0.95^turn

guided monte carlo

nb computed turns by simulation : 8
do not consider opponent actions
consider trigger / torpedo only during first turn as opponent moves are not predicted
guide the search :
- player’s state is approximated by its position and path
- after each simulation, update the best score obtained for the triplet (position, path, action)
- during a simulation to choose an action :
  with a proba of .5, choose the best action found in (position, path)
  else choose a random action

mcts

fixed depth :
after a node expansion, process a rollout similarly to monte carlo method until the fixed depth is reached (I tuned fixed depth to 4).
considered actions :
- for my first turn, consider all relevant actions
- for future turns and opponent actions, a difficulty is that trigger and torpedo validities may change depending on previous actions.
  Trigger and torpedo actions were substituted by generic TRIGGER/NO_TRIGGER and TORPEDO/NO_TORPEDO. When applied, the chosen target was the one with the best evaluation.
- considered opponent : at each simulation the opponent state is chosen randomly in the set of its possible states

improving detection
A key issue is to estimate the probability that opponent is in a given state s.
The simple way to compute it is to assume an equal probability among all possible states : p(s) = 1 / number of possible states.
To improve this estimation one can use the historic of opponent states / action.
The motivation is that if a possible state can be lied with nonsense actions, it is very unlikely (for example a path that cross several mines).
To this end we can compute a “nonsense score” that cumulate other turns. For example at each turn the hazard of the position (or a nonsense torpedo, etc.).
Then nonsense scores must be transformed in probabilities.
The so-called softmax function (often used in classifiaction tasks) can do that. The probabilty of state s is computed with :
p(s) = exp(- nonsense score of s) / sum(exp(- nonsense score))

I wish I had the detection trick idea sooner – I added it only the day before the end – to tune it and test it more intensively.
Maybe a month was not enough after all.