Ocean of Code - Feedback & Strategies

Great contest, great time!
Unfortunately I could spend on it much less time, than in any of previous 3 shorter contests (maybe net 2-3 days in total), so my bottom Gold (~20% of field) result does not really justify sharing any strategies. (My opponent- and self-tracking was quite OK, but my heuristic action selection not so.)
Instead, an interesting observation: my last submit was 8 days before the close date. Despite this, my position fluctuated +/-25 places during the week (280-330). I was about to self-congratulate but in last 24 hours I lost 120 places… Which reminds me to https://youtu.be/qdmTw5L1YpY :slight_smile:

2 Likes

That is realy nice. I also liked your probability adjustment, i tried to come up with something like that, but not very succesfully

Thanks codingame and the creators @eulerscheZahl, @Illedan and @G-Rom for a great contest ! I really liked it despite having had very little time to play. I set myself the goal of reaching gold and finally did it after struggling with the silver boss.

When will the multi be available?

Looking forward for the next challenge ! (maybe calling it “spring” is not very global friendly, it’s autumn here in the south :slight_smile:)

2 Likes

Already available :slightly_smiling_face:

Here is a link to my post-mortem for the Ocean of Code contest.

27 Likes

#22
This is my second contest on CG (The first one was Code Royale back in 2018)
I’m very happy with the result considering I started April 11 but a bit disapointed not to have won the t-shirt :stuck_out_tongue:

I first started with PHP as it is my main language but switched to C++ because I didn’t want to spend time to optimize. This allowed me to try some nice new features of modern C++ (auto, closure, “foreach”, …) and it was very pleasant.

My bot was quite simple with 2 main components:

Tracking
Everybody in legend had good tracking and my tracking was working almost the same as what @Saelyos explained except it didn’t have weight.
I use the same tracking on myself.

At the begining of each turn I build a bitboard of mine - 1 if probability of mine is greater than a certain threshold (high at the begining, lower at the end) else 0 - to help my path finding (more on that later)

This part took me roughtly half of the time to be bug free.

Decision
Power: if (turn <=16): MINE*2+SILENCE+SONAR else: TORPEDO > SILENCE > MINE > SONAR
Start: Center of largest square area

FATALITY > SURFACE > TORPEDO > TRIGGER > MOVE > TORPEDO > TRIGGER > SILENCE > TRIGGER > MINE > SONAR

FATALITY: check every combination of SURFACE / MOVE / SILENCE / TORPEDO / TRIGGER
SURFACE: if there is no more reachable cells considering the mine bitboard
TORPEDO: if at least 1 damage can be made and safe for me
TRIGGER: if probability to hit opponent is greater than a certain threshold and safe for me
MOVE: if in danger (considering mine bitboard) find the shortest path to the largest safe area else consider in this order: safeness(mine bitboard), number of reachable cells (mine bitboard), avoit cut vertices, number of possible positions, cell with the most edges
SILENCE: if my possible positions are below a certain threshold, always 0
MINE: if at least 4 more cells are covered
SONAR: if possible sectors >= 2 and opponent used SILENCE , try my sector or sector 5 or random

My move heuristics is probably what allowed me to reach legend despite its relative simplicity and only depth 1.
I tried some heuristics for silence (always 1, or up to 4 if the move heuristics says 4 time the same direction) but it was worse than silence 0 and I reverted this change for final submit

Congratulation to @pb4 for the win. I loved watching his replays and see how he swimmed so easily in mine fields :star_struck:

And last but not least: Thanks so much to the authors of this great contest @Illedan, @eulerscheZahl and @G-Rom I had a great time doing it :slight_smile:

5 Likes

#72

I implemented tracking propagating constraints, based on opponent orders : moves, silences (duplication of paths, impossibility to come back to already visited spots and to go out of the map), torpedo and trigger locations (BFS distances), and cohesion between theorical and real remaining opponent life (about TRIGGER and TORPEDO actions) for each possible position.

I kept possible enemy positions and paths, and associated mines positions turn after turn, and reset paths on “new paths containing only each possible position” when the global number of paths was too high if the opponent used too much silences.

My move actions were chosen using a ponderation of

  • myNumberOfPossibleSpots (to be untrackable)
  • numberOfFutureAccessibleSpots (to avoid useless SURFACE), calculated with BFS algorithm.
  • opponent mine probability (only if he is almost detected and if I am also, else he wouldn’t trigger).
  • bonus if my torpedo cooldown is lower than his and I’m close to the opponent
  • malus if opponent_torpedo_cooldown < my_torpedo_cooldown and I’m close to him

I used silences only when the enemy is close (minDist short), when I’m almost detected, or for killerMoves, or to escape the opponent when opponent_torpedo_cooldown < my_torpedo_cooldown
Its length depends on the needs (distance from the opponent).

I use surface only for killerMoves when I need to go through my own current path.

I use torpedos only if the enemy is almost detected, on the most central position among several accessible positions.
Idem for triggers, but according to a less restrictive threshold (it may be used a little as a sonar).

I used sonar only if it is useful to shoot him a short time later and if there are more than 1 possible sector for the enemy (and if he may be close).

I put mines not too close to each other, and to map limits. I use a decreasing factor to put more mines at the beginning than at the ending, because the opponent will have more time to go near them if I put them early.

My priority is to load TORPEDO and then MINE or SILENCE depending on the turn number and when everything else is loaded, SONAR.

I tried to do killerMoves when I can and “move, then shoot and escape with silence” actions when I also can.

I assumed that the enemy would load torpedo and then silence when he can to calculate his minCooldowns.

Congrats to @pb4, @jolindien, @valgrowth

Thanks to @Illedan, @eulerscheZahl and @G-ROM for this amazing contest !

5 Likes

Legend 3rd.

Thank you for the great contest, I had a lot of fun for a month :slight_smile:

【Algorithm】
I used a bit modified MCTS.
In the expansion phase in general MCTS, one child node is randomly expanded and then it is evaluated, while in my method, evaluate all children nodes and the only one with the highest evaluation value is expanded.
If the kind of games in which the evaluation function cannot be designed at all, there is no choice but to randomly expand the child nodes like general MCTS. But in this game it is not difficult to design the evaluation function, greedily selecting and expanding child node can achieve more efficient simulation rather than randomly.

【Action Selection】
Permutation of MOVE, TORPEDO and SILENCE.
Then TRIGGER, MINE, SONAR.

  • MOVE
    All directions.
  • TORPEDO
    Fire only one target with the most high expected damage. Do not fire if expected damage is below threshold(=0.6).
  • SILENCE
    All directions and distances.
  • MINE
    Put the damage range does not overlap with other mines, if cannot do so in entire of map we can put it anywhere.
  • TRIGGER
    like torpedo target selection, trigger only one mine with the most high expected damage. Do not trigger if expected damage is below threshold(=0.9).
  • SONAR
    Use immediately if it works.

The priority of the skills to be charged is TORPEDO > SILENCE > MINE > SONAR.

【Evaluation function】
Use MyLife - OpponentLife as the basic evaluation value, and consider the following contents

  1. Damage likely to be taken by an opponent TORPEDO next turn.
  2. Damage likely to be taken by opponent mines next turn. (only if my position has been guessed to 15 coordinates or less)
  3. Damage that my MINE gives oppponent cannot avoid.

In simulating my turn, additionary consider the following contents.

  1. My TORPEDO, SILENCE cooldown
  2. My number of cells that can be moved
  3. The number of cells I may be in from opponent view

【Predict Opponent Position】

After strict narrowing down of opponent coordinates, I heuristically weighing them.
Doing flood fill to count the number of movable cells, and the higher the number of them, the higher the evaluation.
At the beginning of the game, this prediction doesn’t hit very well, so I used it after the opponent position was narrowed down to some extent.

16 Likes

I know what Jolindien will say about your use of MCTS :innocent:

I am actually very curious, and I’m not sure I fully understand what you did.

How deep would your MCTS go on average ? How deep would it go in the “principal variation” ?

How is the uncertainty on the opponent position and path handled ? I mean, if he might be in several positions you can’t apply the same action to all those positions ?

3 Likes

I had a lot of trouble dealing with uncertainty, but I thought as follows.
When the referee progresses the game turn, the positions of both players must be unique.
On the other hand, when a player evaluates and selects his action, opponent’s position should be uncertain.

More details are as follows.

In order to do simulations, it is necessary to uniquely determine the opponent’s position and path.
I do it when selecting a child node from root node, randomly determines one position and path.
Each iteration, ramdomly select.
The root node has at most as many child nodes as the number of actions × the number of opponent’s positions.
Therefore, when the opponent’s position is uncertain at all, the number of children of the root node is so large, cannot search too deeply.

As a point to be aware of, even in the nodes of depth 2 and after, where the opponent’s position is determined, the opponent’s position is treated uncertainly when calculating the evaluation value.
Players estimate opponent position and path as usual, evaluate, select actions, and back up evaluation value to the root node.
When expanding a child node, we must simulate the result of action, so we must use the unique opponent’s position selected in the root node.

Depth were very variable, when both of players had been exhausted their skills and only striking torpedoes, it could explore very deeply, with a depth of about 15 turns.
(I think the one of the reasons why it is so deeply is that I set the constant C of UCB1 to a very small value(=0.01))
On the other hand, the depth is only 1 or 2 when the skill charges are full and the opponent position is uncertain.

The evaluation function is executed about 3000 times in one turn.

10 Likes

I was 14th. It was a nice surprise to have this contest during the lockdown. Even if, I lacked of motivation on the beginning. It is probably the 30 days contest effect. Overall, I think I spent the same time than for a 10 days contest but it was more diffuse :). Well I thank CG and the game authors for this work. Thanks also to all participants, it was nice to chat the day without motivation !

I code in python and I was on the edge concerning time limit. Thus, I had to optimize many aspects of my bot.

Pathfinding

So first step was naturally to be able to choose a path that does not lead to a surface every 5 turns. I used a simple flood fill for this. Depth 2, that is to say, for every direction, I keep the largest of its adjacent continuous component. With depth 1, there is indeed the risk to cut off a big zone in two. At the end I added some features to pass close visited cells. It actually enlarged the paths.

Tracking

I won’t detail all the points of the tracking process as many others did it. I used a tuple structure (rpath, rpos, rmines). “r” is for relative, rpos is the position in the relative map (29x29). Rpath denotes the visited cells and rmines the mining positions. rpath and rmines are bitboards, so this is a 3 integers tuple.

Then for every potential cell of the opponent, I add a set of these tuple with hp states. Then, when I compute a tuple transformation like a “silence”, I do a hash map of the tuple to tuple transformation to only compute it once by tuple. Nevertheless, it was not sufficient for ultra-furtive opponents, you know, the ones that does not load other weapon that silence… So if the number of different tuple was to large, I cut it.

I did not consider probability of presence for the potential cases and the mines. Maybe this was a weakness of my bot.

Evaluation

My decision was based on a heuristic. I considered 8 cases, for every direction with and without surface. For each I evaluated 4 possibilites:

  • basic (MOVE)
  • attack (MOVE with TORPEDO before or after)
  • silence (MOVE then SILENCE)
  • attack+silence (MOVE then SILENCE with a TORPEDO anywhere)
    Then I complete with mines, trigger and sonars.

I did not have the SILENCE then MOVE option, and already missed some last hit because of it.

Among the items evaluated there were:

  • keeping the biggest continuous component for move
  • expected damage done to the other
  • risk of mine and risk to be targeted by a torpedo
  • stealth (log of the number of potential positions)
  • item consumed (I had a price by load point, so silence costs basically twice a torpedo)
  • distance (If i was winning on hp, I tried to reduce distance)

What I missed ?

I noticed that on the games I lost, a lot were on specific situations that I will call “zoning” and “the final fight”. I wished to have the time, the imagination and the motivation to build a strategy switch for these situations. Let’s detail it.

Zoning
That’s when players cover map with mines and globally split the map in two. With some players like blasterpoard or Illedan, it happened 90 % of time and I generally lost on it. If a player tries to enter the other zone, he has very small chance to win (except to last hit him). So, in this game, one has to make efficient paths to save time and optimize the cases when players cross at frontier and trade some torpedos. An interesting point would be not to move every turn to save time (on surface or mine).

Final fight This is when both players position are known, have no more silence and just blast each other as fast as possible. It needed some depth to guess what would happen… For instance 4 hp vs 4 hp, you think, “I move and strike first and will win”. But the other, reload after the first torpedo and launch the second before you… This needed some analysis of depth simulation but I am pretty sure I could improve on it.

Well, now let’s sleep properly and get ready for next challenge ! :grinning:

8 Likes

Given all the post mortems above, I wanted to see if some players were statistically better in the early, mid or late game.

Here is a graph plotting game durations for the top 7, split on whether the game was a win or a loss.

image

What I read here :

  • jolindien forces games to finish early, but doesn’t seem skewed for early game wins
  • kovi forces games to finish late, and he’s indeed slightly more effective later in the game
  • Saelyos doesn’t force games to finish early or late, but he’s more effective in winning early

EDIT : same graph for the top 15:
image

12 Likes

Final rank: 88th (25th in Gold).

My first bot was on Python and it was pretty basic: it tracked only final possible locations of enemy, no mines, no sonar, very simple navigation. Still it was able to reach #11 rank at some point. Though not for long.

Then I invented another algorythm, but I assumed that Python is not suitable for it due to performance reasons, so I switched language and created the new bot from the scratch.

My second bot was on Java and it had pretty good tracking algorithm.
First, now it tracked trajectories. Trajectory is:

  • Final location
  • Matrix of visited cells
  • List of locations from which mines were placed

Second, instead of just simple yes/no logic, now I used probabilities. Every trajectory has a probability. I analized battles and I found that:

  • It is more likely that opponent starts in center cell (about 12% chance) or in top-left corner cell (about 4% chance).
  • During SILENCE enemies usually travel 0 cell (about 40%) or 1 cell (40% in Silver, 20% in Gold) away. Travelling 3 cells away is the least likely.

Third, I used DP approach to track damage. Every trajectory also has “undistributed damage” field. When analizing TORPEDO and TRIGGER commands (both opponent’s and my own) trajectories that were hitted had this value decreased. In the end of analisys all trajectories that still had undistributed damage left were dropped.

Forth, it tracked mines pretty accurate: if opponent’s TRIGGER command can be applied to several different mines, trajectory forks: for every possible mine a separate trajectory was created without that paticular mine. So it was totally impossible that bot chooses a wrong mine to explode: it always took into account all possibilities.

Of course, when opponent TRIGGERs and trajectory has no mines that fits trigger location, that trajectory is dropped.

Fifth, I decided to drop self-tracking support. I replaced it with a simple “alert level” integer. When bot performs risky action (like SURFACE or TORPEDO) alert level raises, when it performs SILENCE, alert level decreases. And even that looked unnecessary.

Finally, using probabilities, I could build a “danger map”: what is the average amount of HP bot can lose if it stays in the cell. This danger map was used in Dejkstra algorithm as weight. Also, if bot finds it’s in too dangerous cell and SILENCE is fully charged, it will jump to the least dangerous cell. It helped to jump over mines or run away from enemy.

Alas, travelling algorithm was still pretty basic, so my bot ended up SURFACEing much more often than it should. Also fatality was not implemented. I think that was what didn’t allow me to reach Legend.

3 Likes

Legend 2nd.

First of all thanks to the creators and to CG. You did a great job especially considering the tight deadline !
My pm is a bit late, but I had a lot of work to catch up on. Please CG don’t propose a month contest anymore :wink:

My strategy was based on two main algorithms : a mcts when the opponent is close, a monte carlo otherwise.

evaluation functions :yawning_face:

  • evaluate actions :
    • surface : -1
    • trigger / torpedo : expected damages
    • mine : .2 + nb free neighbors / 10
  • evaluate turn state :
    let the player be in position c,
    • self detection : 1 - proba to be in c - .5 * proba to be in a neigbor of c
    • opponent detection : (proba that actions will discriminate c) * (1 - this proba), e.g for sonar (proba that opponent is in sonar sector) * (1 - this proba)
    • hazards : proba there is an opponent mine in c neighborhood * self detection
    • cooldowns : - sum of cooldowns / 10
  • evaluate terminal state: nb reachables cells (floodfill) / nb cells that are not island
  • decrease future evaluations : all evaluation are weighted with a decrease coefficient : 0.95^turn

guided monte carlo :face_with_monocle:

  • nb computed turns by simulation : 8
  • do not consider opponent actions
  • consider trigger / torpedo only during first turn as opponent moves are not predicted
  • guide the search :
    • player’s state is approximated by its position and path
    • after each simulation, update the best score obtained for the triplet (position, path, action)
    • during a simulation to choose an action :
      with a proba of .5, choose the best action found in (position, path)
      else choose a random action

mcts :kissing_heart:

  • fixed depth :
    after a node expansion, process a rollout similarly to monte carlo method until the fixed depth is reached (I tuned fixed depth to 4).
  • considered actions :
    • for my first turn, consider all relevant actions
    • for future turns and opponent actions, a difficulty is that trigger and torpedo validities may change depending on previous actions.
      Trigger and torpedo actions were substituted by generic TRIGGER/NO_TRIGGER and TORPEDO/NO_TORPEDO. When applied, the chosen target was the one with the best evaluation.
    • considered opponent : at each simulation the opponent state is chosen randomly in the set of its possible states

improving detection :nerd_face:
A key issue is to estimate the probability that opponent is in a given state s.
The simple way to compute it is to assume an equal probability among all possible states : p(s) = 1 / number of possible states.
To improve this estimation one can use the historic of opponent states / action.
The motivation is that if a possible state can be lied with nonsense actions, it is very unlikely (for example a path that cross several mines).
To this end we can compute a “nonsense score” that cumulate other turns. For example at each turn the hazard of the position (or a nonsense torpedo, etc.).
Then nonsense scores must be transformed in probabilities.
The so-called softmax function (often used in classifiaction tasks) can do that. The probabilty of state s is computed with :
p(s) = exp(- nonsense score of s) / sum(exp(- nonsense score))

I wish I had the detection trick idea sooner – I added it only the day before the end – to tune it and test it more intensively.
Maybe a month was not enough after all. :upside_down_face:

13 Likes

1 month was justified for this kind of game though.

If it’s too short and the game is complicated then there isn’t enough time to explore it.

2 Likes

Many thanks for this really rich and deep game.

I really appreciated the one-month format. The lockdown allowed me in one month about what I usually have in one week. I am now pretty sure there is a choice to make: go for legend or have kids! But still, one month was very nice.

I wanted to try and build some efficient functional-oriented algorithms to deal with grid-based games. This game was a perfect playground. I ended up with merging and intersecting lists of lists (list of (x, (list of (y, data))). Mixing that with an offset function, you get really efficient computations. It helps also in flood-filling or pathfinding algorithms.

I used only heuristics to choose the next moves, and I kept no records of the opponent’s mines. I could reach mid-silver this way, but I really should work harder if I want to reach legend some day!

2 Likes

Same types of questions as for kovi:

  • can you evaluate the maximum branching factor, both for your MCTS and MC ? I could consider 568 types of sequences on turn 0 when SILENCE and TORPEDO were available, I don’t get how this number allows any kind of meaningful simulation…
  • typical depth reached with the MCTS (not including the depth 4 rollout) ?

Can you also describe a bit more the rollouts that were used for the MCTS ?

I do like the “nonsense score” idea you brought to your AI :slight_smile:

EDIT: Also, I don’t get the logic of the “proba * (1 - proba)” you have described everywhere in your PM ?

1 Like

The branching factor can be reduced by pruning some actions. I pruned some without regret : torpedo/trigger that have no expected damages. I also pruned some (but not many) that seemed bad : silence when only one direction is allowed (only in monte carlo and after first turn), torpedo/trigger that have too low expectattions.
Hence the branching factor remained high, and my searches were not expected to find the global optimum. But when guided, one can be confident in finding good solutions.

In the case of mcts, I expanded a node after each action (not only after turn). Also I used two distinct (decoupled) trees (one for me, one for the opponent). A typical reached depth (for expanded nodes) was of 2 turns, but with some “knowledge” at depth 3-4. The rollouts are similar to MC but with mean update (not max).

The motivation for p(1-p) to reward the detection comes from the proportion of pruned opponent paths.
To illustrate this, consider a SONAR action and let p_i be the probability the opponent is in sector s_i. Assuming there is no other action in the turn that impact detection, if you choose to SONAR in s_i :
- you will success with proba p_i and will prune 1-p_i
- you will not succes with proba (1-p_i) and will prune p_i
In consequence you will expect to prune p_i (1-p_i) + (1-p_i) p_i = 2 p_i (1-p_i).
The same holds for trigger/torpedo where sector s_i is replaced with damage area.
Notice that it vanishes for p=0 and p=1 (nonsense actions :wink: ) and it is maximal for p=0.5 when you can split the set in two subsets of equal sizes.
image

4 Likes

Hello again everyone,
I have performed some improvements in my code in order to beat this terrible legend IA!
It was hard, I have added these functionalities to be able to do it:

  • Compute fatality, being able to do surface + move + silence + torpedo in any order to win (I didn’t include mines)
  • If in danger (mines or torpedo) and silence possible, compute the best combo move/torpedo + silence. Move and torpedo are optionals, but I think it’s better to perform them first. In this case, torpedo was launched if I have more than 70% chance of touching my opponent.
  • Timeout security. With the previous combo, I lost quite few games in timeout, I modified my code to compute between 1 and 3 turns in advance according the already consumed time.
  • Better starting position. Thanks to one of the PM, I am searching for the biggest square, if equality the one this the best neighbour squares
  • Modifying my computed score:
    3000 * MY_POSS_POSITIONS + 1000 * REMAINING_TILES - 500 * TORPEDO_RISK - 500 * MINES_RISK + DIST_TO_OPP
    MY_POSS_POSITIONS => Number of positions where my opponent think I am
    REMAINING_TILES => Number of accessible positions (computed with Connection Point, less effective than Tron but not that bad)
    TORPEDO_RISK => If my opponent could shoot near me, what are the percentage of covered possibilities by shoot near me + penalty if on my position (so silence with 0 or 4 would be better)
    MINES_RISK => Probability computed on each tile for mine, score is the sum of my path
    DIST_TO_OPP => Well, not sure if it is useful, but trying to be at more then 6 distances from opp, unless my torpedo cooldown is better than opp, in that case I try to be at a distance of 4.
  • Power-up order: TORPEDO > SILENCE > MINE > SONAR. And a little modification. If SILENCE cooldown is slower than torpedo, it is better to go for silence.

Now I am ranking 16 and truly happy. to be there =)
Thanks a lot to post the game that quickly in multi, it might seems weird but I didn’t had the time to finish all what I wanted to do during the month of the contest ahah.

P.S: I think my filling area (to compute remaining tiles) is not performant (basic BFS). If someone have a good idea to perform it (with connection points or something similar) it would be perfect =)

4 Likes

How can i detect if an island is around my sub?