Summer Challenge 2024 - Feedback and Strategies

jBigman · June 25, 2024, 3:43pm

jBigman: 167th Global, 1st in JavaScript

I developed an algorithm capable of emulating the four possible movements. For each movement, I independently analyze each mini-game to evaluate the game itselft and the potential medals

Hurdle: I determine the number of moves needed to finish and compare this with the minimum moves required by other players.
Archery: during the last 8 turns, I brute-force all combinations to see if I can get closer to the bullseye than the others.
Skating: I mainly try to avoid other players by checking where they might land on the next turn.
Diving: I determine the best possible diving points for each player and compare them to see if I can beat their maximum score.

Then i compare each move emulation, by adding the 4 mini games evals and the new total medal score.

If you want to read more details on my evaluations, on how i avoid the 0 medals, and how i priorized short term over long term medals : SummerChallenge2024 - Olymbits - jBigman PostMortem

jacek1 · June 25, 2024, 4:20pm

85th, 23rd in gold

I didn’t use any search, just trained NN to pick the move it thinks is the best.
First day trained directly on the referee using evolution strategies to see if the policy NN is feasible. It was. This method was sufficient to get into mid gold after all. But for the most time I was using deep Q-learning to train the network.

Most time was spent tweaking hyper params and net architecture/inputs. I had several runs from scratch. Thanks to that I had several differently learned bots to test against. Bot was easily in top100 all the time, unfortunately it was not enough for the legend. I desperately tried several things at the end, even with some search, but it didn’t work, because I haven’t done serious simultaneous games search before, let alone 3 players + randomness. All I can say with proud is that my bot doesn’t have any simulation. It was weird throghout the runs to see my bot was among the strongest ones, but it would lose with 0 points against 2 default ais in IDE, because it would not earn any gold or silver medals in hurdles. But because it got all golds in other minigames, it would still think it is winning (have good score on the moves).

Given the game was easy to simulate, I wonder why I didn’t try that from the start, 2 weeks should be enough to learn about tree search of such games. Meh maybe next time.

I really liked the design of the game, the pixel art and simple rules, though the statement contained some errors or was not 100% clear in some aspects. The viewer could contain more detailed information, at least in debug mode. People in discord were constantly asking why they got 0 points when they got most gold medals :). It seems using AI for cover images is a tradition now.

Appendix:
NN is 2 hidden layers with 200 nodes each, with 214 inputs and 4 outputs. 214 x 200 + 200x200 + 200x4. The inputs are from the perspective of me vs others. I experimented with various inputs. One thing that Lysk told me is to use hurdles relative to players, it sped up the training as it can be seen here (y-axis is winrate against test bots and green is the updated inputs).

Deep Q-learning params:

double deep q-learning, update target network every 4096 turns
NN learning: SGD with 0.8 momentum and alpha learning rate
alpha learning rate: 0.001 to 0.0005 during run; in final run fine-tuning with 0.00005 which stabilized the training
exploration: e-greedy with e = 0.05 (more in the beginning)
gamma: 0.99
1 million replay buffer
learn from 32 positions randomly from buffer each turn
each turn evaluate 3 players and get results, so each turn 3 positions were added to the buffer for learning
reward at the end only. 1.0 for 1st, 0.25 for 2nd and 0 for being 3rd

LeifTerry · June 25, 2024, 4:34pm

54th in Legend, 2nd in C.

My first legend finish in a contest!

Reproducible Tests
I did not create an offline tournament this time, as I just end up tuning my code to beat myself which doesn’t always translate into stronger bots.
Instead I found strong, deterministic bots (jack, sullyper) from among the other players and tested against those with a specific list of seeds.

Simulate Accurately
Try to avoid artificial weights and heuristics.
Fixed depth look-ahead, computed all 4^depth combinations and outcomes for player. depth=8 (9 if I had time). Compare player’s simulated results to best/worst possible opp results. e.g. definitely 1st, possibly 2nd, etc.
For Skating, depth=1. Considered all 16 combinations of opp moves, but not equally - opps are much more likely to move in direction of Diving or jump a Hurdle.

Combine Results
Weight games are behind in medals more.
Games that are in GAME_OVER, won’t finish before turn 100, or have outcome that can’t be changed (1ST/2ND/3RD) can be ignored.

IloIlo · June 25, 2024, 4:42pm

Hi @therealbeef, interesting, thanks ! Can you explain the process to gather data and train the NN for each minigame ?

cegprakash · June 25, 2024, 5:07pm

#103 Global

Postmortem: GitHub - cegprakash/cg_olymbits_post_mortem

Feedbacks:

The olympics theme was super fun
More minigames like Running, Javelin throw, Triple jump etc. would have been fun especially because it was super easy to simulate the games…
In the minigame input if we had MINI_GAME_TYPE before each mini game data would have been even more fun with random games spawning at random locations…
Captchas and IDE testing limit was super annoying provided that there is no officially supported tool for local testing
It was super cool just having the need to beat the boss in wood leagues
Contest duration of 14 days was perfect… I was able to implement all my ideas in the contest till I ran out off ideas.
Fiverr gigs looks amazing!!
Can we have Tshirts for top 103 ranks instead of top 10 please…

_Royale · June 25, 2024, 5:53pm

Legend 5th

Thank you CG and all the participants for this contests!

Base search algorithm: Decoupled PUCT without chance nodes.

Decoupled MCTS, a MCTS variant for simultaneous game: XMash Rush (CC07) - Feedback & Strategies - #38 by jolindien
No chance nodes, sometimes referred as “open-loop”: monte carlo tree search - MCTS for non-deterministic games with very high branching factor for chance nodes - Artificial Intelligence Stack Exchange
Selection : PUCT instead of UCB.
Started with a similar heuristic as @gaha, where each mini-game voted for its prefered move, to guide the search and prefer promising move for each player.
Finally I trained a very small network to replicate (supervised learning) the policy of UCB for a single player, by self-playing lots of games with high number of sims.
Simulation : rollouts until the end of the mini-games, where moves are sampled from the same policy network.
Reward is my_score / total_score (with a epsilon free silver medal for all players).

Things that did not work:

chance nodes or “closed-loop” approach
bigger/smarter NN for policy
for the first days of the contest, my simulation was broken because I forgot to reset this boolean
https://github.com/CodinGame/SummerChallenge2024Olymbits/blob/main/src/main/java/com/codingame/game/mini/HurdleRace.java#L124

wala · June 25, 2024, 7:56pm

Java #Legend.
First time I do a MCTS during a contest (it always seemed a bit suicidal using Java).
I did a “DUCT” :

I have one layer for each player : first layer 4 nodes for each of my moves, then the first opponent (16 nodes) then the second (64 nodes). It seemed easier that way even if it adds 4 + 16 nodes.
So when I select the most promising node I look through the next 3 layers.
Then to avoid that the opponents choose their moves knowing mine, when I rollback the teams scores, I update all the nodes with the chosen moves (so I update 1 node for me, 4 nodes for opponent 1 and 16 for opponent 2).
I only simulate 4 random moves, then attribute medals based on the teams positions (with some “light” heuristics).
For each team the wininng score is : sum of the delta medals scores of the team divided by the total for all teams.
I don’t start again the mini games. For skating I choose random risks. My game state is a byte array, without the scores. I used a cache with 400000 nodes.

But I only have a few thousands iterations each round. If the contest had lasted longer, I would have tried to translate my code in C++.
That was a really fun contest. Really nice idea. Thanks Codingame.

MSmits · June 25, 2024, 8:55pm

Thanks to the organizers for the contest. I made rank 8. It has been my best contest rank on CG in a long time as i’ve mostly been focused on optimization contests on other platforms and boardgame arenas on CG. I’m quite happy with the results and hope to join again next time.

Here’s my post mortem

It’s a bit short, but if I need to explain more, just ask me a question. I’ll answer and change the post mortem also.

wlesavo · June 25, 2024, 9:34pm

Thx to CG for keep doing a good work setting up a regular bot challenges. I reached bottom legend with a lucky push.

I realy liked the very idea of this game when one action controls 4 different games. I originally thought I wont have time for this, but the concept of this game was so facinating I had to participate. Although the game was frustrating in many aspects on high level, I really enjoyed the first week of the challenge and community discussions with all the familiar faces like in old times.

Strategy-wise I tried very similar ideas with mentioned by other players, trying to make SmitsiMax work, although I had less luck in the chosen direcions. Also I think I made way more hyper parameters then it was needed and couldnt resist tweeking them, which was absolutely pointless considering the randomness of submits.

Few things that actually worked:

Different exploration for me and opponents. This helped the search to stabilize.
Punishing for total score of all players.
Fixing bugs and reading referee code.

Vintarel · June 25, 2024, 9:36pm

26th overall

A result I’m very proud of ! The contest was fun.

I’ve written a pretty standard DUCT algorithm, nothing too fancy, based on the MCTS algorithm :

In every node of the tree, each player chooses one of the 4 possible moves thanks to the UCB formula. I stop expanding the tree when all 4 games are over. In reality, Rollers is only played at the root node, because otherwise its future states are not known.
The score I back-propagate from the leaves to the top is a weighted difference of the total scores (real or expected) after the 4 games are finished, the weights being the same in all the tree.

For instance, if A is currently winning, he wants to widen his score with the second player, say B. So the moves A chose are rewarded with the score : SCORE[A] - SCORE[B]. If A is actually right between B and C, I reward his moves with the score : SCORE[A] - 0,5 * SCORE[B] - 0,5 * SCORE[C]. And so forth. Locally, on my computer, I found that this difference of score gives much better results than the raw score.

In the UCB formula I divide the sum of the rewards by a “normal” score, e.g. an average of the expected scores. It should give numbers between [-1; 1]. It does not matter if it’s not.
It makes around 40k iterations per turn, and at the end of the search, the move I choose is the most visited one.
I use big arrays for getting the expected final scores of each game in O(1) computation. I suppose all players will play random (like a roll-out). For example, in Hurdles, if A, B and C are respectively in cells 13, 22 and 24, with no hurdles in sight, I compute that they respectively have an expected score of 0, 1.58 and 2.92 (if playing randomly).
I couldn’t store every states as it’s too big, but I came up with this approximate method : instead of doing A vs B vs C, I look at A vs B and A vs C separately. Then, in hurdles for instance, I have an array P and P[xA][xB][stunA][stunB] is the probability that A wins (or ties) vs B only (in a pure random play). The same array gives the same probability vs C. If the former is denoted pAB and the latter pAC, then the expected score of A in this game is :

E = 3*pAB*pAC + 1*(1-pAB)*pAC + 1*(1-pAC)*pAB.
Separating duels is not perfect because C can infer in the A vs B duel (if C finishes first), but it is close to perfect (I checked locally). This is the same for all the games (they all gave me some headaches to code, but then they are quickly computed !).

Possible improvements :

I started to code a version where the final move is chosen randomly, based on the visit counts (like in the paper on DUCT). It gave rather promising results but I didn’t have time to finalize it.
I wanted to try Nash equilibrium but I didn’t get the algorithms to compute it.
I started to code a version where Roller is played, at least for the first layer of the tree, having 24 times more nodes, one per different permutation of (U, L, D, R). I coded it locally but it gave poor results, I guess because of the bugs and the parameters I didn’t tweak.

Thanks to Codingame for this challenge !

Remi · June 25, 2024, 9:58pm

438th global / 376th gold

I played to 3 games only, the roller wasn’t existing.
My solution works in two parts.

Part 1 : select pathes to explore
I did it like a Monte Carlo but carefully selected the paths to explore instead of doing them randomly.
I started from the ideal path of swimming, in which I sacrificed the last 3 laps of archery to test the 64 possibilities.

Part 2 : simulate
I simulated the 3 games (hurdles, archery and swimming), and every time I fell because of a hedge, I added to my list of paths to simulate the path that would have avoided the fall. At the end of each simulated path, I compute a score and select the path with the best score.

I added in the 2 last days a feature to ignore the swimming game if it wouldn’t change the end (I won in advance or I lost in advance, regardless of the actions). In this case, I simulated all possible pathes with depth 6. This allowed me to move up 50 to 100 places in the ranking.

Thanks CG for the game, it was really interesting, it was the first time I used simulation in a contest.

Illedan · June 25, 2024, 11:01pm

Thanks for the challenge. This was a very good game

Ended 12th and 1st C# missing the frog

Key features:

Psyleague FTW I have 140 version locally and 7 archives of leaderboards where I restart.
Smitsimax
Stop it at depth 7
Don’t create new minigames - Regret not trying it though, as Smitsimax handles this nicely with random moves in skating
Create some priority based on my current score and the amount of score gained from getting a new gold medal in that game (priority has a total sum of 1 for all 4 games and are divided by how much I would gain)
- Tiny extra bonus to Skating since this seems to be the primary source of losing for top players
- Calc: 5 + scores[gameId] / maxScore * (Program.Turn > 30 ? 20 : 5);
The first turn I pick the tree with the highest points as all players are equal. Trying to track equal players for multiple turns seemed worse, and I don’t understand why…
Eval is sum for all games:
g.GetScore(playerId) * Priority[player][gameId] * (!done ? 0.8 * g.TimeLeft() : 1.0)
- Games return scores of: Solo first: 1.0, shared first: 0.6, second: 0.3, shared second: 0.15
No eval in regards to who wins and total medal score - But I tried real hard on that, about 30 versions locally and nothing sticked for a better bot

Please let us know about the t-shirt prior to the contest next time. Would have skipped sleep for top 10 t-shirt

And thanks for the screenshot prior to the game. This made me create this interface + Diving, Archery and some initial version of the search: (didn’t get all the details correct in the sims, but it saved me some time when it started)

    interface IGame
    {
        void Reset();
        void Save();
        void Move(int[] moves);
        void GenerateNew();
        double GetScore(int player);
        double GetScore(int player, int move);
        int GetState(int player);
        void SetFromState(int player, int state);
        bool IsDone();
        void ProgressTime();
    }

Andriamanitra · June 25, 2024, 11:16pm

Gold (781st global)

I decided early on that I didn’t want to spend much effort on the contest so I would just focus on heuristics and not try anything fancy, so I was pleasantly surprised to make it into the Gold league. In the end my solution was about 280 lines of Python.

My approach

I used heuristics to score each move in each minigame independently. To do this I structured each minigame into its own class with a method that returned a MovePrefs object (for example MovePrefs(UP=2, DOWN=2, RIGHT=3, LEFT=1)). The main loop would then take the weighted sum of these preferences to pick the best move.

Hurdles: +1 per square moved, -6 for hitting a fence
Archery: +1 if the move is towards origin, except on the very last turn I gave +10 for each enemy that the move is guaranteed to beat even if the enemy plays their best move. Archery was my worst minigame, I didn’t really like the problem so I didn’t spend much time on it. It would’ve been pretty simple to look more than one move ahead.
Roller Skating: +1 per square moved, increasingly negative penalties for taking on more risk depending on current risk level, and a penalty for getting stunned (with lower penalty for getting stunned if there’s less than 3 turns left). I also estimated the risk from collisions by assuming (non-stunned) enemies move 1 square 25% of the time, 2 squares 50% of the time, and 3 squares 25% of the time.
Diving: +(1+2*combo) for the correct move, except when the victory is already guaranteed (current points > maximum possible points if both enemies play the rest of the game perfectly) or 3rd place is already guaranteed.
I discarded minigames that have no chance to finish on time before round 100
I prioritized the minigames in which I had the fewest points so far by weighing them by 1/(current_score_from_minigame + 0.1).

Feedback

I really liked the concept of having multiple simple minigames played at once. Compared to some of the previous challenges it was really easy to understand how the game works and get started with coding (I don’t think it was even necessary to start with only one minigame). I think always having the same 4 minigames was a huge missed opportunity. Having a pool of about 8 minigames out of which 4 would always be randomly chosen (with repeats!) for each match would have made the challenge much more interesting.

tierrarica616 · June 26, 2024, 12:36am

gold 421st java

i appreciated the novel game concept

algorithm notes

3 depth look ahead all possible states for hurdles, archery, diving
1 depth greedy choice skating
decrease heuristic weight of games where too far behind
decrease heuristic weight of games with more golds as to maximize score (product)

wish i could give more energy to these contests, it is fun. good game all and i look forward to reading your posts.

therealbeef · June 26, 2024, 4:05am

Just local self-play against some of my other bot versions. Then, for each turn of each match check the state of the minigames and their end result. Training input is the mini game’s state, and training output is 3x a 0 or 0.5 or 1, depending on the final medals of that mini game.
The goal was to get better eval than my manual attempt (which wasn’t bad either, got me to #1 for several days). I think in the end it gave about 65% win rate against manual eval.

FrictionlessBananas · June 26, 2024, 5:03am

Here is a write-up for Frictionless Bananas (Legend League, 46th place): Frictionless Bananas - CodinGame Summer Challenge 2024 (Olymbits) Entry

jfaixo · June 26, 2024, 9:03am

Thanks to CG to keep hosting these contest, the game was interesting !

I ended #20 Legend, with Rust of course !

Approach

I’ll not elaborate much as my approach is very similar to all the other ones above. In short:

Search based bot : I directly started with a DUCT, switched to Smitsimax before Legend (with a small score gain, but both were performing well)
“mcts style” rollouts: I do fully random rollouts until all minigames are ended. So the depth can vary depending on the state of the turn (if all minigames needs to be finished, the tree will be deeper). For skating, I randomize the gpu at each search iteration (that feature was a noticeable boost)
scoring: my scoring is a combination of 2 KPIs, one based on the true score (with the medals multiplication) differential of the bot vs the 2 others, and an other one which is simply based on the addition of each minigames score. That second part was intended to guide the search while all true scores are “0”, and to allow the bot to race for all minigames medals (otherwise, true score would be very inclined to play only for the medal that benefits most to the true score, ie only on the minigame that has the current lowest score)

Tips/things that worked

I am very happy of my result regarding the time I managed to invest in this context. Basically 4 big sessions, in which I managed each time to have significant improvements. This was possible because of the points listed below
TDD for parsing and simulation: I can’t emphasize how this helped me to go forward, as I knew my parsing&simulation was really perfect and bug free. I have a 100% code coverage on parsing logic and simulation logic. When things does not work in your bot, you tend to fear that there is a bug hidden somewhere. This allowed me to be very confident in these parts of the bot that are specific to the game (I did not test the search, but it’s less than 100 LoC and I tinkered a lot on it)
“pro tip”: A thing I implemented day 1 was to permute the game inputs so that my bot is always the player_index 0 after parsing (here again, unit tested)
relatively strong perf: in 45ms my bot was able to do ~400k single turn simulations included in a duct search (which corresponds to a rough mean of ~30k search iterations with my variable depth setup). Thanks to Rust , and some experience in mcts & bot implementation…
a big thank you to Psyho and his tool psyleague , that allowed to offline test bots. In this game, local improvements have almost always translated in ranking improvements in the arena. A true compass while at sea

Things that I did not manage to make work

tree reuse (very poor scores)
simulate minigames past their end horizon
a lot of evaluation variants
…

darkhorse64 · June 26, 2024, 9:04am

#31st Legend, C++ (my best ever result)

First of all, congratulations to all participants and thanks to CG for organizing the event. As many other competitors, I suffered from occasional timeouts which I could not relate to any bugs on my side. Apart from that, things went smoothly. I particularly liked that the event lasted two weeks which gave me enough time for coding while dealing with real life constraints. For the same reasons, I appreciate that leagues were opened at a steady rate.

I liked the game topic. CG guys are insane!

My bot is deceptively simple:

the search is a DUCT. This MCTS family algorithm is designed to handle simultaneous moves while keeping the nice property of finding efficiently the “needle in a haystack”. See Coding Games and Programming Challenges to Code Better for an explanation and LudiiExampleAI/src/mcts/ExampleDUCT.java at master · Ludeme/LudiiExampleAI · GitHub for an example implementation. Given my prior experience with MCTS and the low branching factor, I thought I would have more chances with this search rather than writing an if forest. Even if I had to implement it from scratch, I reused large chunks of my previous MCTS bots code. Within a day, the search was ready and my initial submit gained me +900 spots in the rankings with a +6 against the Silver Boss.
my rollouts run current minigames until their end or until the 100 turns (for skating, I use random gpus)
for a long time (approximately one third of a total game), the score remains at 0. Therefore, to trigger my search towards scoring medals, I compute scores with non zero minigames only multiplied with a factor that grows with the number of non zero minigames to reward scoring on zero minigames.
my evaluation for each rollout and for each player is the difference between the final score and the initial score divided by the sums of differences for normalization. This rewards scoring heavily.
the exploration parameter is tuned to favor exploitation
I did not make any attempts to optimize the simulation code: I run 400-500k sims per 50ms during search. That looked good enough.

It is very important to include skating with random gpus in the simulation (thanks Thyl for the tip). I have no real explanation for this but my bot gained + 4 TS with it and this promoted me to Legend with +1 against the Gold Boss. Fun fact, I had a huge bug: the skating gpu was not random but repeated the current value. Randomizing it is better but not that much.

In retrospect, I should have paid more attention to rollouts: instead of using purely random moves, I should have incorporated randomly chosen good moves for diving, hurdles or archery.

Inaniwa · June 26, 2024, 11:12am

86th, 24th in gold, C#

I used the greedy method.
For each mini-game, a (rating * weight) was calculated and added together.
Each game is evaluated based on the number of losses and draws, with the best moves being made.
Games with fewer medals already earned and games nearing the end have a higher weight.

(More information, Japanese language)
https://inaniwa.hatenablog.com/entry/2024/06/25/180350

And thank you for hosting the contest.
Fighting bots that only focused on one minigame wasn’t much fun,
but the judging was solid and I had fun competing in the Gold League.

blasterpoard · June 26, 2024, 11:22am

#6 Legend

I wanted to be a bit original with my search algorithm, and the thought I started with is that I really, really disliked the idea of UCT forest/smitsimax for this game and I wanted to fix it. The reason why I thought smitsimax wouldn’t do well in this game (and hey, I beat all the smitsimaxers, so I guess I’m right) was the diving minigame: often, my opponents’ 2nd move (continue the diving streak or not) will heavily depend on whether I chose to dive in my 1st move, and in those cases, the results of smitsimax would probably be garbage.

In an attempt to fix this, I combined smitsimax and DUCT - I still use three separate trees like smitsimax, but instead of using one tree per player, each tree was a DUCT-like tree for a pair of players. The selection part still happens just like in smitsimax, but each player sums up the UCT formulas from their 2 trees, and instead of random rollout, I evaluate minigames in progress after one tree expands (skating only at depth 1; rollouts worked worse for me). I hoped that with this, I could reach higher depth than a DUCT would, without losing the player interactions like smitsimax does.

Of course, it backfired.

Consider a game state where 2 players are 3 spaces away from the hurdles minigame’s goal, and a starting diving minigame with the first action not being RIGHT. This is an example of a prisoner’s dilemma - if both players agree to split the gold medal one turn later, they can also both start their diving streak and not fall behind the third player in that minigame, but there’s also the risk of one player just getting the hurdles gold medal in one turn. My algorithm consistently converged to cooperating with the other player and it was getting consistently ‘betrayed’ on the leaderboard. My attempts to fix this were only partially successful: the medals gained by the opponents were already in my eval, decay made the bot play much worse (maybe it messed with the weights of different medals), adding a bit of randomness to the tree node selection did help a bit but didn’t solve the problem completely, paranoia and contempt did not work at all.

The main problem with this, even when I managed to find some workarounds to the issue, is that my selfplay became mostly useless, because any bot that switched to betraying more often would be outcompeted by the cooperating ‘friendship forests’ in my local arena, and thus my attempts at parameter fiddling based on selfplay winrate did not do well - which was when I originally found out what’s happening with my bot.

Not having reliable selfplay meant that I had to somehow come up with an evaluation function for unfinished minigames. I assumed that the number of spaces a player advances in skating and hurdles would be approximately normally distributed, and therefore the probability of one player being overtaken by another would be a sigmoid function of the player’s lead. I downloaded a lot of top players’ games using the CodinGame API to verify that this is indeed true and to be able to find some functions that approximate the average winrate relatively well. This approach also worked for archery (the sigmoid is a function of distance from center divided by average wind speed) and diving (function of the number of turns that the player ahead would need to avoid diving at the end of the game to be overtaken by the player behind). The approximation gets imprecise when there are only a couple of turns left in the minigame which would be a problem for the skating evaluated at depth 1, so I also created a dataset of win probabilities (~1000 data points) for the skating minigame from a given game state close to the minigame’s end (now many spaces a player is ahead, each player’s risk, turns left). For illustration, here’s the relation between a player’s lead in hurdles, his position, and the average game result, based on ~10000 legend league games: