The topic where you can tell how you liked the challenge, & how you solved it.
Thanks to Codingame and Fiverr for organizing it, and congratulations to @MSz, @reCurse and @Nanaeda for the top 3 !
The topic where you can tell how you liked the challenge, & how you solved it.
Thanks to Codingame and Fiverr for organizing it, and congratulations to @MSz, @reCurse and @Nanaeda for the top 3 !
I ended up around 56th place in legend. This was an odyssey for me considering that I lost a lot of time due to factors that were completely out of my control and that I was stuck at the top of gold near the final hours of the contest, but I finally reached legend for the first time and I am very happy with my bot!
I believe that part of the reason that I got so far was because I wrote my bot in Rust, historically I have used C++ but having all the modern features and crates that come with Rust makes everything so much faster and easier for me to deal with.
The first thing I worked on when the contest started was simulating the game. For this I created a bunch of structs and methods for all of the minigames and the entire state of the game. When a minigame ended I would simulate a random turn based on the logic from the source code of the game on GitHub.
For example, to generate a new set of wind speeds for the archery game I copied over the list of probabilities for each wind speed from the game’s source code and randomly sampled from that list in my simulation. It didn’t matter so much if the code was suboptimal at this point, I could always optimize this at a later stage.
The great thing about how I organized my code was that I was able to set up a local referee that could use this simulation code, while also using the same code for my bot. That way I didn’t have to worry about duplicating code and making mistakes in either my bot or the referee and not in the other.
But what if both the referee and the bot are incorrectly simulating the game? For that I decided to set up some unit tests.
I played a game in the web browser and I manually collected the inputs that were given to my bot for all 100 turns by printing them out to the standard error stream with eprintln!()
, and I also recorded the outputs for all three bots at every turn. I then created a big unit test locally to read in these inputs and outputs.
Next, for every turn I fed the input string and the outputs for each player into my simulation and compared the input string for the next turn with the one I had recorded, ignoring the GPU string when a new game started because I obviously wasn’t going to be able to predict that when it changed (I did consider trying to crack the random seed, but CodinGame recently changed the random algorithm they use to a cryptographically secure one, so that was out of the question).
To my surprise the simulation was almost perfect out of the box, there were a few small mistakes but I fixed them and I now had accurate simulations!
Once I had the simulations working, the next step was to implement a framework that could take in an algorithm and use that to find the best action for me to take during every turn of the game.
I like making my code as general as possible because it allows me to reuse parts, so I worked on creating trait
s (you can think of them like interfaces if you have never used Rust) for game states, game actions and players. This way I could write the code for my algorithms independently from the code for the game and avoid having to focus on specifics.
Also, I decided to allow my algorithms to simulate until the end of the game even if the states were heavily impacted by randomness because algorithms such as Monte Carlo tree search should be able to adapt and learn from the random variation of these states.
Once my algorithm framework was set up, the first thing I tried out was beam search. It failed miserably, but after debugging my code locally with VSCode and CodeLLDB I realized that the issue wasn’t the algorithm, but my search. I was using the game scores of the minigames multiplied together, but the problem was that this was basically zero for most of the game.
This was causing the beam search to essentially pick moves at random because all the scores were always zero. To fix this, I made a heuristic scoring function based on different aspects of the games (how far ahead I was in hurdles, how close I was to the centre in the archery game, etc.) and this shot me up straight to the top of silver.
Because I had a local referee, I was also able to get some code to test different hyperparameters and that helped me a lot, but after messing around with it a while it became apparent to me that beam search wasn’t going to get me any higher than the very bottom of gold.
The next thing I tried was Monte Carlo tree search with a branching factor of 4 moves ^ 3 players or 64 nodes. At first I implemented it using only my UCT score, but then I read about Decoupled UCT or DUCT and I consulted this and this papers to implement it in my code. I then proceeded to optimize my hyperparameters and surprisingly my heuristic wasn’t helping anymore, so I got rid of it.
This sent me up to around the middle of gold, but yet again I couldn’t find anything else to improve in my code until I realized that I was only doing about 2000 simulations at every game turn. This was very surprising to me because I thought my code was efficient.
I was about to focus on optimizing my code when I noticed something. I saw MSmits in first place and a light bulb turned on in my head.
Everything was falling into place, I had read about Smitsimax before (MSmits has a very good write up on the algorithm here) but the main advantage I see in it is that there’s a tree for every player as opposed to a tree for the entire game, so instead of having an increase in num_moves ^ num_players
child nodes at every level, you end up with only num_moves * num_players
nodes (64 vs 12 for this challenge), greatly reducing the amount of nodes that need to be stored in memory.
I greatly suspected (although this might not be the case) that MSmits was using Smitsimax, and it’s not too different from Monte Carlo tree search, so it naturally felt like the next thing for me to try out.
I finished working on it and it shot me up to around 100th place in gold. I knew there were more things I could optimize, but I was slowly running out of ideas and time was ticking.
There are a few major things that helped me clear the home stretch into legend:
I talked about this earlier, but I was achieving around 2000 simulations per turn, and therealbeef mentioned on Discord that he was getting around 250k in C++, so I knew I was doing something wrong. I learnt that RustRover has a profiler (although I really dislike it) and using that I realized that the rand
crate from rust was eating up a LOT of time.
I decided to get rid of the crate and create my own random number generation with fancy bit magic, and that got me to around 30k simulations per turn.
I also realized that I was allocating a lot of structures on the heap, so I replaced those structures with my own stack friendly implementations from scratch.
Additionally, I precomputed some of the GPU strings to save even more time. For skating there were only 24 possible strings, and for hurdles only 251, but for diving there were over 1 billion possible strings and for archery there were almost a quadrillion possible strings with different weights, so what I did was to create a script to simulate a whole bunch of strings for those two games and use around 200 of the most commonly repeated strings for the precomputed arrays. These optimizations then allowed me to get to around 43k simulations per turn.
I am sure that I could have squeezed out a lot more simulations out of my code if I had dedicated more time towards this, but for now this was going to have to do.
Looking on discord I saw a comment by wontonimo:
A good eval has a smooth(-ish) gradient, and is an underestimate.
Followed by this comment by LazyMammal:
I’m guessing he’s talking about any mini-game that rewards medals. If you only count “whole” medals the reward function will be in discrete steps. It only gets worse if you are evaluating the combined score (product of medals point). Getting a reward of “zero” is much worse than a fractional reward.
This made a lot of sense, because at the current moment my score was just the product of the medal scores, which was still zero in most cases.
My old heuristic based on individual goals for each of the different minigames didn’t work, but I decided to try adding the number of gold medals I have to my score instead in order to make the evaluation function smoother and help Smitsimax choose paths with more certainty. This really helped me and shot me up to around 25th place in the gold league.
There were only a few hours left and I was getting desperate. I couldn’t find any more ways to improve my code and I was going to have to refactor a lot of code to improve my simulation count. I decided to re-write a few things such as Smitsimax and while I was looking for copper I struck gold. There was a big mistake in my implementation of Smitsimax where I wasn’t expanding the trees properly and I was dropping the movements for some of the players.
I was honestly surprised that the algorithm even worked without some of the player’s inputs, but I rushed to fix the bug and I finally had an advancement. I submitted my code, and to my dismay I was losing a lot of games due to timeouts. I tried lowering the time limit for Smitsimax from 50ms to 40ms, adding more places for the code to stop and so on but nothing was working. Suddenly, one of the re-submissions didn’t have as many timeouts and I ended up at around 5th place in gold league.
There was nothing left for me to do but to wait anxiously for me to potentially climb to legend as less than 12 hours were left and I didn’t know why I was getting these timeouts all of a sudden. Luckily, I waited for a few more hours and I got the notification that I was in legend!
I must say that this has been a wild ride, a massive storm blew right past my city leaving us without power and water for several days, and the transformer outside of my house decided to blow up randomly a few days before that, so I wasn’t able to dedicate as much time as I had hoped to the contest.
Nevertheless this has been extremely fun and I learnt a lot by participating in this contest. Thank you to CodinGame and CGjupoulton for creating this contest, Fiverr for sponsoring it, and the rest of the community for your support and suggestions!
Hello everyone !
Thanks Codingame and Fiveer for this challenge and GG to everyone!
I finished 170th globaly with an heuristic based bot based on the intersection of best moves for every miniGames.
More information here if you want : SummerChallenge2024 - Olymbits - Apofils PostMortem - Gold 100th - Global 150th · GitHub
Always a pleasure to participate to this challenges!
Hello !
I finished 172th.
Here is my postmortem if you’re interesed : My post mortem for the codingame Summer Challenge 2024 | Yann Moisan
9th in Legend
I used a UCT Forest, with one tree per player.
I liked the game; discrete states and full observability. Only downside was the randomness of the games, and the resulting noise in the ratings.
#7 in legend
I am using Smitsimax - so 3xMCTS; one per player. I do rollouts up to depth 30, replacing minigames as they finish. I keep a pool of random minigames generated in turn 0, so I can load them up quickly. If I am finishing the search before turn 100, I add a small amount of bonus points to all minigames, which represent the expected points players will get in the future minigames. I decay this bonus linearly as the game progresses.
I have a heuristic to rate all 4 moves given the game state: I am using an array of size 3 in which the 3 minigames (hurdles, diving, archery) are voting for moves by adding a heuristically chosen amount of points to each move (U, L, D, R). For example, in some imaginary game state, hurdle minigame might be voting [0, 4, 4, 10], diving [0, 8, 0, 0] and archery [4, 5, -4, -5]; skating is ignored in this heuristic. Votes are then summed up to: [4, 17, 0, 5] from which I choose the maximum value and perform this maximal move.
I use this heuristic voting in two places: in random rollouts where I add random noise to the votes (to increase playout diversity) and in progressive bias: an additional term that I add to UCB1 formula which encourages heuristically good moves when the numer of visits in a node is low; but I don’t introduce the random noise here.
For scoring I use just 0, 1 or 2 depending on the final result + small bonus in form of the difference of squared points between me and the opponents. The bonus encourages moves that decrease point difference, even if they don’t bring the victory yet.
I use an additional tweak to the random rollouts that gives a slight improvement: every odd iteration I do a fully random rollout without the voting mechanism. Not sure why it improves things, but it does. Perhaps my rollouts were too deterministic.
From performance standpoint, I am getting about 17k rollouts (to depth 30) on the faster machines and 9k on the slower ones (there are various VMs that you can get on CG and they have different speed).
PS. Used psyleague, helped a lot!
BlitzProg: 798th (PHP - 1 turn heuristic)
Hi there. It’s been a while!
A game with only 4 commands made it look easy to take advantage of a simulation and rush to the gold league fast. So I made one. Then I had no idea how to use it.
Eventually, I made it to my goal without.
Wood 2
Run next to the hurle, then jump.
Wood 1
Same thing, just combine all hurdles from all game into one GPU! (I don’t know why the bronze boss loses to this)
Bottom Silver:
Turns out you can still get some half-decent performance with some simple 1-turn heuristics. I force-win two medals on the diving game. then weight each move using wind strength x relative direction and how close the hurdle is to my player.
Rank 2 silver to gold promotion
wind x direction for weighting is very naive, so a basic MC does it instead: random archery rollouts are performed for each move, the best ending distance is then used to weight moves.
That’s pretty much it!
I developed an algorithm capable of emulating the four possible movements. For each movement, I independently analyze each mini-game to evaluate the game itselft and the potential medals
Then i compare each move emulation, by adding the 4 mini games evals and the new total medal score.
If you want to read more details on my evaluations, on how i avoid the 0 medals, and how i priorized short term over long term medals : SummerChallenge2024 - Olymbits - jBigman PostMortem
85th, 23rd in gold
I didn’t use any search, just trained NN to pick the move it thinks is the best.
First day trained directly on the referee using evolution strategies to see if the policy NN is feasible. It was. This method was sufficient to get into mid gold after all. But for the most time I was using deep Q-learning to train the network.
Most time was spent tweaking hyper params and net architecture/inputs. I had several runs from scratch. Thanks to that I had several differently learned bots to test against. Bot was easily in top100 all the time, unfortunately it was not enough for the legend. I desperately tried several things at the end, even with some search, but it didn’t work, because I haven’t done serious simultaneous games search before, let alone 3 players + randomness. All I can say with proud is that my bot doesn’t have any simulation. It was weird throghout the runs to see my bot was among the strongest ones, but it would lose with 0 points against 2 default ais in IDE, because it would not earn any gold or silver medals in hurdles. But because it got all golds in other minigames, it would still think it is winning (have good score on the moves).
Given the game was easy to simulate, I wonder why I didn’t try that from the start, 2 weeks should be enough to learn about tree search of such games. Meh maybe next time.
I really liked the design of the game, the pixel art and simple rules, though the statement contained some errors or was not 100% clear in some aspects. The viewer could contain more detailed information, at least in debug mode. People in discord were constantly asking why they got 0 points when they got most gold medals :). It seems using AI for cover images is a tradition now.
Appendix:
NN is 2 hidden layers with 200 nodes each, with 214 inputs and 4 outputs. 214 x 200 + 200x200 + 200x4. The inputs are from the perspective of me vs others. I experimented with various inputs. One thing that Lysk told me is to use hurdles relative to players, it sped up the training as it can be seen here (y-axis is winrate against test bots and green is the updated inputs).
Deep Q-learning params:
54th in Legend, 2nd in C.
My first legend finish in a contest!
Reproducible Tests
I did not create an offline tournament this time, as I just end up tuning my code to beat myself which doesn’t always translate into stronger bots.
Instead I found strong, deterministic bots (jack, sullyper) from among the other players and tested against those with a specific list of seeds.
Simulate Accurately
Try to avoid artificial weights and heuristics.
Fixed depth look-ahead, computed all 4^depth combinations and outcomes for player. depth=8 (9 if I had time). Compare player’s simulated results to best/worst possible opp results. e.g. definitely 1st, possibly 2nd, etc.
For Skating, depth=1. Considered all 16 combinations of opp moves, but not equally - opps are much more likely to move in direction of Diving or jump a Hurdle.
Combine Results
Weight games are behind in medals more.
Games that are in GAME_OVER, won’t finish before turn 100, or have outcome that can’t be changed (1ST/2ND/3RD) can be ignored.
Hi @therealbeef, interesting, thanks ! Can you explain the process to gather data and train the NN for each minigame ?
#103 Global
Postmortem: GitHub - cegprakash/cg_olymbits_post_mortem
Feedbacks:
Legend 5th
Thank you CG and all the participants for this contests!
Base search algorithm: Decoupled PUCT without chance nodes.
Decoupled MCTS, a MCTS variant for simultaneous game: XMash Rush (CC07) - Feedback & Strategies - #38 by jolindien
No chance nodes, sometimes referred as “open-loop”: monte carlo tree search - MCTS for non-deterministic games with very high branching factor for chance nodes - Artificial Intelligence Stack Exchange
Selection : PUCT instead of UCB.
Started with a similar heuristic as @gaha, where each mini-game voted for its prefered move, to guide the search and prefer promising move for each player.
Finally I trained a very small network to replicate (supervised learning) the policy of UCB for a single player, by self-playing lots of games with high number of sims.
Simulation : rollouts until the end of the mini-games, where moves are sampled from the same policy network.
Reward is my_score / total_score (with a epsilon free silver medal for all players).
Things that did not work:
Java #Legend.
First time I do a MCTS during a contest (it always seemed a bit suicidal using Java).
I did a “DUCT” :
But I only have a few thousands iterations each round. If the contest had lasted longer, I would have tried to translate my code in C++.
That was a really fun contest. Really nice idea. Thanks Codingame.
Thanks to the organizers for the contest. I made rank 8. It has been my best contest rank on CG in a long time as i’ve mostly been focused on optimization contests on other platforms and boardgame arenas on CG. I’m quite happy with the results and hope to join again next time.
Here’s my post mortem
It’s a bit short, but if I need to explain more, just ask me a question. I’ll answer and change the post mortem also.
Thx to CG for keep doing a good work setting up a regular bot challenges. I reached bottom legend with a lucky push.
I realy liked the very idea of this game when one action controls 4 different games. I originally thought I wont have time for this, but the concept of this game was so facinating I had to participate. Although the game was frustrating in many aspects on high level, I really enjoyed the first week of the challenge and community discussions with all the familiar faces like in old times.
Strategy-wise I tried very similar ideas with mentioned by other players, trying to make SmitsiMax work, although I had less luck in the chosen direcions. Also I think I made way more hyper parameters then it was needed and couldnt resist tweeking them, which was absolutely pointless considering the randomness of submits.
Few things that actually worked:
A result I’m very proud of ! The contest was fun.
I’ve written a pretty standard DUCT algorithm, nothing too fancy, based on the MCTS algorithm :
In every node of the tree, each player chooses one of the 4 possible moves thanks to the UCB formula. I stop expanding the tree when all 4 games are over. In reality, Rollers is only played at the root node, because otherwise its future states are not known.
The score I back-propagate from the leaves to the top is a weighted difference of the total scores (real or expected) after the 4 games are finished, the weights being the same in all the tree.
For instance, if A is currently winning, he wants to widen his score with the second player, say B. So the moves A chose are rewarded with the score : SCORE[A] - SCORE[B]. If A is actually right between B and C, I reward his moves with the score : SCORE[A] - 0,5 * SCORE[B] - 0,5 * SCORE[C]. And so forth. Locally, on my computer, I found that this difference of score gives much better results than the raw score.
In the UCB formula I divide the sum of the rewards by a “normal” score, e.g. an average of the expected scores. It should give numbers between [-1; 1]. It does not matter if it’s not.
It makes around 40k iterations per turn, and at the end of the search, the move I choose is the most visited one.
I use big arrays for getting the expected final scores of each game in O(1) computation. I suppose all players will play random (like a roll-out). For example, in Hurdles, if A, B and C are respectively in cells 13, 22 and 24, with no hurdles in sight, I compute that they respectively have an expected score of 0, 1.58 and 2.92 (if playing randomly).
I couldn’t store every states as it’s too big, but I came up with this approximate method : instead of doing A vs B vs C, I look at A vs B and A vs C separately. Then, in hurdles for instance, I have an array P and P[xA][xB][stunA][stunB]
is the probability that A wins (or ties) vs B only (in a pure random play). The same array gives the same probability vs C. If the former is denoted pAB
and the latter pAC
, then the expected score of A in this game is :
E = 3*pAB*pAC + 1*(1-pAB)*pAC + 1*(1-pAC)*pAB.
Thanks to Codingame for this challenge !
438th global / 376th gold
I played to 3 games only, the roller wasn’t existing.
My solution works in two parts.
Part 1 : select pathes to explore
I did it like a Monte Carlo but carefully selected the paths to explore instead of doing them randomly.
I started from the ideal path of swimming, in which I sacrificed the last 3 laps of archery to test the 64 possibilities.
Part 2 : simulate
I simulated the 3 games (hurdles, archery and swimming), and every time I fell because of a hedge, I added to my list of paths to simulate the path that would have avoided the fall. At the end of each simulated path, I compute a score and select the path with the best score.
I added in the 2 last days a feature to ignore the swimming game if it wouldn’t change the end (I won in advance or I lost in advance, regardless of the actions). In this case, I simulated all possible pathes with depth 6. This allowed me to move up 50 to 100 places in the ranking.
Thanks CG for the game, it was really interesting, it was the first time I used simulation in a contest.
Thanks for the challenge. This was a very good game
Ended 12th and 1st C# missing the frog
Key features:
5 + scores[gameId] / maxScore * (Program.Turn > 30 ? 20 : 5);
g.GetScore(playerId) * Priority[player][gameId] * (!done ? 0.8 * g.TimeLeft() : 1.0)
Please let us know about the t-shirt prior to the contest next time. Would have skipped sleep for top 10 t-shirt
And thanks for the screenshot prior to the game. This made me create this interface + Diving, Archery and some initial version of the search: (didn’t get all the details correct in the sims, but it saved me some time when it started)
interface IGame
{
void Reset();
void Save();
void Move(int[] moves);
void GenerateNew();
double GetScore(int player);
double GetScore(int player, int move);
int GetState(int player);
void SetFromState(int player, int state);
bool IsDone();
void ProgressTime();
}
I decided early on that I didn’t want to spend much effort on the contest so I would just focus on heuristics and not try anything fancy, so I was pleasantly surprised to make it into the Gold league. In the end my solution was about 280 lines of Python.
I used heuristics to score each move in each minigame independently. To do this I structured each minigame into its own class with a method that returned a MovePrefs object (for example MovePrefs(UP=2, DOWN=2, RIGHT=3, LEFT=1)
). The main loop would then take the weighted sum of these preferences to pick the best move.
1/(current_score_from_minigame + 0.1)
.I really liked the concept of having multiple simple minigames played at once. Compared to some of the previous challenges it was really easy to understand how the game works and get started with coding (I don’t think it was even necessary to start with only one minigame). I think always having the same 4 minigames was a huge missed opportunity. Having a pool of about 8 minigames out of which 4 would always be randomly chosen (with repeats!) for each match would have made the challenge much more interesting.