The move generation is great! Kudos to you!
Glad to help!
Spoilers - I’ll provide some code I used, and will comment more than your original question for everyone that might be interested by my modest solution
First, I created some magic matrices.
The first one, Transformations
, is easy to get. On a paper wrote the base one:
0 1 2
3 4 5
6 7 8
then play with it, Axis V for example will be:
2 1 0
5 4 3
8 7 6
(first and last column are swapped)
etc…
In the end it gives this:
#define N_TRANSFO 8
static constexpr uint32_t Transformations[N_TRANSFO][9] = {
{0,1,2,3,4,5,6,7,8}, // Identity
{6,3,0,7,4,1,8,5,2}, // Rotation 90
{8,7,6,5,4,3,2,1,0}, // Rotation 180
{2,5,8,1,4,7,0,3,6}, // Rotation 270
{2,1,0,5,4,3,8,7,6}, // Axis V
{8,5,2,7,4,1,6,3,0}, // Axis TR
{6,7,8,3,4,5,0,1,2}, // Axis H
{0,3,6,1,4,7,2,5,8}, // Axis TL
};
Then the target is to be able to answer to the question “If I have this grid, that was obtained by transforming it using the transformation id_transfo
, what was the original ?”
For this, another matrix: the InvTransformations
.
I deduced it by writting a script (because I lost myself trying to do it by hand):
int inverse[8][9] = { 0 };
for (int t = 0; t < 8; ++t) {
for (int i = 0; i < 9; ++i) {
inverse[t][Transformations[t][i]] = i;
}
}
for (int x = 0; x < 8; x++) {
for (int y = 0; y < 9; y++) {
cerr << inverse[x][y] << ',';
}
cerr << endl;
}
This yield:
static constexpr uint32_t InvTransformations[N_TRANSFO][9] = {
{0,1,2,3,4,5,6,7,8},
{2,5,8,1,4,7,0,3,6},
{8,7,6,5,4,3,2,1,0},
{6,3,0,7,4,1,8,5,2},
{2,1,0,5,4,3,8,7,6},
{8,5,2,7,4,1,6,3,0},
{6,7,8,3,4,5,0,1,2},
{0,3,6,1,4,7,2,5,8},
};
Now we can create the function grid_hash_original
that will be used at the end to compute the final result.
…
static constexpr uint32_t divs_10[9] = {
100000000, 10000000, 1000000, 100000, 10000, 1000, 100, 10, 1
};
static constexpr uint32_t grid_hash_original(uint32_t grid, int32_t transformation_idx = 0) {
const auto& perm = InvTransformations[transformation_idx];
uint32_t result = 0;
for (int idx = 0; idx < 9; ++idx) {
uint32_t value = (grid >> (idx * 3)) & 0b111;
result += value * divs_10[perm[idx]];
}
return result;
}
Eventhought it looks barbaric, it’s actually quite simple:
The grid was stored by blocks of 3 bits (because the number goes up to 6, we don’t need more)
doing this: (grid >> (idx * 3)) & 0b111;
gives the value of the cell at idx
(0 to 8)
By applying the inverse transform and multiplying it by the correct power of 10 , it gives back the real placement of this value.
For the exploration, I opted for BFS. My reasoning was simple: ‘I don’t find any gain by doing deeper first since I would have to do all possibilities anyway’.
I have a queue of what I called ‘State’ which is in fact just a structure:
struct State {
uint32_t grid;
uint32_t turn = 0;
};
Random thought: I was a bit sad of this structure, I could not put everything in a single uint32_t
: grid is taking (9x3) 27bits, an turn can go up to 40, so 6 bits - which gives 33bits. Maybe I could have been smarter and find a way around - but I did not.
Random thoughts 2: I don’t actually need to say 40, but just have the notion of ‘previous’ /‘current’, and could have keep the real number of turn global elsewhere. Well… too late to change it now !
Each time I pop a grid from the queue, I create all the possible evolutions of it and then check if we are at the end of it (i.e. no evolutions found or depth max reached). If it’s time to sum the result, I use the grid_original_hash
method.
To avoid putting too many in the queue, that’s where cache + transfo are playing their parts!
En plus of the Queue, I have a Map (custom), that stored entries:
using TransformationCount = uint32_t[N_TRANSFO];
class Map {
// ..
struct Entry {
uint32_t key = EMPTY_KEY;
TransformationCount value;
};
//..
Entry table[MAX_SIZE];
};
(note: to gain time, MAX_SIZE is hardcoded to be able to have enough memory for depth 40 starting from grid 0 - I hope I didn’t screwd this one up for the last hidden validator of the contest. Otherwise I’m KO).
Anyway, before adding a new grid to the queue, I check this map (named next_turn_states
):
for (size_t i = 0; i < i_grids; ++i)
{
uint32_t grid = all_next_grids[i];
int32_t id_transformation = normalize_grid(grid);
// Count states
if (!next_turn_states.has(grid)) {
// Add to path searched
q.push({ grid, next_turn });
}
apply_and_add(id_transformation, next_turn_states[grid], current_transformation_count);
}
Apply and add is one I refered to
count[normalized_grid] = transformation_id (x) count[normalized_grid]
and it does this:
static constexpr void apply_and_add(int32_t transfo, TransformationCount& lhs, const TransformationCount& rhs)
{
const auto& map = TransformationsMatrix[transfo];
for(int i = 0; i < 8; ++i)
lhs[i] += rhs[map[i]];
}
So the magie is in the TransformationsMatrix:
static constexpr uint32_t TransformationsMatrix[N_TRANSFO][N_TRANSFO] = {
{0,1,2,3,4,5,6,7},
{3,0,1,2,5,6,7,4},
{2,3,0,1,6,7,4,5},
{1,2,3,0,7,4,5,6},
{4,5,6,7,0,1,2,3},
{5,6,7,4,3,0,1,2},
{6,7,4,5,2,3,0,1},
{7,4,5,6,1,2,3,0},
};
I got it by first writting the first item on paper, then by bidouilling around with a script.
it can be read as a table that answer the question “If I have a rotation 90, and then I mirror V it, what transformation do I have in the end?”
The idea behind is that at first you only have the input grid, and its transformation count is {1,0,0,0,0,0,0,0} (i.e. 1 identity).
This one gives children.
For example: Let’s say A and B.
A is normalized. (Meaning, to get back to A, you have apply the inverse transformation we used to normalize it.) If it’s the first time we see A_norm, we put it to the queue.
Then, the count for A_norm will be the count from the parent of A with respect to the transformation for A_norm.
B is normalized. Let’s say we are lucky, and A and B have some symmetries. Thus, B_norm will be the same as A_norm. (With a different transformation).
We don’t put it in the queue.
However we have to count it. So we add to the count of A_norm, the count of the parent of B with respect to the transformation of B_norm.
At the end we have (in this example):
A_norm = {0,0,1,0,0,1,0,0}.
Then we continue to evolve these. The queue willl pop A_norm, and etc, we continue to apply this algorithm.
In the end we are always one inverse transform away from the real (original) grid we would have have if we didn’t normalized everything.
This is thanks to the fact these transformations are reversible (not losing any information) when applied and are creating a small group.
Voilà, hope it clarifies a bit my approach !
Here is the postmortem I wrote: CG Spring Challenge 2025 · GitHub (already shared it on discord, but though I would post it here as well)
Final rank after recompute: 12
I shared mine on discord too, probably should have posted here as well:
Postmortem from me (#2): CG Spring 2025 Postmortem · GitHub
Hi! Do you know if the challenge will be republished as a normal game?
I want to keep working on my code for this but I don’t know if I’ll have the full suite of test cases to test it on.
Here is my write-up (1st place):
Frictionless Bananas - CodinGame Spring Challenge 2025 Entry
BFS!? You savages! I gave it about 30s of thinking time, and thought: what can be done with BFS? I can pop a node off a stack. Generate its children…and push them onto a stack … and to get answers I need to loop through the whole DAG afterwards anyway? And saw no benefits.
I (#120 with 500ms) do almost exactly what @FrictionlessBananas do, except:
- I chose DFS, for the very reason that I can reuse computations that “bottom out” early. That is, if I find a branch where a depth 12 search finds that depth 9 was enough, any search of depth >9 can be looked up. This gave some modest 5-10% speed.
- Not in a million years would I have thought of measuring the time before program start and input becoming available! Just assumed it was there “immediately”. Instead I tried to add some page touching after a bunch of prefetch instructions to group cache updates into one spot. Nice one.
- Storing the number of empty cell in the high bits… was the overhead of updating that worth it over
__builtin_popcount((b|(b>>1)|(b>>2)&0111111111)
(And maybe we had_mm256_popcnt_epi32
? Apparently we had_pext_32
, which I thought we didn’t… but then CG has virtually NO information about anything)
At least I am happy I was wrong about people being able to precompute the game compile time. When someone from Mountain View submitted a 0ms solution after like 24h, I was almost gutted enough to quit…
I really enjoyed the contest and learned a lot.
BFS, clearing HT between depths.
Let me share two interesting attempts that didn’t quite work out for me.
Rough tests suggested that the state count could be reduced by a sizeable amount if states appearing at different depths were merged together. What I tried was adding another count array to the state, representing counts for depth - 1.
The first problem was checking if a new state had already appeared at a previous depth. One option was to alternate between two hash tables, but checking each state in both tables would be too costly.
So I tried periodically not clearing the hash table and including a single bit for depth as part of each hash entry, to track which state array it belongs to. Then, if the stored state hadn’t yet been expanded at the current depth, I could move the counts.
This didn’t reduce the state count by much - often a new state has something in its own 2nd count array and hence we can’t get rid of this state.
Note that “faster” states have had larger captures, while slower states have had more smaller captures. I’m now thinking a better approach would be for the second count array to represent depth + 1, include a depth delta in the state (so it represents current depth + delta), and make “faster” states wait and expand after slower ones.
I assumed most transpositions come from different move orders starting from a common ancestor, rather than from board symmetries. So I looked for ways to avoid computing canonical boards and rotating counts unless necessary.
One idea was to come up with a non-unique board representation that could be computed quickly. There are several symmetry-invariant properties:
- value at cell 4
- popcount of the board and various masked subsets (e.g., mask using upper bits of even cells and lower 2 bits of odd cells)
- corner/side cell canonical value (imagine empty board with only corners possibly filled with die) If 12 bit lookup is too cache costly, one can try extracting lower 2 bits of each corner and looking up canonical value in 256 element char array.
- number of empty cells
- with a bit more memory, we can track things like the sum of dice
On a hash hit, check the raw board first (no symmetry) and only compute all rotations if needed.
Unfortunately, this increased collisions a lot—both due to different boards sharing the same representation and the fast hash not distributing similar inputs well enough. Also, the proportion of state matches resulting from symmetries was higher than I expected.
My postmortem (14th) and (622ms before rerun in OCaml !) : Spring Challenge 2025 - Rust/OCaml
My first optimization contest and strangely it’s my second best so far My final result is far from impressive (363/2865), but I’m proud of myself, that I’ve managed to implement all of my ideas. And I see a small improvement in my skills.
The challenge was interesting, simple to start and simple to simulate. My first idea was to implement a “vanilla” DFS with a tree of states, 3x3 board array, no pruning and nothing fancy. Even this idea took me a day of 3 sessions around 2 hours each. I’m not sure where is the problem, I’ve implemented such algorithms many time, I regularly train on CodinGame, my day job is related to such algorithms, I even teach students such topics… I assume just skill issues, my nature is just build this way
After I’ve managed to implement the above idea I wanted pruning. I’ve added std::unordered_map, with key a pair of the board hash and its depth and value the sum for the state. Then I wanted to remove the tree structure and rely only on a simple while loop, this again took me a day, I’ve encountered many bugs to backpropagate the sum in my frontier array. But I’ve managed it.
Then I wanted to simplify the board representation, with 3 bits per cell and 5 bits for the depth, and guess what again more than a day of work…
The contest was nearing the end. I wanted to use my own hash map, I’ve tried a couple of implementations but no luck there.
Probably the most interesting part for me was that I’ve asked chat GPT a couple of times for help and it gave me very buggy code. I’ve decided to scrap it and managed to implement my ideas myself. So I guess I kind of beat the chat bot, but maybe my prompts weren’t specific enough.
I’m happy with my result, but still the dream to reach top 100 in a contest stays alive and still so far far away…
Thanks for posting, I was also wondering, not only did no one get 0ms in the after, but many of the top performers also are gone…so confusing
These are undeniably exigent and trying times for speedrunners, whose presence is no longer met with the warmth of welcome in this hallowed arena.
Verily, it may be posited that forthcoming contests shall offer a more felicitous stage upon which their prodigious skills and unwavering dedication might shine resplendently.
Let us thus hope that such events shall rise to the occasion, providing a fitting tribute to their artistry and indefatigable spirit.
Can anyone enlighten me on the approaches they used to achieve 100% in Python? While the top solutions are well covered (kudos to the authors for the delicious details), there is a lack of postmortems from Python programmers, despite many 100% solutions on the leaderboard across a wide range of scores. Even though it’s obvious that you won’t be able to win this kind of competition in Python, it’s still a great opportunity to sharpen your Swiss army knife.
What I tried: BFS, lru_cache, precomputed LUTs, some numpy hacks without any significant success in massive vectorization. I read about accounting for transformations on Discord, but implementing that, as well as trying DFS, was a bit beyond my capabilities in the time allotted.
P.S. Overall, I was completely hooked by the simple rules and the feeling of immediate progress with every little effort put in the right direction while trying to overcome each next test case. It’s been a while since I’ve had this much fun and motivation. A truly refreshing experience. Thank you, CG Team!
You can achieve 100% in Python (as well as in other languages) with a simple algorithm. I am not a master in Python, but I was able to help someone reach 100%.
What you need is a BFS. This will pass the first 5 tests. For the 6th one, you need some memorization: for each depth, keep the grids you already saw at this depth in a map. The key will be the grid and the value will be the number of times you saw this grid. This will save you a lot of time.
The important point is how you implement your grid. If it is a simple integer, this should be enough. Otherwise, you’ll need to add a Equals method (or the similar method in Python) for your map to work correctly. If you do not, your map will check the memory address and all grids will be different. If you have a Equals method, you’ll be able to correctly use your map and reach the 100%
This is very similar to my BFS implementation. I used a map (Counter from the collections module) to count the repeating grids, but used a string representation instead of an integer. Guess, I have some room for improvement now. Thanks for the hints.
Will the challenge be available as a puzzle? I had only 66% in java. need to improve
Hi all,
It was possible to share the code after a submission, how can I see other people code especially the top 10 contenders?
It is even possible? I want to study some solution to improve my optimization-fu.
Thank you
There is a discussion on Discord, they have some problems with publishing the puzzle. Guess we will have to wait until next week.
Thanks for the news, I also would like to try some new ideas, before reading the PMs!
I found at first that this contest was a bit special, and saw it first as a language optimization problem … So my first goal was simply to solve it with a 100% in my day to day language, java … Went down to 16k with that. Simply translating this algorithm to javascript allowed me to break down to 5k… So at the end I decided to try rust, that i never used before despite the trend, what led me to 2.5k, still with the same algorithm, based on a memoization of (depth, grid)->score.
While I was in top 5 in js, i was not even in the top 50% in rust … So i definitely changed my mind, it’s not just about choosing the right language. And now i want to try soooo many things that i didn’t explore enough
Thanks for the contest, and make me try Rust, that’s a nice discovery