[Community Puzzle] Sokoban

This makes a lot more sense. With the way I’m currently doing it, there are too many valid, but “useless” states considered, like just moving the character random directions w/o pushing a box, and that’s probably why my node count is so high. Guess this requires a whole rewrite of everything I’ve got.

Figuring out unreachable positions and reconstructing the character path sounds really expensive though?

If you ignore the character position and just generate box pushes, how can you tell if a push is unreachable without also knowing where the character is? Do you keep some kind of character reachable area info with each state?

Also, do you do a separate BFS within each iteration to scan for reachable positions and then one at the end to reconstruct the character path to each box push?

Djoum’s approach is very interesting but not mandatory at all, in fact I wanted to try this but finally didn’t had too.

If you want to go Djoum’s way, just keep track of the player position after each push. At each step you can use a small DFS or BFS (doesn’t matter) to find the next possible pushes.

EDIT: Disregard the previous post. I finally started tracking how many iterations my BFS was doing vs the actual size of my visited nodes. I was iterating more times than I was actually finding unique nodes, despite pruning before appending to the queue. Added a check at the top for uniqueness (before any computation work was done) and suddenly the whole things runs much faster and passes all tests. No idea why as I’m very much an Algo noob. Thanks for the help everyone.

Reconstructing the character path is a chained BFS itself, but we’re talking a few hundred iterations at worst so the cost is minimal.
For the unreachable cells, I number areas on the map (each one is separate from the other) and I check the moves that have the same number as the character. Any time a box is moved, you know the cell where the character would be and you can include it in your BFS state. Doing this incurs some cost, but it dropped my number of iterations so much that it ended up being a massive gain.

Solved in JS with simplest BFS (just a tree and a hash for already visited states), with map in a wxh long string ( :shushing_face:), no heuristics, no metrics, just “don’t push in a corner please”.
Test cases timings:
3-5: 312k nodes, 2.0s
3-7: 194k nodes, 1.3s
3-10: 263k nodes, 1.7s

1 Like

Great puzzle!! It was very fun and the puzzle is very good for learning BFS,
both for searching in maze and generating possible states.
With a good pruning you don’t have to use the second given in the first turn, I didn’t use it.

1 Like

Very nice puzzle! Took me quite a long time to optimise and understand why my “alreadySeen” cache wasn’t working properly (coded a custom hash for state, without handling collisions, in go).

Also thanks for the hint that boxes order should have been ignored when hashing.

To torture myself practice, I translated my original solution to several languages.
As these version use exactly the same (deterministic) search order, it can be used as a kind of language benchmark. Of course, relative speed depends on MANY things, so these results should not be generalized or taken too seriously.

These are the average running times for IDE TEST #25 in turn 0 (when the complete solution is found) on my local machine:

language msec relative
C++ (optimize pragma) 85 1.00
C++ (default options) 242 2.85
Java 421 4.95
PHP 3,747 44.08
Python 1,776 20.89


  • My code searches 226,877 nodes for this input.
  • The pragma line for the optimized C++ run was: #pragma GCC optimize "O3,omit-frame-pointer,inline,unroll-loops"
  • All solutions use standard library data types (vector + unordered_map, Vector + HashMap, array, list + dict), and don’t use language specific code optimizations. The game state representation is encoded to a single int64, involving lots of bitwise operations.
  • Online results in CG IDE was quite similar. However Java seems to be twice as fast locally. I don’t know if this comes from Java v8 vs Java v17, or from different default settings.)
  • While other languages running times are fairly stable, Java result fluctuated by ±15%, most likely because of the garbage collector.

Hi, i try to solve it, but this is my score :

  • 100% on “scoria” tests
  • 80% on “scoria 2” tests (2 timeout)
  • 0% on “scoria 3” test (10 timeout)

I optimized my code as much as possible, but i don’t think this is the problem.
I saw that some of you need less than 100,000 iterations to find a solution, and i’m over 1,000,000 (2,429,671 on scoria 2 level 10 and i don’t timeout).
Is there a heuristic to change the BFS into A* ? Or did I miss something else ?

(I wanted to do an A* but I didn’t find the heuristic, since we don’t know in advance which box is associated with which target)

How do you compute a hash ignoring boxes order ?

I just have a straigthforward BFS in Python.
My nodes are tuples of 3 int: player x, player y and a bitboard of boxes positions.

Same here, but all in only one int64.
But i don’t know how to bitboard boxes position AND ignore boxes order.


For reference, I’m finding the solution (85 moves) for scoria 2 level 10 with BFS in 49,894 iterations.
I think that “ignoring the boxes order” is referring to if you have 2 boxes on positions x1;y1 & x2;y2 it doesn’t matter if it’s box ID 1 on x1;y1 and box ID 2 on x2;y2 or if it’s ID 2 on x1;y1 and box ID 1 on x2;y2

Yes, that is my problem. My hash is computed like that :

for (int i=0; i<n; ++i)
   hash |= (box[i].x<<4|box[i].y)<<(8*i);

So, the order matters here.

What about sorting boxes from top-left to bottom-right before generating the hash?

I tryied to sort, but it was too long (timeout on the very first test). Maybe i don’t know the good way to sort.
I think I’ll try to use an int128 to bitboard boxes with “0” for empty cell and “1” with a cell with a box.

I didn’t try yet the 128 bits integer, but I avoid corners, then I tryied that :

The result is better but i still have timeouts.

Test Simple BFS Multi BFS Multi BFS + gcc optim
Scoria level 1 6ms 5ms (not played)
Scoria level 2 3ms 1ms (not played)
Scoria level 3 3ms 3ms (not played)
Scoria level 4 7ms 9ms (not played)
Scoria level 5 4ms 4ms (not played)
Scoria level 6 5ms 8ms (not played)
Scoria level 7 9ms 21ms (not played)
Scoria level 8 17ms 21ms (not played)
Scoria level 9 3ms 4ms (not played)
Scoria level 10 12ms 5ms (not played)
Scoria 2 level 1 1,153s 1,083s (not played)
Scoria 2 level 2 1,241s 1,165s (not played)
Scoria 2 level 3 0,430s 0,191s (not played)
Scoria 2 level 4 0,504s 0,180s (not played)
Scoria 2 level 5 1,071s 0,772s (not played)
Scoria 2 level 6 0,841s 0,732s (not played)
Scoria 2 level 7 0,572s 0,224s (not played)
Scoria 2 level 8 0,154s 0,081s (not played)
Scoria 2 level 9 timeout timeout timeout
Scoria 2 level 10 0,734s 0,303s 0,063s
Scoria 3 level 1 timeout 3,5s 0,700s
Scoria 3 level 2 timeout 1,67s 0,414s
Scoria 3 level 3 timeout 8,5s 1,89s
Scoria 3 level 4 timeout timeout timeout
Scoria 3 level 5 timeout timeout timeout
Scoria 3 level 6 timeout timeout timeout
Scoria 3 level 7 timeout timeout timeout
Scoria 3 level 8 timeout 8,2s 1,85s
Scoria 3 level 9 timeout 5,99s 1,24s
Scoria 3 level 10 timeout 10,01s 2,04s

Is there another heuristic to avoid bad positions (like corners) ?

Similar to avoiding corners you can avoid segments when you know that the box would be stuck on that segment.
A segment is a part of a row with only consecutive empty cells (no target) with a wall on the left and the right. If at the row below or above all the positions of the segment are walls then no boxes should be pushed on any of the positions of the segment.

For example Scoria - Level 10 is:

  • On line 1 you have a segment with positions: 1, 2 & 3, since at line 0 there’s a wall for the positions 1, 2 & 3 you know you can forbid the position 2;1 (positions 1;1 & 3;1 are already excluded because they are corners)

  • On line 5 you have a segment with positions: 1, 2, 3, 4 & 5, since at line 6 there’s a wall for the positions 1, 2, 3, 4 & 5 you know you can forbid the position 2;5 & 3;5 & 4;5 (positions 1;5 & 5;5 are already excluded because they are corners)

The same logic applies to segments on columns, so you can also forbid the positions 5;3 & 5;4

1 Like

Sorting 5 numbers should not take too much time… When (and only when) any box is moved, I re-sort the box positions before re-packing them into an int64 game state. It is good not just for the hashmap lookup in the BFS to avoid revisiting same game state twice, but also to make checking for the goal easier, i.e. a simple int comparison.
Pruning the state space, as cerdicdd described, is also essential.
Depending on language choice, turn off garbage collection if possible.
Pre-allocating big vectors, instead of letting them grow only as needed also saves some time.