[Community Puzzle] Sokoban

I just solved it with Python and indeed, a simple BFS was enough.

Very basic BFS with sorted tuples for boxes positions and no pruning → 76%
Replace sorted tuples with bitboard → 96%
Basic pruning (don’t push boxes in corners) → 100%

I would have added the following rule if it wasn’t enough:
don’t push a box near a wall if the wall ends with corners and there is no release spot along the wall
(those forbidden tiles can be precomputed)

For languages that have char representing integral values in range of [-128, 127] notice that you can encode each position as a value between 0 and 121 (if you flatten the grid, so it becomes one big array of size width * height), so you can encode each position in a char, and have your game state represented in a string of length number of boxes + 1 (1 for the pusher) where each character is a position in the grid, which is a perfect hashing function.

Anyone have anymore hints for this? I’m stuck at 80%. I’ve read the previous discussion here and the author states that this can be done without bitboards and with “plain breadth-first search”.

I’m using Python with BFS, representing my visited states as a tuple of current player position and a nested tuple of sorted box positions. Ex: (3, (21, 37, 54))

Also, I’m pruning corner deadlocks already, but my Scoria 3-2 (the last one that didn’t timeout) hits 90k nodes visited still! This is much larger than what others are claiming!

My questions are:

  1. Do I need a better state representation and/or better hashing?
  2. How can the tree be pruned further?

Thanks in advance for any help!

Hi all. I think I’ll sound a problem everyone is thinking about and noone told yet.
We have start position.
We are building possible/eligible positions in BFS manner and add them to “to visit” collection
We are happy if position we took from “to visit” is the solution
That means we need only initial state of map. Then we are going to generate different states of map according to possible movements and really does not need any input at all. Finally we came to situation when we need resolve all puzzle in “first step time” bounds.
QUESTION : how to make solution which would get only next step BUT guarantied that if solution exists this step would pe part of solution

P.S For now I only made my code readable so to speak without “x[a[p+1-fun()]][anything here]” expressions. It took a certain overhead from C# but… I believe there is an idea such that if you understand it you will solve puzzle in any level of overhead and if not then c++ wouldn’t help

Either you prune wall deadlocks, or just use bitboard.
Python is the most bitboard friendly language as it uses integers of arbitrary length.

For example your (21, 37, 54) would just be built like this:
boxes = 0
boxes |= 1 << 21
boxes |= 1 << 37
boxes |= 1 << 54

If you want to move 21 to 22 for example, the corresponding state would be:
boxes & (~(1 << 21)) | (1 << 22)

To check if there is a box in 22:
boxes & (1 << 22)



On Scoria 3-2 my solution makes 3795 BFS iterations, you need a better pruning system.

The first thing I do is to compute a general workable area : the cells that you can use to move the boxes, it prunes the corners but also the dead ends (cells you can’ go back from).
You can dynamically compute additional dead ends during the BFS (if you push a box against another one for example).
You can also ignore the option of having a box immediatly go back on its path (there’s no reason to go back immediatly, something else need to happen for that move to be interesting).
And so on, there’s a lot of possible optimizations.


J’ai rien compris!

If boxes = 0, its binary representation is 0b0…0

Now if you take 1 and shift it 21 times on the left, you get:


for 1 << 37 you get:


and for 1 << 54 you get:


Now if you take | between boxes and these numbers, as the rule is 0|1 = 1, you get:


It’s a convenient way to store the positions of you boxes.

From now on, add a new 1 in boxes can be done with | and remove a 1 with &~

explanation for &~:

1 << 21 is 0b1000000000000000000000
if you take its opposite with ~ you get:
and the rule for & is 1&x = x and 0&x = 0 so it won’t change boxes except at the place you want to put a 0

Note that you could also you ^ instead of &~, if you are sure that there is a 1 to remove (the rule for ^ is: 1^x = opposite of x and 0^x = x).
You can also use a ^ instead of | if you are sure that there is a 0 at the place you want to put a 1.

So for example,

boxes & (~(1 << 21)) | (1 << 22) would become just: boxes ^ (1 << 21) ^ (1 << 22)

1 Like

Nan mais ça représente quoi boxes = 0? Comment sont codées tes cases? La case (0,0) vaut 0, la case (8, 0) vaut 8, la case (0, 1) vaut 9, la case (x, y) vaut x + 9 * y? Je lis çà, je passe mon chemin comme la plupart des articles publiés sur CG malheureusement :frowning:

Can you please post in English @philRG ? Thanks.


Well hello @anon72424297 :slight_smile: What does represent boxes = 0? I mean just precise the context to facilitate reading from other ppl. I understand binary but not quite aware from the board encoding used in this post :wink:

Trying to answer your French question in the hope the deepl got the meaning right.

The state encoding is left up to you. You could assign IDs in a way that translates to x and y such as x+width*y. Or you can assign a number for each cell that you can possibly enter, just enumerating them.

Both of these ways have advantages and disadvantages.
The first representation allows to quickly find the next cell when moving a box.
For the second representation this adds a level of indirection (extract the cell from the bitmask to then look up the neighboring cell and encode it as a bit again). The advantage is that you get a more dense packing that way and can fit the entire state into a single 64bit integer.

Or you could store redundant info: keep the full state with a list of boxes and just use the bitboard as a hash function to detect duplicates.
For this use-case I also recommend looking into Zobrist hashing as it’s of more generic use (Sokoban is somewhat special as the state information is so small, other games have more complicated states)


With python, as you are never bounded in size, you can indeed just use the following:
pos = w*y + x
and if you ever need to recover x and y from pos:
y, x = divmod(pos, w)

1 Like

But do you know how Python handles this internally? I wouldn’t be surprised if it’s significantly faster when you stay within the 64bit limit - but really don’t know.


Apparently Python uses C structs internally for integers and the tresholds go like this:

x < 2**30 --> 28 bytes
2**30 <= x < 2**60 --> 32 bytes
2**60 <= x < 2**90 --> 36 bytes


As you can see, it’s heavy anyway so you usually don’t pay attention to these tresholds.


OK. As long as I solved it in C# I can give some hints.

  • when prunning “blocket states” be carefull to do not prune state where box is blocked BUT it is already stays in correct place
  • as long as state is immutable it worth to calculate its hash once and do not call GetHashCode() anymore
  • do path finding only when pick gamestate from queue (not before placing in) because the majority of states in queue you will not visit so it is no point to calculate distances in advance
  • check duplications before add next state to queue -it should only contain unique states
    Thats it
    P.S. You dont need all this brainf*ck with bit masks, flat arrays e.t.c. Time limits extended significantly so just make algorithm without mistakes and bugs and thats it

@Djoums Does your solution do any kind of pathfinding to the boxes, or player position normalization? Mine simply takes the player position and generates the 4 possible new positions, then prunes invalid or visited states.

Also, even with pre-computed static deadlocks, and basic dynamic deadlock detection, my node count is still far too high (~60k on Scoria 3-2 now) and i’m not sure how to prune duplicate push states, given the way I generate new states.

@pardouin Thank you for the bitboard suggestion. It sped up my code considerably.

My algorithm works on the boxes and ignores the character (well mostly, I ignore moves that would require an unreachable position).
Then when I have a solution (a sequence of box moves), I reconstruct the character path to make it happen.

My node count for the last testcase is 263811 (as I said, my pruning is very lazy).

And still I have way enough time to explore those nodes, my overall time for the whole search is 0.627 sec.
Maybe you use unefficient operations at crucial moments, like queue.pop(0) or queue.insert(0, node) or some slicing stuff, I don’t know ^^
Just keep it simple, append nodes in your “next exploration” list and that’s it.