[Community Puzzle] Data compression 1


Send your feedback or ask for help here!

Created by @BigBart,validated by @Djoums,@aymane211 and @Aayush.Curious.
If you have any issues, feel free to ping them.

If you can solve this puzzle while listening to https://www.youtube.com/watch?v=EX12iUS4F9c then you are eligible for double XP.


Will there be an issue if char ‘/’ appears in the plaintext?

Okay … give me my double XP :smiley:

Puzzle seems wrong to me. (in addition to… well, maybe another day)

  1. Count whole words only
  2. The reference has always to be shorter than the original, otherwise they are not counted in the index

Yet the lorem ipsum case doesn’t seem to index word 13 “dolore”. No good reason to skip it is given in the statement.

Rule 5 pretty much says anything goes as far as ordering goes, so I’m yet to read a sensible explanation for that.


“dolore” becomes “/2e” after indexing dolor. That makes it 3 letters long which is too short to be considered since the reference counting is 13 at this point.

1 Like

That’s a post-hoc rationalization. The statement says to index words, it’s a word. The statement does not say partial word replacement has to occur before indexing.

The puzzle imply a left to right compression with on the fly indexing, so you can’t index words before hand. I agree, it could be clearer.

I am stuck with exactly the same issue.
My solution for LOREM test case is off by 1 index above 13 because of this.

I even tried to add some extra logic to avoid it (don’t add word to dictionary if its beginning substring already there). With this one LOREM passes, but then COUNTING test gets fail which worked originaly.

The statement text leaves it a bit of guesswork, what is the exact rule of substitution and dictionary building… Actually my understanding was not on-the-fly indexing but a 2-pass solution.

Seriously, it does no such thing. That would actually be the sensible option.
The author solution performs left-to-right replacement, with FULL LOOKAHEAD indexing. That’s almost the perfect opposite of what most adaptive compressors do, that actually qualify as “left to right compression with on-the-fly model adjustment”…

The statement merely says future references are allowed.

Try “indolore dolor dolore”

“in/2 dolor /1e” would be a perfectly valid encoding.

1 Like

Just hardcode not indexing “dolore”. The validator is the same as the test case.

Yay mapquest.


That’s exactly what would be expected though.

Are you telling me with a straight face it would exactly be expected to reference a word that is not in the index?

Yes, that’s why there’s no index ! Or said otherwise, it’s a form of recursive indexing.
I’ll keep the need for clarification in mind, I agree the lookahead should be explained.

Yuck… I did as suggested, and now 100% pass.
The only thing I liked in this puzzle was the German tongue-twister.

1 Like

The lookahead makes this the most out-of-touch scheme I’ve ever seen. But sure, we’ve seen crazier puzzles. If only they didn’t try to invent a rationalization.

The basics of that compressor are to replace repeating parts of the text with a reference to the index of the first appearance of the word.

I don’t see why a replacement order should be implied (or even EXIST, considering rule 5), but “the first appearance of the word” seems very well-defined to me, independent to any replacement procedure, and is not contradicted by the rules.

Really, this puzzle reads like someone toyed with replacements on a small string and then “invented” a data format to reproduce what they did. And we get yet another of those puzzles where you have to read the author’s mind about how it has to be done, because no one CBA to write a proper specification and the statement only describes the solution when you already have it.

Guys, I respect both of you. I kindly ask not to make another ‘Revenge in rejecting contributions’ kind of thread from this. This puzzle does not worth the damage.

May Nico Haak be with you…


I hit on a very similar obstacle as @JBM in my first two trys with this puzzle. The meaning of “word” is not well defined. The approach to handle punctuation is not defined at all. The approach to handle space or tab or other invisible characters (are these letters or non letters or should they be counted as a word or a component in a word?) was not mentioned. Why “twenty-one” was not treated as two words and be indexed as soon as they appear - not defined. The reason for “dolore” not indexed in the test case was not explained in the statement.

I came upon multiple possibilities to handle data in multiple forks. The author or anyone with good knowledge of the untold definitions and assumptions please edit the statement to clarify it. Thanks.

1 Like

Please remember that you did not complain about the bona fides of the validation panel for this puzzle, if you should decide to do so for future puzzles.

CG claims they have 300k members. Not everyone of them has a chance to comment in the contribution stage before a contribution was accepted or rejected.

It is grateful there were 10 or so members giving comments in the contribution stage. It does not rule out the right of 300k-10 other members to enjoy the product when it comes out and giving feedback as a consumer of a contribution.

As an early consumer of this puzzle, I call for clarification because I wish other consumers coming later can have a better experience to enjoy it rather than feeling disappointed with it.