[Community Puzzle] Data compression 1

https://www.codingame.com/training/medium/data-compression-1

Send your feedback or ask for help here!

Created by @BigBart,validated by @Djoums,@aymane211 and @Aayush.Curious.
If you have any issues, feel free to ping them.

If you can solve this puzzle while listening to https://www.youtube.com/watch?v=EX12iUS4F9c then you are eligible for double XP.

3 Likes

Will there be an issue if char ā€˜/ā€™ appears in the plaintext?

Okay ā€¦ give me my double XP :smiley:

Puzzle seems wrong to me. (in addition toā€¦ well, maybe another day)

  1. Count whole words only
  2. The reference has always to be shorter than the original, otherwise they are not counted in the index

(updated)
Yet the lorem ipsum case doesnā€™t seem to index word 13 ā€œdoloreā€. No good reason to skip it is given in the statement.

Rule 5 pretty much says anything goes as far as ordering goes, so Iā€™m yet to read a sensible explanation for that.

2 Likes

ā€œdoloreā€ becomes ā€œ/2eā€ after indexing dolor. That makes it 3 letters long which is too short to be considered since the reference counting is 13 at this point.

1 Like

Thatā€™s a post-hoc rationalization. The statement says to index words, itā€™s a word. The statement does not say partial word replacement has to occur before indexing.

The puzzle imply a left to right compression with on the fly indexing, so you canā€™t index words before hand. I agree, it could be clearer.

I am stuck with exactly the same issue.
My solution for LOREM test case is off by 1 index above 13 because of this.

I even tried to add some extra logic to avoid it (donā€™t add word to dictionary if its beginning substring already there). With this one LOREM passes, but then COUNTING test gets fail which worked originaly.

The statement text leaves it a bit of guesswork, what is the exact rule of substitution and dictionary buildingā€¦ Actually my understanding was not on-the-fly indexing but a 2-pass solution.

Seriously, it does no such thing. That would actually be the sensible option.
The author solution performs left-to-right replacement, with FULL LOOKAHEAD indexing. Thatā€™s almost the perfect opposite of what most adaptive compressors do, that actually qualify as ā€œleft to right compression with on-the-fly model adjustmentā€ā€¦

The statement merely says future references are allowed.

Try ā€œindolore dolor doloreā€

ā€œin/2 dolor /1eā€ would be a perfectly valid encoding.

1 Like

Just hardcode not indexing ā€œdoloreā€. The validator is the same as the test case.

Yay mapquest.

2 Likes

Thatā€™s exactly what would be expected though.

Are you telling me with a straight face it would exactly be expected to reference a word that is not in the index?

Yes, thatā€™s why thereā€™s no index ! Or said otherwise, itā€™s a form of recursive indexing.
Iā€™ll keep the need for clarification in mind, I agree the lookahead should be explained.

Yuckā€¦ I did as suggested, and now 100% pass.
The only thing I liked in this puzzle was the German tongue-twister.

1 Like

The lookahead makes this the most out-of-touch scheme Iā€™ve ever seen. But sure, weā€™ve seen crazier puzzles. If only they didnā€™t try to invent a rationalization.

The basics of that compressor are to replace repeating parts of the text with a reference to the index of the first appearance of the word.

I donā€™t see why a replacement order should be implied (or even EXIST, considering rule 5), but ā€œthe first appearance of the wordā€ seems very well-defined to me, independent to any replacement procedure, and is not contradicted by the rules.

Really, this puzzle reads like someone toyed with replacements on a small string and then ā€œinventedā€ a data format to reproduce what they did. And we get yet another of those puzzles where you have to read the authorā€™s mind about how it has to be done, because no one CBA to write a proper specification and the statement only describes the solution when you already have it.

Guys, I respect both of you. I kindly ask not to make another ā€˜Revenge in rejecting contributionsā€™ kind of thread from this. This puzzle does not worth the damage.

May Nico Haak be with youā€¦
https://www.youtube.com/watch?v=a2begmUC7F8

2 Likes

I hit on a very similar obstacle as @JBM in my first two trys with this puzzle. The meaning of ā€œwordā€ is not well defined. The approach to handle punctuation is not defined at all. The approach to handle space or tab or other invisible characters (are these letters or non letters or should they be counted as a word or a component in a word?) was not mentioned. Why ā€œtwenty-oneā€ was not treated as two words and be indexed as soon as they appear - not defined. The reason for ā€œdoloreā€ not indexed in the test case was not explained in the statement.

I came upon multiple possibilities to handle data in multiple forks. The author or anyone with good knowledge of the untold definitions and assumptions please edit the statement to clarify it. Thanks.

1 Like

Please remember that you did not complain about the bona fides of the validation panel for this puzzle, if you should decide to do so for future puzzles.

CG claims they have 300k members. Not everyone of them has a chance to comment in the contribution stage before a contribution was accepted or rejected.

It is grateful there were 10 or so members giving comments in the contribution stage. It does not rule out the right of 300k-10 other members to enjoy the product when it comes out and giving feedback as a consumer of a contribution.

As an early consumer of this puzzle, I call for clarification because I wish other consumers coming later can have a better experience to enjoy it rather than feeling disappointed with it.

3 Likes