[Community Puzzle] Snake encoding

https://www.codingame.com/training/medium/snake-encoding

Send your feedback or ask for help here!

Hi,
For some reason my solution for the “Snake encoding” puzzle does not pass the “This is a mix!” validation testcase (it work on all other testcases). I checked over and over, made thousands of tests on my computer, and I don’t see what I could have missed. My solution works for inputs of size 0 to 31 (what is asked is only 1 to 20), it works with any number of encodings (what is asked is only with any strictly positive number). Any clues?
Thanks!
durango

Your code doesn’t handle multibyte characters like é, £, €, …

Do you see my code?
Anyway, that can’t be the reason, given that the problem statement explicitly states that “The input text is composed of ASCII characters”…

No I can’t see your code, but I had the same failure until I changed my code to handle weird characters. The statement is clearly wrong here.

Okay, then I just abandon. I don’t know how to cleanly read UTF-8 (?) characters in pure C.

int get_mbchar_length(char lb) {
    if (( lb & 0xE0 ) == 0xC0 ) return 2;
    if (( lb & 0xF0 ) == 0xE0 ) return 3;
    if (( lb & 0xF8 ) == 0xF0 ) return 4;
    return 1;
}

here is the code for determining multibyte char length in UTF-8 encoding.
you should scan the string and check the length of the next character (in bytes) using this code.
then you should handle each character separately. I just stored each character in a separate string.

Many thanks, magaiti.
Actually I just don’t want to use such a function just to pass a test.
And this only solves the problem of reading, then you have to write that character back at some point.
Unless there is a clean set of functions to do this in C, I’ll just stay away of these problems.

Another solution is to use the type wchar_t instead of char. Then, you can use the formats %lc and %ls for characters and strings with printf and scanf. Don’t forget to add setlocale(LC_CTYPE, "") at the beginning of your code, and to include locale.h and wchar.h.

@durango
You can write the character back as it is. just treat a single multibyte character as a sequence of several char-s, which should not be divided.

@Plopx
I tried to improvise with locale and wide characters, but couldn’t get it to work. I tried setlocale(), wchars, wstrings, mbctowc or whatever else. Only hand-counting the UTF-8 symbols’ length worked for me. I’ll try it again though, to see if I missed something.

One thing I can say, the name of the puzzle is unintentionally spot-on.

I too tried to improvise with locale and wide characters, and it didn’t work either. It is “well-known” that wide characters are broken and should in general not be used.

#include <locale.h>
#include <wchar.h>

setlocale(LC_CTYPE, "");

    for (int i = 0; i < N; i++)
	{
        wchar_t buffer[1000];
        scanf("%ls", buffer);
        Square[i] = wstring(buffer);
    }

	// ..encoding...

	for (int i = 0; i < N; i++)
	{
        printf("%ls\n", Square[i].c_str());
    }

actually this worked for me in c++, though it’s not the ideal solution because it uses c-style i/o.
I think I have figured it out how tu make this work with c++ i/o as well.

Again :confused: C-haters :smiley:

What do you mean, C-haters? how about I call you a C++ hater?

I mean the peeps doing these puzzles did not pay attention to C (& C++) coders, nothing more ^^ The “hater” term was pure irony.

I’m still a bit frustrated about the time I lost for useless multi-byte characters (and just in validators) though ^^ Quite vicious :stuck_out_tongue:

I personally don’t consider it a time lost. I have gained some knowledge about dealing with multibyte and wide characters, after all.

The time lost is the time spent to search why the validators don’t pass, not to code multibyte support. Btw, I deal with UTF-8 file names on a daily basis, and never had to use wchar_c for this :slight_smile:

UTF-8 is designed to be mostly transparent in most reasonable text processing. This isn’t the case here, where it’s used as a fixed-width presentation format where individual glyph’s bytecount actually matters.

This is a rather severe shortcoming in the problem statement (that doesn’t even specify UTF8 is the encoding used), but it’s probably an honest one where the submitter didn’t realize he was using out-of-ASCII characters.

This should be a mathematical problem, not a text I/O one. Please replace the unicode with ascii and introduce a test case where N is crazy vastly superior to X^2 instead.

There are languages besides English. This puzzle stumped me until I found this discussion. This article helped me: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software