[Community Puzzle] Snake encoding

CommunityBot · November 6, 2020, 7:36pm

https://www.codingame.com/training/medium/snake-encoding

Send your feedback or ask for help here!

durango · April 13, 2016, 12:44pm

Hi,
For some reason my solution for the “Snake encoding” puzzle does not pass the “This is a mix!” validation testcase (it work on all other testcases). I checked over and over, made thousands of tests on my computer, and I don’t see what I could have missed. My solution works for inputs of size 0 to 31 (what is asked is only 1 to 20), it works with any number of encodings (what is asked is only with any strictly positive number). Any clues?
Thanks!
durango

Plopx · April 13, 2016, 1:16pm

Your code doesn’t handle multibyte characters like é, £, €, …

durango · April 13, 2016, 1:34pm

Do you see my code?
Anyway, that can’t be the reason, given that the problem statement explicitly states that “The input text is composed of ASCII characters”…

Plopx · April 13, 2016, 1:38pm

No I can’t see your code, but I had the same failure until I changed my code to handle weird characters. The statement is clearly wrong here.

durango · April 13, 2016, 1:53pm

Okay, then I just abandon. I don’t know how to cleanly read UTF-8 (?) characters in pure C.

magaiti · April 13, 2016, 2:08pm

int get_mbchar_length(char lb) {
    if (( lb & 0xE0 ) == 0xC0 ) return 2;
    if (( lb & 0xF0 ) == 0xE0 ) return 3;
    if (( lb & 0xF8 ) == 0xF0 ) return 4;
    return 1;
}

here is the code for determining multibyte char length in UTF-8 encoding.
you should scan the string and check the length of the next character (in bytes) using this code.
then you should handle each character separately. I just stored each character in a separate string.

durango · April 13, 2016, 2:24pm

Many thanks, magaiti.
Actually I just don’t want to use such a function just to pass a test.
And this only solves the problem of reading, then you have to write that character back at some point.
Unless there is a clean set of functions to do this in C, I’ll just stay away of these problems.

Plopx · April 13, 2016, 2:41pm

Another solution is to use the type wchar_t instead of char. Then, you can use the formats %lc and %ls for characters and strings with printf and scanf. Don’t forget to add setlocale(LC_CTYPE, "") at the beginning of your code, and to include locale.h and wchar.h.

magaiti · April 13, 2016, 3:00pm

@durango
You can write the character back as it is. just treat a single multibyte character as a sequence of several char-s, which should not be divided.

@Plopx
I tried to improvise with locale and wide characters, but couldn’t get it to work. I tried setlocale(), wchars, wstrings, mbctowc or whatever else. Only hand-counting the UTF-8 symbols’ length worked for me. I’ll try it again though, to see if I missed something.

One thing I can say, the name of the puzzle is unintentionally spot-on.

durango · April 13, 2016, 3:09pm

I too tried to improvise with locale and wide characters, and it didn’t work either. It is “well-known” that wide characters are broken and should in general not be used.

magaiti · April 13, 2016, 3:50pm

#include <locale.h>
#include <wchar.h>

setlocale(LC_CTYPE, "");

    for (int i = 0; i < N; i++)
	{
        wchar_t buffer[1000];
        scanf("%ls", buffer);
        Square[i] = wstring(buffer);
    }

	// ..encoding...

	for (int i = 0; i < N; i++)
	{
        printf("%ls\n", Square[i].c_str());
    }

actually this worked for me in c++, though it’s not the ideal solution because it uses c-style i/o.
I think I have figured it out how tu make this work with c++ i/o as well.

Wildcat-SC · April 13, 2016, 11:32pm

Again C-haters

magaiti · April 14, 2016, 7:05am

What do you mean, C-haters? how about I call you a C++ hater?

Wildcat-SC · April 14, 2016, 7:20am

I mean the peeps doing these puzzles did not pay attention to C (& C++) coders, nothing more ^^ The “hater” term was pure irony.

Wildcat-SC · April 14, 2016, 7:35am

I’m still a bit frustrated about the time I lost for useless multi-byte characters (and just in validators) though ^^ Quite vicious

magaiti · April 14, 2016, 8:40am

I personally don’t consider it a time lost. I have gained some knowledge about dealing with multibyte and wide characters, after all.

Wildcat-SC · April 14, 2016, 8:57am

The time lost is the time spent to search why the validators don’t pass, not to code multibyte support. Btw, I deal with UTF-8 file names on a daily basis, and never had to use wchar_c for this

JBM · April 15, 2016, 9:24am

UTF-8 is designed to be mostly transparent in most reasonable text processing. This isn’t the case here, where it’s used as a fixed-width presentation format where individual glyph’s bytecount actually matters.

This is a rather severe shortcoming in the problem statement (that doesn’t even specify UTF8 is the encoding used), but it’s probably an honest one where the submitter didn’t realize he was using out-of-ASCII characters.

This should be a mathematical problem, not a text I/O one. Please replace the unicode with ascii and introduce a test case where N is crazy vastly superior to X^2 instead.

Dogwon · May 5, 2016, 10:51pm

There are languages besides English. This puzzle stumped me until I found this discussion. This article helped me: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) – Joel on Software