Coding Games and Programming Challenges to Code Better
Send your feedback or ask for help here!
Created by @David_Augusto_Villa,validated by @Eulero314,@TBali and @BerserkVl.
If you have any issues, feel free to ping them.
Coding Games and Programming Challenges to Code Better
Send your feedback or ask for help here!
Created by @David_Augusto_Villa,validated by @Eulero314,@TBali and @BerserkVl.
If you have any issues, feel free to ping them.
TLDR: in C++ unicodes are a joy to work with.
In your test cases you use the unicode \u2019 (ā == RIGHT SINGLE QUOTATION MARK) as an apostrophe instead of '. I treated it as a 2bytes character (like every other unicode char you provide) but it turns out it is a 3 bytes wide character. For all test cases and validators except validator 2, this misinterpretation poses no problem. On validator 2 however it is, asā¦
ā¦skipping 1 char when we encounter ā in ālāĆ¢meā made me skip the āĆ¢ā altogether, and therefore I kept both French AND Italian as valid languages for line 1 and 9 and couldnāt pass validator 2 in the first place. I passed it by considering " ĆØ " as italian-only before figuring out that ā was actually 3 char wide and coming here to rant.
Still enjoyed the solve, cheers.
Thank you for reporting that. Yes, Iām aware there are difficulties, and it was not my intention to spring upon you unexpected surprises like that. I will modify all punctuation to 1 or 2 bytes. It would also be nice to link to some sort of guide on CodinGame which discusses the issues with Unicode, in particular to at least state which languages do not have support.
This puzzle is supposed to have the tags: Natural language, Unicode, Constraint propagation, Exact cover
This is a relatively new area for CodinGame. It was worse previously: the console had never displayed non-ASCII characters correctly (so I would have outright rejected your contribution for that reason alone if you had published it back then). I raised the issue a few months ago, and it took two or three rounds of fixes from the CodinGame staff before such characters were displayed properly.
If anybody has insights into how well different programming languages on CodinGame handle Unicode, I encourage you to start a new forum thread or a tech.io playground on the topic. Your contribution would be greatly appreciated!
I found the most difficulty in C/C++, Lua, Objective-C, Pascal, Perl, and PHP.
I had no issue solving this in PHP. You just have to remember to use the mbstrings standard library functions and not the "normalā string functions.
There was one thing needed extra care: CG supports so ancient php version that the āsplit unicode string to lettersā function is missing, but there is a simple documented workaround available.
My code is like this:
if (version_compare(PHP_VERSION, '7.4.0', '<')) {
$letters = preg_split('//u', $text, 0, PREG_SPLIT_NO_EMPTY) ?: [];
} else {
// @phpstan-ignore-next-line function.notFound
$letters = mb_str_split($text);
}
I am not sure whether my browser can display all diacritics correctly to allow me to copy the information correctly. Below is the image I can see from my browser.
Could someone check and affirm that this display is correct? (particularly on the Irish line)
It is correct:
Thanks nicola.
Then there is a further problem.
In the Moby-Dick test case, the Irish string contains a latin cap letter I with acute. This letter is not included in the basic information table.
The table includes the lower case letter i with acute. Is the puzzle assuming coders should transform all lower case diacritics letters into upper case, or reverse, themselves?
It is potentially difficult because not everyone have a general knowledge in dozen of human languages.
Better to have the table be complete with all forms of possible letters at least enough to cover all test cases.
Yes, you can translate the text into lowercase. Uppercase and lowercase exist in these languages written with Latin characters.
As far as I know, in Turkish, the uppercase of ı is I and the uppercase of i is İ.
Welcome to natural language processing! Yes, languages are complicated. The uppercase of German eszett (Ć) is two characters SS, and Turkish differs in treatment of i as nicola described. As far as I know the only other exception in letter case is Greek sigma (Ī£), which has two lower-case versions depending on the position in the word. But as far as language goes, weāre barely scraping the surface here!
Anyway, in this puzzle you can actually ignore uppercase letters and still pass all the tests and validators. However you choose to solve the puzzle will work, if you make it work. Personally I think itās valuable to have real world challenges, and truly earn the Natural Language and Unicode badges. Speaking of which, what happened to those tags?
Edit: Iāve updated the description to include an example in English which at least hints at there being uppercase variants.
I was surprised to look thru solutions, because the puzzle turns out to be a bit easier than intended. Thereās another case that never made it into any of the tests or validators. I know I needed to cover both cases at one point during its creation, so I donāt know when exactly that changed. Thus far about 40 people have attempted, and adding additional tests would only affect 30 some solutions. You all would hate me tho.
You can post your extra test cases here to let anyone try it and get the code improved, yet does not affect submitted codes.
OK this is very weird but I may have been wrong about an additional case potentially being needed. Itās hard to talk about without giving spoilers, but basically right now for the check that needs to be done (that all the working solutions are doing), whatās being checked for always occurs because the setup is nice by design and always has a solution. In situations where the algorithm might get stuck, thereās another check that can be done which I had assumed would be needed as well. However, that other case appears to only occur when there isnāt a single solution. In these situations, the additional check will get you further than just the one, but ultimately, at least as far as Iāve been able to work out, there will be an unresolved ambiguity. This I saw a lot when designing the puzzle because unresolvable scenarios popped up a lot. I didnāt realize one check might actually be sufficient once those bad test cases were corrected or eliminated.