[Community Puzzle] Literary Alfabet Soupe - Puzzle discussion

Coding Games and Programming Challenges to Code Better

Send your feedback or ask for help here!

Created by @David_Augusto_Villa,validated by @Eulero314,@TBali and @BerserkVl.
If you have any issues, feel free to ping them.

TLDR: in C++ unicodes are a joy to work with.

In your test cases you use the unicode \u2019 (ā€™ == RIGHT SINGLE QUOTATION MARK) as an apostrophe instead of '. I treated it as a 2bytes character (like every other unicode char you provide) but it turns out it is a 3 bytes wide character. For all test cases and validators except validator 2, this misinterpretation poses no problem. On validator 2 however it is, asā€¦

validator 2 spoiler

ā€¦skipping 1 char when we encounter ā€™ in ā€œlā€™Ć¢meā€ made me skip the ā€˜Ć¢ā€™ altogether, and therefore I kept both French AND Italian as valid languages for line 1 and 9 and couldnā€™t pass validator 2 in the first place. I passed it by considering " ĆØ " as italian-only before figuring out that ā€™ was actually 3 char wide and coming here to rant.

Still enjoyed the solve, cheers.

1 Like

Thank you for reporting that. Yes, Iā€™m aware there are difficulties, and it was not my intention to spring upon you unexpected surprises like that. I will modify all punctuation to 1 or 2 bytes. It would also be nice to link to some sort of guide on CodinGame which discusses the issues with Unicode, in particular to at least state which languages do not have support.

This puzzle is supposed to have the tags: Natural language, Unicode, Constraint propagation, Exact cover

This is a relatively new area for CodinGame. It was worse previously: the console had never displayed non-ASCII characters correctly (so I would have outright rejected your contribution for that reason alone if you had published it back then). I raised the issue a few months ago, and it took two or three rounds of fixes from the CodinGame staff before such characters were displayed properly.

If anybody has insights into how well different programming languages on CodinGame handle Unicode, I encourage you to start a new forum thread or a tech.io playground on the topic. Your contribution would be greatly appreciated!

1 Like

I found the most difficulty in C/C++, Lua, Objective-C, Pascal, Perl, and PHP.

I had no issue solving this in PHP. You just have to remember to use the mbstrings standard library functions and not the "normalā€™ string functions.
There was one thing needed extra care: CG supports so ancient php version that the ā€œsplit unicode string to lettersā€ function is missing, but there is a simple documented workaround available.
My code is like this:

    if (version_compare(PHP_VERSION, '7.4.0', '<')) {
        $letters = preg_split('//u', $text, 0, PREG_SPLIT_NO_EMPTY) ?: [];
    } else {
        // @phpstan-ignore-next-line function.notFound
        $letters = mb_str_split($text);
    }
1 Like

I am not sure whether my browser can display all diacritics correctly to allow me to copy the information correctly. Below is the image I can see from my browser.
dia1
Could someone check and affirm that this display is correct? (particularly on the Irish line)

It is correct:
image

Thanks nicola.
Then there is a further problem.
dia2

In the Moby-Dick test case, the Irish string contains a latin cap letter I with acute. This letter is not included in the basic information table.

The table includes the lower case letter i with acute. Is the puzzle assuming coders should transform all lower case diacritics letters into upper case, or reverse, themselves?

It is potentially difficult because not everyone have a general knowledge in dozen of human languages.

  1. No one affirms that whenever a lower case letter exists in a language, the upper case letter must also exist.
  2. Could there be some letters that do not have a upper or lower case classification, that if I enforce it to change case that may cause error? (is the ā€œdot above Iā€ letter in Turkish a upper case?)
  3. I am not sure whether all programming languages supported in CG have a .toUpperCase() function that works for all diacritics letters.

Better to have the table be complete with all forms of possible letters at least enough to cover all test cases.

Yes, you can translate the text into lowercase. Uppercase and lowercase exist in these languages written with Latin characters.
As far as I know, in Turkish, the uppercase of ı is I and the uppercase of i is İ.

Welcome to natural language processing! Yes, languages are complicated. The uppercase of German eszett (Ɵ) is two characters SS, and Turkish differs in treatment of i as nicola described. As far as I know the only other exception in letter case is Greek sigma (Ī£), which has two lower-case versions depending on the position in the word. But as far as language goes, weā€™re barely scraping the surface here!

Anyway, in this puzzle you can actually ignore uppercase letters and still pass all the tests and validators. However you choose to solve the puzzle will work, if you make it work. Personally I think itā€™s valuable to have real world challenges, and truly earn the Natural Language and Unicode badges. Speaking of which, what happened to those tags?

Edit: Iā€™ve updated the description to include an example in English which at least hints at there being uppercase variants.

I was surprised to look thru solutions, because the puzzle turns out to be a bit easier than intended. Thereā€™s another case that never made it into any of the tests or validators. I know I needed to cover both cases at one point during its creation, so I donā€™t know when exactly that changed. Thus far about 40 people have attempted, and adding additional tests would only affect 30 some solutions. You all would hate me tho.

You can post your extra test cases here to let anyone try it and get the code improved, yet does not affect submitted codes.

1 Like

OK this is very weird but I may have been wrong about an additional case potentially being needed. Itā€™s hard to talk about without giving spoilers, but basically right now for the check that needs to be done (that all the working solutions are doing), whatā€™s being checked for always occurs because the setup is nice by design and always has a solution. In situations where the algorithm might get stuck, thereā€™s another check that can be done which I had assumed would be needed as well. However, that other case appears to only occur when there isnā€™t a single solution. In these situations, the additional check will get you further than just the one, but ultimately, at least as far as Iā€™ve been able to work out, there will be an unresolved ambiguity. This I saw a lot when designing the puzzle because unresolvable scenarios popped up a lot. I didnā€™t realize one check might actually be sufficient once those bad test cases were corrected or eliminated.