I just finished this puzzle and it seems like I took a substantially different approach than anyone else I see commenting.
I’m wondering if somebody than understands this puzzle better than me could comment on the computational complexity of my approach compared to what others’ have done? Is mine better? Is mine worse? Am I just doing the same thing as others with more steps?
I computed a finite state machine for parsing the morse code as a function of the word dictionary, then ran the morse string through my state machine. (Similar to regex compilation)
I forked the state whenever multiple moves were possible along the machine (i.e. a word ended, but more characters might still complete a longer word). I also forked the state whenever a state resulted in multiple local interpretations (ex: “ET” or “A”). I merged the state whenever multiple forks collided on the same state (different strings terminated at the same character). I kept track of how many machines had been merged together as I went.
At the end of the string, all valid interpretations would collide on the starting state of the machine, and I could count how many machines had been merged together on that state, which gave the number of valid interpretations.
If you merge the forks that collide into the same state, I’d say it’s equivalent to memoization (what most people used for this problem).
The purpose is the same : don’t launch the same exploration twice.
Pretty fun to solve, but the only reason this is a “Very Hard” problem is that the test cases suck Based on other replies, it looks like it’s easy to come up with a solution that works for most of the tests, but fails on the last one that’s WAY too big to work through manually to figure out what’s wrong. I’d recommend inserting another test case there to highlight what seems to be a very common error.
It was fun and instructive, I learnt the concept of memoizaiton (although I had used it before without knowing the name).
Just tere was something a little frustrating with the test cases.
My code had a mistake, when it found a matching word, it didn’t continue trying other words in the same level.
The frustrating thing is that this code passed all the testing tests, but it failed some of the validations tests. Therefore, it was hard to find the mistake, but I could do with the validation tests title.
This was the wrong code:
if (message.startsWith(w, start)) {
if (w.length == message.length - start) return 1; // WE SHOULD TRY THE REST OF WORDS!!!
levelCombinations += tryWords(message, words, combinations.concat([i]), start + w.length);
}
Hello
My solution gets 140737488355328 with @SinOfWrath 's example.
What I do at the beginning is store in the dictionary only the morse code of each word in the input, so when we reach to a root case where L[p:] == word-code, the number of possible solutions increases by R(word-code), being R(word-code) the number of times a word with the same morse code as word-code was declared while reading the dictionary input.
Looking into adding new IDE test cases for the puzzle, I think @SinOfWrath 's suggestions are great, I’m trying to see if it’s possible to improve them slightly.
The tests are as follow:
Big result (integer overflow gotcha):
Input:
..............................................
2
E
I
Expected output:
2971215073
This is basically the same as the original, but with dots and only 47 of them. I reduced the number of dots it to 47 so that the result is the 47th Fibonacci number, which is the first greater than the maximum positive value of a 32-bit signed integer (usual default when declaring variables as ‘int’). The hope is that this can ensure that the result, in the case where the count uses 32 bit signed integers, overflows once at the end and becomes negative, which I hope will be enough to point out the issue in the code.
The fact that it uses dots with I and E changes nothing in the end.
Ambiguous words (“words with the same morse code representation” gotcha)
Input:
-.-..---.-..---.-..--
5
CAT
KIM
TEXT
TREM
CEM
Expected output:
125
Same idea as for the second proposition, here, all words in the list are encoded as -.-..--, which is repeated 3 times in the input morse code. I am not sure about keeping the CEM and TREM words, as having the 3 others would already be enough. If these words are removed, the answer would become 27.
I am looking for feedback on these, so please tell me what you think about these test cases, whether you like them or not, and if not, why.
Regarding the ambiguous words test case. If you reduce the number of words to only 3, the result is 3^3 and based on my experience people get easily mistaken whether that’s lengthoptions or optionslength. Most people also tend to make a quick judgement call and code it immediately. Now that can cause some extra debugging for sure. I do realise that if you do it dynamically it should not be a problem though.
Also @Thetos, do you actually know someone who can add these test cases?
@SinOfWrath I’m not sure I understand what you are trying to say regarding the number of options, is it that you think keeping 5 options is better if there is 3 repeats ? or more generally, just not doing number of repeats == number of options ?
And yes, I do know people who can add these, such as myself :).
After all, I am currently an intern at CG. (Maybe I should have said that in the original post).
Second, I saw this is your first post and I thought you assumed we can edit this puzzle like we can with community contributions (after a certain level), hence my confusion. But you are an insider now.
And I do think number of repeats == number of options is making it harder, so 5 options, 3 repeats is better. The solution is both small and simple either way.
Lastly, and this is just me who has bad eyesight, I can count .s much harder than I can count -s, that’s why I suggested dashes but that it is really not a big deal. Have some fun naming the test cases.
Hi! The last test is not passing for me, too slow as well. For now I’m making a BFS of all indices that I’m reaching, and when the indice == string length, then I increment my “win counter” …
I’m memoizing the children for a sub sequence.
But it takes 8 minutes for the sixth test, compare to almost only 5ms for the rest.
So I checked the forum, and everybody is talking about trie algorithm. I’ve never used it, and I can’t understand how I could use it to solve this problem.
Can someone give me a hint to help me think about a solution using trie?
A trie is a type of data structure. You may read more about it here: https://en.wikipedia.org/wiki/Trie
It is more efficient than some other data structures or search algorithms.
You may also search further to understand it more.
Seeing that there are dozens of published solutions in PHP, I think it is doable. Maybe some of the discussion above will inspire you? Or maybe take a break and come back to the puzzle later with a fresh mind.
Could someone explain why is this marked as “very hard”?
I’m used to other “very hard” problems being highly abstract and taking me several hours and some amount of confusion to properly solved, this took no longer than a clash of code. Is memoization actually that little-known of a strategy? Or is there some other reason? Even that alone seems like it shouldn’t put it above hard if it is classified as an obscure strategy.
Hello,
I’m just wondering why in the third test (Simple message) i’m right when the output is 2 (solution : “HELL” and “HELLO”) and i’m wrong with the second test (Correction detection of a word) when the output is 2 (solution : “G” and “GOOD”) knowing that the expected output is 1 in this second test
Hello, I was stuck at 70% too in PHP yesterday, using a simple queue, but today I made a recursive function with memoization, and all the IDE tests are now below 0.4ms, except one who take an average 38ms. Got 100% in submission. My dictionnary is build with morse pattern of the word as key, and number of words for this pattern as value, if it can help. That fit in about 25 lines of (dirty) code.