Genome Sequencing puzzle discussion

I too would like to know what the “Multi-Solutions” actually test for? Is it that N is larger than 3? I can see that the limitations are 6 but how would you go about such a genome? When the sequence must contain all other sub-sequences

I was stuck on this one (“Almost superimposed”) for quite some time, and now I feel a bit stupid about it… if anyone’s having trouble passing this test case, I recommend the following custom case:

3
AAC
TC
TTAACCTT

Shortest string will be “TTAACCTTC” (9). Hope this helps!

2 Likes

The strange thing is - my code does return 9 on this testcase. And still fails in “Almost Superimposed”.

I need to read through the code and debug that code to figure out what is wrong.

I did end up writing it from scratch and solved it 100%.

Kudos to the codingame team for the history of submissions. I can revisit my old code and find out what went wrong. Will get on it once I go through all the other games.

My algorithm passed all the tests except
7. Sequence included
8. Sequence included, reverse order

Could anyone please suggest test inputs for this cases?
I’ve already tried the cases mentioned in this topic and they are all OK.

I’ve found the flaw in my logic and now it works fine.
It turns out I had incorrect priorities when deciding how to concatenate strings (I checked first if they intersect and then if they did not, I checked for inclusion, which was incorrect)

1 Like

I’m bit disappointed. I spent a lot of time trying to develop solution using dynamic programming with multi-dimension arrays. But this is not needed as simplest solution works as well, even notoptimized.

Thanks a lot man, i was stuck on this particular validator and your test case helped me find my mistake.
Cheers.

I also failed this test case only. The input that helped me is:
3
TAA
TTB
TAACCTT

and output must be:
TAACCTTB which is 8

2 Likes

Glad I could help! :slight_smile:

I am looking for Test Cases for:
-Multi-solutions
-Mirror, reverse order
-Mirror
-Example 3, but with the first two words nested

Thank you, was only missing this test, it made my day :slight_smile:

Hello! it’s possible to show some exemple for test 13 (Sequences included and disjointed) plz ?

Best test case for “Almost superimposed”, got me to 100%. Thanks!

Great task. But that A+C=AC should be included in the tests: don’t make people guess what’s wrong when they have 100% tests and 93% validations passed. Especially when the problematic validator has such misleading name (AC is input too, not output).

My code also fail on “Sequences included and disjointed” all test case from previous reply are here
[‘BACA’, ‘GATTACA’, ‘TT’] = 11
CA, TCAT, CAGG = 8
ATAC, TAC, GT = 6

I think it’s all correct, right ?
Have no idea anymore.

3 Likes

Hey, I’m mostly curious about whether if I solve this problem properly, would I actually be solving a real problem.

I mean is this genome sequencing problem like something that real geneticists deal with?

3
TT
AA
ACT
I think this gives TTAACT which length is 6 but the testcase output is 5
Can someone please explain me this!!!

The explanation is in the name of the test case :wink:

:sweat_smile: :roll_eyes: :roll_eyes:
All my testcases are Ok but this one :sneezing_face: :sneezing_face: :sneezing_face:

I have completed this puzzle today, with 2 sessions of 1h and 4h of coding. I thought i was going to spend the entire night (or the entire week lol).

The key of my program as the function finding the longest common substring problem between two strings.

I start by storing all the words given in a list, and while this list has more than 1 element, I pop the last two words and determine their LCS, then i concatenate the two words if the LCSP is the prefix of one and the suffix of the other. Ex: LCS(TGACA,CATGA) returns TGA, and the word concatenated is CATGACA. Then i put this word in the list and start again (unless it’s the only word).

Test 7 failed. Let’s see why. The words given are [CCCTG,TGACA,CATGA].
We pop CATGA and TGACA, and get the LCS TGA, so the list become [CCCTG, CATGACA].
LCS(CCCTG,CATGACA) is the empty word so we must concatenate them into CCCTGCATGACA, with a length of 12 characters.

I first failed to notice that this exemple (given by test 7) can in fact produce another concatenated word (TGA[CA]TGA). So the list would then be [CCCTG, TGACATGA], and LCS(CCCTG,TGACATGA) = TG, and since it’s a prefix of CCCTG and a prefix of TGACATGA, we can concatenate them into CCCTGACATGA, with a length of 11.

So we have to check all combinations because going greedy here does not always give the minimum.
In the case of test 7, there are now 2 instances of the problem : [CCCTG, TGACATGA] and [CCCTG, CATGACA]. We have to evaluate both of them and keep the shortest superstring.

Of course, we also have to check if the LCS is one entire word: LCS(ATG,TG) = TG so we can keep ATG as the concatenated word.

Sorry for potentially bad english.

1 Like