Genome Sequencing puzzle discussion

3
TT
AA
ACT
I think this gives TTAACT which length is 6 but the testcase output is 5
Can someone please explain me this!!!

The explanation is in the name of the test case :wink:

:sweat_smile: :roll_eyes: :roll_eyes:
All my testcases are Ok but this one :sneezing_face: :sneezing_face: :sneezing_face:

I have completed this puzzle today, with 2 sessions of 1h and 4h of coding. I thought i was going to spend the entire night (or the entire week lol).

The key of my program as the function finding the longest common substring problem between two strings.

I start by storing all the words given in a list, and while this list has more than 1 element, I pop the last two words and determine their LCS, then i concatenate the two words if the LCSP is the prefix of one and the suffix of the other. Ex: LCS(TGACA,CATGA) returns TGA, and the word concatenated is CATGACA. Then i put this word in the list and start again (unless it’s the only word).

Test 7 failed. Let’s see why. The words given are [CCCTG,TGACA,CATGA].
We pop CATGA and TGACA, and get the LCS TGA, so the list become [CCCTG, CATGACA].
LCS(CCCTG,CATGACA) is the empty word so we must concatenate them into CCCTGCATGACA, with a length of 12 characters.

I first failed to notice that this exemple (given by test 7) can in fact produce another concatenated word (TGA[CA]TGA). So the list would then be [CCCTG, TGACATGA], and LCS(CCCTG,TGACATGA) = TG, and since it’s a prefix of CCCTG and a prefix of TGACATGA, we can concatenate them into CCCTGACATGA, with a length of 11.

So we have to check all combinations because going greedy here does not always give the minimum.
In the case of test 7, there are now 2 instances of the problem : [CCCTG, TGACATGA] and [CCCTG, CATGACA]. We have to evaluate both of them and keep the shortest superstring.

Of course, we also have to check if the LCS is one entire word: LCS(ATG,TG) = TG so we can keep ATG as the concatenated word.

Sorry for potentially bad english.

1 Like

Sure, if you solve it, you can clone people… in Python. :shushing_face:

Another test that helped me to hunt down a bug in my logic was this one:

5
CAT
TAG
GTA
CATAGT
CATAGTA

Output: 7

General advice
If you suffer from not knowing which case exactly breaks your solution you can consider building a test case generator with reverse logic: generate a sequence with a random (but known) length, then grab random substrings from this full sequence and your test case is ready. Make sure your substrings cover the whole sequence though.
You can wrap this into a loop and feed generated cases to your main function (solution) to test if you evaluate correctly.
When it fails - you know what to debug.

1 Like

Hello, I can’t figure out what to do to pass test 10 “example 3 (identical sequences) natural order”.
May I ask the help of the community please?
I checked the forum and no one seem to have encountered a problem with this test.

As the puzzle was created by CG, I have no idea what the validator looks like. You may want to check the general advice given in this thread, e.g. (just randomly chosen, so not limited to the two links below)
https://forum.codingame.com/t/genome-sequencing-puzzle-discussion/24/5
https://forum.codingame.com/t/genome-sequencing-puzzle-discussion/24/89

I finally triggerred manually the right input to get the bug out.

Hi CG! My score is 86% with to remaining tests to validate:

  • test 7: Sequence included
  • test 8: Sequence included, reverse order
    I thought it was the same logic as test 4 (AGATTA - GAT) and test 5 (GAT - AGATTA) but it is not.
    Can you please give me input/output example for these 2 tests?

Does this help?
https://forum.codingame.com/t/genome-sequencing-puzzle-discussion/24/72

Not really, do you have example of input/output?
I don’t know what needs to be done here compare to test 4 and 5

Sorry, I don’t have any example either. This puzzle was created by CG instead of the community, so I don’t have any access to the validators.

You may consider reading all the posts in this discussion and see whether other hints help you.

Yes check out genome assembly Sequence assembly - Wikipedia
When you sequence DNA, you get lots of small fragments of DNA that you have to string together to have a full sequence.

It’s a lot more complicated IRL (because of repeats, the fragments are a lot longer, you might have lots of reads mapping to the same place on the genome, etc) but i think this is a good analogy for what you’re actually doing

Hello,
I do not know if it is disjointed or not, but my code failed trying to parse this sequence:

HINT: instead of contain, use endwith…

CGT
GTAT
ATTCA
CCGTAT
TATCA

1 Like

I have only failed in validation test 13 “Sequences included and disjointed”
what does that mean can i have the inputs for that test?

Only the CG staff know what that validator is. As to your question, you may search and refer to the previous discussion above, as some players gave suggestions and hints regarding that validator.

Ok i have found the problem
Thanks.

Hi,

I’m stucked at “Sequences included and disjointed”. I don’t understand what to do with this test. I can’t get a clue of the test case to highlight my code issue.
I already tested all sequences that have been posted here.

I covered A contains fully B, B contains fully A, A ends with B (partially ), B starts with A (partially )

@Doc415 ,
As you found the issue on your side, could you share a sample of sequence to hightlight the issue? thanks,

I am on a trip now and i dont have a computer with me. As far as i can remember i added two conditions to solve it. The solution should include

A starts with B or
B starts with A
A ends with B or B ends with A
Both should also be checked

This is all i can remember. If that is not enough, i would like to help when i go back home in few weeks. Good luck