[Community Puzzle] Codongame - Puzzle discussion

Coding Games and Programming Challenges to Code Better

Send your feedback or ask for help here!

Created by @Rafarafa,validated by @JeZzElLutin,@Andriamanitra and @kayou.
If you have any issues, feel free to ping them.

Hi,

Should the following be added in the statement ?:
- The START codon (AUG) does generate ‘M’ which has to be added into the string even if no close codon is present afterwards.

I’m sorry but that should not hold true unless I missed something.

Note that the sequences are only stored when scanning a stop codon (step 2). That implies that if, for a given index, the translation process terminates in an OPENED state, the current sequence is lost.

Do you have an example of a test sequence whose solution does what you claim?

ok i figured out my issue, missed that start codon is actually adding a symbol in the string so my code ended with quite strang outputs.

My comment is mistaken, thank you for your quick feedback anyhow.

1 Like

LOL, test case #5 even reveals Rick’s DNA… How did you get that?

Years of careful investigation.

I think it doesn’t hurt if I post the generator for those here: Generator - Pastebin.com

How can UGAUAAUGAAUGUGA give M?
AUGAAUGUGA starts with AUG but then AAU gives N, GUG brings V and a single A remains. Thus, there is no terminating sequence.

UGAUAAUGAAUGUGA

U-GAU-AAU-GAA-UGU-GA (No start codon => None)
UG-AUA-AUG-AAU-GUG-A (No end codon => None)
UGA-UAA-UGA-AUG-UGA- (M)
             |   |

For test = "CAUGAUGAUGUGACAUGUGAAUGUGACAUGUGA (“4. Correct lengths”)

  • CAU GAU GAU GUG ACA UGU GAA UGU GAC [AUG UGA] => [‘M’]
  • [AUG AUG AUG UGA] CAU GUG AAU GUG ACA UGU GA => [‘MMM’]
  • UGA UGA UGU GAC [AUG UGA] [AUG UGA] CAU GUG A => [‘M’, ‘M’]

But it expects ‘MMM’ and not ‘M-M’ although :

For all three starting indices return the translation that yields the most amino acids. If that translation consists of multiple sequences, return them joined by a -.

MMM contains 3 M’s
M-M got 2 only

[‘MMM’] count as 3 alhtough it is one…

MMM is one sequence containing 3 amino acids
We know each M is an amino acid because it comes from a given table

The codon_table contains every codon and their corresponding amino acid.

To pick one from multiple possible answers we count amino acids.

return the translation that yields the most amino acids

I read every detail of the statement many times (without going to external resources) to come up with this understanding. Overall it is a well written statement. It is relatively hard to comprehend because the subject matter is itself complicated.

1 Like

To complement @java_coffee_cup 's explanations, there is also this line in the constraints:

For each rna string a (non empty) solution exists and is guaranteed to be unique.

In the same test: “4. Correct length”, the three translations of the first rna are:

[‘M’, ‘M’]
[‘M’, ‘M’]
[‘MMM’]

only the amount of amino acids returns a unique solution. If you use either the amount of sequences, or let’s say the final, “-” joined string, the solution is no longer unique.

I do not understand why test case #4, “Correct length”, line 1:

AUGAUGAUGUGACAUGUGAAUGUGACAUGUGAAUGUGA

expects

MMM

I do find that starting at 0, but starting at 2 my computation yields an 8 amino acids result as follows (STDERR and STDOUT mixed together):

GUG CLOSED->OPENED
GUG -> V
ACA -> T
UGU -> C
GAA -> E
UGU -> C
GAC -> D
AUG -> M
UGA OPENED->CLOSED => VTCECDM
AUG CLOSED->OPENED
AUG -> M
UGA OPENED->CLOSED => M
AU GAU GAU GUG ACA UGU GAA UGU GAC AUG UGA AUG UGA
rna=AUGAUGAUGUGACAUGUGAAUGUGACAUGUGAAUGUGA start=2 str=VTCECDM-M count=8
VTCECDM-M

What did I miss?

Regards,

Only AUG lets you transition to the OPENED state.

Thank you. Did the Wikipedia reference table change since the puzzle was written?

https://en.wikipedia.org/wiki/DNA_and_RNA_codon_tables#Standard_RNA_codon_table

mentions GUG as a possible initiation codon, just like UUG and AUG

I don’t know, but when a rule is explicitly stated in the statement, the puzzle should be based on the stated rule first. And in this case, it’s stated:

Let’s also name UAA, UAG and UGA the stop codons, and AUG the start codon.

Oh, I had missed that. I relied on the default code that mentioned stop codons but not the start codon(s). That and mentioning a contradicting reference were the two confusing bits for me.

I am so confused as to what output you want here.

I tried outputting the longest sequence that can be read from any of the three reference frames.

I tried outputting three sequences, the longest one for each frame.

How else are we meant to interpret “For all three starting indices return the translation that yields the most amino acids.” ?

It definitely doesn’t seem possible to get the four different lengths in “MNEVER-MGQNNA-MGIVE-MYQV-MVP”

Which case are you referring to, and could you explain the results you obtained?