https://www.codingame.com/training/hard/basic-decision-tree---1
Send your feedback or ask for help here!
Created by @Wei-1,validated by @bbb000bbbyyy,@kotobotov and @Bibou.
If you have any issues, feel free to ping them.
https://www.codingame.com/training/hard/basic-decision-tree---1
Send your feedback or ask for help here!
Created by @Wei-1,validated by @bbb000bbbyyy,@kotobotov and @Bibou.
If you have any issues, feel free to ping them.
You might want to explain what to do when you get a group of different species but same horn size.
The description only says to keep dividing until we get 0 entropy but since we can’t divide a group with all same horn size, we get stuck.
It took me a while to understand that was also a final condition.
Also, there are a few grammar mistakes in the description, like “step 1. we choice 1 beetle pupa” (choice -> choose).
Nice puzzle overall.
Too bad that I read Mercy38’s comment a bit late. I had exactly the same issue with TEST 04, causing an extra hour of debugging, which could have been avoided with a better puzzle statement…
2
0 1 1
1 1 2
what should be the output for this input?
My solver returns 0, if I add this as custom test case.
Mine returns None, lol
Thanks, I think 0 is fair enough.
If we select pupa-1 as the separator, pupa-0 will be separated into one sub-group, pupa-1,2,3 will be in another sub-group.
Pupa-1,2,3’s group have a binary entropy of 0.9183.
Based on the example I tried different sets of probabilities and finally found the correct approach.
In this case, there should only be two P in the bigger group (containing 3 samples pupa-1,2,3 ). The array of P is [1/3, 2/3] because 1/3 of the samples are of type 1 and 2/3 of them are of type 2 (the column 3 values).
Binary entropy is 1/3 * log2(1/3) + 2/3 * log2(2/3), got 0.91829
Making the probabilities correct is the most difficult part for me because P is not very accurately defined
This formula implies we should have a prior complete knowledge of all samples, and know all the classification results before calculation. It also means that we cannot add additional samples to the calculation midway.
Long time ago I touched upon speech recognition model training and have an impression that samples are often dynamically added to improve the training result. In practice does decision tree always use a static “god-mode” sampling?
Wei-1,
I solved it with no calculations at all. 100%! I just looked at the pattern in the data.
I used Java, so you can look at my code.
I recommend increasing the number of validators and the data in each. Then solutions like mine would fail – forcing me to figure out the probability and then entropy, but I did none of that. Perhaps my solution is correct, more validator data would demonstrate that too.
Also, why are you not parsing the three values in the generated starter code? I had to start another puzzle and copy paste the parsing code from there.
I love this kind of puzzle, but this one needs some work in my opinion.