[Community Puzzle] Basic Decision Tree - 1


Send your feedback or ask for help here!

Created by @Wei-1,validated by @bbb000bbbyyy,@kotobotov and @Bibou.
If you have any issues, feel free to ping them.

You might want to explain what to do when you get a group of different species but same horn size.
The description only says to keep dividing until we get 0 entropy but since we can’t divide a group with all same horn size, we get stuck.
It took me a while to understand that was also a final condition.

Also, there are a few grammar mistakes in the description, like “step 1. we choice 1 beetle pupa” (choice -> choose).

Nice puzzle overall.

1 Like

Too bad that I read Mercy38’s comment a bit late. I had exactly the same issue with TEST 04, causing an extra hour of debugging, which could have been avoided with a better puzzle statement…

0 1 1
1 1 2

what should be the output for this input? :face_with_monocle:

My solver returns 0, if I add this as custom test case.

Mine returns None, lol

Thanks, I think 0 is fair enough.

If we select pupa-1 as the separator, pupa-0 will be separated into one sub-group, pupa-1,2,3 will be in another sub-group.
Pupa-1,2,3’s group have a binary entropy of 0.9183.

Based on the example I tried different sets of probabilities and finally found the correct approach.

In this case, there should only be two P in the bigger group (containing 3 samples pupa-1,2,3 ). The array of P is [1/3, 2/3] because 1/3 of the samples are of type 1 and 2/3 of them are of type 2 (the column 3 values).
Binary entropy is 1/3 * log2(1/3) + 2/3 * log2(2/3), got 0.91829

Making the probabilities correct is the most difficult part for me because P is not very accurately defined

This formula implies we should have a prior complete knowledge of all samples, and know all the classification results before calculation. It also means that we cannot add additional samples to the calculation midway.

Long time ago I touched upon speech recognition model training and have an impression that samples are often dynamically added to improve the training result. In practice does decision tree always use a static “god-mode” sampling?


I solved it with no calculations at all. 100%! I just looked at the pattern in the data.

I used Java, so you can look at my code.

I recommend increasing the number of validators and the data in each. Then solutions like mine would fail – forcing me to figure out the probability and then entropy, but I did none of that. Perhaps my solution is correct, more validator data would demonstrate that too.

Also, why are you not parsing the three values in the generated starter code? I had to start another puzzle and copy paste the parsing code from there.

I love this kind of puzzle, but this one needs some work in my opinion.