Well i think you can gain some insight even without fully understanding the code on a byte level.
I guess iād only try to do with some heuristic functions
(maybe something like sentiment analysis on code?)
and go for patterns or just plain strings (aiming for variable names) first
instead of full blown parsing, but lets maybe play with some numbersā¦
Lets take as an example the language use from the last contest, āsmash the codeā:
Total of 2493 entries (according to leaderboard)
And the top 20, looking at counts (in order of first appearance when sorted by rank)
15x c++, 1x go, 1x java, 1x c#, 1x c, 1x objc
(i would maybe start with a c++ evaluator only at this point o.O)
And the languagesā best placements
#1 c++, #2 go, ... #15 java, #16 c#, #19 c, #20 objc ...
all places inbetween filled by c++ entries.
On a side note: At place #47 comes the first python3 entry, and at #61 follows js.
The top 8 languages by number of entries:
language: c++, java, c#, py3, js, c, py2, php
their counts: 532, 528, 295, 279, 205, 157, 139, 101
accumulated sums: 532, 1060, 1355, 1634, 1839, 1996, 2135, 2236
and their percentage: 21.3, 42.5, 54.3, 65.5, 73.7, 80.0, 85.6, 89.6
(for some reason the markdown here has no tablesā¦ )
So you may chose what (expected) coverage youād like to see;
but i guess i would go by the best ranked solutions instead of popularity.
Lets also see, what is the best rank of the remainig languages not covered.
top 10 languages by ranking:
languag: c++, go, java, c#, c, objc, py3, js, lua, py2
best # : 1, 2, 15, 16, 19, 20, 47, 61, 125, 130
beatTop: 1, 14, 15, 18, 19, 46, 60, 124, 129, 133
acc.cnt: 532, 571, 1099, 1394, 1551, 1557, 1836, 2041, 2049, 2188
percent: 21.3, 22.9, 44.0, 55.9, 62.2, 62.4, 73.6, 81.8, 82.1, 87.7
Of course the ābeatTopXā is purely hypothetical,
if analysing would somehow guarantee a win against any and all solutions in that language.