Clash of Code “shortest” competitions use the wrong measure of code size. They count Unicode code points, not source code size in bytes.
The most annoying abuse of this that I’ve seen is in the Python .encode() hack, with:
exec('[gobbledygook ]'.encode('utf-16'))
…where the [gobbledygook] in the string literal is the result of using bytes(’[original source]’, ‘utf-16’) on the original source code.
I claim it’s a cheat since it takes 2-3 times as many bytes as original code characters to actually represent the gobbledygook string.
I’m putting this in Platform Evolution, since implementation (if any) of the idea would involve modification to both the scoring of clashes and to the editor. I’d suggest this for the Code Golf competition area too, but that has a bigger potential impact by changing scores and standings on existing submissions.
Of less concern, but maybe still an issue, is that extra charge for UTF-8 encoding of non-ASCII characters would be culturally biased toward English (or any other language that can use the 26-letter Roman alphabet…Hawaiian, for example. I say “less concern” because the programming languages used here all have an ASCII bias built-in.