For reasons that have been discussed at length on Discord, I believe the contest leaderboard that has been added in the last website update is largely inaccurate and unrepresentative of actual contest performance.
Out of curiosity, I decided to directly apply the battle-tested Kaggle’s competition ranking formula to CG contests and you can see the resulting leaderboard for yourself. It still features scaling with number of participants and time decay, but in a more nuanced manner. In my opinion, it is much more interesting and accurate than what’s currently online, so I thought others could be interested to take a look as well.
Please pardon my poor and rushed HTML skills.
Enjoy!
EDIT: Now with bots and optims leaderboards, but without time decay for those.
Would you be able to run it for the top 500 or 1000? That might be useful to compare how it behaves in that range, and for more people too.
The medals is a nice touch. Minor nitpick: it’s somewhat hard to read visually because a bronze metal (when no gold) will stack with a gold one, and because repeat numbers are not visual. It might work better with tabulating them (column for gold, silver, …) and/or with repeating the medal instead of a repeat number:
1 3 2 2
becomes
Both new options are only marginally better, so I guess for me either is fine over what’s currently in place.
However I cannot consider any of CG’s proposed formulas to be oriented towards proper competitiveness, and a good compromise seems unlikely at this point. So I will be looking to make what I’ve posted into an “official” 3rd party website updated automatically, much like others have made on various topics. Personally that’s all I will consider paying attention to.
Thanks for the honest conversation though, it’s much appreciated.
The kaggle formula seems to just care about top 0.1%, other than that is misery. In these leaderboards the 1st player has like 3x or 4x the points of the 10th player (esp. optim and multi). Thibaud’s proposal seems to have a less exagerated decay. And current formula also has a less steep curve, moving between 1.16x and 1.38x points to the 10th player.
if the excel proposal fits the three main things (push players to participate, new players can pass retired veterans and reward top 20%), I see it as a more preferable option.
Not sure I understand your perspective. It only says something about the final point distribution, but nothing about the stated goals (push players to participate, new players can pass “retired” veterans, etc) or components like time decay and rank rewards or the actual ordering mattering more than the score.
The way you portray Kaggle with these curves makes it seem impossible, while in fact, if you look at the details, you can see good examples like karliso getting top 5 with only 2 contests.
I don’t know, I think it’s misleading putting it that way.
Yes, I’m just talking about point distribution only. Having such a big difference between 1st and even 2nd player on a leaderboard doesn’t feel right. Leaderboard has inherent noise (even Kaggle had some challenges that people “win” by like 2.5pixels out of 3.2Million pixels), and remember that a leaked bot was placed both 30th and 70th, being the same code.
Karliso has “only” 2 out of 3 challenges, I think teccles is a better example, 4 challenges -> 3x 2nd places == he is on top of the leaderboard, both on kaggle formula and Thibaud’s. Imho karliso should be on global top5 when he has another good score, completing his 3 challenges.
The main purpose of a leaderboard is to give as correct a ranking ordering as possible, according to whatever criterias, first and foremost. I could easily scale the exponential component of that curve down with the same ordering, and it still wouldn’t tell you how easy/hard it is for someone new to break in, the importance of time falloff, etc. But I guess it would look more fair?
From my perspective it’s not right to expect a leaderboard like that to express how much “better” someone is relative to someone else. It’s not meant to be like TrueSkill. It’s a bit like sorting two keys by scaling the more important one by 100 or 1000. What matters is the result, which is correctly prioritizing the ordering according to several criterias that sometimes conflict with each other. If the ordering is considered right, if the difficulty is considered right for someone to break in or fall off, why should the points delta matter?
And sure, some contests have more inherent noise than others in the rankings, though I would argue it’s more often right than wrong. I don’t think equalizing that noise everywhere by completely smoothing out any available contrast like CG does is the way to go.
Anyway I get the feeling we won’t agree. I also gave up on changing CG’s mind on that, the current formula is clearly not going away. I’m hoping hosting this alternative formula will satisfy people that are more competitive minded, and in the end have a win-win situation for everyone.