I apologize in case this is/was already explained somewhere, but I wasn’t able to find it on the forums (I’m still feeling a bit lost here).
I looked a bit at the scoreboard while last tests were finishing and it looked like you just added more tests without reducing variance in the formula (or reducing the weight of each game). If I understand correctly, for the people at the top (including me), the current (final?) ranking really depends only on the last few games since everyone at the top of the ranking has a rather similar win ratio. Locally, I needed around 2-3K games in order to decisively tell that an AI that has 52-53% win ratio is indeed better. While I know you don’t have that many resources available to do it very precisely, currently it looks like throwing a dice in order to decide top3 (or maybe top4 since it looks like eldidou was unfairly affected by a bug/feature). And well, spending huge part of the night on improving AI, so that all of my efforts goes to waste feels a little bit disheartening.
And I still hope, that someday you’ll get a proper ranking system that takes into account that AIs have fixed skill levels, not fluctuating ones. Thus, each game should have equal weight (except for confidence stuff).