As you can see, pb4 has a 63% win rate against DerRadikaleRusse. Agade has a 72% win rate against DerRadikaleRusse.
The problem is the following : At the end of the final rerun, DerRadikaleRusse was the last player playing additional games (i don’t know why, his AI is slower than other AI, maybe he’s very good at surviving until the end). When DerRadikaleRusse play a Bo5 against pb4 or Agade, he will always lose.
The problem is that Agade has a better winrate against him. But the Bo5 just destroy this advantage. At the end, Agade and pb4 were near at the same ELO. If DerRadikaleRusse would have played more games against pb4 than against Agade, pb4 would be the 2nd at the leaderboard.
Bo5 is a very nice feature for humans. Because humans can change their level, they can hide a tactics and play it at the second game. AI can’t do this.
1000+ games by player is very very very very good for the final rerun. Don’t change it. But i think the Bo5 should be removed.
Good catch IMO. The side effect of this is both that pb & Agade where gaining trueskill points and that DerRadicaleRusse was on the opposite loosing points.
But I think BO5 can bring added values in some cases where the level of the bots are really close and that can attenuate the unexpected win/loss against another player with a great difference of score. Meaning for example top1 loosing against the top10 would not happen in a BO5 when it can on a single match.
The real problem you raise is that the bots that take more CPUs ends up playing alone in the last run no? But maybe the fix could be that the algorithm starting the matches on the server pools is taking that into account and is not just triggering a new match for a player when its bots just finished a single match. So that in the end everyone finishes its match at the same time
What do you think?
Yeah, definitely good catch by Magus. Bo5 is questionnable.
But the bigger issue is that TrueSkill is time dependent.
And as far as I remember, you can get the ranking of two players reversed just by permuting the order in which you feed the TrueSkill with the same battle data. Magus’ observation (i.e. one slower bot) just exacerbates this problem.
Furthermore, the TrueSkill ranking can sometimes go “rogue” against the jugdement by pure winrate. This is by the very nature of the system that is made to rank humans who can evolve or loose skills. AI bots on CG do not evolve after the last submission.
So I think there should be a final round round between top 5 (or top 10) players that makes a ranking on a purely winrate basis with the full everyone-against-everyone matching scheme. The model was already discussed by pb4 elsewhere on the forum (it’s using maximal likelihood estimator). For as few as 5 or 10 players it’s totally doable.
And with this super-finals none of the issues discussed here would occur.
Manwe : Best-of-n systems are useful in one precise situation. It is when you need a single result with less volatility.
This is not the type of situation we are in : Trueskill + averaging is perfectly able to deal with a lot of random information.
If the budget is a total number of games, I think it is much better to have 1000 BO1 results than 200 BO5.
Regarding the observations made during the contest :
After 20% of the re-run was done, I don’t think that anybody from top 3 had a red arrow indicating a decreasing score.
Agade and I still saw our score go up even when we lost a BO5
The re-run ended with the scores still going up, as if not stabilized
My interpretation of what happened is the following :
I strongly believe that we have some kind of “hidden score” used for averaging. This hidden score goes up during the re-run. When a match is played, this score contributes with a small fraction to the score displayed on the leaderboard. Since the hidden score is higher than the displayed score, every time a game is played the displayed score goes up.
Hence, it feels that the final winner was decided because of a number of games played, instead of an actual win-rate. (granted this feeling only arises if the actual win-rates are close )
How to make things better :
STOP DESTABILIZING THE LEADERBOARD DURING A RE-RUN !
DerRadikaleRusse was the last playing games during the rerun but i don’t think it’s a problem. Maybe he just has an AI specialized in surviving. The problem i’m pointing is not here anyway.
I think that it’s not just your feeling, but a real systematic bias.
More precisely, the TrueSkill uses a “mean” and “uncertainty” parameters for each player.
The mean is roughly the winrate (but it may depend also on the order you win and loose your battles).
The uncertainty generally diminishes with more games played.
And the rank is computed as the mean minus three times uncertainty.
So for two players with the same mean, the player who played more games would be ranked higher than the other.
Again, this makes sens in open-ended evolving massive human competitions.
But for CG-style AI bot contests, the “pure” winrate (each one against each other) seems to be the ultimate judge.
Obviously, we can’t do it for all players. But I think a kind of super-finals would be cheap to do and the winners would then be incontrovertible.
A note for people who are reading this topic now, the win rates that @Magus mentions in his original post arn’t properly shown by cgstats anymore because CG purged its database of alot of games the day after.