Introduction:
This message will adress the ranking system for multiplayer games. There were already a lot of things said for multiplayer contests (with a fixed end date) compared to multiplayer games (no end date). Between Smash the Code and CodeBusters, the ranking system for multiplayer contests was much improved. I will concentrate on multiplayer games in this post.
As a second introductory note, I do think that priority is rightly put on having a good ranking system for multiplayer contests. The continuous nature of multiplayer games makes it more challenging to rank AIs fairly : I will be happy if the system is not perfect for multiplayer games. There are however a few improvements I’d like to see if they are not too costly to implement.
I’ll first adress this by making a few observations. I’ll then describe some changes I’d like to see in the ranking system. Some lessons may also be learned and transferred to the ranking system for multiplayer contests.
Observations:
1 : Unability to reach a meaningful rank
An overall win rate of 91%, with 61% against Neumann and only two defeats against other players doesn’t allow my AI to claim the first place. Not even close. While I may have been lucky in this push, I do believe this type of match history should justify being first with a good margin. As a reminder, Jeff06 was convincingly first at the end of CSB with only a 51% win rate against his contender.
The same thing happened to Neumann a few days ago : the only way to claim 1st place is to push an AI with at least 70% win rate against the current first player.
2 : CSB is a 2 player game. (the best kind of game !!!). Owl and Jeff06 have pushed their AI on the leaderboard once, a long time ago. Since then, even if many people submit their AI, there will be no more games between Jeff06 and Owl. The ranking will gradually forget how they compare to each other.
3 : Nobody plays the game anymore. That’s not exactly true (hello smeagol, Lirkin, and soon Hohol !), but for ranking purposes it’s true. (only few games are added to the system)
4 : If an AI plays more games than the others, the ranking will be skewed
During this last week, Neumann and I have exchanged the first place several times. In order to do so, we pushed our AIs very often. We basically have 100% win ratios against everybody else. The consequence is that the leaderboard is crushed down, with up to 10 people within 0.01 points of each other. The effect was already described on the chat : player #5 will play more often against my AI and lose. He loses 0.01 point, which brings him down to 15th place. Another AI takes his place and waits to be beaten down to the 15th place.
Hence my observation : if an AI plays more matchs than the others, it will skew the rankings around it.
@CG_Maxime : After the CSB contest where Jeff06 and Owl played twice as many games as everybody else, I complained that this was unfair to me, the 3rd place player. You asked me whether I could provide an example. This observation is a perfect example : in the current CSB leaderboard Jeff06 has been seen as low as the 14th place (unbelievable !!!). Jeff06 is only back to a slightly better place because Magus re-pushed his AI very recently.
5 : “Unfair seeds” should not count towards the total number of games played.
@CG_Maxime : In another thread, you explained that games were played twice with positions reversed. This helps mitigate the possible un-fairness of a seed if the game is not perfectly symmetrical. The reasoning is good, but I believe that the implementation is faulty. When the result is a tie, the TrueSkill score is not updated. However, the two games count towards the total number of ranking games. In an extreme scenario, you could imagine that an AI has played its 100 ranking matchs with no result at all being given to TrueSkill.
I believe that if an unfair seed results in a tie, the games should not count in the “ranking progress counter”.
6 : Neumann has ~47 points. The rest of the top 10 has 33 points. When my AI is ranking up, it plays 90% of its games against the rest of the top 10. When it is approaching 44 or 45 points, this means that it only gains 0.01 points per game won. 90% of the time, my AI virtually doesn’t gain any points for a win.
7 : On a good day, my AI plays 30 games against Neumann. 15 of them will be ties and discarded. 8 will be wins. 7 will be losses. With only 8 and 7 significant results to provide to TrueSkill, it is normal that my rank can not stabilize.
8 : This observation is linked to multiplayer contests instead of multiplayer games, but I’ll still make it. At the end of CodeBusters, Jeff06 had only completed 50% of his games while the rest of the leaderboard was finished. This may skew the results around Jeff06’s rank similarly to Observation 4.
Suggestions :
1 - For multiplayer games - increase the number of games that are done each time an AI is submitted. For CSB this might mean multiplying the number of games by 5 or 10.
2 - For multiplayer games - regularly add new games for people in the leaderboard. I don’t know how many, but enough so that the ranking is not skewed when an AI is pushed several times in a row.
3 - For multiplayer games & contests - if a game is discarded due to the “unfair seed” rule, this game should not count in the number of ranking games.
4 - For multiplayer games & contests - players should roughly play their games at the same moment. Jeff06 shouldn’t have had 50% of his games remaining when everybody else was done in CodeBusters.
5 - Consider introducing a condition in the games selection rule. Something like : “50% of ranking games should be played against an AI that is ranked higher than my current rank”. There was a time when I thought this was a bad idea, but I’m not sure anymore. Thoughts ?
My thoughts on #5 : player #2 will play many more games against player #1 than player #3 against player #1. Player #2 will therefore be pushed down harder than player #3. This could minimize the score difference between players #2 and #3, similarly to what is described in observation #4. I would say it’s a bad idea if there are enough games played by every AI. However, if the number of games is too constrained, this modifcation could be an acceptable tradeoff.
Final notes :
Given all these observations, I have asked Neumann whether he would accept temporary measures that can be taken at our level to make the rankings converge between his AI and mine.
I am thinking about this band-aid solution : push my AI on my main account. Create a second account, and push the same code several times. This second AI will have games against pb4608 and against Neumann, effectively increasing the number of games played. After several pushes of the second AI, I hope that the ranking for the main account will have stabilized.
If I start doing this, Neumann will also use the same method. We agreed to discuss this on the forum before going ahead…
Hence my question : is this behavior strongly prohibited by Codingame ? I’d like not to do this, but it’s hard to cope with the current rankings in multiplayer games…
Side note
I really like the system that was added to get a rough estimate of a player’s ranking. 10 games against AI across the whole leaderboard : that’s neat !