Maxime,
The questions you ask are valid, but I’ll make a difference between the ranking system and how it is used. There are two important questions :
- Based on a database of win/loss information how can we rank people ? (that’s the ranking system)
- How can we fill-in a database so that special situations are fairly treated ? (that’s the ranking information given to the ranking system)
Here is how I woud approach the second question, assuming there is a good ranking system to which results are provided :
-
On 1v1 games with fair (symmetric) seeds :
** Play one game
** If there is a winner, feed the result to the ranking system. Both players’ game counter is incremented by one game.
** If there is a draw, the game should not be counted. The result is not fed to the ranking system, and the game counter is not incremented for both players. -
On 1v1 games with unfair (non symmetric) seeds :
** Play two games between the players, with the same seed and reversed positions.
** If one player wins both, feed the ranking system with the one win the that player, one loss for the other. Both players’ game counter is incremented by one game.
** If no player wins both, the games should not be counted. The result is not fed to the ranking system and the game counter is not incremented for both players. -
On 1v1v1 games : they are unfair (non-symmetric) seeds:
** Pick one seed and play the six combinations for player positions.
** Sum the ranks for each player.
** For each pair (playerA, playerB), if the sum of ranks indicates that one is better than the other, feed the result to the ranking system. The game counter is incremented by one game for both players.
** For one series of 6 1v1v1 games, this means that the ranking system will receive between 0 and 3 1v1 game results. Draws are not provided to the ranking system.
A few additional comments :
Note that one set of 1v1v1 games will provide the ranking system with up to 3 results. A 1v1 game will provide a maximum of 1 result. I believe the number of 1v1 and 1v1v1 games should be adjusted so that there is a good balance of “result importance” between 1v1 and 1v1v1 games. Personal opinion : for previous contests such as TGE and BTTC, I would say the weight of 1v1v1 games was too strong.
I understand that for Coders Strike Back there was a system where players would have 10 1v1 games against the same opponent. The average result was fed to the ranking system. While I understand the merit of such a system (prevent unlucky losses against a much weaker opponent), I see more reasons why the system should not be used. Consider the (real) case scenario in CSB where pb4608 has 30% chances to win against Jeff06, and Magus has only 10% chances to win against Jeff06. In order to get a win according to the ranking system, pb4608 has to win 6 games out of 10 against Jeff06. This is very rare when the individual chance to win each game is only 30%. It is so rare that the ranking system can’t make any difference between pb4608 and Magus based on the games against Jeff06. In order to accurately differentiate players, it is better to feed the ranking system with as many small results as possible than to provide it with a smaller number of stronger results.
On the complexity of rank centrality as described by _anst : the complexity will explode if you want to do an exhaustive search for more than 10 players. The problem is somehow similar to the traveling salesman and could advantageously be approached via genetic algorithms or simulated annealing.
More on the complexity of rank centrality : this complexity comes from the discrete nature of ranking. This is the reason why I proposed an approach based on evaluating a player’s strength instead of its rank. It seems easier to find an optimum strength within a continuous search space : methods such as gradient ascent are available to converge towards a proper player strength.
Under a system where no matches are forgotten, Magus commented that early losses against a much worse player would be detrimental. Indeed, we would be tempted to submit our AI repeatedly until we get a “lucky start” with no losses. This was not an issue with the current use of TrueSkill, but may become one in a system where old games are not forgotten. My proposal is to keep the current version of TrueSkill during the contest. At the end, N games are relaunched for the top100 players.
The previous point goes against my comment a few days ago stating that opponents should be uniformely sampled (top5 players have as many chances to play against another top5 player as against a top1000 player). I repeal this comment as it seems to bring more difficulties than anything else…