@Manwe: it will be just like Tron and Game of drones: no additonal battles at the end of the challenge.
We did a study of your experiments with TrueSkill (again, thank you for that), we concluded that we could not implement the “mean” of the TrueSkill score during the challenge as it would be very hard for new submitters to beat an old submitter. We could do it at the end of challenge, starting from a score of 0 as you suggest. BUT, let’s imagine for a minute that you have a player that would be first during the whole duration of the challenge, then the challenge ends, we recompute everything with the new method you suggest and the leaderboard is completely changed and the first player is no longer first. You cannot imagine the level of complaints we would get from that. This is not acceptable. We need to have the same ranking method from the start of the challenge till the end.
We did something though to stabilize the top of the leaderboard: let’s say our system triggers a battle between players for ranking. If among these players you have at least one player in the top 20, then we do 5 different games with the same players and only the global result of the 5 games taken together (by adding the rank within each game) is fed to True Skill (TrueSkill only sees this as one game). It lowers considerably the chance of a player winning by chance over another one. As a result it provides a better leaderboard.
One more thing: as for Tron, we do the combination games for 1vs1 and 1vs1vs1 asymetrical games (so 2 games for 1vs1 and 6 for 1vs1vs1). For 1vs1vs1vs1, the ranking games are always symetrical, so no need for combinations. If you are still with me it means that, when you are in the top 20, each of your 100 ranking battles is in fact [2x5 = 10 games] for a 2 players battle, [6x5 = 30 games] for a 3 players battle and 5 for a 4 players battle. So on average, if you are in the top 20, you do (2+6+1)/35100 = 1500 games per submission (plus all the games you play afterwards against new submitters).