#6 Legend
I wanted to be a bit original with my search algorithm, and the thought I started with is that I really, really disliked the idea of UCT forest/smitsimax for this game and I wanted to fix it. The reason why I thought smitsimax wouldn’t do well in this game (and hey, I beat all the smitsimaxers, so I guess I’m right) was the diving minigame: often, my opponents’ 2nd move (continue the diving streak or not) will heavily depend on whether I chose to dive in my 1st move, and in those cases, the results of smitsimax would probably be garbage.
In an attempt to fix this, I combined smitsimax and DUCT - I still use three separate trees like smitsimax, but instead of using one tree per player, each tree was a DUCT-like tree for a pair of players. The selection part still happens just like in smitsimax, but each player sums up the UCT formulas from their 2 trees, and instead of random rollout, I evaluate minigames in progress after one tree expands (skating only at depth 1; rollouts worked worse for me). I hoped that with this, I could reach higher depth than a DUCT would, without losing the player interactions like smitsimax does.
Of course, it backfired.
Consider a game state where 2 players are 3 spaces away from the hurdles minigame’s goal, and a starting diving minigame with the first action not being RIGHT. This is an example of a prisoner’s dilemma - if both players agree to split the gold medal one turn later, they can also both start their diving streak and not fall behind the third player in that minigame, but there’s also the risk of one player just getting the hurdles gold medal in one turn. My algorithm consistently converged to cooperating with the other player and it was getting consistently ‘betrayed’ on the leaderboard. My attempts to fix this were only partially successful: the medals gained by the opponents were already in my eval, decay made the bot play much worse (maybe it messed with the weights of different medals), adding a bit of randomness to the tree node selection did help a bit but didn’t solve the problem completely, paranoia and contempt did not work at all.
The main problem with this, even when I managed to find some workarounds to the issue, is that my selfplay became mostly useless, because any bot that switched to betraying more often would be outcompeted by the cooperating ‘friendship forests’ in my local arena, and thus my attempts at parameter fiddling based on selfplay winrate did not do well - which was when I originally found out what’s happening with my bot.
Not having reliable selfplay meant that I had to somehow come up with an evaluation function for unfinished minigames. I assumed that the number of spaces a player advances in skating and hurdles would be approximately normally distributed, and therefore the probability of one player being overtaken by another would be a sigmoid function of the player’s lead. I downloaded a lot of top players’ games using the CodinGame API to verify that this is indeed true and to be able to find some functions that approximate the average winrate relatively well. This approach also worked for archery (the sigmoid is a function of distance from center divided by average wind speed) and diving (function of the number of turns that the player ahead would need to avoid diving at the end of the game to be overtaken by the player behind). The approximation gets imprecise when there are only a couple of turns left in the minigame which would be a problem for the skating evaluated at depth 1, so I also created a dataset of win probabilities (~1000 data points) for the skating minigame from a given game state close to the minigame’s end (now many spaces a player is ahead, each player’s risk, turns left). For illustration, here’s the relation between a player’s lead in hurdles, his position, and the average game result, based on ~10000 legend league games: