3rd place overall
I’d like to thank CodinGame and Fiverr for organizing and sponsoring this exciting competition. I liked it a lot!
Solution
My bot was based on Decoupled-UCT. Its tree search and node selection algorithms were borrowed from AlphaZero, but its value/policy evaluation functions were hand-crafted ones. I chose the design hoping for a future transition to RL-based NNs (real AlphaZero), but it remained to be a future work until the end of the competition.
Details:
- At the beginning of a turn, I constructed a tree only with a root node. In each tree search, I traversed the tree to a leaf node. Then, I evaluated it, expanded it by one level, and back-propagated the evaluation. On node expansions, I simulated each mini-game and started a new game if necessary, except for the skate. For the skate, I simulated only for the first actions, and its state gets frozen for the following expansions.
- Node evaluations were performed by computing the final outcome of the entire game (i.e. the rating exchange on the leaderboard). I’ll write the details at the bottom of this post, but the final outcome was computed based on the current scores, the numbers of remaining games, and a guessed medal distribution for each ongoing mini-game (e.g. player 2 gets a gold/silver/bronze medal with the probability of 50/30/20%).
- Medal distributions were evaluated differently for each mini-game, but they shared similar themes. For example, in archery, I computed the distribution of positions (i.e. player 1 is at (1,3), (7,3), (4, 6), (4, 0) with the probability of 50/10/10/30% in 2 turns). Position distributions at turn N were computed from the position distributions at turn N -1. Then, I computed the medal distribution comparing the position distributions at the final turn.
- The position distribution was independently computed per player. I expected each player to be in any of the random/reasonable/serious modes. When a player is in the random mode, it moves uniformly randomly (25/25/25/25%) until the end of the ongoing mini-game. A player in the serious mode plays the ongoing mini-game consistently perfectly. After computing position distributions for all the modes, I took a weighted sum of the distributions for the final turn (not at each turn).
- At first, I didn’t use the “mode-throughout-the-entire-game” concept and took a weighted sum at each turn, but it resulted in my bot stopping a diving combo earlier than needed because it didn’t expect other players could achieve 10+ combos. So, I introduced the concept to capture the high-level strategy of players.
- My policy function was a rather simple heuristic based on statistics. I think the quality wasn’t so good.
Minor details about node evaluations:
- I computed the final outcome of the entire game. As described above, I first guessed the medal distribution of the ongoing mini-games. Then, I computed the final score distribution for each mini-game using the current score, the medal distribution, and the number of remaining games (e.g. if a player can get a gold/silver/bronze medal at 50/30/20% for the ongoing game while it’s current score is 8 and there’s 1 game remaining, it can get 8/9/10/11/12/14 points with probabilities like 8/18/8/23/…/17%). Finally, I could compute the final outcome by multiplying the per-mini-game final score distributions and comparing them among players.
- We get the final total score distribution by multiplying scores of the 4 mini-games. However, it’s prohibitively expensive to calculate all the combinations in a naive way. So, I took the logarithm of scores, then I scaled and rounded them to integers. This converted the multiplication operations to additions (i.e. log(X * Y) = log(X) + log(Y)). Then, I could perform polynomial multiplications to get the final distribution (i.e. a score distribution of {32: 40%, 8: 20%, 4: 10%, 1: 30%} can be converted to {0.4 * y^5 + 0.2 * y^3 + 0.1 * y^2 + 0.3 * y^1} with a binary logarithm).
- Polynomial multiplication can be performeed efficiently by FFT, Karatsuba and other algorithms. I tried a few approaches, but it turned out to be most efficient to use a naive algorithm implemented with SIMD. That’d be because I limited the size of the polynomials and the score distributions tend to be sparse (i.e. many 0s).
- My bot could search around 3000+ times per turn in the slowest turn (or 1000+ on slower CG CPUs).