I am all about pure MCTS for this, because that is most (ehm!) relevant to my academic studies. Spending a lot of time on crafting an evaluation function may be less valuable.
At first I thought that moves were simultaneous, so I made an incredibly slow plain MCTS solution in Python, where each step involved generating up to 256 complete (!) immutable successor states indexed by immutable tuples of moves, filtering out the ones that would never be chosen by a player because the move that player made would be a suicide regardless of the moves of other players, then trying to optimize the move selection probabilities of each player in light of the possible immediate outcomes, before sampling one successor for the playout. Then repeat. This approach allowed only three playouts within 100 ms, so I made sure that all were for different moves, and selected at random one of the moves with best playout outcome…
I kind of got tired of using Python, and realized that I could speed up and improve the thing a lot, not only by switching to C, but by considering one player turn at a time, checking whether a move is suicidal before testing it, and doing playouts in-place (with a single copy to begin with). This allowed somewhere near 14000 root-level playouts within 100 ms, and got me to top 160 or so in Silver. (I have discovered a bug in my timetaking calculation since. If the fix were implemented in this plain MCTS version, the number of playouts could easily reach 20000-40000, I think.)
The plain MCTS agent has very interesting behavior. It is the greatest coward of all time. Not only does it flee from its opponents, but it also avoids spaces where it “thinks” it can easily trap itself (based on playouts). To improve on this and be able to compete with minimax approaches, I am currently going for MCTS with UCT selection, which involves building a tree of promising options, and sampling playouts from leaf nodes of that tree. Up to four players are handled by backpropagating evaluation for all players, consisting only of the average end result over all playouts from a node.
The behavior now is much more directed, looks smarter. But it is still some of a coward. It will trap opponents in obvious cases, but often wastes time going back and forth while its opponents are winning ground. I have reached top 75 on Silver league (OK, more like top 85 now), but notice some weird (buggy) moves and occasional timeouts and being denied the amount of memory that I try to reserve. If I fix these errors, I believe I can reach Gold. Then I can put to test my hypothesis that I will do better (relatively) with four players on the board than with three, because my approach naturally handles multiple players equally well, whereas some other approaches rely on simplifications. The dream scenario is jumping straight to Legend League, but I guess that is too much to hope for.
During my experiments, I have implemented my own console-based visualization of the board, and also a procedure that optimizes the UCT’s exploration parameter by offline self-play.
BTW, this is my favorite on CodinGame so far. The rules are so clean and simple, allowing simulation without an enormous effort. Without simulation, one is stuck in heuristics hell. :-/