[Community Puzzle] Monte Carlo Tree Search exercise


I’m playing with MCTS and I’ve stuck because there some things I don’t understand in this puzzle. In test case “Small random test”:

  1. Search paths: ‘bbabb’, ‘bbaa’ are there twice in the list. I’d expect - not just with MCTS, but in case of every search algorithm - that you never walk the same path again.
  2. If the paths are the same and thus the terminal states are also the same, how can the outcomes be different (‘bbabb’: 15.25 vs. 5.71)? And what would be the reward in that case? That last one? or the average value of them?

Thanks for any tip, or explanation to these questions.

So it took me a while to read and re-read all the rules until I finally got that one small detail of the example was the most important part of the puzzle.
This is the line from the example:

  • Reading baa 30 will create child node labeled b with avg. score 30 and one visit. The MCTS tree root will have the same statistics. Note that we are adding only one new node at a time.

The node, “Note that we are adding only one new node at a time.” Changed everything.
I could not understand why the answer for the first one was only a. when we where given ab. basically nothing made sense.
Until I read the example and it had rules in it.

Maybe that is obvious to someone that been working with monte Carlo tree searches for a long time. But I think it should be explained in the rules as well as in the example.


I read an article that recommended recording all playouts as win or loss instead of scoring them. I don’t have any personal experience with it. Opinions?

Good puzzle to learn / practice MCTS but I found the statement very unclear for some points :

  • As already said previously in this discussion : only one node have to be added to the tree for each playout.
  • In the paragraph with the UCB1 formula, the parenthesis at the end of the sentence confused me. In fact M.s is the sum of the scores, M.v is the number of visits, so (M.s/M.v) is the average score. In the sentence “so the first component of the sum is average score for node M”, here “the sum” refers to the sum in the UCB1 formula, not the sum of the scores represented by M.s