So it took me a while to read and re-read all the rules until I finally got that one small detail of the example was the most important part of the puzzle.
This is the line from the example:
- Reading baa 30 will create child node labeled b with avg. score 30 and one visit. The MCTS tree root will have the same statistics. Note that we are adding only one new node at a time.
The node, “Note that we are adding only one new node at a time.” Changed everything.
I could not understand why the answer for the first one was only a. when we where given ab. basically nothing made sense.
Until I read the example and it had rules in it.
Maybe that is obvious to someone that been working with monte Carlo tree searches for a long time. But I think it should be explained in the rules as well as in the example.