In AlphaGo Zero algorithm, the neural network returns a pair (p, v) consiting of a vector of move probabilities p and a scalar v indicating winning probability. The v is used for node evaluation during MCTS rollout. But is p used anywhere during MCTS procedure?

I understand that the network is trained to approximate search probabilities π computed by MCTS with p, but I am not sure whether p is used during MCTS itself.

More Karol Antczak's questions See All
Similar questions and discussions