As I've already studied, Q-learning doesn't have to know anything about transition probability! Thus, how an agent can determines its new state after choosing an action without knowing anything about the probability of transition to other states?

More Negin Malekian's questions See All
Similar questions and discussions