Anybody expert in Q-learning, please explain how the immediate reward is computed in Q-learning ?

More Devarani Devi Ningombam's questions See All
Similar questions and discussions