Do we have example of a Markov Decision Process where rewards depend on the actions but transition probabilities do not depend on the actions?
For an MDP problem (S, A, r, p), I am looking for an example where transition probabilities have the following property p(s'|s,a) = p(s'|s) for all s, s', a
08 April 2021
6,287
0
View