I know that in the Markov decision process (MDP), the probability of transition to a new state depends on the current state and chosen action of an agent. However, in my model, the new state of an agent also depends on the last previous action of itself and its neighbors. Can I solve the problem by a trick? The trick is to consider the last previous action of an agent and its neighbors to be a part of its current state space. I would appreciate it if you could let me know if there is any better solution.

More Negin Malekian's questions See All
Similar questions and discussions