There are a lot of agents in my model while they have interaction just through the environment. I’m using a Q-Learning algorithm to solve this model so that all the agents share a static Q-table in java (because here the agents are homogenous). Here, the environment is dynamic and the time step of environment changes is a lot smaller than the time step of agent state changes. So, the state of an agent won’t be changed until the environment has been updated through plenty of steps. Furthermore, the agents and environment have interaction with each other and can affect each other. In one hand, I need to know the new state of the agents at the next time step (i.e., to find the MaxQ(s(t+1),a) in Q-Learing algorithm). On the other hand, I can’t postpone updating the Q-table until the next step because it is shared between the agents. So, do you have any suggestion to handle my problem?