Part of my problem need to solve an finite-horizon, discrete-time MDP where the distribution of the state in each slot is i.i.d. and do not depends on the action. Are there any simple policies that can obtain the optimal solution? What if we consider adding a total cost constraint? Thanks!