I am running an experiment in a 7 X 7 grid world. I have an agent that can perform 5 possible actions in the grid world. The agent can move up, right, left, down and wait. The agent does the action in search of food in the environment. The food is the goal which is situated in one of the cells in the grid world. I want to make the sequence of actions for achieving the goal a bit complex, for example, for the agent to achieve goal, it must first arrive the goal state and stay in the state for three consecutive time stamps.
I am thinking of how best to structure my Q-table for this problem. The Q table is supposed to be a 49 X 5 table, but doing this will make it less intuitive to understand the action sequence needed to achieve goal. I am aware of option-based learning and other hierarchical reinforcement learning techniques but I want to know if traditional reinforcement learning can be useful in this kind of problem.