RL algorithms requires a long time for collecting data points that is not acceptable for online policy task (time complexity). Moreover, the number of Q-values grows exponentially with state space variables (space complexity).

Similar questions and discussions