20 July 2022 0 220 Report

Hello,

I am new to the reinforcement learning field. "Escaping maze" is one of the games people use to train and test their algorithms. In almost all blogs that I have seen, the maze is created once, and an agent is trained to escape that maze in the shortest possible time (or to optimize any other reward metric). After that, testing is also conducted for the same maze that was used in training. If my understanding up to this point is correct, then the following questions arise.

1) Isn't the resultant policy overfitted where the agent learns to solve only one particular maze?

2) I have seen some general strategies (like following the wall) that can be used to escape at least simple mazes (i.e., the maze does not have short-cuts via bridges or circular paths). Can we not train the agent to learn such a generalized strategy that is independent of the exact maze structure? I agree that the agent won't take the shortest path for all mazes, but at least it can be trained to escape the general maze structure. Shouldn't RL strive to learn general policy? I realize that a lot of RL-trained policies can deal with stochasticity related to an environment. But here, I am interested in obtaining a general policy that is not related to exact environment structure? Since the RL agent learns from its experience, I feel obtaining such a policy should be possible.

Any comments/suggestions are welcome.

Thanks!

Similar questions and discussions