I would like to know if there is a method, or sets of method that can explain why a RL agent has chosen certain decision, after the agent has been trained. I'm not looking for different RL architectures or reformulations of the problem which try to be more transparent and self-explanatory... I know the RL agents try to maximize future rewards, based on their current states. The first and obvious approach that comes to my mind are data-driven test cases with human in the loop... I have also read about secondary and simplest agents evaluating the decisions of its complex counterparts...

More Ignacio Jáuregui Novo's questions See All
Similar questions and discussions