I'm training an agent to accomplish a reaching task. The agent controls a multi-joint robotic arm and has to reach for a target. So far, I've had some success with vanilla policy gradient but, to my surprise, I can't get it to work with actor-critic.
I'm wondering about the ways I can find out what makes it fail. I've tried various reward functions, but none was robust enough. Hence, I thought about monitoring more the agent. I'd like to know what values you think might give some insight ?
I've thought of:
Thanks !