Hello everyone,
I am working on a problem in which a point mass is trying to catch another point mass. The dynamics are correct and implemented in MATLAB RL uswing a PPO agent. I have normalised the reward and the observations. The reward is 2*exp(-0.005*(e_x^2+e_z^2)) where e_x and e_z refer to the error in x and z directions.
I am a little bit confused about tuning the hyper parameters. Currently I am using
"ExperienceHorizon",100,...
"ClipFactor",0.1,...
"EntropyLossWeight",0.01,...
"MiniBatchSize",75,...
"NumEpoch",5,...
"AdvantageEstimateMethod","gae",...
"GAEFactor",0.95,...
Total training time per episode is 10 seconds (100 steps). The agent does not seem to train even after 20000 iteration.
I would be grateful if someone could point me in the right direction. Thanks in advance.