Hello everyone,

I am working on a problem in which a point mass is trying to catch another point mass. The dynamics are correct and implemented in MATLAB RL uswing a PPO agent. I have normalised the reward and the observations. The reward is 2*exp(-0.005*(e_x^2+e_z^2)) where e_x and e_z refer to the error in x and z directions.

I am a little bit confused about tuning the hyper parameters. Currently I am using

"ExperienceHorizon",100,...

"ClipFactor",0.1,...

"EntropyLossWeight",0.01,...

"MiniBatchSize",75,...

"NumEpoch",5,...

"AdvantageEstimateMethod","gae",...

"GAEFactor",0.95,...

Total training time per episode is 10 seconds (100 steps). The agent does not seem to train even after 20000 iteration.

I would be grateful if someone could point me in the right direction. Thanks in advance.

Similar questions and discussions