Compared to policy-based Deep RL algorithms, does this category of algorithms have a lower exploration efficiency ?
I found this information in the following article:
"Optimal energy management strategies for energy internet via deep reinforcement learning approach, Hua et al, 2019" but it didn't cite a source. Is it common knowledge in this expertise ?
Thank you !