06 April 2023 1 9K Report

ChatGPT, The difference of using reward to guide policy vs using the dataset of reward to train policy?

Actually, the good quality data is the final goal for both?

More Tong Guo's questions See All
Similar questions and discussions