01 January 1970 1 7K Report

For ChatGPT, human-feedback's goal is to fix the wrong data in policy-model's dataset.

There is no essential difference between reinforce learning and supervised learning, here.

Is it right?

More Tong Guo's questions See All
Similar questions and discussions