01 January 1970 3 5K Report

We collected the [good]/[bad] feedback from the web page.

Then we remove the [bad] feedback data.

Then we only use the [good] feedback data to train the text-generation policy-model.

The [good] feedback data is merged into the origin dataset of policy-model.

More Tong Guo's questions See All
Similar questions and discussions