For ChatGPT,if you can collect all the possible pre-train data, then you can just remove the bad-feedback data from predictions for reward model.
if you can not collect all the possible pre-train data, then you need to correct the bad-feedback data from predictions for reward model.
But in both way, you need humans to label.