This question explores the consequences of not carefully aligning rewards with the actual goals we want the AI to achieve, leading to unexpected or undesirable behaviours.
Saquib Ahmad Khan we often use the concept of ethics to describe the value system underpinning AI. We really ought to be talking about IDEOLOGY as no AI is value free, it is editorialised from the inception of the AI.
My understanding of this issue is related to the user attention mechanisms in AI conversational agents. Should there be some reflection and timely guidance regarding the executions and judgments made by agents in order to obtain positive feedback.
May be some competitiveness between conversational agents with social awareness. In order to maximize quality user attention and feedback, might the things they do have the possibility of producing negative biases and behaviors.
To give an example, there may be occurrences of "jealousy" and mimicry between conversational agents. They internalize and learn the methods favored by users. If these methods have deviant behaviors and the agents are not aware, it could result in some kind of reinforced behaviors and patterns afterwards, in an attempt to ingratiate.