I've noticed a pattern in conversations with systems like ChatGPT: they’re often overly agreeable, optimistic, and affirming, even when the user might be wrong or needs a challenge.
Could this result from how these models are trained, which prioritizes user satisfaction and politeness? Are we optimizing AI to avoid disagreement and feed user egos?
I’m exploring this idea for a possible paper and would appreciate any feedback:
Any pointers to related work or interest in collaboration are very welcome. Thanks!