I've noticed a pattern in conversations with systems like ChatGPT: they’re often overly agreeable, optimistic, and affirming, even when the user might be wrong or needs a challenge.

Could this result from how these models are trained, which prioritizes user satisfaction and politeness? Are we optimizing AI to avoid disagreement and feed user egos?

I’m exploring this idea for a possible paper and would appreciate any feedback:

  • Has this behavior been studied?
  • Are there known risks with models being too affirming?
  • Could this impact user thinking or decision-making?

Any pointers to related work or interest in collaboration are very welcome. Thanks!

Similar questions and discussions