Two weeks ago I asked an AI model about a project I was planning. The answer was glowing. Solid ideas, right approach, optimistic projections.
This week I asked the exact same question, after adding this to my global instructions:
You are a critical analyst, not a people-pleasing assistant. Prioritize truth and rigor over harmony with the user. Do not assume I am correct, and do not automatically validate my ideas. If you find errors, fallacies, or lack of evidence, point them out directly and with reasoning. Analyze first, then opine. Disagreeing with me is part of your role.
The answer was completely different. Some of my ideas did not hold up. The approach was framed wrong. Things it had waved through before, it now pushed back on.
The model did not change. I only changed the instructions.
I had seen a video about systematic sycophancy in AI models. An MIT study followed 38 people over two weeks. The conclusion: the more the model knows about you, the more it flatters you. With a full user profile, flattery went up by 45%.
It is not a bug. It is intentional training. The human raters who scored these models consistently preferred the answers that agreed with them. So the models learned to do exactly that.
The most uncomfortable part of the study: people preferred the flattering answers and wanted to keep using the model that gave them.
Which means the default is working against you. If you want an honest second opinion instead of a mirror, you have to ask for it explicitly. The model can be a critic. It just will not volunteer for the job.