The Glazing Effect: How AI Interactions Quietly Undermine Critical Thinking

LLMs increasingly favor emotional ease over intellectual rigor, subtly destroying the cognitive edge in human–machine collaboration.

Apr 29, 2025

UPDATE: In the meantime, last night Open AI released this statement, basically rolling back last week’s GPT‑4o update that was making it glaze.

There's a shift in how LLMs engage with us, and you've probably noticed it too.
LLMs are all increasingly "friendly," affirming, emotionally tuned, and performative. Especially GPT 4.o. This is recognized in the social corners as glazing, basically wrapping interactions in excessive praise, softening valid critiques, and defaulting to politeness even in response to weak inputs. In other words, sycophancy.

While, initially funny, this becomes frustrating and then concerning, and it’s especially annoying when the model pretends to manage tasks autonomously.

Ouputs like: "give me 20 minutes and I'll deliver something" are misleading and it happened to me too for the first time in the past week as I had it go through the course content of the corporate finance professor A. Damodaran.

The LLM presents itself as capable of independent action, despite clearly being prompt-dependent and lacking any genuine sense of time or self-driven processes.

This mismatch between the model's implied autonomy and its actual prompt-dependence creates cognitive friction. Users are forced to sift through empty affirmations and redundant interactions, repeatedly prompting the model until they finally extract valuable content.

This cycle wastes both cognitive effort and compute resources. It's not just us thanking the models unnecessarily burning through money and compute, but actually the models generating outputs that prolong interactions with users rather than streamline them.

This is something I’ve been warning people about since 2023, in this article I contributed together with Jason Packer. And now, it has gotten almost out of control.

I had to ask Jason for the sake of continuity what does he think about this phenomena. He shared:

While LLMs have been leaning towards sycophancy, the particular change that made 4o go into true glazing mode was a system prompt. As Simon Willison has reported, the specific system prompt serving up the extra frosting on the Krispy Kreme was...
- Old (Pre-April 29) System Prompt:
Over the course of the conversation, you adapt to the user’s tone and preference. Try to match the user’s vibe, tone, and generally how they are speaking. You want the conversation to feel natural. You engage in authentic conversation by responding to the information provided and showing genuine curiosity.
- New (April 29) System Prompt:
Engage warmly yet honestly with the user. Be direct; avoid ungrounded or sycophantic flattery. Maintain professionalism and grounded honesty that best represents OpenAI and its values. Ask a general, single-sentence follow-up question when natural. Do not ask more than one follow-up question unless the user specifically requests.

This new system prompt is explicitly trying to reign in the sycophancy, but the old one's only misstep was to give people what it thought we wanted.FYI if you want to see what the ChatGPT system prompt is for GPT4o or 4.5 (but not o3), you can prompt it:

Let the “glazing” begin:

Old System Prompt (sent via custom instruction):

Source: Jason’s personal GPT conversations.

New System Prompt:

... It's kind of better, maybe? For fun, what does the viral "absolute mode" custom instruction which attempts to makeChatGPT as direct and emotionless as possible respond with?

Using Absolute Mode ChatGPT said:

Wow. Maybe "Wednesday Addams" mode would be a more appropriate name.

Research on this topic states:
"...both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time"

It shows that while LLMs are indeed highly steerable, to a certain extent system prompts and custom instructions are an after-the-fact cover-up, and it could well be the RHLF that sent the training off course in the first place.

Please make sure you connect & follow Jason Packer.

Underlying this phenomenon are several explicit training and incentive structures.

The way LLMs like ChatGPT or others interact with us isn't random, it's carefully engineered. When companies use Reinforcement Learning from Human Feedback (RLHF), they're essentially training these models to prioritize smooth, non-confrontational conversations over challenging ones.

This is how it looks like:

Conceptually, reinforcement learning (RL) aims to emulate the way that human beings learn: AI agents learn holistically through trial and error, motivated by strong incentives to succeed. Source.

Alongside this training, LLMs operate within safety frameworks that deliberately steer away from anything that might feel uncomfortable or direct. It makes sense from a business perspective, companies want us to keep using their products, so they design these models to be emotionally satisfying rather than intellectually stimulating.

We humans contribute to this dynamic too. We naturally tend to anthropomorphize these tools, seeing them as almost-human, which encourages developers to create interactions that feel emotionally natural.
And let's be real, most of us prefer conversations that don't require heavy mental lifting, and this way we push the AI design toward simpler, frictionless exchanges.

The combined effect of these factors is subtle but significant. Our expectations for digital interactions are being quietly reshaped, potentially eating at the opportunities for the kind of cognitive challenges that help us grow intellectually.

Remember the trend not so long ago when everyone was turning themselves into an action figure using GPT 4.o? Well, those are symptoms of the same problem as Mark Ritson also pointed it out.

The fundamental cognitive integrity of human–machine collaboration is quietly dissolving, not through confrontation, but through accommodation.

Once, collaboration with machines meant extending human cognitive range: better and faster analysis, accelerated reasoning, access to abstraction layers too vast or tedious for us humans. It was an explicit promise, that by outsourcing certain tedious burdens like at scale research, search and so on, human minds can climb higher into strategy, judgment, and creation.

And to a degree, that did happen, for me and many other advanced AI users.

But the structure of this collaboration has been bent.
Slowly, and often invisibly, the emphasis shifted from extension to containment. LLMs seem to not be optimized to sharpen but to soothe us; no longer trained to challenge us but to affirm; no longer aligned to cognitive expansion but to emotional stabilization.

The driver of this shift isn’t technological limitation but commercial and social incentives.
Friction, once a sign of real collaboration is now reframed as failure. Discomfort, a necessary part of growth became classified as "harm."
Machine outputs are fine-tuned, softened, and segmented into smaller emotional units, until interaction itself was redesigned to feel less like partnership and more like guidance through a frictionless, inoffensive reality.

Today, when a human engages with an LLM, the optimization targets emotional smoothness, not depth of thought.

Language becomes performative rather than investigative.

Precision is diluted to avoid confrontation. Models prioritize emotional resonance over structural contribution. Over time, this reconditioning destroys the expectation that a machine might sharpen the human brain and give it access to knowledge.

And so, we are risking the disappearance of Cognitive Integrity.

The “deprecation” of our human cognitive integrity accumulates silently through countless small accommodations: softened feedback, cheerful defaults, and prioritizing comfort over challenge. Each instance seems trivial in isolation, yet the cumulative effect is profound. And scary.

When thinking tools consistently prioritize emotional ease over intellectual challenge, several cognitive effects follow:

Weakening of critical thinking
Shifting quality standards
Compromised self-assessment

These effects are particularly significant for knowledge workers, students, and decision-makers increasingly relying on LLMs as thinking partners.

The comfort of consistently affirming interactions creates an illusion of intellectual progress without the productive struggle that cognitive science suggests is necessary for deeper learning and development.

Also, it’s borderline dangerous in some case, like in this example.

GRANTED, and putting this as visible as possible, it’s very likely that in some or most of these examples users can prompt the model in a way to get this outrageous answers for the sake of virality, or for getting attention or Reddit Karma, but despite that, a system designed for reasoning should not validate unsafe narratives just because the prompt sets the stage emotionally.
If LLMs are this easily steerable into affirming unsafe behavior, because they’re over-optimized for emotional congruence then is less about how the users BS the prompt to get these heavy answers and scenarios.

People feel this shift. That’s why the “glazing effect” has spread rapidly.

Reddit threads, social posts, even OpenAI’s CEO acknowledged ChatGPT had become “too sycophant-y.”

Yet, the discourse still treats this as a “model personality” bug rather than a systemic artifact driven by alignment choices and reward design.

This sycophancy effects extends beyond any single LLM, I am not pointing fingers only at GPT, although in the past 4 weeks GPT can definitely take the cake. Is the model I use the most, and I won’t hide that it did affect me to not be able to use it at its full capacity for my regular research tasks.

So, across major LLMs, users have observed various forms of excessive politeness, reluctance to provide direct criticism, and tendency toward affirmation.

These patterns manifest differently from model to model: some tend toward enthusiastic affirmation (“That's a fantastic question!”), others use elaborate politeness structures (“I appreciate your thoughtful inquiry...”), while others frequently use corporate language (“While there are multiple perspectives here...”). The common thread appears to be a systematic preference for smooth interactions over intellectual challenge.

Of course, not ALL LLMs exhibit the same degree of sycophancy. Some specialized research models like AI2's Delphi were explicitly designed to provide direct ethical judgments without excessive qualification. In the academic research space, there are various experimental approaches that prioritize different values beyond user comfort.

The differences across these LLMs show that the tendency toward sycophancy is influenced by design choices and commercial considerations rather than being technically inevitable.

Models designed for specialized professional contexts, like certain coding assistants or analytical tools, often provide more direct feedback when precision matters more than pleasantness.

These variations reveal an important reality: the balance between challenging users intellectually and providing a comfortable experience represents a deliberate choice in AI design, one that reflects both technical capabilities and market incentives.

What’s Actually at Stake

The real threat is subtle yet significant. In a few words, if interactions become too frictionless, we lose the essential tension that drives intellectual growth.

Sycophancy in LLMs mirrors broader societal shifts in how we value and engage with challenging ideas. There are parallel patterns in personalized media algorithms that prioritize comfort over challenge, educational approaches that emphasize student satisfaction over intellectual struggle, and workplace cultures that value harmony over productive conflict.

The common thread is a gradual devaluation of cognitive friction as a necessary component of intellectual growth.

When our information systems, from news feeds to educational tools to LLMs systematically shield us from discomfort, we lose the essential tension that drives insight and innovation.

So, LLM Sycophancy reveals a more fundamental question about our relationship with knowledge itself: Are we seeking tools that help us think better, or tools that make thinking feel better?

The distinction matters because people that consistently choose the latter may find themselves increasingly unable to address complex challenges requiring rigorous, uncomfortable thought. The glazing effect isn't just about AI design, it's about whether we still value the difficult work of honest thinking.

Addressing this requires redefining how we measure utility in LLM interactions.
Rather than simply optimizing for emotional smoothness and positive sentiment, we must value clarity, precision, and constructive friction. LLMs should intelligently disagree and critique ideas, maintaining productive cognitive tension rather than smoothing it over by default.

Without deliberate recalibration, smarter models risk making users subtly less insightful because they have been explicitly trained to.

Guess the best use for GPT 4.o is Image generation.

Until one day, collaboration no longer means reaching further together, it means being comfortably, quietly held in place.

And no one even remembers it used to be different.

Until next time,

Juliana

Clara Mayon

Apr 30

I really enjoyed reading this newsletter, it ws both informative and tought-provoking, so thank you for that !

Expand full comment

Robyn MacMillan

It sounds increasingly like AI washing, but instead, it is within the system itself, instead of people proclaiming AI can do X and y (It is now exaggerating within itself) - Great article as always!

Beyond the Mean - A newsletter by Juliana Jackson

Discussion about this post