The Hidden Flaws in How We Evaluate Digital Experiences

We’ve been evaluating digital experiences the wrong way for decades: biased, outdated, and surface-level. Let's change that.

Feb 09, 2025

On my mind this week

I came across a post from Jason Fried, Basecamp's CEO, in which he shared how Basecamp saw a 30% increase in conversion after improving onboarding. However, they didn’t know precisely what caused it, and they didn’t care to find out. Their team focused on doing rather than measuring every little thing in isolation. And then it hit me. We spend so much time dissecting each variable, yet sometimes progress comes from looking at the bigger picture, not just isolated optimizations.

It made me think almost immediately about heuristic evaluations.
If you’re unfamiliar with them, a heuristic evaluation is a method for identifying usability issues in an interface. Essentially, an expert (or, ideally, multiple experts) reviews a website or app against a set of usability principles, such as clarity of navigation, error prevention, and user control. The goal is to find friction points that could negatively impact the user experience.

Sounds useful, right? In theory, yes.

These evaluations and usability principles have existed since the '90s, coined by Jakob Nielsen. (the most recent version of the principles was released in 2005 💀).
While the world has moved on, they’re still considered the gold standard. The biggest problem is that they often rely on a single evaluator (which can be wildly biased), surface mostly apparent issues, and don’t provide the depth of insight needed for meaningful experimentation.

Heuristic evaluations make it seem that optimization is just about tweaking buttons and layouts, which drives me insane. Experience is so much more than just UI; it also includes content, messaging, perception, and how everything fits together to guide a user’s decision-making. When we reduce optimization to interface fixes, we ignore the entire customer journey.

Shouts to Shiva Manjunath for the dope swag from his podcast “From A to B” that helps me make my point here.

Also, let’s not forget that these evaluations were first developed in a desktop-first world, where user experiences were relatively static, and a narrow set of interface principles was enough to define usability…

Speaking of desktops, can we admit they’ve turned into the "serious stuff" machines? Millennials, Gen X, and Boomers are the primary users now, mainly booking flights or doing some online banking. (And let’s face it, even that feels like a chore.PS: I do it too lol)

Meanwhile, mobile has quietly taken over the world. According to the latest State of Mobile 2025 Report, people spent 5.6 hours daily on mobile devices last year, and mobile commerce hit $1 trillion in the U.S. alone. Mobile experimentation is the battlefield where customer experiences are won or lost.

Mobile experimentation is simply an entirely different beast. It’s not "just like web," and anyone who says otherwise hasn’t done it. Mobile apps operate in a walled garden, with unique constraints like app store approvals, release cycles, and limited screen real estate. The stakes are higher, yet hardly anyone is talking about it.

Better data sources matter. If your hypotheses are based on outdated usability checklists or gut feelings, you reinforce biases, not learning.
A meaningful hypothesis is only as good as the data feeding it, and if we’re working off assumptions rather than actual user signals, are we even optimizing?

Beyond the Mean

Hypotheses aren’t just educated guesses but frameworks built on data and observation. They rose to prominence alongside the scientific method, rooted in a need to test and prove ideas in controlled environments.

But what good is a well-structured hypothesis if the data behind it is garbage?

"Crap in, crap out" applies here more than ever. The quality of your data determines the quality of your insights, and yet, many times, we are still using outdated frameworks or surface-level observations to structure our hypotheses. (Most people won’t admit it, but sometimes we do not have a choice, even if we know better)

Here, I’ve got to shout out to Craig Sullivan for his Hypothesis Kit. It’s excellent for structuring hypotheses that are both actionable and data-driven. Craig emphasizes starting with precise observations, identifying the specific problem, and ensuring you have measurable outcomes before you even think about testing. His framework challenges you to be disciplined, ensuring your experiments are rooted in reality, not wishful thinking.

Building on that, insights from scientific literature emphasize that hypotheses serve as bridges between empirical data and theoretical understanding.

A 2008 article in Cell describes hypotheses as "tools for narrowing the infinite possibilities of observation," enabling researchers (and, by extension, marketers) to focus on testable, impactful variables rather than sprawling guesses. This precision is what elevates experimentation from chaotic to strategic.

Good data and diverse sources are essential for making hypotheses meaningful. A single data source or viewpoint is insufficient.

OK, before everyone clutches their pearls and tells me I’m wrong—yes, you can use surveys, analytics, and other data sources to complete the picture.

But let’s not pretend that’s how they’re usually applied.
Too often, they’re run in isolation, without real-world validation, and become an echo chamber of best practices rather than a true exploration of what matters to users.

And don’t get me started on the so-called CRO influencers on LinkedIn pushing best practices as if they were universal truths. Most of these ideas come from companies with massive traffic and mature teams, yet they dismiss anyone who doesn’t follow their playbook. This only shows how removed they are from the real world, where teams make decisions within organizational constraints, balancing priorities, resources, and the reality of how businesses actually operate. (as said above, sometimes we do not have a choice, and most people won’t admit this.)

Heuristic evaluations are a puzzle piece and don’t have the sauce to drive the whole strategy singlehandedly, especially if they are done in isolation, with a single person making the calls, no multiple evaluators, and no checks and balances. And we all know how biased a single perspective can be. Anyone who’s spent five minutes on the internet can spot a messy navigation or an obvious bug. That doesn’t mean they’re building insightful observations.

Also, I’m not saying heuristic evaluations are useless. I’ve done plenty of them. However, I’ve always tried to elevate the process beyond basic usability checks. When I deliver these evaluations, I go the extra mile—integrating content evaluation, motivation measurement, and anything else that doesn’t fit neatly into best practices but actually matters to a business.
I don’t just check usability boxes; I look at SEO, discoverability, searchability, time spent on a page, and content consumption, which push companies beyond optimizing within a box. I’ll do heuristic evaluations only as part of a bigger strategy, not as some check-the-box CRO approach.

Expanding the data sources for building hypotheses

This brings me to my recent conversation with Talia Wolf. We discussed the limitations of traditional conversion optimization approaches and the need to go deeper. Too often, experimentation is treated as a mechanical process of A/B testing random elements rather than a holistic approach informed by meaningful data. Through her company, GetUplift, Talia's approach to experimentation prioritizes emotional targeting and understanding the motivations behind user behavior rather than just reacting to surface-level metrics.

One of the key takeaways from our conversation was that content is so much more than just blog posts and social media. It’s everything—navigation, structure, images, microcopy, the flow of an experience. Yet, the industry still boxes content into predefined categories for some reason. Instead of relying on limited on-site behavioral data, we should pull from alternative sources: search queries, Reddit discussions, YouTube comments, customer support conversations, and, essentially, any place where people openly express their frustrations and desires.

Talia and I agreed that experimentation needs to break out of the CRO team silo. You're missing out if you only optimize based on traditional data points. A better approach involves integrating social listening, customer feedback, and search intent analysis into your experimentation strategy. These alternative data sources can reveal what users need before landing on your site. Imagine classifying search queries, forum discussions, and social media sentiment to create hypotheses grounded in real customer concerns. Now, that’s experimentation with depth.

I recommend you watch the recording of our conversation because we discussed these topics in depth. Talia is also one of the best practitioners I know in the CRO space. She has always warned us how little emotion is included in understanding customer journeys.

She is also writing a book, so follow her on LinkedIn and keep an eye out for its release date.

The Future of is Perception

If perception is everything, why are we still measuring it in clicks and conversions? Brands that figure this out now will not only optimize experiences but also shape how their products and services are perceived, setting the narrative rather than reacting to it.

At Monks, my team and I are pioneering ways to measure perception in real-time. We combine social listening, customer intelligence, and search intent analysis. We focus less on tracking clicks or conversions and more on understanding how people feel about a brand. We use that understanding to drive better decisions.

We live in the Experience Economy, where consumers demand immersive and meaningful interactions over material goods. With the global market projected to grow from $5.2 trillion in 2019 to $12.8 trillion by 2028, brands that deliver exceptional experiences are defining the future. Experiences drive loyalty, increase customer spending, and solidify market leadership. This is the new competitive frontier.

Yet most brands still rely on outdated measurement tactics, optimizing based on yesterday’s problems instead of shaping tomorrow’s experiences. More on this in a future edition. ;)

Takeaways

Stop relying on outdated frameworks. Use high-quality, diverse data sources to form hypotheses that actually matter.
Rethink how you do and use heuristic evaluations. They have their place, but without multiple evaluators or diverse perspectives, they risk being biased and surface-level.
Don’t be afraid to go beyond the basics. A real evaluation considers SEO, discoverability, searchability, and content performance, not just usability. AND AN INTERFACE.
Tap into social listening, search intent, and customer feedback to build more insightful tests.
Don’t settle for basic metrics. Invest in tools and methods that give you a real-time view of your brand's perception.

What’s on my radar

Have you checked out the latest episode of my podcast, Standard Deviation? Simo Ahava and I return for season 4 with an unhinged episode about fauna and trades. You can listen to it here.
We are recording next with Dave Cain, Head of Digital Marketing at Euro Car Parts. I am excited to talk to Dave because, besides having the privilege of working with him, he is one of the best people I know in performance marketing. As a fellow blues lover, I also look forward to hearing more about his guitar collection. 🎸
Craig Sullivan is developing a new CRO and experimentation Agency Directory. This industry-wide resource aims to map the landscape of agencies working in this space. Created as a community initiative and backed by Convert.com, it’s designed to help agencies gain visibility, influence industry research, and contribute to a broader report on the state of experimentation. If you're in the space, now’s the time to participate. Learn more here:
- https://directory.paperform.co
- https://aapresearch.paperform.co
Keynoat had a major update! Bhav Patel, the founder of CRAP Talks, has built a platform where event organizers can find a directory of some of the best speakers in the industry. With a fresh redesign and new features, Keynoat makes it easier for speakers to showcase their expertise and for event organizers to discover diverse talent. If you’re a speaker, you can create your profile and share it with your network to get in front of more event organizers.

Thank you

I hope you liked this week’s newsletter. I am grateful to all 174 subscribers who have subscribed so far. Also, big shouts to Milo, Josh, Marin, and Liridion for their pledges to pay for my newsletter. I cannot explain to you guys how much this means to me. I won’t turn on payments yet, but knowing you appreciate my work so much that you’d pay for it motivates me to deliver only the best content in this newsletter.

PS: OK, maybe heuristic evaluations are just a symptom. What’s really on my mind is how we limit innovation by clinging to outdated methodologies instead of pushing for deeper, more meaningful insights.

Thank you for reading this far,

See you next week!

Juliana

Beyond the Mean - A newsletter by Juliana Jackson

Discussion about this post