What metrics and KPIs do the experts use to measure UX effectiveness?

We take a comprehensive journey into the world of UX metrics, exploring both behavioural and attitudinal measurements.

What’s the difference between UX metrics and KPIs?

UX metrics are a set of quantitative data points used to measure, compare, and track the user experience of a website or app over time. They are vitally important for ensuring UX design decisions are made and evaluated using fair evidence rather than opinions.

KPIs (key performance indicators) reflect the overall goals of your business – such as revenue growth, retention, or increased user numbers. Metrics are all the measurements that go towards quantifying these higher goals.

So when you’re running any kind of UX research, such as UX Benchmarking, it’s important to choose metrics that reflect your objectives and the overall KPIs of your business.

But which metrics are the most valuable? What should you be measuring? NPS? AOV? TPI? SUS? CUS? Come see how many of these abbreviations I invented myself in this investigation into how the experts measure UX.

My huge thanks to Kuldeep Kelkar, our VP of Consulting and Professional Services, who provided valuable insight for many of these metrics.

How to take the invisible and make it measurable

In our new ebook, we offer practical guidance for launching, managing, and scaling a UX measurement program. One that helps you drive a roadmap of UX improvements and secure the budgets you need to run larger-scale research projects.

Why do we need to measure UX?

It’s all well and good us sitting around in our ivory tower yelling how great UX research is out of the window to passers-by, but this can only get us so far.

Occasionally some of those people look up and yell back, “yeah I know! It just makes common sense to make design decisions based on actual human behavior” but then they often make the following point…

“But how can we measure that? If we run usability testing and make a change to a website that presumably improves the user experience based on our observations, how do we really know the change has worked? What UX metrics can we use to measure success? How do we prove to our bosses that the investment is worth it?”

It’s normally around the ‘metrics’ mark where we start to close the window and mumble something about “having to keep it shut because of the air-con, sorry I can’t hear you.”

Metrics were traditionally a difficult discussion when it comes to measuring the success, failure of shrugging indifference of your UX. Every other discipline has it made!

  • You want to measure how well your blog post did: Look at your traffic, see the time on page, notice how many times it has been shared, judge the quantity and quality of comments.
  • You want to measure your social channels: Look how many followers you have. Is there growth? Are they influential in your niche? Do they comment? Do they share? Are they entirely bots?
  • You want to measure the changes made to the order of the categories in your main menu: Well it definitely looks better to you! Have you run some more usability testing to see if people are still struggling with it? Perhaps traffic from the homepage to those categories has improved, but there’s no guarantee it’s because of the changes.
  • You want to measure the quality of your Bakewell Tart: Did I eat the whole damn thing? Probably, but that’s not a testament to how good it is. I’ll often eat an entire Bakewell Tart with the same ease that I take a breath. Did I demand that you make me another one? Yes! Now that’s a quality Bakewell Tart.

As we already know, data only shows part of the story. Google Analytics can tell you what’s happening but not why it’s happening. If you’re only going by analytics, you’re essentially guessing. Sure, it can be an educated, highly informed guess – but you won’t know exactly why things are happening on your site until you see real people using it.

But UX measurement doesn’t have to be an intangible mystery. As you’ll see below, there are many ways to prove the value of UX research.

What’s the difference between behavioral and attitudinal UX metrics?

We work with various companies across all industries and have noticed certain metrics that are most commonly used for benchmarking (either over a period of time or compared against competitors). We broadly divide them into two categories:

Behavior (what they do)

In the user research world, it’s critical to understand what people are doing, and how they are using your products. Task-based usability testing is a standard method to gather this information across the industry. We don’t mean just ‘in-lab’ think-out-loud studies, but also remote moderated studies, which will help you access larger sample sizes in an efficient way.

Typical metrics you could capture include these task-level behavioral measurements:

  • Abandonment Rate
  • Pageviews
  • Problems and Frustrations
  • Task Success
  • Task Time

Attitude (what they say)

How users feel, what they say before, during or after using a product, and how this affects brand perception.

To measure this, you might want to capture these attitudinal metrics:

  • Loyalty (using scores such as SUS or NPS – more on these further down)
  • Usability (or ease of use)
  • Credibility (taking things like trust, value and consideration into account)
  • Appearance (“oooooh pretty!” or “OW MY EYES!!!” etc.)

But how do you quantify opinion? How do you take these “oooooh pretty” or “OW MY EYES!!!” hot takes and turn them into a simple score that any busy executive can understand?

Let’s take a deeper dive into these individual metrics, and we’ll see how they can help to form a bigger picture.

For an in-depth guide to measuring UX and proving the value of research, download our free ebook on running both longitudinal and competitive benchmarking.

Behavioural UX metrics

Abandonment Rate

Quite simply, how many people have come to your online retail store, put a bunch of products in their basket, and then just left without checking out. Behaviour that would make a real world IKEA visit a treacherous assault course. The abandonment rate is the ratio of the number of abandoned shopping carts to the number of initiated transactions.

AOV: Average Order Value

AOV means average order value, and this is simply your total revenue / number of checkoutsAccording to VWO this is a “direct indicator of what’s happening on the profits front.” If your UX efforts directly tie into increasing cross-selling or upselling, then AOV can be an indicator of whether you’ve improved things or not.

Conversions

Helpful if there’s a specific thing triggered by a UX improvement. Say for instance a web-form completion, newsletter sign-up or some other task completion. If the site change directly impacts how many people are converting in that specific task, and you can measure that accurately, then you can be *fairly* confident you made an impact.

Just remember that having a higher conversion count may also be a result of marketing efforts, so be sure to measure the conversion rate (typically Number of Sales / Number of Visits).

As NN/g suggests:

“The conversion rate measures what happens once people are on your website. Thus it’s greatly impacted by the design and it’s a key parameter to track for assessing whether your UX strategy is working.”

And because we like to argue both sides of the… uh… argument… here’s ecommerce whiz kid Dan Barker on why you shouldn’t necessarily trust conversion rate as the solution to all your problems. Remember that not all visitors to your webpage have the potential to convert, or that conversion rates vary wildly based on visitor type.

Pageviews

Website page views and clicks are a common metric. For mobile apps, or even web applications or even single-page web apps, some combination of clicks, taps, number of screens or steps can be measured.

If you are running an in-lab study, counting these can be extremely tedious. But, if you are using a user research platform like ours, most of these metrics are captured automatically and significantly reduce analysis and reporting time. In most cases, combining these, or at least connecting them to analytics data (from the live site or apps) is beneficial.

Problems & Frustrations

These can be measured as a Number of unique problems identified and/or Number (or %) of participants that encounter a certain problem. We recommend conducting Think-Out Loud studies to identify problems, and then quantify them via a large-sample study to find the % of problems actually encountered by a large population (with confidence intervals).

Most of these Behavioral KPIs are collected ‘per task’ and then aggregated as an average for a given study, and/or digital product. These are then compared over a period of time (e.g. each quarter) or compared with competitors’ digital products.

Our single UX metric score

Task Success

Typically, a group of representative users are given a set of realistic tasks with a clear definition of task success – examples of task success could be: Reached a specific page in a check-out flow, found the right answer on a marketing website or reached a step in mobile app. Having a clear definition of success and/or failure is critical.

If eight out of 10 users completed the task successfully and two failed, then Task Success would be 80%. Because of the small sample size of 10, the Margin of Error at 90% Confidence Level would be about +-25. This means that we are 90% confident that the Task Success rate falls somewhere between 55% to 100%.

But if 80 out of 100 users completed the given task successfully, then the Task Success rate would still be 80%, but with a Margin of Error of about 8%. Generally speaking, this means that we are 90% confident that the Task Success Rate falls somewhere between 72% to 88%. The larger the sample size, the smaller the Margin of Error.

Task Time

Usually an absolute number. For example: 3 mins. For most task-based studies, where the user goal is to get something done as efficiently as possible, shorter task times are better. There are exceptions, though: if the goal is to keep the user more engaged, such as staying on Facebook’s News Feed, then longer Task Times could be better. It really depends on what the task is. Even on Facebook’s News Feed – if the goal is to find a specific event then shorter task times might be a better outcome.

Organizations can look at either the Average Task Times for only those who were successful or they can look at the Average Task Times for all users.

Attitudinal UX metrics

Attitudinal metrics are where we ‘quantify’ qualitative data, such as appearance, loyalty, trust and usability. There are many different ‘scores’ on the market that will assign a number to attitudinal data, using various methods. Here’s an overview of the main ones…

CSAT: Customer Satisfaction Score

This measures customer satisfaction, but doesn’t have the strict question limit parameters of NPS as you can ask anything from one single question to a full-length survey. Results are measured as a percentage.

Pro: unlimited customization. Con: the people who actually take the time to fill in a full-length survey are only likely to either love or hate your product.

NPS: Net Promoter Score

Net Promoter Score (NPS) is a survey you can include at the end of your UX tests. NPS helps you measure loyalty based on one direct question: How likely is it that you would recommend this company/product/service/experience to a friend or colleague?

Here’s how NPS works:

  • Those who respond with a score of 9 or 10 are called ‘promoters’. Loyal enthusiasts who recommend your services, products or brand to other people and will continue to buy from you in the future.
  • Those who respond with a score of 7 or 8 are called ‘passive’. They are happy with your service but have no real loyalty to you therefore will likely stray.
  • Finally there are the ‘detractors’, customers who responded with a score of 0 to 6. These are unhappy people who don’t want to see your product ever again.

The final NPS score is then calculated by subtracting the percentage of customers who are detractors from the percentage of customers who are promoters. Promoters – Detractors = NPS.

SUPR-Q: Standardized User Experience Percentile Rank Questionnaire

This is an 8 item questionnaire for measuring the quality of the website user experience, providing measures of usability, credibility, loyalty and appearance. You can read details about SUPR-Q at www.suprq.com

SUS: System Usability Scale

Watch me go this whole section without saying how I’m going to ‘suss this out’. You’ll be so proud of me…

For every website usability test carried out, users complete a short questionnaire and a score is derived from that. It’s on a Likert scale, which helps to ascribe a quantitative value to qualitative opinions.

Example of a Likert scale

These are the types of questions that can be asked, which are responded to by clicking on an option from strongly agree to strongly disagree:

  • I think that I would like to use this website frequently
  • I found the website unnecessarily complex
  • I thought the website was easy to use

The benefits of this measurement is that it’s very easy to administer, can be used on a small sample size and it can clearly indicate whether a feature has improved or not. However, bear in mind that the scoring system is incredibly complex, and it won’t tell you what’s wrong with your site – it merely classifies its ease of use.

TPI: Task Performance Indicator

Gerry McGovern gives an extensive breakdown of the method his team developed, “to measure the impact of changes on customer experience.” With TPI you ask 10-12 ‘task questions’ that are created especially for the ‘top tasks’ you want to test (these will need to be repeatable, as they’ll be asked again when running the test again in 6 – 12 months time).

For each task, the user is presented with a task question via live chat. Once they have completed the task, they answer the question. The user is then asked how confident they are in their answer. The theory is that if a task has a TPI score of 40 (out of 100), it has major issues. If you measure again in six months and nothing has been changed, the score should again result in a TPI of 40.

Is there just one single UX metric that can make my life easier?

At UserZoom we have our own single UX metric score, called the QXscore. This is a “quality of experience” score that combines various measurements, collecting both behavioural data (such as task success, task time, page views) and attitudinal data (such as ease of use, trust and appearance) – the purpose of this is to create a single benchmarking score for your product.

This single UX score is a simple, clear and persuasive tool for communicating user research results to stakeholders, and should help with getting future buy-in.

Final thoughts

I haven’t even remotely covered every possible UX metric here, because frankly that would take all week. What I am discovering however is that UXers have a broad range of measurements to rely on, that blend both user rating systems with qualitative feedback from usability testing.

It also depends on your own company goals, and what results your various stakeholders wish to see. The key is being clear on what is being measured and why.

Now to clear my throat, throw open the window and start bothering the neighbours again.