Data Cleaning 101: A User Researcher’s Guide
So much more than Clorox wiping your keyboard!
As part of our UX 101 education series, where we discuss the different types of studies and research methodologies you can use with our own user research platform, we’d like to introduce our readers to the exciting world of data cleaning.
Now you may be thinking that this isn’t a type of study or a research methodology, and you would be spot on. However, it’s such an integral part of the research process that we went ahead and included this in our 101 Researcher’s Guide collection as it applies to all data, regardless of whether you got it from a card sort or a remote moderated usability study.
So without further ado, let’s dive in!
Why clean your data?
Data cleaning is an important and necessary step in the research process because missing or incorrect data, or data from the wrong people, can impact the reliability and validity of your insights.
We work hard day in and day out to gather these insights, and so many users and customers depend on the experiences that are born from these labors, so the very last thing you want after finishing a research study is to have bad data!
Monitoring and cleaning your data during recruitment will help catch problems while there is still time to fix them, but also helps to minimize efforts when preparing your data for analysis.
Accessing your data
Step one is accessing your data. Now please keep in mind if you’ve managed to find your way to this article by happenstance and aren’t a UserZoom customer that we will be using the UserZoom platform as our example; the ideas behind cleaning your data, however, will be the same. Alright, onto the data!
There are two main ways to access your data in the UserZoom manager: exporting the raw data and looking at the Results data tab. Researchers often utilize both the raw data export and viewing the results tab in the UserZoom Manager as part of their data cleaning process.
Exporting the raw data
Exporting your raw data files can be done in just a few easy steps.
Step 1: Under the Results tab, select Exports & Options
Open your study, and from the monitor page (or any page for that matter) simply click on the Results tab on the top of the screen. From there you will click on an Exports & Options area on the left hand side of the page.
Step 2: Select your format
Clicking the New Export button on the right side of the screen will open the following export choices.
For the purposes of cleaning your data we recommend looking at the raw data on a per participant level, and the easiest way to see this is by selecting either Excel or SPSS as your format.
While you’re on this page you’ll notice that you have a few more options available to you under the “Type” section. We would like to quickly point out two options to you.
Raw Data: Numerical Values
The important thing to note about selecting Raw Data: Numerical Values is that the UserZoom platform will automatically separate quantitative and qualitative responses onto different tabs. This also includes a codebook so that you can match your responses with your prompts.
Actual Answer Text in Condensed or Expanded Columns
If you select either one of these choices than all your data will be included in one sheet – this includes all of your prompts and responses in text, meaning no codebook is required to match up responses with prompts.
Step 3: Click Export
Once you have finished making your format selections simply click the Export button and you’ll have your raw data in no time.
Accessing open-ended question results and videos
You can also access participant level data in the UserZoom Manager under the Results Tab. For the purpose of this article we will primarily be focusing on open-ended questions and videos, if you have them in your study.
Find participant level responses for open-ended questions under the Ratio & Responses tab. Hint: You’re looking for gibberish answers, such as “asdfjkl;” to remove.
You will be able to see individual responses to the open-ended prompt in the box to the right.
Find participant level task and completion times under the Videos tab. Hint: You’re looking for completion times that seem fishy or out of place as well as any errors or abandons you want to watch later.
If a majority of users are successful but you notice an abandon, for example, or someone whose duration is wildly out of synch with the average, these are clues you may want to watch the videos later to double check what is happening. More on this in the next section.
Data cleaning methods
Alright! Now that you have all of the data that you’re interested in cleaning up in one form or another it’s time to don some metaphorical plastic gloves and get to work.
To kick things off, here’s a quick guide to how we recommend you approach cleaning your data by study type and what to look for.
Now let’s cover some data cleaning methods by a few different metrics and categories as well as red flags to keep your eyes open for.
Look over your open-ended questions to identify poor or nonsense responses (e.g. gibberish such as “asdfjkl;”, or single word answers like “yes” or “good” to your open-ended questions asking for more in depth answers).
You can use completion time as an indicator that something warrants further review. Low completion times may indicate a participant who is speeding through without following the directive. Longer task times may indicate a participant that went off task.
However, it is important to always review the video before excluding a participant based on task time as some participants are truly faster than others and longer task times can indicate a participant was truly encountering struggles during the task.
Remember that one kid in high school who used to answer all multiple choice questions with C? This is like that. Utilize rating scales to identify participants who give the same rating for every question, especially if reversed wording survey questions were included in the study.
Occasionally, researchers will include repeated questions in a survey. These are often intended to help identify respondents who provide different responses to the same question.
Number of errors:
Too many errors may indicate low engagement and can be used to identify those who are speeding through without following the directive. Set a threshold for the number of errors allowed and exclude those who hit or exceed the threshold. For example, if there are 10 tree tests in a study, exclude participants who have 5 or more errors altogether.
Quick review of videos (2x or 3x speed):
Quickly review videos in a task-based study by changing the speed of the video to 2x or 3x, or by gently using the slider to move throughout the video.
An abundance of nonsense selections (obviously wrong selections) may indicate low engagement and can help identify those who are speeding through without following the directive.
For example – let’s say your study has 10 tree tests, all of which involve streaming a movie, and you have identified a participant who consistently selected categories other than streaming (such as DVD). Set a threshold for a number of these kinds of errors (e.g., 3 or more nonsense errors) and exclude those who hit or exceed the threshold.
Placement of all cards into a single category:
Keep an eye out for card sort respondents who place all of the cards into a single category or group. This may indicate low engagement and can help identify those who are speeding through without following the directive.
Clicking on outside areas in Screenshot Click Tests:
Keep an eye out for participants who consistently click in ‘outside areas’ during screenshot click tests – you know, those areas with no CTA’s, no links, not even an image. Too many of these ‘outside’ errors may indicate low effort or lack of engagement. Set a threshold for the number of outside areas allowed and exclude those who hit or exceed the threshold.
Once you have identified the participants whose feedback you don’t want included in your final report the next step is to remove them from your study.
Removing participants from your UserZoom study also removes their raw data (but if you decide to re-include them their data will be included once more so don’t worry about permanently losing the data). There are two ways to exclude participants in the UserZoom Manager: selecting from the participant list or excluding them from their responses.
Click on the Participants List icon on the left hand side of the screen to see your full list of participants.
Select the participant you want to remove, check their “UZ ID” box, make sure the Action is set to Exclude, and then click Apply. Their data is now removed from your report (and again, if you want to re-include them you can do this by changing Exclude to Include and hitting Apply).
If you have an open-ended question, meaning that participants manually input a text answer, you will be able to see and select individual responses from Ratios & Responses.
Simply left-click anywhere on that line (#, Answer, or UZ_ID, for example) and you will have the option to “Show this participant only” which will filter all your data to only look at this person or to “Hide this participant” which will exclude their data from your report.
This concludes our introduction to data cleaning. Thank you for reading and don’t forget to check out the rest of our UX 101 education series to help you on your way!
Sarah Greene is a UX Researcher at UserZoom with a background in psychology and education policy research who primarily supports customers in the e-commerce and travel industries. She is excited by the overall process of utilizing data-driven insights to inform business and policy decisions. When not busy designing studies and gathering insights, she can be found flexing her creative muscles creating jewelry or crafting one of her many DIY creations.