Bot protection template for Qualtrics surveys
On this page, I’ll discuss what bots are, why they’re an issue for surveys, and give you a template to help you reduce bots in your surveys.
Please make sure you read the information on this page before using the template.
You don’t have to use the actual template - you could also just borrow the programming shown on this page and do it yourself in survey flow. The template just saves you some time.
If you know what you’re doing and just want to download the file, click here.
If you want to see the list of my videos that show people how to use Qualtrics, please click here.
I also discuss this issue in this video. Note that the names of some of the options have changed, but the idea is the same. Please see a comment under the video which explains the changes.
OVERVIEW OF BOTS
What is a bot?
When people create surveys, they want to test them. An easy way to test them is to create a script that answers questions randomly through the survey. You can then look at the data to check if the survey is functioning as intended – e.g., if people answer a certain way, do they see the questions they should see? Qualtrics offers this function (Generate Test Responses), and you should use it for testing your own surveys. These automated scripts are called bots.
Are bots a bad thing?
A lot of researchers offer prize draws, or even pay each participant to take part in their surveys. Some people have worked out that if you use a bot to take the survey hundreds or thousands of times, they increase their chance of winning a prize or being reimbursed. So while bots are great for helping test surveys before they launch, it’s obviously not great for researchers because their datasets are filled with junk responses.
What can we do about it?
Qualtrics has built-in protection to help identify bots. I have used these tools to create a template that you can use for your own surveys, to help identify bots and even to stop them from taking the survey. This document explains how it works.
GETTING THE TEMPLATE
What is this template?
In Qualtrics, you can program a survey and download a file that contains all of the programming. The file format is called .qsf. This file can then be uploaded by others so that they can run your survey, or they can start with your survey and modify it to their needs.
I’ve created a template that contains elements that will allow you to identify bots and/or duplicate responses, as well as some other pieces of programming. In this document, I’ll outline how it works. I’ll also talk about some different research scenarios and how you might adjust the template for these different scenarios.
Where do I get it?
You can download the template here.
USING THE TEMPLATE
Here’s an overview of the steps. I’ll spell each out in detail
1) Import the template into your Qualtrics account. Be sure to do this before you start programming the rest of your survey.
2) Adjust the template to suit your needs.
3) Add the rest of your questions.
Step 1: Import the template into your qualtrics account
You will need a Qualtrics account. If you’re at CQU, go to qualtrics.cqu.edu.au and log in with your usual student or staff credentials. You can use this template with any Qualtrics account, though. You don’t need to be at CQU to use it.
Download the template (this qsf file) and put it somewhere you can easily find, perhaps on your Desktop or similar.
To upload the template into your Qualtrics account:
Once you’re logged in to Qualtrics, look for a button that says “Create a new project”. Go to your Projects page if you can’t see that button.
You’ll see a page with lots of templates. You won’t find my template on that page - those are all templates made by the Qualtrics company. Select “Survey” from scratch, right at the top, and click “Get started”
On the next page, where it says “How do you want to start your survey”, open the dropdown menu and select “Import a QSF file”. You might also like to change the name of your survey, and perhaps put it in a folder to make it easier to find later.
Click “Choose file” and find wherever you have downloaded the qsf file that you downloaded earlier. Select it.
Click “Create project”
For more information, please see this page.
Important – you must upload the qsf file before you start adding your own questions. When you upload the survey, it creates a new survey for you within Qualtrics. If you’ve already started programming your survey, you can copy questions over in to this new survey based on the template – it’ll just be a bit more work.
If you make changes to the survey and want to export it for others to use, there are also instructions on the same page that show how to export it.
Step 2: adjust the template so that it suits your needs
Different projects may need different settings. To explain this, first I’ll describe how the template works and then highlight a few scenarios where you might make some changes to the template.
How the template works
Once you’ve uploaded it, you’ll see the screen where you write questions, with not a lot of questions in there. This isn’t where the bot protection happens. Instead, it happens in survey flow.
To get to survey flow, look for the icons on the left of the screen. One of them looks like two rectangles connected with a line. Hover over it and you’ll see the words “Survey flow” appear. Click on that.
If you’re new to survey flow, you might want to watch my video that explains how it works. The interface has changed a bit since I made that video, but it’ll tell you what you need to know.
The key bits in the survey flow
The picture above shows the programming that I’ve used in survey flow to identify bots and duplicates (i.e., people taking the survey more than once). To do this, I’ve used tools that are built-in to Qualtrics. You can read more about these tools on this page.
The way it works is that the first green box uses Embedded Data to capture information from six variables into your survey. When you download your data, you’ll actually see these variables there, including values for them. For example, each participant will have a Recaptcha score. Some variables are empty unless they identify an issue (e.g., Ballot Box Stuffing).
Then, the “branch logic” uses these values to identify potential bots or duplicates.
The first branch identifies potential bots, using the variables Q_RecaptchaScore or Q_RelevantIDFraudScure. If you only want to use one of those, just remove the one you don’t want to use from the branch. If a response has a Recaptcha Score below .5, or a RelevantIDFraudScore of more than or equal to 30, then the survey flow will identify them as a potential bot. It will do this by adding a variable to your dataset called PotentialBot, and a row of data (i.e., a response) will say “True” in that column if it is identified as a potential bot.
Some of you may know Recaptchas as the things where you select which squares have buses, or stairs. This doesn’t add one of these to your survey, but you can add one of those if you want. See https://www.qualtrics.com/support/survey-platform/survey-module/editing-questions/question-types-guide/advanced/captcha-verification/
Duplicate screening works similarly to bot screening, but using different measures = Q_BallotBoxStuffing, Q_RelevantIDDuplicate and Q_RelevantIDDuplicateScore. Again, if you don’t want to use all of them, just remove the ones you don’t want to use. This will create a second variable in the dataset, called PotentialDuplicate, and if people are identified as potential duplicates, their response will say “True” in this column.
Most importantly, there must be a survey block before this branch logic. In the template, I have included a block with a Recaptcha question, and instructions that are only shown if a particular embedded data variable = “blank”, so it shouldn’t show to participants. Please make sure that this block comes before the branch logic above.
When you do your data analysis, you can look at the survey to determine if any responses might be problematic and work out which ones you want to remove. Note that you should always do data quality checks before starting your analyses, because often people don’t take surveys seriously, but will still complete it for you, especially if you’re offering a reward. So, consider their other answers - do they seem reasonable? Are you sure you want to exclude them?
Note that these measures will only flag potential bots and duplicates in your data. The person (or bot) will still complete the survey. If you want to, you can adjust the script to automatically screen them out (i.e., make it so they do not complete the survey).
Automatically screening bots and duplicates out
One adjustment that you might like to make is to automatically exclude attempts that are identified as bots and/or duplicates.
A few things to be aware of. If you set it to automatically eject them from the survey, this happens right at the start of your survey, so they won’t answer any questions for you, and you can’t easily get them to come back if you’ve made a mistake. Also, the bot and duplicate checks are not perfect, so if you’re trying to survey people who are hard to get, you might want to just let the script flag them (like I described above) and then look at their data later to work out if they seem to be taking the survey seriously. However, if you’re running a prize draw, or paying everyone, and they’re allowed to complete the survey so you can check them later, sometimes people who run these bots can harass you to try to get payment. In some instances, it has become pretty unpleasant. So particularly if you’re paying a lot for each response, you might consider automatically ejecting them. Sometimes it can be difficult to work out what you want to do. If you’re unsure, don’t automatically exclude them, then during analysis, look at your data and determine which ones to exclude and which ones to keep.
To automatically eject bots and/or duplicates from the survey, in survey flow, we add the “End of Survey” element. To do this, under the Embedded Data that creates the PotentialBot variable, click on the “Add below” button, or on the “Add a New Element Here” button, and from the options that appear, select the red End of Survey element option.
If you want to eject both bots and duplicates, you need an end of survey element for each. Your screen should end up looking like the one below. (But please scroll down for one last step - adding gc and term values.)
A potential reason why you might want to allow duplicates
While the bot detection isn’t perfect, generally you’ll want to remove them. But there are lots of situations where you might want to allow duplicates, for example if you’re expecting multiple people to take the survey on the same device (e.g., a family survey, or one where you provide the device to people to complete it). In those cases, it’s often best to keep
Step 3: add the rest of your questions
Once you’ve got the bot bits of the template working as you’d like it to, now you can program the rest of your survey. Go back to the survey editor and add the questions and question blocks you want. Just keep the bot detection group before any questions.
WHAT ARE THE OTHER BITS IN THE SURVEY FLOW?
I’ve added some other bits in the survey flow that I often use. At the start of the survey, a variable called “gc” (for good complete) is created, and everyone has a value of 9. There is also a “term” variable (termination reason), and everyone has a value of “Survey started but not yet completed”.
If you scroll down a bit, you’ll see we have a block of questions called Intro block, then a branch that screens people out if they say they don’t consent.. The branch logic says if they select No to consent, they reach an End of Survey element. Before they get to the End of Survey element, their gc and term values are updated. gc is now equal to 2, and term is “No consent”. When I download my data later, I can then work out how many people didn’t give consent.
Then at the end of the survey, the final thing that happens is that their gc value is updated to 1, and their term value is now “Survey completed!”. During analysis, I can look at these gc and term values in my data to work out how many people started but didn’t finish (they will still have gc = 9), and who actually finished (gc = 1). I can then remove anyone who did not complete the survey before I start my analysis, but I can also see where they got up to in the survey, to identify if there were any bits that were problematic.
If I’m automatically screening people out for being a bot or duplicate, then I’ll also want to adjust my gc and term values when I screen them out, because otherwise they’ll still appear as gc = 9, and term = Survey started but not yet completed. Sure, I can check the PotentialBot and PotentialDuplicate columns to remove them if I need, but it’s neater if the gc and term values work well too. To do this, I add gc and term adjustments into the Embedded Data before the End of Survey element.
WHAT DOES ALL THIS LOOK LIKE IN MY DATA?
Below is a snippet of a dataset with some duplicates identified. Here, I’ve just shown the variables that Qualtrics uses behind the scenes (e.g., QRecaptchaScore). You can see that some have been identified as duplicates, with QRelevantIDDuplicate coded as “true” in some cases. In those cases, you can also see that it shows a date (Q_RelevantIDLastStartDate) that they last took the survey, which helps you identify when the first attempt was.
I’ve sorted the file here by QRecaptchaScore. Typically, a score under .5 indicates that it is likely a bot. So, the first few rows are probably OK in terms of being bots, but some of them are duplicates. The last few lines have RecaptchaScores of .4 so should probably be discarded.
Note that you sometimes get different answers from the different measures. So QRecaptchaScore thinks these are bots, while QRelevantIDFraudScore doesn’t. Generally, I’ll exclude if any of the measures indicate issues, but I’m in a position where we get lots of participants and have the flexibility to do this. You might want to keep more of them. Just be open with readers when publishing about what you’ve done and why.
I haven’t shown the new variables we created (PotentialBot and PotentialDuplicate), but they’ll look just like the QRelevantIDDuplicate column - blank if there’s no problem, or True if the response has been flagged as a potential bot or duplicate.
WHAT ELSE CAN I DO?
There are lots of creative ways to deal with bots.
One way is to add a Captcha verification at the front of the survey. In Qualtrics, from the dropdown menu where you can select a question type, scroll down until you find Captcha verification. This adds a question to the start of the survey where people check a box to indicate that they are not a robot. If it’s not sure, once you click the box, it may expand to ask someone to select the squares that contain something, like a pedestrian crossing or similar.
There are other versions of these, for example a series of letters and numbers presented in a way that are difficult for machines to read. This is fine, but sometimes they’re also really difficult for genuine participants to read. Remember, you’re asking people to give up their time to complete a survey for you. If these images are too difficult to read, they might not bother to complete the survey for you at all. I’m not sure these are the best solution. That said, in some cases they might be exactly what you need - a colleague of mine, Dr Philip Newall, used these quite creatively in this study. The idea was to get the participants to actually have to put some effort in to earn a reward (a $3 bonus) that they could then use in a subsequent gambling task. In gambling research, we can’t ask people to gamble their own money in a lab experiment. But, if they’re earned something within the experiment (e.g., by completing a bunch of difficult Captcha-type questions), and can then bet with it, the findings may have some decent validity to them. Clever idea! That said, that’s not the general purpose of Captcha-type questions, and in fact builds on the idea that they can be difficult sometimes. So, I may suggest not doing this with your own surveys.
REMOVING BOTS DURING DATA ANALYSIS
The things above won’t necessarily catch all of the bots. Sometimes, during data analysis, it will become apparent to you that some have been missed. Or, it might not be that apparent, so here are some tips to look for.
I’ve seen some pretty lazy bots before. For example, sometimes people just tell the bot to do every survey the same way. This will be most visible to you if you have something like 1,000 responses who all have the same score on a continuous scale - this is what it looked like in one instance for me. We removed them easily enough. In that instance, I was recruiting via a market research panel, so we didn’t have to pay for them. (Hot tip - market research panels don’t want bots infiltrating their panels, so won’t pay them and will not ask you to pay for them either, but you need to check for them.)
Bots are getting more sophisticated though. Most of the time these days, they answer surveys differently on each attempt. These can be a little harder to catch. Look for unlikely combinations of answers, e.g., an 18 year old who says they have 30 years of work experience (or similar things based on whatever variables you have in your data).
If they’re asked to enter their email address, look for patterns in email addresses. I had one recently where the email addresses were all of a format like [firstname][3letters]@gmail.com, and often the first names were misspelled. For example, agathaxyz@gmail.com or pietreguw@gmail.com. You won’t always capture email addresses, of course, but this is just an example of something to look for.
Where possible, check your data before you pay your participants. If you pay people quickly, and then identify a bot, you won’t be getting that payment back.
WHAT DO I DO IF BOTS START CONTACTING ME?
I’ve had lots of people tell me that the people behind bots will start to chase them up for their rewards. Some of them can be nice enough, but some are very rude, and some make threats (e.g., “I’ll tell your boss about this”).
We need to strike a balance here. We want to make sure everyone who legitimately did the survey gets a reward, and it is possible that the bot protections, like Recaptcha scores, incorrectly identify legitimate people as bots. Similarly, duplicates can sometimes be expected, e.g., two people from the same house might complete your survey with the same IP address. They may be identified by the algorithms used by Qualtrics, but may still be legitimate. So, best not to assume.
But we also don’t want to reward bots, because it just encourages them. And if someone has sent a bunch of bots to your survey and one of them gets a reward, the person behind them will know you’ve given out rewards and may start contacting you from the other bot email addresses. There’s no perfect solution, but we should err on the side of making sure legitimate people are rewarded, even if a few bots do get rewards too. Otherwise, no one will want to take part in our research in the future.
If a bot contacts you about a reward or something else, and you’re sure it’s a bot, please feel free to block them. It’s usually best not to reply to them, too. Make sure you send a list of these blocked email addresses to your ethics team, including the reason you think they are a bot (e.g., their Recaptcha score). Ethics may start to receive complaints, and will need to be able to determine which complaints are legitimate and which are not. Give them the evidence they need to help you.
If you’re not sure if it’s a bot, you can try to work it out by replying to them and asking some questions to work it out. This can be a bit tricky, and it will depend on what your research is about. But, you could ask some questions about the weather, to determine if they’re where they say they are - look up the weather in their area when they reply. Usually, people who do bots are overseas - China and India seem to be pretty common locations. You can sometimes tell from their IP address if they have forgotten to mask it.
The main point is not to panic. The people behind bots can be pretty rude, and pretty mean. You haven’t done anything wrong. If you’re unsure, talk to your supervisor or the ethics team, and you can also reach out to Qualtrics support for advice, too.
WHERE CAN I GET MORE HELP?
While I’d love to be able to help everyone with all of their surveys, I simply don’t have enough time! Fortunately, there’s a lot of great support available to you for free.
Qualtrics support – Our CQU license includes support directly from Qualtrics. With your permission, they can log in to your survey and help with it. Qualtrics also updates their products regularly, and the support team is on top of changes. They are very fast to respond. If you’re from another institution, please check with your Qualtrics admins to see what level of support you have available, but I think it’s pretty standard that Qualtrics supports all paying accounts. See https://www.qualtrics.com/support/
Qualtrics XM Community – This is a community of users where people can ask and answer questions. There are a lot of very experienced people on here. While you might not always get an immediate answer for your question, you can search previous questions and answers by others. Chances are that if you’re asking something, someone else has asked it too. https://community.qualtrics.com