How to Establish Data Collection Standards

July 31, 2018

Part five in our series on getting starting with mobile data collection explains why an app alone is not enough to make sure you are collecting consistent, clean data and how data collection standards will set your project up for success.


Consistent, clean data is the gold standard that every data collection program targets. Achieving this standard starts with careful consideration of the design of program content, delivery, and data storage and security. Repeating the process of defining your data collection standards as you iterate on your program (after the preliminary design, the pilot phase, and initial data collection) will ensure that your data continues to serve your needs.

This post will outline the three categories of considerations that are important for the collection of unbiased, reliable data:

  • Content Design
  • Delivery Design and Method
  • Storage & Security


Content Design

Content design is about developing a survey or assessment that generates responses to best inform the goals of your project. This involves setting up clear questions that avoid bias, maintain consistency in phrasing, and are culturally appropriate.


Ask “what effect has this program had?”

Not “how has this program helped you?”


Offer “0-10 kg,” “10-30 kg,” and “30-50 kg.”

Not, “light,” and “heavy.”


Phrase questions in terms of “dollars” in the United States, but “francs” in Mali.


Spending a bit of time planning and strategically designing the content (e.g. questions, surveys, etc.) that will be used in your project will help ensure that you follow a systematic process collecting data from your target beneficiaries, from the beginning to the end of the data collection period. It is not just about putting all the questions you can think of in the survey. With a consistent, well-thought-out design, results that emerge from data collection will be cleaner and more easily compared with results collected at a different time point or even by other programs and projects with similar objectives.

The primary components of content design are avoiding bias and cultural adaptation, and there are even some widely-used, validated surveys available that could be exactly what you need.


Avoiding bias in phrasing and questions

Avoiding bias is one of the hardest jobs of a survey developer, as the way you ask a question will impact the answer you receive.

To capture the full range of possible answers, the phrasing of your questions should remain neutral, which can be easier said than done. Depending on the topic, it can require extensive knowledge of the subject, including public perception, power dynamics, and even general controversies within the field.

A widely-used Catalog of Biases in Questionnaires highlights many of the ways you may be unintentionally biasing your data. From problems with phrasing to leading questions, there are many ways you could be designing the content of your surveys that will result in unreliable, messy data. For instance, in a poorly designed question, “How often do you exercise?” could be answered by “regularly” or “occasionally,” which are vague responses. A better option, providing precise and quantifiable answers, would be:


“How often do you exercise?”

[ ] Twice a week or more often

[ ] Once a week

[ ] Less than once a week


Another area to pay close attention to is the phrasing of questions that are intended to elicit more subjective responses (i.e. opinions, feelings, or beliefs). The way you pose the question can influence beneficiaries’ responses in ways that may be unintended.


A simple question on a data collection app, with straightforward answers and an image for clarity.


Addressing cultural expectations

Context matters.

Cultural references and word selection in survey questions may lead to variability in interpretations of the questions when applied to different populations.

A common example that would increase the accuracy of your data collection would be if your project targeted beneficiaries in rural Tanzania, you may consider including a multiple choice question inquiring about the beneficiaries’ languages spoken at home. The choices could include Kiswahili, English, and potentially other dialects based on prior knowledge of the population you are working with. And of course, it is helpful to have the survey itself available in the predominant language spoken by the respondents you are conducting the survey with.

An example that might have more of an effect on project outcomes comes from Ethiopia, where the Red Cross logo was used as a hospital icon, but locals interpreted it as the symbol for a butcher.

Considerations like these should be balanced with the overall goal of the program, as a culturally-specific reference may help get a reliable answer for one beneficiary, but if the sample is ever expanded, then you may run into issues.


The University of Michigan’s Institute for Social Research outlined a few ways to consider cultural references in survey questions, explaining that you can either (1) translate your questions, (2) ask different questions, or employ a mixed approach.

  1. Translating questions: Asking the same questions and translating can mean literally translating a question, but often involves techniques such as decentering (i.e. removing all cultural references entirely) or including anchoring vignettes (asking respondents to assess both for themselves and a hypothetical person). This is the best way to be able to compare data later on.
  2. Asking different questions: This can be a little tricky, because instead of standardizing the question, you must standardize the responses. The opposite of decentering, it requires you to find appropriate examples for each population to ensure relevance.
  3. Mixed approach: Finally, a mixed approach could involve a standard question (e.g. “What language do you speak at home?”), but varied responses depending on cultural context (in Madagascar, Malagasy and French, but in Afghanistan, Pashto and Dari).


Using validated surveys

Why recreate a survey instrument when there may be one that already exists that is well-aligned with your project objectives? There are databases of existing, validated surveys that would allow you to reliably assess and compare your results with those of other projects in the broader community. For example, RAND Health provides free online access to their surveys covering a variety of health topics.

In sum, the content of your survey is the primary determinant of the data you will collect, so extra effort should be placed on ensuring it is unbiased but relevant to your audience.


Delivery Design and Method

How you deliver a question is just as important as how you phrase your question. Determining the optimal delivery method is all about how to best structure and disseminate your survey. The survey structure and mode of communication are the primary considerations in this category.


Survey Structure

The structure of a survey, including the sequencing of questions and their available responses, will affect the outcomes of your survey. Here are a few considerations for how you might recognize and solve these situations:


Do certain questions depend on others?

If you have questions that will only make sense within the context of a previous answer, you can use skip logic (also known as display conditions) to control when they appear. Deciding which questions should appear (or disappear) depending on the answer to a prior question (or questions) is important for collecting clear, consistent data. For example, if you ask the question, “Are you taking any medications?” and the respondent says “Yes,” you can use skip logic to trigger the question “Which medications are you taking?” Otherwise, you can skip that question if the respondent answers “No.”


Which questions are required?

Often, with paper forms, you will find that important fields are left completely blank. This can render the whole submission useless. The ability to deliver surveys electronically has made it possible to reduce the potential of incomplete datasets. As a general rule, it is good to make every question on your survey required, especially if each one is required in understanding the overall results.

However, there are some items that it may make sense to keep as optional. For instance, if you include an open-text question to solicit any additional comments or feedback on the survey, the respondent may not necessarily have any comments to provide. Therefore, it is reasonable to not require a response. Alternatively, you might find that including “Don’t Know” or “Refused” for certain questions adds some clarity to why some responses are left blank. Find the right balance of required questions to make sure you are capturing all of the data that are essential to your project, while also ensuring that the survey is not too long or cumbersome to complete.


What type of answers are you expecting?

When developing a structured-entry survey, you may want to consider whether a given question accepts only one response or multiple responses. While paper forms cannot enforce the number of responses that a respondent can provide, an electronic survey can. Furthermore, you may want to restrict the type of responses that can be accepted. For instance, you may want to accept only integers for an item inquiring about the respondent’s age or restrict phone numbers to ten digits. By using what are called validation conditions, you can ensure your enumerators correct mistakes on site that they might not otherwise notice until it is too late.


These are just a few of the questions you might ask around the structure of your survey. There are surely many more, such as whether you want a single question per page or multiple. The important thing is to evaluate each step of the delivery of your survey and to test it out, making sure that your beneficiaries find it easy to answer and your data is clean and consistent.


A health worker in South Africa reads questions aloud to a new mother from a mobile data collection app.


Mode of communication

Are enumerators reading questions off a paper form or are respondents answering directly on a mobile device? This consideration is focused mostly on the user experience and how it can affect the reliability of their responses.

For instance, if you have a short survey, and your respondents have access to smartphones, you may want to consider an SMS-based data collection program. However, if you are unsure of your audience’s access to mobile phones, you may unknowingly restrict and bias your sample, based on the type of person who has access to a mobile phone in that population (e.g. male heads of households).

On the other hand, if you have many open-ended questions that require longer text responses, you may want to consider collecting responses verbally and/or entering responses via tablet or computer.  Understanding how each mode of communication is used in your target location (and the norms and power dynamics associated with those types of interactions) will help you account for any bias as you design the delivery of your surveys and programs.


Storage & Security

Certain sectors lean more heavily on this consideration than others. For some beneficiary populations and projects such as those working with HIV patient data, privacy concerns may be much more important than others. You might find a few relevant answers as part of your evaluation of your existing data collection program, but understanding where the data you collect goes is vitally important.

The actual considerations will depend on the sector you are working in. For instance, projects in the public health domain might require you to consider patient confidentiality and HIPAA compliance. The FDA has shared guidelines for the use of electronic health record data that may be helpful for your project.

Private data can be hard to keep secure when it is recorded on paper.


As you consider the storage and privacy of your data, ask yourself the following:

  • Does the dataset need to be de-identified before exporting and sharing?
  • Do I need to protect certain data after it is entered?
  • Who can have access to the data?
  • How long can the data be stored?

All of these considerations will be specific to the industry or sector that you work in, while others will depending on local laws or even partner organizations’ codes of conduct. Make sure you are familiar with the requirements of all parties involved before you begin collecting data.


Why Does This Matter?

All of these design, delivery, and security considerations will affect your collection of clean, consistent data. Use them to understand how some beneficiaries might feel more comfortable with your survey and strive to improve. You will rarely–if ever–get clean data the first time. So, be sure to use your pilot program to find out how your end users and beneficiaries will respond. Then, examine that data and modify your method of data collection as needed to ensure that the responses are clean, consistent and allow you to meet the goals of your project.


Once you know which tool you are going to use and understand how you will collect your data in a clean, consistent way, the next step in our starter’s guide to mobile data collection is actually getting to build your new solution: “How to Design & Test Your Mobile Data Collection App”


Written by

Read more from
Staff Blog

The World's Most Powerful Mobile Data Collection Platform

Start a FREE 30-day CommCare trial today. No credit card required.

Get Started

Learn More

Get the latest news delivered
straight to your inbox