The inconvenience of convenience sampling

Wikipedia defines ‘convenience sampling’ (also known as grab sampling, accidental sampling or opportunity sampling) as a type of non-probability sampling that involves the sample being drawn from that part of the population that is close to hand. The sample is taken from a population that is easy to contact or to reach.

It didn’t take COVID-19 to stimulate interest in on-line rather than face-to-face surveys. The costs involved in collecting data through on-line surveys or mobile phones are, typically, many times less than face-to-face polls.

In a typical nationwide study in Zimbabwe, for example, a properly designed sample will mean you will have interviews to conduct with people who live six or more hours away from the city and who can only be reached on corrugated or washed-away dirt tracks in four wheel drive gas guzzlers – so you’re contributing to the demise of polar bears’ territories too.

In addition, when you get to the said village and the particular respondent’s abode, what if he or she has gone to market or to a funeral and will not be back until late afternoon, by which time it is too late for the research team to leave the village? Can the funder of the survey afford to have an entire team stay overnight in the area, if indeed there is anywhere to stay? So, you substitute that respondent, yes? That’s already a compromise on a representative, randomly selected sample – it was inconvenient, and too costly to wait around for the originally-selected person. But if the researcher follows protocols correctly and adheres to the ‘random walk’ or other prescribed method of selecting village, household and respondent within the household, and of substitution, when that’s required, it’s considered an acceptable compromise, usually. (Substitution and refusal rates should be reported, of course, but often aren’t).

And then, who is checking to see that the random walk and subsequent selection of the individual to be surveyed was properly executed by every member of the team? What if the correct village and household to approach means a long walk on a hot October day? How big is the temptation to go to the village that’s closest to the road where the trusty and dusty Landcruiser is parked? Who is checking anyway? What does it matter if our data states that access to markets, hospitals or schooling is better than it actually is, because the research team compromised in order to make their working day shorter and more comfortable?

Recently I read a research report conducted on a population in Zimbabwe by a reputable research agency (from a country I will not name but which is not Zimbabwe). I was astounded to read that in Mbare, a high-density suburb of Harare, renowned for its high population of men who have come from rural areas to seek their fortunes in town, 80% or more of the population were women. The report’s author surmised that the men in Mbare had all gone to South Africa to look for jobs. A follow-up by the funder, whom I represented, revealed very quickly that the researchers involved had gone door to door in quick succession and interviewed anyone they found at home. No call-backs, no after-hours or weekend work and no record in the report of the number of substitutions that were done because originally selected respondents were not at home. The survey was completed in record time, of course. But the data wasn’t worth the tablet it was written on.

Convenience sampling is not just associated with on-line sampling. Let’s consider on-line surveys. There is a plethora of ODK and other software to design and administer the questionnaire. But who are you going to send it to? All very well if you are purposively targeting your customers or project beneficiaries, whose details you have and which are up-to-date, and assuming that they all have mobile phones or computers and email addresses – there is still the question of who chooses to respond and who does not – but that is the nature of the beast with surveying. People are not obliged to respond, however they are sampled or approached.

What if you want to do a nationally representative poll to establish, for example, voting patterns in an election or measure access to information and services related to HIV or COVID-19? Perhaps one of the mobile phone providers in the country will offer to send out a few questions to their database of millions of subscribers and send you the data. Is that nationally representative? Is it even ethical that they use their databases to make money like this? I can’t recall that anyone asked me if they could pass my information on, when I signed up for a mobile number.

Do they tell you what the response rate was, and if so, do you consider what effect this has on the interpretation of the findings?
Do you know how the profile of the respondents differs from the profile of the population based on the latest census – who is over- and who is under-represented?
How do you reach people living in areas of the country with no or poor network coverage?
What about collecting the views of those who do not have cell phones, or who have no power, or who have power outages and cannot easily charge their phones?
Are elderly people less likely to have the technology and are thus under-represented in the survey responses?
Do men and women have equal access to the technology? What about disabled people, such as people with visual impairment? Have the tool and methodology been designed to capture their views and are they represented at the same level as they are represented in the population?
Will one get equal representation from working and non-working people? The latter might have more time to fill in a questionnaire but might be less able to afford data and receive a questionnaire on WhatsApp, for example. Working people may have less time to fill it in – and less easily incentivized, in the event that financial or other incentives are provided to complete the survey.

What if the questions are personal and sensitive? Are people more likely to respond on-line or in person to such questions?

I recall that some years ago the manufacturer of a renowned brand of face moisturiser was challenged regarding a claim that was along the lines of this: ‘80% of women who used this moisturiser and who were surveyed reported a significant reduction in wrinkles over a 12-week period’. The claim was not untrue, but it was misleading. They’d neglected to state two key facts: (i) That the number of women interviewed was just 20 (see our recent blog in the archives –‘It’s all about the denominator’) and (ii) that they were selected from the manufacturing company’s employees’ relatives.

Next time you commission a survey, on-line or not, remember to ask about the denominator. How big? And who is in it? And how will they be selected? Allow for an adequate level of back-checking and oversight in the research budget; this means budgeting for random spot-checks by supervisors and managers, including at those remote and hard-to-reach sampling points. And next time you read a research report, remember to check that they report the refusal rate; the substitution rate; the limitations on the sampling method; how they ensured that the theoretically sound sampling approach outlined in the proposal was really followed in practice. Rather have a smaller sample size and correctly sampled, than a large sample that does not come from a population representative of the population you are interested in.

The inconvenience of convenience sampling

Our two cents