I running a data collection for an observational study (with follow up) among 50.000 students. I would like your opinion about the response rate. What response rate would be a sufficient one? What about a good one?
The response rate is a huge myth and poorly understood in academic research. I have read papers claiming that they have a response rate of, let's say, 60% and therefore their sample has to be representative. This is a mistake. Size is but one factor that matters.
First, you want your sample to be representative of the overall population. In other words, you do not want to have a systematic bias when drawing your sample. Second, the response rate indeed matters, but only in connection with the representativeness of your sample. Just imagine that your response rate is an impressive 50% but for some obscure reason you do not have any female respondents. This would create a strong systematic bias if your population includes male and female students.
Ideally you have some probability sampling, in which all members of the population have a chance to be included in the sample as opposed to nonprobability sampling. In your case, if the population would be students that would be great.
All things equal and if you succeed in avoiding a systematic bias, a higher response rate is obviously better than a lower one. However there is one more caveat. If you force students to fill out surveys or provide huge incentives (e.g., monetary payments) you might run in the problem of getting poor data quality in spite of having a high response rate. The same problem might occur if your questionnaire is too long or too complicated.
Having said all that, if you can get a 10% response rate in an online survey would be really impressive. Just keep in mind that with a big sample size many effects will be significant if you use traditional statistics so you might also want to consider effect size in your study.
The response rate is a huge myth and poorly understood in academic research. I have read papers claiming that they have a response rate of, let's say, 60% and therefore their sample has to be representative. This is a mistake. Size is but one factor that matters.
First, you want your sample to be representative of the overall population. In other words, you do not want to have a systematic bias when drawing your sample. Second, the response rate indeed matters, but only in connection with the representativeness of your sample. Just imagine that your response rate is an impressive 50% but for some obscure reason you do not have any female respondents. This would create a strong systematic bias if your population includes male and female students.
Ideally you have some probability sampling, in which all members of the population have a chance to be included in the sample as opposed to nonprobability sampling. In your case, if the population would be students that would be great.
All things equal and if you succeed in avoiding a systematic bias, a higher response rate is obviously better than a lower one. However there is one more caveat. If you force students to fill out surveys or provide huge incentives (e.g., monetary payments) you might run in the problem of getting poor data quality in spite of having a high response rate. The same problem might occur if your questionnaire is too long or too complicated.
Having said all that, if you can get a 10% response rate in an online survey would be really impressive. Just keep in mind that with a big sample size many effects will be significant if you use traditional statistics so you might also want to consider effect size in your study.
Dear Horst Treiblmaier , thanks for your enlightening reply! It has been very useful and clear. I would like to ask you one more question. In order to avoid bias, I sent an email though the institutional email of the University: this allow me to potentially reach all students. All of them, if enrolled at the University, got an email. In your mind, informing students via students' web platforms and invite them to fill my questionnaire to increase the response rate would be a good idea in your mind?
Dennis Mazur : 10% was just an example of mine. However, if you believe that this response rate is low, please check out the typical response rates of online surveys, especially when organizations are involved (e.g., in the Marketing literature). Of course, you will get a response rate of 100% if you force students in a class to fill out a survey, but what about the data quality? Furthermore, does the population you are interested in only consist of students? Alternatively, you can pay people, for example, on Mechanical Turk (which is not necessarily a bad thing!). Strictly speaking, in this case you do not have a response rate, but a selection bias for sure. Trust me, you will not find any CEOs on Mechanical Turk filling out surveys for two dollars. If you are interested in CEOs you have to have a different strategy...
A way more extreme example are survey polls, which usually include a very small portion of the population and are surprisingly reliable. Of course in this case we are not talking about the response rate, but about the appropriateness of a small sample to represent the whole population, but the topic is similar. The key issue is external validity.
In a nutshell, a response rate of 10% of a representative sample might be far better than 50% of a heavily biased sample.
A very famous case is the 1936 presidential election in which George Gallup used a more representative sample 50,000 people (as opposed to Literary Digest with 2.4 Mio. people) and correctly predicted the outcome of the election.
Fabio Porru You are welcome. Let me briefly answer your question:
*) You can never fully avoid bias, but the mere fact that you are considering it, shows that you are on the right track ;-) Shadish et al. (2001) list dozens of different types of bias.
*) I guess in your case the main question is what the population is from which your sample is drawn. Is it the general population? Then a student sample is heavily biased. Is it students all over the world? Still a strong bias since your sample is country-specific. Students from your country? It is getting better, but different universities might have different cultures. Ideally, you are only interested in students from your university.
*) If the focus of your study is less descriptive, but more inferential (e.g., hypotheses testing), than a student sample might not be quite as bad as it seems. I highly recommend the short paper from Blair and Zinkhan on that matter:
Article Non-Response and Generalizability in Academic Research
*) I believe that the strategy you suggested (contacting students via the web platform) is the best you can do.
*) If this is an academic study, you can always acknowledge a low response rate in the limitations. Again, in the paper from Blair and Zinkhan you will find that there are different roads to generalizability.
Total elimination of bias is impossible to achieve. But still response rate of a certain critical value is must. There is nothing sacred about response rate of 60%, 80% or 90%. The thing that matters is representative-ness. Responders and Non-responders vary in characteristics. If so many people opt of the study and choose not to respond, they might be handful of the people which are actually different from the reference or the target people leading to the jeopardizing of representative-ness of the sample. Responders may not be actual representatives.
it is possible to achieve 70%. 80 of response within 5 to10 minutes but Most times its very difficult to eliminate bias in response which error is attributed to the mind set and environmental factor of the respondent at the time of administering the questioner.
Definitely the best response rate is 100% of adequate and representative sample. I do remember that a response rate should not be less than 60% of the assigned sample. Look for ways to improve response rate via reminding, revisiting or sometime replacing.
Interesting to read what numbers are posted here. Let me break down the subject in three parts: representativeness of the sample, response rate and data quality. At the end of the day we want the data we get to be meaningful (i.e., representative for the population and accurate):
(1) Selecting your sample: you have to select your sample such that it represents the population you are interested in. If you succeed, this is great, if not, you have some kind of sample selection bias. Marketers have shown that samples that are a fraction of a percent of the total population might suffice. Numerous sophistiscated techniques exist to account for potential biases (e.g., stratified sampling).
(2) Nonresponse (the topic of this post): If the respondents represent your population, it is all good (more or less independent of the response rate). If you have a systematic bias, most likely you have a problem (even you have a high response rate). A common procedure to measure the problem of nonresponse bias is by extrapolation, i.e., assuming that late respondents somehow resemble nonrespondents:
Personally, I am highly skeptical of this procedure (originally published in the 1970s) and would like to see replications studies with Internet samples.
(3) Data Quality: Perhaps the most overlooked factor. The best response rate is useless if the data quality is poor. This can be caused by sloppy respondents (e.g., if the surevy is too complicated or too long) or by the fact that people simply lie.
Wentland and Smith (1993) published an excellent book with empirical results that were triangulated:
Their results in a nutshell? People lie. Especially if you ask them about sensitive topics such as drug abuse or their financial situation.
Summarizing, the response rate is only one factor and should not be overestimated. It is therefore impossible to specify exactly how high a good response rate should be. Forcing students to fill out a questionnaire most likely will lead to 100% but the data quality will suffer. Mailing a questionnaire to companies will inevitable lead to a comparatively small response rate, especially if you target managers.
Normally I would simply press the Recommend tab when reading a response such as yours above, but your advice is so comprehensive and just plain sensible that I have decided to take the extra step of thanking you in a separate post.
Your contribution here reminds me of the article that you published with Andreas Wieland and others in 2017 about scale purification. That article seems to me to be replete with clarity and solid argument, and, I am planning to use and cite it in some research that I colleague and I are currently initiating.
I'll say it again: thank you. This is what RG should be about: helping others to conduct good research.
Thank you very much for your kind words. I enjoy being a researcher and I consider it to be an ongoing learning process. Most interestingly, I frequently discover that I need to reassess my own beliefs. The paper I mentioned above from Blair and Zinkhan (2006) is one example. In about half an hour the authors completely changed my understanding of external validity. I am now extremely skeptical when it comes to rules, regulations, guidelines etc. ;-) I am glad that you liked the paper that I co-authored with Andreas Wieland et al. Here are two more papers that I wrote on Paul Feyerabend and that you might potentiall enjoy. In a way they are also related to the topic of this post, since I consider epistemology the basis of methodology ;-)
Article The Philosopher's Corner:: Paul Feyerabend and the Art of Ep...
Article Taking Feyerabend to the Next Level: On Linear Thinking, Ind...
Thank you, Horst, for the extra articles. I'll devour them with interest. I agree that epistemology is the basis of methodology, and suspect that too many researchers magnify what are simply methods by calling them methodology, and do so with barely an inkling about epistemology.