One of my students is interested in assessing the nutritional status of school going children in a particular city. How can we calculate sample size for this and what will be the formula?
In addition to whats Dr. James said, a better answer will be if you provide us the study hypotheses or research questions, and the measurement level for the variables. Sample size calculation depends on these elements. Each statistical test has its own formula.
It depends upon the kind of data that are being collected (continuous, yes/no, and if qualitative, that is more nebulous). It depends also very much upon the variability of the data, which is the standard deviation for quantitative data, and also you need to design well to avoid bias.
It also depends upon your design. School studies might often benefit from cluster sampling. Stratification lowers the overall sample size needs.
Sample size "calculators" on the internet are generally only for yes/no questions, even though they typically do not tell you that. They also are generally for the worst case regarding standard deviations (setting p =q= 1/2), and ignore the finite population correction factor, so they may give you a sample size that is far too large, even if you are only interested in a yes/no question.
You also need to consider each question, if asking more than one.
For simple random sampling, you can find "formulas" in Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons, and other such books. But one needs first to know something about the standard deviation of the population for each question.
What I suggest is that your student do a pilot study - a small preliminary study - to get an idea as to what it will take to do a good job on the final study. This could apply to any survey. It could also help work out any other problems before committing to a full study.
I assume that no regressor/axillary data are being contemplated, so this is strictly a probability-based survey, and it is therefore very important to use randomized sampling to avoid bias.
One other suggestion for later: wherever appropriate, your student should consider presenting results with the aid of graphs.
Cheers - Jim
PS - Nonresponse bias is another problem. Try to collect from all of those randomly selected. Those who do not respond are likely not "typical." You want to know if you need a separate stratification by response propensity/homogeneity.
In addition to whats Dr. James said, a better answer will be if you provide us the study hypotheses or research questions, and the measurement level for the variables. Sample size calculation depends on these elements. Each statistical test has its own formula.
For quantitative data you can use (1) hypothesis tests or (2) straightforward determination of standard errors of parameters like means, based on standard deviations of populations. Both depend upon standard deviations. But hypothesis tests are more difficult to properly/practically interpret. You cannot use just a p-value.
I suggest something like the sample size determination found in Cochran (1977), based first on simple random sampling, without bothering with hypothesis tests.
Article Practical Interpretation of Hypothesis Tests - letter to the...
In order to answer this question/problem, several remarks have to be studied.
1. Research studies are usually carried out on sample of subjects rather than whole populations. The most challenging aspect of fieldwork is drawing a random sample from the target population to which the results of the study would be generalized.
2.The key to a good sample is that it has to be typical of the population from which it is drawn. When the information from a sample is not typical of that in the population in a systematic way, we say that error has occurred. In actual practice, the task is so difficult that several types of errors, i.e. sampling error, non-sampling error, Response error, Processing error,…
3. In addition, the most important error is the Sampling error, which is statistically defined as the error caused by observing a sample instead of the whole population. The underlying principle that must be followed if we are to have any hope of making inferences from a sample to a population is that the sample be representative of that population.
4.A key way of achieving this is through the use of “randomization”. There several types of random samples, Some of which are: Simple Random Sampling, Stratified Random Sampling, Double-stage Random Sampling... Moreover, the most important sample is the simple random sample which is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen. In order to reduce the sampling error, the simple random sample technique and a large sample size have to be developed.
5.The following factors are highly affected the sample size and need to be identified:
Population Size,
Margin of Error,
Confidence Level (level of significance) and
Standard of Deviation.
6. Then, the sample size can be estimated by,
Necessary Sample Size = (z-score or t-value)2 * StdDev*(1-StdDev) / (margin of error)2 .
Once again, are you limited to yes/no questions? Usually not. Beware of one-size-fits-all formulas and "calculators." They have very limited uses. Consider type of data, standard deviations and designs.
Software tool I used for calculating sample size is "G Power", but you must determine and input all the parameters like statistics test that you want to apply, confidence level, standard deviation, etc...
Yamane/Slovin is very limited. This goes with the warning I gave above, regarding many online sample size calculators: "Sample size 'calculators' on the internet are generally only for yes/no questions, even though they typically do not tell you that. They also are generally for the worst case regarding standard deviations (setting p =q= 1/2), and ignore the finite population correction factor, so they may give you a sample size that is far too large...."
Suppose we want to use continuous data? Suppose we want to use something other than simple random sampling? Cochran(1977), 3rd ed, Sampling Techniques, Wiley, goes on to show stratified random sampling, and cluster sampling. You might also use regression if you have the data. For school children you might use a multilevel model, with students nested in schools. There are a lot of possibilities. One size does not fit all.
Apparently Yamane/Slovin has been very much misused for quite some time, and there is a paper in The Philippine Statistician pointed out by Johnny T. Amora regarding this. Please note my last two responses on https://www.researchgate.net/post/What_is_sampling_error on September 21, 2019. Oh, and there is also a restriction to a given confidence level. So it only applies to one very specific case. When i first looked into it years ago, for some time i was not sure it applied to anything at all. But as shown in the paper in The Philippine Statistician, there is one very specific case where it does apply, and I even forgot about that. Yamane is just not something you need to keep in your toolbox.
Apparently misconceptions about the usefulness of that "formula" in Yamane are widespread. It seems it is hard to stop that, once it gets started.
Please note that the formulas posted by Abdullah Noori are only useful for yes/no questions, simple random sampling, and I do not see a finite population correction (fpc) factor. - Well, it is written in a different format and with N in there may account for the fpc, but I don't recall seeing this from a previous derivation.
This is also covered in Asif Ahmad's class23.pdf file, as well as the simple random sampling, no fpc case, for means.
This is just the 'tip of the iceberg,' but does go further than Yamane.
You should always be aware of the foundations for whatever you apply. People very often apply a "formula" where it is not relevant. Sample size considerations can be far more complex.
Goli, a more practical design than simple random sampling for this problem might be cluster sampling or two-stage sampling. I keep mentioning Cochran's book though there are many others, and I have quite a few, but Cochran is well-written and packs a lot of information in one book, and I know more of what is in it. There (and elsewhere) you will see a tradeoff in sample size for two-stage sampling between number of primary and secondary units. You could also consider hierarchical modeling, with students found under classes. I don't know a good reference for that, but I'm sure that others on ResearchGate do.
Goli Srinivasarao, I mentioned cluster or two-stage sampling above for this case, and said that for two-stage sampling there is a tradeoff between the number of primary and secondary units. Of course it is easier to collect data from the same place. But if the data you collect is expected to vary a good deal by primary unit, you may want to stratify first (usually a good idea, but very complicated here, and I don't know where you'd find it worked out) or else you might prioritize number of primary units over number of secondary units. If they don't vary a good deal, then a simple random sample of clusters, or unequal probability sample of clusters based on cluster size, might work well, and be much simpler. You may be able to guess high or low variance between primary units or clusters by the subject matter. What would you expect?
However, if you go with a simple random sample of clusters, you should do fine regardless, if the number of clusters is large enough.
A pilot study may be in order to collect preliminary information.