The online calculators are often for yes/no data, for simple random sampling, and assuming the worst case where p=q=0.5, and no finite population correction factor. However, it sounds like you have an idea as to what p is, and such an initial estimate can give you a better (smaller estimated sample size requirement) result. Here, without the finite population correction (fpc) factor, an estimated variance of p (or q) is pq/(n-1), as shown on page 52 of Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons.
On pages 75 & 76 in Cochran(1977), you see that a "first approximation" for this case of simple random sampling gives you an estimate of n which is pq/V, where V is your goal for a variance of the sample proportion (i.e., standard error squared). There is much more in Cochran on this. Online you can find this:
If you check yourself with one of those calculators which use p=q, the sample size the calculator suggests should be larger.
Note that if you can stratify your population, an even smaller sample size will give you the same overall variance. But if you need to report for subpopulations, you may need larger sample sizes from each group.
Don't forget that you can bias results and add to variance if you are not careful in your data collection.
This all applies to each yes/no question on your survey individually. Some proportions will therefore be more accurately estimated than others with the same sample size and design.
More advanced methods with other auxiliary data might be helpful, but I'm not expert on more for proportions, and think that the above references may be all you need here.
If you do not have a preliminary guess for p to use in estimating sample requirements for a key question, a pilot study could help. That could also help work out other issues such as data collection logistics ahead of the survey. Just a thought.
Pl.see (1)Leslie Kish,Survey Sampling,New York,John Wiley &Sons and (2) C.A. Moser and S Kalton,Survey Methods in Social Investigation,op.cit.,Section 7.1 in Chapter 7
I used Survey System's link (https://www.surveysystem.com/SSCALC.HTM). I calculated a 9.15 confidence interval and confidence level of 95% and got a sample of 100 for a total population of 778. Does this make sense?
See my response above: Preliminary for simple random sampling is n = "...pq/V, where V is your goal for a variance of the sample proportion (i.e., standard error squared)."
I have not plugged it in - using my phone - but the worst case has p=q=0.5.
PS -
Note again that V is the square of the (estimated) standard error, se.
Half your confidence interval is d. Use t or z appropriate for your "confidence."
So t = d/se or d/t =se.
pq/V = pq/((d/t)^2)
Are you saying 9.15 is 2d? That does not work here. Note that 0
Thanks for your reply, James, but that's a bit too complicated for me, sorry! I'm looking for the simplest calculation so I can use the results of my survey in a report. As I said, the answers were pretty predictable and 80% of participants replied the same so I don't need any advanced statistics, just make sure that my results are statistically relevant :)
I checked on this calculator, and it appears that its version of a "confidence interval" means that you estimated whether a population value of 50% would fall into the range 40.85% to 59.15% -- that what you intended to do?
A way of interpreting what David said is that if you have any questions for which the point estimate of p is about 0.5, then the confidence interval will be about plus or minus 0.09, so p would be between about 41% and 59% for your 95% confidence interval and that sample size.
For the same sample size, if the point estimate of p turns out to be substantially larger or smaller than 0.5, then your 95% confidence interval will be substantially shorter in width.
Note that this is not a really large population size for the sample size you contemplate, and the finite population correction (fpc) factor starts to matter. That means you would require a smaller sample size than calculated above, even for p approximately 0.5, and the fpc would be especially helpful if you want a shorter confidence interval.
When you say your answers were "predictable," I assume you have a better guess than p = 0.5. But you can underestimate your confidence by using that calculator.
When you get your results, you should estimate confidence intervals for the data obtained. That means the online calculator you found won't serve you, unless all of your questions really do have about half yeses and half no answers. You may find a better online calculator. There are others. I think I saw one once which allowed you to enter estimated p. Finding one that accounts for the fpc might also be possible. Maybe.
If you are saying that you already collected data from 100 randomly selected members of your population, and you cannot see from the above how to do a better job of this, then you can say that you are at least 95% "confident" that none of your estimated proportions are off by more than about nine "points," but some could be a good deal better, namely those with estimated p close to 1 or 0. That may not be very satisfactory, but at least you can say that.
Next time it would be better to plan ahead and see if your budget can include a good statistician. You probably want a larger sample size, unless you can live with lower 'confidence.' The fpc should probably be considered. Also, data quality needs to be maintained.
The statistical evaluation needs to be considered in the planning stage of your projects.
Best wishes - Jim
PS - Roughing it out, I think it looks like the fpc in your case means bringing 9.1 "points" down to about 8.5. With a larger sample size, the difference will become greater, quickly.
I agree with every James advice, including the observation that 778 is not a large population, since your sample size is a quite considerable proportion of the universe. Don't forget that those statistical principles assume an "infinite" population.
I assume that your units of analysis are the 778 supermarkets, not some kind of sub-units (customers, managers, etc.) inside them. In the latter case, the sample design would be a bit more complicated.
It is indeed 100 supermarkets out of 778. I asked them whether they had food waste reduction routines in place or not. I already knew most of them did, but wanted to know to what extent. It turned out 86% of supermarkets said yes.
... for fpc adjustment, think just multiply d by sqrt(668/778).
... Last note:
Roughing this out in my head, I think you get approximately a 95% confidence interval of something like 0.83 to 0.89 (around the point estimate of 0.86), that is, 0.03 either side of 0.86, rather than 0.09 either side ... so that looks substantially better. But my arithmetic estimates in my head may be off a ways. You best check that arithmetic for that algebra. - You might look at the 90% confidence interval as well. That would be shorter ... just less 'confident.'
I checked my arithmetic that I rounded, and it was off. Maybe I plugged the numbers in incorrectly too.
So ...
n=pq/(d/t)^2 means d = t sqrt(pq/n)
Here is what the arithmetic should show when these numbers are "plugged in":
Using the calculator on my phone to do the arithmetic with p=0.86, q =0.14, t=1.96, and n=100, you get d=0.068, so rounding to 0.07, the 95% confidence interval is about 0.79 to 0.93.
I looked into doing the fpc factor, sqrt(678/778) which multiplied by 0.068 gives about 0.064, so considering that, the interval is closer to going from 0.80 to 0.92.
So, a 95% confidence interval is fairly long here, even when p=0.86.
I looked to see what I would get using p=q=0.5, like the online sample size estimation "calculator" used in this thread above, and got d=0.98, but when I applied the finite population correction factor (fpc), I got 0.0915, which is what that "calculator" had ... so that particular online "calculator" apparently does account for the fpc.
If you need 95% "confidence," then I doubt that an interval as wide as 0.80 to 0.92 is very satisfactory. If you look at the 90% confidence interval here, you have less "confidence," but the interval is shorter. You can find it by using t = 1.645 instead of 1.96 above.
I'm thinking that you probably should have a bigger sample size next time, if you can reasonably collect it.
.....
PS - Well, what do you know! I looked at that online calculator again, and it has a second window which does just what you want! Huh!! I did not expect that. - In the attached, I filled out what you had: n=100, N=778, p approximately 0.86, wanting a 95% confidence interval, and got 86% +/- 6%, as above. Well, it shows 6.35, not just 6, but the distribution is not completely normal, and you could have some other inaccuracy. By the way, it is really z=1.96, not quite t, though with these distributions they may be close. Perhaps this calculator even uses t instead of z.
At any rate, the 6.35 obtained from that online "calculator" is confirmed by the 6.4 that I found above in this response.
Using the calculator on the site you provided, with an estimate of 15% for a sample of 100 out of a population of 778, I got a confidence interval of 6.5% -- in other words there is a 95% chance that your actual percentage is between 8.5 and 21.5%.
That's good. Paula and others should be able to follow the way you laid it out, and get results not on that calculator, such as the one for a 90% confidence interval which I have a few responses back in this thread. Since Paula only gave two digits (0.86) here, I gave fewer "significant" digits, but other than that, we have the same results.
I was surprised that the online calculator included the fpc, but then, that isn't too hard. It however is still a problem with online calculators that they do not usually cover anything but simple random sampling and usually only for proportions, and often that is not clear to users. Many times on ResearchGate I have seen people using them, or trying to use them, like they were magic, applicable to everything. But this one at least gives some explanation. I barely glanced at it, but it does seem better than most.
Still, everyone, these calculators should not be used without the user knowing the assumptions/applicability. Not everything is a yes/no question. In fact, I hardly saw anything like that at all, and I'm retired now. I have seen another "formula" that many people try to use for all continuous data. This is a problem. The field of statistics is not just plug-and-play "formulas." You need to understand what you are doing. In particular, the reasons/methodologies.
Perhaps Paula should have stratified her population and done better, for example. That might have been possible, for all we know here.
You are right. People need to understand the principles behind those formulas. I did my calculations on an Excel sheet. I've never used an online calculator.
Determining the sample sizes involve resource and statistical issues. Usually, researchers regard 100 participants as the minimum sample size when the population is large. However, In most studies the sample size is determined effectively by two factors: (1) the nature of data analysis proposed and (2) estimated response rate.
For example, if you plan to use a linear regression a sample size of 50+ 8K is required, where K is the number of predictors. Some researchers believes it is desirable to have at least 10 respondents for each item being tested in a factor analysis, Further, up to 300 responses is not unusual for Likert scale development according to other researchers.
Another method of calculating the required sample size is using the Power and Sample size program (www.power-analysis.com).