I'm needing some expert help here :) How does one identify a sample size (i.e. for a survey) if they are using non-probability sampling methods i.e. convenience or snow balling ? Many thanks in advance !
decision to choose among convince or snow balling depends on your nature of samples.
if you know your sample in term of amount, location then choose convince. If you don't know who is the appropriate people to answer your survey then choose snow balling sample.
The problem with most nonprobability sampling is that it can be heavily biased, and it will be hard to impossible (most often impossible) to be able to find a way to validate the accuracy of your results. Cochran and most other sampling books rely on probability of selection (design-based sampling) to avoid bias and to provide a way to estimate variance (bias-squared plus variance giving you the mean square error, the square root of which is a good measure of accuracy, though you need to pay attention to both sampling and nonsampling errors). But they mostly have unbiased estimators, or asymptotically unbiased, based on randomized sampling, and concentrate on estimating variance, from the randomized sampling, and leave nonsampling error out, as a separate problem, though they really need to be considered together, and have overlapping impact.
Cochran, pages 158-160 is, however, on estimation for my kind of nonprobability sampling, using regression modeling, and he does have chapter 13 on "Sources of Error in Surveys," but I think that is mostly with regard to situations where you start with a random sample, but have nonresponse and other problems.
But I do know that Monroe Sirken did some pioneering work in network sampling, which included some kind of variance estimate. I only heard of it, and do not know details, but I looked on the internet and saw the following paper which looks like it has a good reference list to get you going, if you are interested in this:
But generally speaking, nonprobability sampling is not reliable for inference, and we cannot even get a good idea as to how unreliable it may be.
The exception is when you have auxiliary data, say administrative data, or a past census or a census of related data, or something that covers part of a population and something that covers the remainder (Joel Douglas's "tiers"), which can be used for regressor data. If you can use regression modeling, you can do even better than design-based sampling and estimation. See, for example:
Särndal, CE, Swensson, B. and Wretman, J. (1992), Model Assisted Survey Sampling, Springer-Verlang, or
Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press.
In fact, in some situations, there are advantages to strictly model-based sampling, and I have papers on that on my ResearchGate profile, including an estimation of sample size requirements.
However, my guess is that you do not have access to auxiliary data on the population. So, you might want to look into whatever Monroe and his 'successors' have done, though I cannot really tell you what they did. I don't know. But I think that at least his kind of network sampling relates to snowball sampling, though I am not clear on that.
Whatever you do, I can think of just two other ideas for you to explore: (1) stratification, and (2) validation:
Stratified random sampling is often very useful, but even if you do not have randomized sampling, stratification can help. You could think of your sample as a census with a great deal of nonresponse. In that case you could look at "response propensity" groups, which are essentially strata, designed to reduce bias. And as always, anything you can do to test/validate your results would be good. You may need to get creative there.
Best wishes - Jim
PS - Thanks for showing this question to me H.E.
PSS - Cochran is a very good book, and I can give you other good references, but they have little if anything to do with convenience sampling, which is not often considered useful for inference, though Mike Brick's work may provide some ideas of exceptions to this.
Article Summary Report of the AAPOR Task Force on Non-probability Sampling
Sample size requirements for randomized sample selection and design-based inference are based on the standard deviation of the population (or in each stratum, if stratified). At the heart of sample size requirements for model-based methods you will find the standard deviation of the estimated random factors of the estimated residuals. In both cases, the standard deviation is represented by a sigma. Also, this applies to each question on a survey, and if a relatively unimportant question needs a large sample size to obtain a reasonable standard error estimate for a mean, then you may need to reconsider that question. If you try to collect more data than you can reasonably do with your resources, then you may increase nonsampling error, which will artificially increase your sigma.
For a convenience sample, I suppose you could keep sampling until your estimated mean for a critical result stops changing much. I saw something from the old US Bureau of mines, 1971, by Katherine Harding and Arthur Berger that did something related to that, I think. (Also, I have been thinking in terms of continuous data or even yes/no, but I suppose that would also apply to other data, say likert scales, with which I did not work. I guess a likert scale could be treated like a more complex proportion scheme.) However, just because you have a handle on variance, by sampling until results (hopefully) settle down, does not mean you do not have a problem with bias, which is a concern for convenience sampling, if you do not have regressor data. Stratification is your best hope for controlling bias for your convenience sample.
Hopefully you can find ways to validate your results, or at least indicate you are "in the ballpark."
I recommend you investigate the works of Brick, etal, and of Sirken noted above.