One of the best approaches to estimate sample size is the use of power calculation as stated by Cohen (1988). the small, medium, and large, refer to the the effect size between the variables under study. accordingly one will need large sample for small effect size, moderate sample for moderate effect, and small sample for large effect size.
I observe that the terms of large and small sample size refer to the convergence of sampling distribution of a statistic to a particular probability distribution. Accordingly, no definite value of the size of a large or a small sample. It depends on the statistical tool used and the complexity of your model.
For example for a very simple model like a mean (under normal assumption), the distribution of t-statistic, which is the t-Student distribution, "converges" (close) to the standard normal distribution at sample size larger than 30. Hence for the case of a mean testing based on a sample size of 30 or larger, we can use standard normal distribution. Often we call it as a large sample case.
Another example is in structural equation modeling, which may accommodate a very complex model, there are several rules of thumb concerning a large sample size. In the case of maximum likelihood approach, one says a sample size is large if it is 5 time the number of observed variables or larger. the other one says that it is 10 times. In the case of weighted least square approach, 800 or 1000 is considered to be large.
Haven't heard of very large sample size. Probably because it does not make any sense as the purpose of a sample is to make inferences. Small samples are under 30 as this is the threshold above which on can use the normal distribution to make statistical inferences. Also do not confuse the notion of sample with the minimum observation requirements for regression as in the post above.
I think Adrian is right that this differentiation cannot be found in the literature. But I agree with Yusep: 30 can be seen as a lower bound threshold for samples.
From my observation sample sizes between 1.000 and 5.000 are taken as "large" for general population surveys (e.g. General Social Surveys in the United States or Germany). These samples can be taken for inferential statistics. "Very large" sample would be e.g. the annual German Micro Census which covers 1% of all German households. That means it comprises of about 500.000 cases (households) with about 800.000 people. This sample size has the advantage that it also allows for inferential statistics for sub-sets of sample. This can usually not be done with "large" samples.
Take for example an employment survey. You can - with the appropriate care - infer from the income distribution in the sample to the income distribution in the population. (This has been shown as a valid step in a number of countries. Regular censuses are taken as benchmarks.) Now, if you want to analyse let's say the income distribution in service jobs in the tourism sector you need a "very large" sample or a "large" sample from all employees in this sector.
Link to description German Micro Census with information on sampling quality:
Hi Yusep, I agree with the previous answers, in terms of there being no definition for these things and the size being very dependent on your purpose. A statistical power analysis ideally should determine your sample size, but we don't always have that luxury and have to work with the data we've go. The only thing I wanted to add is that although a very large data set, let's say thousands of cases may seem like a real luxury, one has to be careful because if we adhere to the commonly accepted critical p-values, of .05 or .01 we might find a ton of such results with large data sets but associated with very small effect sizes that might have little importance in the real world. There are many reasons we all should look behind mindless use of p-values, but working with very large data sets is another reminder.
I agree with Amir that there is no exact definition of "small sample" or "large sample". It depends on nature of the study. For example, if one is working for mean (average) then 30 or more is considered as large sample, z variable is used assuming distributions normal or approximately normal. . And less than 30 is considered as small sample and t variable is used.
One of the best approaches to estimate sample size is the use of power calculation as stated by Cohen (1988). the small, medium, and large, refer to the the effect size between the variables under study. accordingly one will need large sample for small effect size, moderate sample for moderate effect, and small sample for large effect size.
Determining the sample sizes involve resource and statistical issues. Usually, researchers regard 100 participants as the minimum sample size when the population is large. However, In most studies the sample size is determined effectively by two factors: (1) the nature of data analysis proposed and (2) estimated response rate.
For example, if you plan to use a linear regression a sample size of 50+ 8K is required, where K is the number of predictors. Some researchers believes it is desirable to have at least 10 respondents for each item being tested in a factor analysis, Further, up to 300 responses is not unusual for Likert scale development according to other researchers.
Another method of calculating the required sample size is using the Power and Sample size program (www.power-analysis.com).
In order to answer this question/problem, several remarks have to be studied.
1. General remarks:
Research studies are usually carried out on sample of subjects rather than whole populations. The most challenging aspect of fieldwork is drawing a random sample from the target population to which the results of the study would be generalized. The key to a good sample is that it has to be typical of the population from which it is drawn. When the information from a sample is not typical of that in the population in a systematic way, we say that error has occurred. In actual practice, the task is so difficult that several types of errors, i.e. sampling error, non-sampling error, Response error, Processing error,…
In addition, the most important error is the Sampling error, which is statistically defined as the error caused by observing a sample instead of the whole population. The underlying principle that must be followed if we are to have any hope of making inferences from a sample to a population is that the sample be representative of that population. A key way of achieving this is through the use of “randomization”. There several types of random samples, Some of which are: Simple Random Sampling, Stratified Random Sampling, Double-stage Random Sampling... Moreover, the most important sample is the simple random sample which is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen. In order to reduce the sampling error, the simple random sample technique and a large sample size have to be developed.
2. Specific remarks:
The following factors are highly affected the sample size and need to be identified:
Population Size,
Margin of Error,
· Confidence Level (level of significance) and
Standard of Deviation.
Then, the sample size can be estimated by,
Necessary Sample Size = (z-score or t-value)2 * StdDev*(1-StdDev) / (margin of error)2 .