Good Evening, I'm doing my thesis in palinology and I was told that we always measure 30 pollen grains per species, but I can't find the reason for this sample size anywhere. Can someone help me with this question? Thank you.
This indeed partly belongs to folklore for some. The number of measurements (n) needed to approximate a particular mean value should be determined (but see statistical handbooks) by investigating the variability/distribution of your particular population. This will be (slightly) different for each plant. Are your variables normally distributed, etc.? However, for practical reasons measurements are often limited to n=30 in biology, one of the reasons being that you often have a limited number (not even 30) pollen to measure (e.g. from old herbarium specimens), and/or that you want to have a fixed and equal "n" for all plants being studied.
Clearly "30" is not a scientific "rule" or "law"...
The point to note is that there will be a limit on the number of plants to use beyond which the gain in accuracy may not be worth the effort and cost. That limit is about 30 plants.
The (a) "Central Limit Theorem" states that the distribution of the arithmetic mean m of n independent and identically distributed variables X1, X2, ...,Xn converges (for some meaning of the word "convergence" I won't discuss here) towards a Laplace-Gauss distribution with mean equal to the mean mu of the distribution common to the individual variables X1,..., Xn and variance equal to variance sigma2 of the distribution of individual variables divided by n (i. e. sigma2/n) when n increases to infinity (as long as mu and sigma exist).
[ Note that this suppose mu and sigma known. They are usually not known : all that we can do about this is to estimate them by m and s2 respectively. ]
Another well known result is that the quotient sqrt(n-)(m-mu)/s has a distribution that converges to a Studen's t distribution with n-1 degree of freedom (with conditions similar to those needed to guarantee convergence to normality in the CLT : independence, identity of distributions, existence of mu and sigma).
For n=30, the tn-1 distribution is practically indiscernible from the normal distribution (as long as you keep to the densities themselves ; but a look at a plot of the logs of the densities of a normal and of a t29 will tell you that not the end of the story ; this important for computational reason : think MCMC, for example...). This piece of statistical folklore is with us at least since Fisher , and probably well before (Gosset (Student)'s paper is dated 1908).
This is the basis of an extremely useful and well-used (overused ?) approximation attributing to means of "large" (n>=30) samples a normal (Laplace-Gauss) distribution, which enshrined the "n=30" as a "magical value", surpassed only in holiness by the even more magical "p=0.05' (for which Fisher didn't bother to give a rationale) in the statistical folklore...
This indeed partly belongs to folklore for some. The number of measurements (n) needed to approximate a particular mean value should be determined (but see statistical handbooks) by investigating the variability/distribution of your particular population. This will be (slightly) different for each plant. Are your variables normally distributed, etc.? However, for practical reasons measurements are often limited to n=30 in biology, one of the reasons being that you often have a limited number (not even 30) pollen to measure (e.g. from old herbarium specimens), and/or that you want to have a fixed and equal "n" for all plants being studied.
Clearly "30" is not a scientific "rule" or "law"...
Measurements were performed to obtain the value of the correct size . it needs a lot of samples to be measured , there is no basis that require as many as 30 samples . was done to get the right size of a sample size varies , for example pollen grains . The important thing to note is that for the accuracy of the estimation
There is no such fixed number for species. It is meaningless. You can take any number of pollen grains but only thing is that the sample size should be statistically meaningful.
You are recorded for the data more than 30 readings it will help for the t-test statistically ; if you have less than 30 reading you can do only in f-test. so, your recording are more accuracy for the calculation in your experiments. It may give very good conclusion about your research works.
Kalidass: It is not true that you cannot do a t-test with samples of less than 30. Furthermore, an F test is a t-test when comparing two groups (t-squared = F).
The use of n = 30 as a break point for a small v large sample is an over-simplification, just as is using 0.05 for a 'significance' level in hypothesis testing. In the latter case, the fact that a p-value is a function of sample size and requires a power analysis or other sensitivity analysis to begin to analyze what your data might support, means that you should not use any one particular level for all problems. Unfortunately, people apparently noticed over time that 0.05 may turn out to be a reasonable level in many cases, and the original formulation of the problem was blurred. (You should do a power analysis or better, a confidence interval when appropriate. That would be more usefully interpretable.) Similarly with setting n = 30 as a breaking point for a small sample size. One might say that often that is a good number, but it has been highly overused, misused, and in particular, it seems to have proliferated to different kinds of problems in different disciplines, where in some cases it may generally be too small, or even too large.
Setting n = 30 as the limit of a small sample is an over-simplified "rule-of-thumb" which ignores the real basis for making such a decision. If you can ignore bias and nonsampling error (say, measurement error), though it is often a bad idea to ignore them, you still have to consider variance in your population, so let's consider that (as a minimal consideration).
The greater the population variance, the greater the sample size needed. To understand that clearly, consider the hypothetical situation that you have a population with a variable, y, such that every value of y is the same. It would not take a large sample before you realized that a good deal more sampling is going to likely be unnecessary. So maybe n=10 could be considered to be large. On the other hand, if there is a lot of variance, this would not be the case.
Often it may be useful to obtain a few observations as a pilot test, to estimate what sample sizes would then be considered large or small for your case.
I worked with continuous data, using model-based estimation, and developed a method for estimating sample size needs (one of my featured publications), where the formula for determining that is in remarkably close to the same format as found for simple random sampling (say, in Cochran (1977), Sampling Techniques). We need an estimate of sigma to estimate what is a reasonable sample size. (In finite population statistics, this is complicated slightly by the need for a finite population correction (fpc) factor. Also, there are considerations such as stratification.)
But the main thing you might want to consider is your population variance - and even the variance of the variance. That is, even if you have almost no variance in your population, it will take a large enough sample size to show that is the case. And there can always be a few extra-large or extra-small cases in a highly skewed distribution such as the ones with which I have dealt.
I do not pretend to understand your subject matter or kind of data collected, but it is rather universal that what should be considered a small sample will vary from application to application, and it is based on factors not the least of which is population variance.
You could start a pilot study and see how far you need to go before making your final sampling plan.
Interesting discussion. The key criterion to choose a sample size n (=30 here) needs to be the inherent variability on the population. In many situations, as probably here, this knowledge is not available or difficult to get. The rule of 30 is then often used for reasons already highlighted in the previous posts.