The aim of statistical testing is to uncover a significant difference when it actually exists. Sample size is important because a larger sample size increases the chance of finding a significant difference, but smaller sample size means less time to key in data (and less money when time means money).
Most of the time, I use all my students in my research, so my sample size may be 40 to 100, depending on whether I'm teaching a larger lecture class or smaller tutorial classes. Thanks.
The aim of statistical testing is to uncover a significant difference when it actually exists. Sample size is important because a larger sample size increases the chance of finding a significant difference, but smaller sample size means less time to key in data (and less money when time means money).
Most of the time, I use all my students in my research, so my sample size may be 40 to 100, depending on whether I'm teaching a larger lecture class or smaller tutorial classes. Thanks.
The dispersion of the student results for each combination of n and N nicely illustrates two central principles of sampling. First, the sample results depend on the specific random sample chosen and therefore are subject to sampling error. Second, the fact that there is a distribution of the sample results allows us to draw inferences from a single sample. In particular, the sample proportion is probably close to the population success probability. This observation sets up my promise that we will later see exactly how to quantify that probability—both theoretically and by using a lot more random samples than are currently on the board. Appendix IV shows a homework assignment that brings out these details.
Do you require more than 5 elephants and more than 5 mice to show they substantially differ in body size?
Are 5 elephants randomly sampled representative for the body size of the elephant population in the world? This will depend on the scale of analysis and the type of study you are involved? For instance, fossil samples representing a few specimen have been used to define new extinct species, or not?
A general rule is that, according to central limit theorem (or the normal distribution approximation), every sample with N>30 is supposed to behave like N-->infinity.
Dear ED, in relation to this Q, I often find that my large samples (n=200) aren't normally distributed. But when I have smaller samples (n=50), it's often normally distributed! So should I start using smaller samples like n=50; and use parametric tests that 'certain reviewers' may prefer? I used Mann Whitney when the data wasn't normally distributed. Have you experienced bias in attitude of reviewers towards parametric tests?
The key is not always the sample size, but how the population has been sampled. Is the sampling method adjusted to the question asked?
Example:
You want to quantify the macrogreographic variation of clutch size in a bird.
You can take 1000 measurements from one southern population and one measurement from one northern population, or you can take 5 measurements from each of these populations. Which sampling method will give the best picture of the scale of variation you want to study?
Thanks Marcel, Ed. No I don't do such an 'unbalanced sampling' as described by Marcel. (Equal numbers from a northern and southern population will surely be better.) My sample is always all the students that I teach because I only have permission to use my own students. Yes Ed, I learned a lot of things concerning stats on RG threads.
The definition of data distributions may depend on background knowledge available to analysers of data. For instance, an ornithologist observes that clutch size in a tropical avian model species varies between 4 and 8 eggs and obtains a bell-shaped frequency distribution when the scale of the X-axis is defined as 4, 5, 6, 7 and 8. However, a fish specialist may not use the same arguments for the definition of the X-axis used to present the distribution of the avian clutch sizes. Fish females usually lay between 10 and >1000 eggs. A fish biologist unfamiliar with values of clutch sizes in tropical birds might justify the definition of an X-axis that pools together clutch sizes varying between 1-10, 11-20, 21-30, etc. based on data from fish, poultry science or private observations. Domesticated chicken at home continuously produce eggs every 2-3 days, except during short-day winter periods. Why not defining an avian clutch size axis with values exceeding 100 eggs? Perhaps the fish biologist would have placed the clutch size data from the tropical bird in a single class (class 1-10) with an X-axis providing clutch size values between 0 and >100 eggs per clutch. Thus data distributions and subsequent statistical analyses of the same data set might vary depending on baseline knowledge available.