It is written in some books such as in a book written by
Hanke and Wichern that the sample size for time series should be at least 50.
However, there is not a formula to determine the least sample size for time series. The important thing is to choose your sample points which might effect your time series trend correctly.
For example, if we are talking about the average weather temperature monthly. Entering the yearly information would be misleading. Considering long-term monthly information could be misleading too. Thus, increasing sample size in time series is not necessarily a good thing.
Thus, if you choose your sample points correctly, as long as your time series has a trend close to current known ones, it is still valuable.
Yes, I agree with at least 50 but let's see what major scholars have done. For example, none have used less 80 quarterly observations. Major refers to , for example, D.F. Hendry and Johansen and Juselius in econometrics area. (300 for monthly observations). Nevertheless, it is the responsibility of a researcher to increase the sample observation for central limit theorem sake. Where, this is not possible, one can equally indicate "data limitations" while reporting their findings. This allows the future researchers to continue the march of scientific inquiry.
Think of the sample size as the number of pixels in an image. More pixels allows a better identification of the features inside (what they are and where they are)
If you see the data as the Question and the analysis as the Answer the number of samples dictates the precision and/or the resolusion of the result. It may be a wish or it may be a must or it may be a threshold
If the analyse is merely statistic (trend, fit, ...) the precision is given by the dispersion sigma or 1/sqrt(N). For example if #samples=50 sigma=14%, and if #samples=80 sigma=11%. Most opinion polls state 0.3% error only because they were made from 1000 interviews..
If the analyse tries to reveal an underling parameter or property (like a frequency content in the spectrum space or a scale content in the wavelet space or some some Renyi dimension) the number of samples in the time domain equals the number of samples in the reciprocal space (or twice of them if the space is complex)
If you wish to be more specific I can be more elaborate.
I like Adrian's explanation. It is consistent with mine. Moreover, in line with central limit theorem. Nevertheless, Adrian's explanation communicates the idea effectively!
Several remarks on sample size in general and may be useful for the time series, are clarified below:
1. Research studies are usually carried out on sample of subjects rather than whole populations. The most challenging aspect of fieldwork is drawing a random sample from the target population to which the results of the study would be generalized.
2. The key to a good sample is that it has to be typical of the population from which it is drawn. When the information from a sample is not typical of that in the population in a systematic way, we say that error has occurred. In actual practice, the task is so difficult that several types of errors, i.e. sampling error, non-sampling error, Response error, Processing error,…
In addition, the most important error is the Sampling error, which is statistically defined as the error caused by observing a sample instead of the whole population. T
3. The underlying principle that must be followed if we are to have any hope of making inferences from a sample to a population is that the sample be representative of that population. A key way of achieving this is through the use of “randomization”. There several types of random samples, Some of which are: Simple Random Sampling, Stratified Random Sampling, Double-stage Random Sampling... Moreover, the most important sample is the simple random sample which is a sample selected in such a way that every possible sample of the same size is equally likely to be chosen. In order to reduce the sampling error, the simple random sample technique and a large sample size have to be developed.
5. The following factors are highly affected the sample size and need to be identified:
Population Size,
Margin of Error,
onfidence Level (level of significance) and
Standard of Deviation.
6. The Cochran formula allows you to calculate an ideal sample size given a desired level of precision, desired confidence level, and the estimated proportion of the attribute present in the population
7.Then, the sample size may be estimated by,
Necessary Sample Size = (z-score or t-value)2 * StdDev*(1-StdDev) / (margin of error)2 .
Thank you so much Prof. Zuhair by the way which text book do you recommend that i can site the above information, am a PhD candidate writing my proposal.
The other question I had was please Prof. Zuhair how is calculating sample size using the effect size and the power. Also how is Sample size related to Parametric statistics and non-parametric statistics? Thank you for your assistance