I have 36 data was sampled through 30 days. I plotted the data it is not a linear. So, may I go through nonlinear analysis with this 36 data? it is enough for me??
Sample size determination may be classified into two scenarios: (i) known population (finite population, and (ii) unknown population (non-finite population). It does not matter if the data at hand is univariate, bivariate, or time series. The function of sample size is to guard against bias and to serve as a good surrogate for population study.
KNOWN POPULATION
If the population is finite, the common means for minimum sample size determination is the Yamane approach. The Yamane equation is given by:
(1) n(yamane) = N / 1 + Ne2
where N = population size, and e is the pre-specified alpha level, for instance alpha = 5% for 0.95 confidence interval. For example, if the population size is 20,000, alpha = 0.05, the minimum sample size is 20,000 / 1+ 20,000(0.052) = 392.16 or about 400. However, this method is not efficient, i.e. the sample size tends to be large. With a large population, this large sample size tends to be capped at 400.
UNKNOWN POPULATION SIZE
If the population size is nonfinite, the following equation may be used:
(2) n = [Z2(sig)2] / E2
where Z = unit normal distribution critical value, i.e. Z = 1.65, sig = estimated population standard deviation which may be obtained through the Z-equation (below), and E = standard error (SE). These two terms: Z and SE may be further defined thus:
(3) Z = (X^ - mu) / sig / sqrt(n)
Solve for mu first through the t-equation:
(4) T = (X^ - mu) / [S / sqrt(n)]
Solve for mu:
(5) mu = t(S/sqrt n)
Now put mu into the Z-equation and solve for sigma:
(6) sig = [(X^ - mu) / Z] sqrt(n)
Throughout this process n = initial sample or test sample. Now for SE, the term could be further be defined:
(7) SE = sig / sqrt(n)
In the present case, there are 36 observations, … let say tag these 36 items as x1, x2, …, x36. Now use a computer to generate random selection of 6 items. You should end up with 6 samples with 6 items each. These are your test samples: n1, n2, …,n6 in a bootastrap form. Use the descriptive and inferential statistics of these six initial (test) samples to calculate the minimum sample size. You should have a sample size that is “less than 400” as would have been suggested by the Yamane equation.
ALTERNATIVE METHOD
Break the non-finite sample equation: n = [Z2(sig)2] / E2 into two stages. In the first step, obtain the following:
(8) n(1) = Z(sig) / E
Note lose the “square). In the second step, use the actual nonfinite value:
(9) n(2) = [Z2(sig)2] / E2
Third step, find the difference between n(1) and n(2):
(10) n(1) – n(2)
Fourth step, obtain the median of the third step, thus:
(11) M = [n(1) – n(2)] / 2
Fifth step, put M into a bound between 0.1 and 100, i.e. pseudo-sampling space (omega), by dividing M/0.1 as the maximum and M/100 as the minimum, this:
(12) m* =(M/0.1 – M/100)
Sixth step, find the median of m*, thus:
(13) Omega = m* / 2
Lastly, the minimum sample size is simply the square root of omega, thus:
(14) n(omega) = sqrt(omega)
This number should be about 30 counts. A minimum sample size is about 30. For most circumstances in social science, this would be adequate.
VERIFICATION OF DATA DISTRIBUTION
Plotting the data is only a preliminary step to determine the data distribution. It is necessary to verify the data distribution through empirical test in order to determine what type of distribution does the data manifest. In so doing, it is suggested that the Anderson-Darling test be used. The Anderson-Darling test is given by:
(15) A = -n – S
where n = sample size, in this case n = 36, and S is given as:
(16) S = Sum(AB)
Where …
A = (2k – 1) / n
B = [ln(F(y)) – ln(1 – F(y))]
This value is called observed value or A(obs). It has to be compared with the standard critical value: (A*) which is given by:
(17) A* = A[1 + (0.752/n) + (2.25/n2)]
The decision rule follows:
H(o) = A < A* …… the data is normally distributed (linear model may apply)
H(a) = A > A* …… the data is not normally distributed (linear model may not apply)
CHAOS
This term "chaos" is still loosely defined in the field. The tasks at hand here are: (i) there are 36 observations, does this meet minimum sample size requirement? (ii) what kind of data distribution does these 36 counts manifest? (iii) what modeling approach would best fit the type of data verified by (ii). As for the issue of chaos, defer that for now. The chances are that you might be working with stock or commodity price movement (inferred from what you said: 30 days of data collection); therefore, it may be more fruitful to engage (i), (ii) and (iii). Chaos theory may come in at the discussion section of the paper.
REFERENCES
(1) Sample Size:
-Yamane, Taro (1967). Statistics: Introduction to Analysis. New York: Harper and Row. p. 886.
-Montgomery, C., Runger, G. C. and Hubele, N.F. (2001). Engineering Statistics, 2nd ed. John-Wiley, ISBN 0-471-38879-3. p. 172.
(2) Anderson-Darling Test:
-Anderson, T. W.; Darling, D. A. (1952). “Asymptotic theory of certain ‘goodness-of-fit’ criteria based on stochastic processes.” Annals of Mathematical Statistics 23: 193–212.
-Stephens, M. A. (1974). “EDF Statistics for Goodness of Fit and Some Comparisons.” Journal of the American Statistical Association 69: 730–737; and M. A. Stephens (1986). “Tests Based on EDF Statistics.” In D’Agostino, R.B. and Stephens, M.A. Goodness-of-Fit Techniques. New York: Marcel Dekker. ISBN 0-8247-7487-6.
Before we look into any tests, can a system of finite data (whatever be the sample size) be called a 'chaotic system'?? This is a valid query made by Yair Zarmi and I welcome it.
The number of data depends on the analysis intended. If you would like to move in the direction of time series analysis, I agree with Dr. Nikulchev that 1000 is a starting point. In the paper: Aguirre, L.A., Billings, S.A., “Retrieving Dynamical Invariants from Chaotic Data Using NARMAX Models”, Int. J. Bifurcation and Chaos, 1995, 5(2):449–474. DOI: 10.1142/s0218127495000363. ISSN:0218-1274, Prof. Billings and I have argued that one way around is to estimate a model from a short data set (sometimes as short as 100 or 150 values is okay (this is for measured data too)) and from the estimated model you might produce a much longer series of data. Of course, the analysis from the long data set will NOT be any better than your model or, in other words, you are analyzing the model. However, if the model is ok, then you can do something. You can find an example (using about 150 values of the sunspot time series) in: Letellier, C., Aguirre, L.A., Maquet, J., Gilmore, R., “Evidence for low dimensional chaos in sunspot cycles”. Astronomy & Astrophysics, 449:379–387, 2006. DOI 10.1051/0004-636120053947. & Aguirre, L.A., Letellier, C., Maquet, J., “Forecasting the time series of sunspot numbers”. Solar Physics 249(1):103–120, 2008. DOI 10.1007/s11207-008-9160-5.
The extent to which it is impossible to determine whether an observed system is chaotic is accentuated by the work of Calogeo and Levraz. They show that you can convert any dynamical system to a periodic system, which for a while looks like your "chaotic" system, but then, after a long time, repeats itself. These are called isochronous systems.
I completely agree with LA Aguirre. The answer depends on the used method but also on the research question.
For example, estimating the attractor dimension using the Grassberger–Procaccia algorithm requires time series of length N, where log N > D2/2 log(1/ρ) (ρ = S/ε is the fraction the recurrence neighborhood of size ε covers the entire phase space of diameter S). This mean, if you use ρ = 0.1 and a decimal logarithm, for finding D2 = 10 you should have at least N = 100,000 data points (Eckmann & Ruelle, Fundamental limitations for estimating dimensions and Lyapunov exponents in dynamical systems, Phys D 1998).
Another example is estimating Lyapunov exponents: in a rough estimate, Wolf claimed that a time series lengths should have a length N of at least 10D2 to 30D2 (with attractor dimension D2) (Wolf et al, Determining Lyapunov exponents from a time
series, Phys D 1985). E.g., a system with D2 = 3 requires 1,000–30,000 data points. Eckmann and Ruelle did a more exact consideration and found that the time series should have a length N satisfying the condition log N > D2 log(1/ρ) (Eckmann & Ruelle, Fundamental limitations for estimating dimensions and Lyapunov exponents in dynamical systems, Phys D 1998).
Thus, using such methods will not be possible for your very short data set.
To add to the list of fancy methods that can work with short time series: recurrence plots and recurrence quantification could be useful for short data (but have a look at N. Marwan, How to avoid potential pitfalls in recurrence plot based data analysis, IJBC 2011). But your data will probably be too short for this method too.