What is the minumum number of data required for nonlinear (chaotic) analysis?

18 August 2014 10 3K Report

I have 36 data was sampled through 30 days. I plotted the data it is not a linear. So, may I go through nonlinear analysis with this 36 data? it is enough for me??

K P N Murthy

There is technique given by Gao and Zen. Itmight work for small data

Rita Rueff-Lopes

Do you mean 36 observations? 36 subjects?

Evgeny Nikulchev

To define the chaotic systems need at least 1,000 observations. You can have only a single tool - a small samples for statistical analysis.

Paul Louangrath

ISSUE

What is minimum sample size?

SAMPLE SIZE IN GENERAL

Sample size determination may be classified into two scenarios: (i) known population (finite population, and (ii) unknown population (non-finite population). It does not matter if the data at hand is univariate, bivariate, or time series. The function of sample size is to guard against bias and to serve as a good surrogate for population study.

KNOWN POPULATION

If the population is finite, the common means for minimum sample size determination is the Yamane approach. The Yamane equation is given by:

(1) n(yamane) = N / 1 + Ne2

where N = population size, and e is the pre-specified alpha level, for instance alpha = 5% for 0.95 confidence interval. For example, if the population size is 20,000, alpha = 0.05, the minimum sample size is 20,000 / 1+ 20,000(0.052) = 392.16 or about 400. However, this method is not efficient, i.e. the sample size tends to be large. With a large population, this large sample size tends to be capped at 400.

UNKNOWN POPULATION SIZE

If the population size is nonfinite, the following equation may be used:

(2) n = [Z2(sig)2] / E2

where Z = unit normal distribution critical value, i.e. Z = 1.65, sig = estimated population standard deviation which may be obtained through the Z-equation (below), and E = standard error (SE). These two terms: Z and SE may be further defined thus:

(3) Z = (X^ - mu) / sig / sqrt(n)

Solve for mu first through the t-equation:

(4) T = (X^ - mu) / [S / sqrt(n)]

Solve for mu:

(5) mu = t(S/sqrt n)

Now put mu into the Z-equation and solve for sigma:

(6) sig = [(X^ - mu) / Z] sqrt(n)

Throughout this process n = initial sample or test sample. Now for SE, the term could be further be defined:

(7) SE = sig / sqrt(n)

In the present case, there are 36 observations, … let say tag these 36 items as x1, x2, …, x36. Now use a computer to generate random selection of 6 items. You should end up with 6 samples with 6 items each. These are your test samples: n1, n2, …,n6 in a bootastrap form. Use the descriptive and inferential statistics of these six initial (test) samples to calculate the minimum sample size. You should have a sample size that is “less than 400” as would have been suggested by the Yamane equation.

ALTERNATIVE METHOD

Break the non-finite sample equation: n = [Z2(sig)2] / E2 into two stages. In the first step, obtain the following:

(8) n(1) = Z(sig) / E

Note lose the “square). In the second step, use the actual nonfinite value:

(9) n(2) = [Z2(sig)2] / E2

Third step, find the difference between n(1) and n(2):

(10) n(1) – n(2)

Fourth step, obtain the median of the third step, thus:

(11) M = [n(1) – n(2)] / 2

Fifth step, put M into a bound between 0.1 and 100, i.e. pseudo-sampling space (omega), by dividing M/0.1 as the maximum and M/100 as the minimum, this:

(12) m* =(M/0.1 – M/100)

Sixth step, find the median of m*, thus:

(13) Omega = m* / 2

Lastly, the minimum sample size is simply the square root of omega, thus:

(14) n(omega) = sqrt(omega)

This number should be about 30 counts. A minimum sample size is about 30. For most circumstances in social science, this would be adequate.

VERIFICATION OF DATA DISTRIBUTION

Plotting the data is only a preliminary step to determine the data distribution. It is necessary to verify the data distribution through empirical test in order to determine what type of distribution does the data manifest. In so doing, it is suggested that the Anderson-Darling test be used. The Anderson-Darling test is given by:

(15) A = -n – S

where n = sample size, in this case n = 36, and S is given as:

(16) S = Sum(AB)

Where …

A = (2k – 1) / n

B = [ln(F(y)) – ln(1 – F(y))]

This value is called observed value or A(obs). It has to be compared with the standard critical value: (A*) which is given by:

(17) A* = A[1 + (0.752/n) + (2.25/n2)]

The decision rule follows:

H(o) = A < A* …… the data is normally distributed (linear model may apply)

H(a) = A > A* …… the data is not normally distributed (linear model may not apply)

CHAOS

This term "chaos" is still loosely defined in the field. The tasks at hand here are: (i) there are 36 observations, does this meet minimum sample size requirement? (ii) what kind of data distribution does these 36 counts manifest? (iii) what modeling approach would best fit the type of data verified by (ii). As for the issue of chaos, defer that for now. The chances are that you might be working with stock or commodity price movement (inferred from what you said: 30 days of data collection); therefore, it may be more fruitful to engage (i), (ii) and (iii). Chaos theory may come in at the discussion section of the paper.

REFERENCES

(1) Sample Size:

-Yamane, Taro (1967). Statistics: Introduction to Analysis. New York: Harper and Row. p. 886.

-Montgomery, C., Runger, G. C. and Hubele, N.F. (2001). Engineering Statistics, 2nd ed. John-Wiley, ISBN 0-471-38879-3. p. 172.

(2) Anderson-Darling Test:

-Anderson, T. W.; Darling, D. A. (1952). “Asymptotic theory of certain ‘goodness-of-fit’ criteria based on stochastic processes.” Annals of Mathematical Statistics 23: 193–212.

-Stephens, M. A. (1974). “EDF Statistics for Goodness of Fit and Some Comparisons.” Journal of the American Statistical Association 69: 730–737; and M. A. Stephens (1986). “Tests Based on EDF Statistics.” In D’Agostino, R.B. and Stephens, M.A. Goodness-of-Fit Techniques. New York: Marcel Dekker. ISBN 0-8247-7487-6.

Yair Zarmi

Unfortunately, to fully identify the fact that a system is chaotic, you have to collect data for an infinite range in time.

Sundarapandian Vaidyanathan

Before we look into any tests, can a system of finite data (whatever be the sample size) be called a 'chaotic system'?? This is a valid query made by Yair Zarmi and I welcome it.

Best wishes

Sundar

Luis Antonio Aguirre

The number of data depends on the analysis intended. If you would like to move in the direction of time series analysis, I agree with Dr. Nikulchev that 1000 is a starting point. In the paper: Aguirre, L.A., Billings, S.A., “Retrieving Dynamical Invariants from Chaotic Data Using NARMAX Models”, Int. J. Bifurcation and Chaos, 1995, 5(2):449–474. DOI: 10.1142/s0218127495000363. ISSN:0218-1274, Prof. Billings and I have argued that one way around is to estimate a model from a short data set (sometimes as short as 100 or 150 values is okay (this is for measured data too)) and from the estimated model you might produce a much longer series of data. Of course, the analysis from the long data set will NOT be any better than your model or, in other words, you are analyzing the model. However, if the model is ok, then you can do something. You can find an example (using about 150 values of the sunspot time series) in: Letellier, C., Aguirre, L.A., Maquet, J., Gilmore, R., “Evidence for low dimensional chaos in sunspot cycles”. Astronomy & Astrophysics, 449:379–387, 2006. DOI 10.1051/0004-636120053947. & Aguirre, L.A., Letellier, C., Maquet, J., “Forecasting the time series of sunspot numbers”. Solar Physics 249(1):103–120, 2008. DOI 10.1007/s11207-008-9160-5.

Yair Zarmi

The extent to which it is impossible to determine whether an observed system is chaotic is accentuated by the work of Calogeo and Levraz. They show that you can convert any dynamical system to a periodic system, which for a while looks like your "chaotic" system, but then, after a long time, repeats itself. These are called isochronous systems.

𝙽𝚘𝚛𝚋𝚎𝚛𝚝 𝙼𝚊𝚛𝚠𝚊𝚗

I completely agree with LA Aguirre. The answer depends on the used method but also on the research question.

For example, estimating the attractor dimension using the Grassberger–Procaccia algorithm requires time series of length N, where log N > D2/2 log(1/ρ) (ρ = S/ε is the fraction the recurrence neighborhood of size ε covers the entire phase space of diameter S). This mean, if you use ρ = 0.1 and a decimal logarithm, for finding D2 = 10 you should have at least N = 100,000 data points (Eckmann & Ruelle, Fundamental limitations for estimating dimensions and Lyapunov exponents in dynamical systems, Phys D 1998).

Another example is estimating Lyapunov exponents: in a rough estimate, Wolf claimed that a time series lengths should have a length N of at least 10D2 to 30D2 (with attractor dimension D2) (Wolf et al, Determining Lyapunov exponents from a time

series, Phys D 1985). E.g., a system with D2 = 3 requires 1,000–30,000 data points. Eckmann and Ruelle did a more exact consideration and found that the time series should have a length N satisfying the condition log N > D2 log(1/ρ) (Eckmann & Ruelle, Fundamental limitations for estimating dimensions and Lyapunov exponents in dynamical systems, Phys D 1998).

Thus, using such methods will not be possible for your very short data set.

To add to the list of fancy methods that can work with short time series: recurrence plots and recurrence quantification could be useful for short data (but have a look at N. Marwan, How to avoid potential pitfalls in recurrence plot based data analysis, IJBC 2011). But your data will probably be too short for this method too.

http://www.sciencedirect.com/science/article/pii/0167278985900119

http://www.worldscientific.com/doi/abs/10.1142/S0218127411029008

http://www.sciencedirect.com/science/article/pii/016727899290023G

Abdelkader Mohamed Elsayed

I agree with Luis Antonio Aguirre

Baseline drift in HPLC? What causes this?

How to increase simulation box size?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

I am trying to obtain microstructure for Mg-Zn-Sn alloy?

How to get Scopus Author Index ??

Dear researchers. pl help how to plot jablonski energy level graph and magnetic hysteresis curve in origin?

Do software tools exist to assess the economic and technical practicality of introducing new food products, such as yogurt with modified starch?

My nanoparticle has a lower fluorescence life time of 2 ns (usual life time between 3-10 ns). what are the inferences I can get from this?

Alternative binders other than Nafion solution?

Journal Report Impact Factor 2024?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

Is it possible to plot the atom-projected band structure using GPAW?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

Should I include H atom into C3N5 when i am doing DFT modelling?