Can I operate regression analysis of time series data less than 30?

Hi,

According to Nunally in multiple regression modeling, for each variable (X), there should be at least 10 observations, i.e. for N variables there should be N*10 observations. See Nunally article "Psychometric Theory".

Sample size of about 30 is the standard practice.

You also can use this answer from researchgate by Paul Louangrath:

https://www.researchgate.net/post/What_is_the_minimum_number_of_observations_required_to_estimate_Panel_data

Shahjahan Ali

Thank you very much.

James R Knaub

Once you have bias in hand (for your hopefully high quality data - re measurement and other nonsampling error considered), the driver to sample size is variance. You should look at estimates of variance. A larger sample will be needed to improve standard error. With high standard deviation, a large sample size is needed. If standard deviation is very low, you may get to sample sizes so low that you need to check the stability of results by adding or subtracting an observation or two.

But this all depends upon variance, so no general answer can really be given. (The use of n=30 as a general figure is similar to another problem, the widespread use of p=0.05 as a decision threshold in the use of hypothesis testing: it is very misleading and not based on the information needed to make a good decision.)

James R Knaub

PS - I have dealt with regression/"prediction" and sampling for finite populations very much more than I have dealt with time series, but it occurs to me that you don't want to forget that with time series, you have to be wary of breaks in your series, which can greatly change everything. So if conditions have changed such that your modeling has needed to change, it may be terribly unhelpful to use data back before such a point in time, just to increase your sample size.

Ehsan Rasoulinezhad

It depends on your research topic and variables. But generally the sample of 30 is used in a regression model.

Shahjahan Ali

Thanks to all.

Aris Ananta

The problem of a small number of observations gets worse as the number of independent variables increases

Peter Foldvari

There is no rule carved in stone. Obviously you need k>N in order your parameters are identified and you model is not deterministic.

The number of observations will matter for inference. If your dependent variable follows a normal probability distribution conditioned on X, then even with very small number of observations your parameter estimates should be normally distributed.

But this is rarely the case. So then you need enough observations so that the CLT kicks in and makes all those test assumptions valid. And this really depends on the data. If you have nice, iid data, you can expect it to happen quickly, so N=20-30 should be all right (medical trials). If you have NID data, then the convergence to normal distribution maybe somewhat slower (almost everything in economics).

So with low number of observation you just fit a curve (in a bit l'art pour l'art fashion), but you cannot really make any meaningful inference or do hypothesis tests.

Also, the less observation you have, the higher relative weight will be given to outliers by any quadratic objective function, which may simply bias your results, and make it very sensitive to extreme observations. But this is a secondary problem, you can use LAD for example.

Shahjahan Ali

Thank you so much dear Peter Foldvari . Your suggestion will enrich my knowledge.

Sandor Kovács

Use this site for calculation of the statistical power.

http://www.statskingdom.com/33test_power_regression.html

I think you could use less than 30 datapoints if statistical power remains over 70-80%

Is there any quantative model of ethics which describe any economic phenomenon?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?