The choice between ARIMA and regression for times series models comes down to a few issues: ARIMA generally requires at least 50 data points but > 100 is preferred. It is also a rather complex model to estimate and the reliability between experts in determining the right model is very low. It is also limited to a single series, unless more complex models are pieced together. On the other hand, regression models require as few as 4 observations, the model specification and estimation are much more straightforward, and multiple series can be estimated within the same model.
I can provide supporting references for these statements if you need them...
You should ideally use historical values from all variables. Please check out my paper on using polynomial nonlinearities to build such a model: https://www.researchgate.net/publication/255909532_Sparse_model_from_optimal_nonuniform_embedding_of_time_series
Article Sparse model from optimal nonuniform embedding of time series
A multivariable ARIMA is a RegARIMA. You want to use the model that provides you with the most parsimonious encompassing of the theory and variance of the endogenous variable(s). I use an s to indicate that a VAR or VECM model might be the most appropriate under particular circumstances. State space models may easily outperform either of these models if you are concerned with latent variables.
The VAR and VECM will have finite sample bias, but so will the ARIMA or RegARIMA if you have to transform your series to obtain stationarity. Box and Jenkins recommended about 50-100 obs per ARIMA model, but this is insufficient if you have to use ADF test to determine whether the series is stationary. It will also be insufficient if you have to model seasonality or long-wave cycles. Under those circumstances you may need more than 250 observations to have the power to do the stationarity testing. That will vary with your signal to noise ratio.
If you use a VAR or VECM because the variables are often found in an ensemble that forecasts better when all of the set of series is analyzed together, you will need more than 150 obs to avoid finite sample bias.
If you are using a regression, the signal to noise ratio will depend on the collinearity among the explanatory variables. The greater the condition number or VIF, the more difficult it will be to partial out correlated effects. IN the event that you have regime-switching models, you will have to have enough observations with which to adequately model each regime. That may depend on whether you define your regimes,, as systems with different levels, variances, or auxiliary threshold parameters.
So it behoves you to determine whether you will have enough power to detect small-to-medium effect sizes with powers of at least .80 with your data. If the consequences of an error are more serious, you will need much larger sample sizes. The Q or signal/noise F ratio is usually determined by dividing the irregular or error variance into the variance of the other components. In this way you can assess whether you have significant cycles, trends, seasonals, or levels.
Actually my problem is some kind of budget equation In the form of
y(t) = f{ x(t), z(t), h(t), j(t)}
where all x, z, h and j are independent time series in favor of a variable. As soon as each variable can be represented as an ARIMA model. I wonder if it is beneficial to first take a regression between data of all varaibles to get coeficeints of a regression and next replace each variable with it's ARIMA model in regression. Actually I prefer it because any further scenarios can be represented in the form of coefients.
where a, b, c, d and e coefficeints will be calculated from regression procedure and each X, Z, H and T variables will be replaced by an ARIMA. Time series model.
the goal is to drive an equation which each of a, b, c and d coefiecients can be multiplied by another coeficient in scenario generation
Babak Vaheddoost i wanted to know if i can perform an arima model with my data having just date period (monthly) with other variable i.e weight so as to predict for the following months. Thank You
Sebastian Asuka , if I understand correctly you are trying to model a single time series of something which is recorded in monthly intervals. For example, the monthly time series of average wind speed, streamflow, sunny hours, etc. am I, right?
- If the answer is yes. I must tell you that you can do it but you have to be careful about physical relevance. I mean, you have to be careful not to do a meaningless analysis.