According to Nunally in multiple regression modeling, for each variable (X), there should be at least 10 observations, i.e. for N variables there should be N*10 observations. See Nunally article "Psychometric Theory".
Sample size of about 30 is the standard practice.
You also can use this answer from researchgate by Paul Louangrath:
Once you have bias in hand (for your hopefully high quality data - re measurement and other nonsampling error considered), the driver to sample size is variance. You should look at estimates of variance. A larger sample will be needed to improve standard error. With high standard deviation, a large sample size is needed. If standard deviation is very low, you may get to sample sizes so low that you need to check the stability of results by adding or subtracting an observation or two.
But this all depends upon variance, so no general answer can really be given. (The use of n=30 as a general figure is similar to another problem, the widespread use of p=0.05 as a decision threshold in the use of hypothesis testing: it is very misleading and not based on the information needed to make a good decision.)
PS - I have dealt with regression/"prediction" and sampling for finite populations very much more than I have dealt with time series, but it occurs to me that you don't want to forget that with time series, you have to be wary of breaks in your series, which can greatly change everything. So if conditions have changed such that your modeling has needed to change, it may be terribly unhelpful to use data back before such a point in time, just to increase your sample size.
There is no rule carved in stone. Obviously you need k>N in order your parameters are identified and you model is not deterministic.
The number of observations will matter for inference. If your dependent variable follows a normal probability distribution conditioned on X, then even with very small number of observations your parameter estimates should be normally distributed.
But this is rarely the case. So then you need enough observations so that the CLT kicks in and makes all those test assumptions valid. And this really depends on the data. If you have nice, iid data, you can expect it to happen quickly, so N=20-30 should be all right (medical trials). If you have NID data, then the convergence to normal distribution maybe somewhat slower (almost everything in economics).
So with low number of observation you just fit a curve (in a bit l'art pour l'art fashion), but you cannot really make any meaningful inference or do hypothesis tests.
Also, the less observation you have, the higher relative weight will be given to outliers by any quadratic objective function, which may simply bias your results, and make it very sensitive to extreme observations. But this is a secondary problem, you can use LAD for example.