Hi. I have a panel data (daily) for 4 years and i want to split it into train and test sets. Many paper said: 80:20, some 70:30. Is there a specific criteria ? any study ? thank you
Generally 70:30 splitting criteria is used for training and testing dataset. 70:30 criteria considers more data rather than 80:20 in order to get more accurate results from fitted model.
I have been working on macro forecasting for many years and did not find any paper on this matter. Maybe there are some recent papers during the last 8 years indirectly addressing this issue, which is actually related to structural breaks. The 70:30 consensus does not consider the existence of structural breaks in the sample. To illustrate this problem, suppose there is just one such a break 'dated' around the 35th observation. Then you have to split a 100-point sample differently, otherwise your estimation sample would be {36:70} and your evaluation sample, {71:100}, and such an splitting is not 70:30 anymore, besides the fitted model's accuracy. Now, suppose you got a 100-point sample of monthly observations, WITHOUT any break therein. We know that in-sample accuracy does not warrant you out-of-sample accuracy, and the latter criteria is usually a good performace measure. For the task of evaluating many forecasting models, it makes sense to discover which model uses the smallest estimation sample while maximizing out-of-sample 12-month-ahead accuracy, say. Take some time to think about this.
There is no definitive reason for a particular split. In practice, one sees splits of 80:20, 70:30, and even 90:10. However, in machine learning, there are generally three groups: a training set, a validation set, and a test set. For example, it both a training and validation set are chosen, during training the model is compared to its application to the validation set as an integrated epoch in the training. This is often helpful for detecting a model that is overfitted. The test set is not used, until the training is completed.
Generally, 70:30 % criterion is for splitting to get accurate results. As we consider more samples for training then the goodness of fit of the model is excellent.