Suppose that we have a unbalanced data-set for a binary classification problem and we want use 10-fold cross validation for training and testing fitted model.

* Is this correct that we only use sampling methods (under-sampling, over-sampling or SMOTE) in training data?

* If yes how we can implement this sampling methods for 10 fold cross-validation? We should re-sample minority class before cross-validation or we should use a new structure of k fold cross-validation?

* Anyway to implement sampling methods with sliding validation? (I'm working on a binary time series perdition - one step ahead, up-turn and down-tern of output of t+1 comparing to t - t is time).

* Is it not more appropriate to use these sampling methods separately for every year of data-set?

Number of up-turn and down-turn samples in every year:

           Total  Up Down

          _____ ___ ____

2009     234   135  99

2010     243   153  90

2011     241   132  109

2012     240   133  107

2013     240   155  85

2014     241   110  131

2015     243   126 117

2016     29     24   5

All data 1711 968  743

Similar questions and discussions