Suppose that we have a unbalanced data-set for a binary classification problem and we want use 10-fold cross validation for training and testing fitted model.
* Is this correct that we only use sampling methods (under-sampling, over-sampling or SMOTE) in training data?
* If yes how we can implement this sampling methods for 10 fold cross-validation? We should re-sample minority class before cross-validation or we should use a new structure of k fold cross-validation?
* Anyway to implement sampling methods with sliding validation? (I'm working on a binary time series perdition - one step ahead, up-turn and down-tern of output of t+1 comparing to t - t is time).
* Is it not more appropriate to use these sampling methods separately for every year of data-set?
Number of up-turn and down-turn samples in every year:
Total Up Down
_____ ___ ____
2009 234 135 99
2010 243 153 90
2011 241 132 109
2012 240 133 107
2013 240 155 85
2014 241 110 131
2015 243 126 117
2016 29 24 5
All data 1711 968 743