I am using bagging for machine learning classification, and would like to know how to decide what is the optimal number of ensembles. I would also like to take into account the number of positive and negative training cases used for each ensemble. I assume that an even distribution of training cases is optimal and I have a 20P/80N skew in my target variable.

Example:

Training set size: 3000 cases

Positive: cases: 100

Negative cases: 2900

My procedure then does the following with replacement:

For each ensemble(

100 negative cases vs. 100 positive cases

Predict on a validation set

Given this procedure, how can I then optimally decide on number of ensembles and also number of positive and negative cases that goes into each ensemble?

More Kasper Christensen's questions See All
Similar questions and discussions