Is bagging classification method sensitive to bootstrap samples used?

it is, no wonder ; see attached paper

the problem with non-vanilla bootstrap methods is that they introduce one or many parameters which have to be optimized for performance, which somewhat ruins the appealling simplicity of bagging

bagging is somewhat "heavy" by itself, at least with large datasets (you have to build and deploy a lot of models), if you have to include that in an even heavier CV loop for parameter setting, well ...

Gino Tesei

Hi, the usual bootstrap algorithm samples i.i.d. and sometime that's a problem. For instance when dealing with time series. As the usual bootstrap algorithm samples i.i.d. , so there is no serial autocorrelation (unlike what is observed in most time series). This makes the set of resampled time series very very different from the sorts of time series we actually get in the real world. This is the reason why for time series the block bootstrap is usually preferred to usual bootstrap. For instance, using blocks of 100 contiguous observations, and resample ten whole blocks with replacement then paste them together to construct each bootstrap time series is an example of block bootstrap.

Generally speaking, an average of B i.i.d. random variables, each with variance sigma2 , has variance sigma2/B. If the variables are simply i.d. (identically distributed, but not necessarily independent) with positive pairwise correlation rho, the variance of the average (averaging phase of bagging) is

rho*sigma2+sigma2*(1-rho)/B

As B increases, the second term disappears, but the first remains, and hence the size of the correlation of pairs of bagged trees limits the benefits of averaging.

For example, the idea in random forests is to improve the variance reduction of bagging by reducing the correlation between the trees, without increasing the variance too much. This is achieved in the tree-growing process through random selection of the input variables. Specifically, when growing a tree on a bootstrapped dataset: Before each split, select m p of the input variables at random as candidates for splitting.

You can find more details in

J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, Springer, 2009 (http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf)

Comparing a proposed classification methods to other existing methods?

How to plot an ellipsis of a original data set to show the missing ones after bootstrap sampling?

How can I set the value of K In K-fold cross validation?

Any advice on the hybrid combination of PCA + SVM for binary classification?

Is it possible to calculate area under ROC curve from confusion matrix values?

Bootstrapping in SEM Amos ?

What are the current challenges and future prospects of integrating artificial intelligence into recognition systems for autonomous vehicles?

Help me download paper?

List of journals impact factors?

What is the difference between opportunity recognition in entrepreneurship literature and sensing in dynamic capabilities theory?

I am working on a network for facial expretion recognition and I have problem with the loss function can anyone help?

Is the pure phonemic content related to emotional valence?

Is it really worthy to have "Recognition Certificate" from unknown and unverified source?

How does the application of (GANs) for data augmentation impact the robustness and accuracy of image classification models?

How should I analyze this mediating model?