As,the voting method in Bagging ( ensemble based )classification relies on the subsets obtained by a bootstrap method. Is bagging method sensitive to the different bootstrap strategies used. If so, how to prove statistically?
the problem with non-vanilla bootstrap methods is that they introduce one or many parameters which have to be optimized for performance, which somewhat ruins the appealling simplicity of bagging
bagging is somewhat "heavy" by itself, at least with large datasets (you have to build and deploy a lot of models), if you have to include that in an even heavier CV loop for parameter setting, well ...
Hi, the usual bootstrap algorithm samples i.i.d. and sometime that's a problem. For instance when dealing with time series. As the usual bootstrap algorithm samples i.i.d. , so there is no serial autocorrelation (unlike what is observed in most time series). This makes the set of resampled time series very very different from the sorts of time series we actually get in the real world. This is the reason why for time series the block bootstrap is usually preferred to usual bootstrap. For instance, using blocks of 100 contiguous observations, and resample ten whole blocks with replacement then paste them together to construct each bootstrap time series is an example of block bootstrap.
Generally speaking, an average of B i.i.d. random variables, each with variance sigma2 , has variance sigma2/B. If the variables are simply i.d. (identically distributed, but not necessarily independent) with positive pairwise correlation rho, the variance of the average (averaging phase of bagging) is
rho*sigma2+sigma2*(1-rho)/B
As B increases, the second term disappears, but the first remains, and hence the size of the correlation of pairs of bagged trees limits the benefits of averaging.
For example, the idea in random forests is to improve the variance reduction of bagging by reducing the correlation between the trees, without increasing the variance too much. This is achieved in the tree-growing process through random selection of the input variables. Specifically, when growing a tree on a bootstrapped dataset: Before each split, select m p of the input variables at random as candidates for splitting.
You can find more details in
J. Friedman, T. Hastie, R. Tibshirani, The Elements of Statistical Learning, Springer, 2009 (http://web.stanford.edu/~hastie/local.ftp/Springer/OLD/ESLII_print4.pdf)