What reasons would you have (and not have) for using boot strapping to estimate confidence intervals of independent variables in a logistic model of a randomized sample of the population?
Sometimes I have found bootstrapping to be of real practical value. If the results from asymptotic methods differ largely from results of bootstrapping, this indicates problem in the data. There might outliers in continuous data or very skewed distribution in categorical data or something else, but always something that needs to be understood. Examining the reason for this difference might lead you to valuable insight into the problem you analyze.
Heteroskedasticity and other violations like collinearity or not including important variables - but there is something else - isn't boostrapping for improving the estimates of effect and narrowing the confidence intervals - in order to avoid Type I error, declaring a difference when none actually exist. However, logistic modeling in public health the major concern is a Type II error, missing significant differences when they actually exist.
Bootstrapping will only produce narrower confidence intervals when data do not met the model assumptions. You can try this using simulation of a simple model.
About Type I and Type II errors, keeping the sample size unchanged, every time you reduce Type I, you increase Type II and vice-versa. To decrease both simultaneously, you will need bigger samples, which is always the major problem when dealing with human subjects...
Bootstrapping provides a nonparametric approach to statistical inference when distributional assumptions may not be met. More accurate standard errors, confidence
intervals, and even hypothesis tests for more complex samples can be derived using these methods.
Why would you not use boot strapping with a large sample size? What if you were just interested in the total variance explained between models not the estimates of the independent variables.
Reducing error variance or bias due to violation of distributional assumptions in logistic models should improve the ability to detect associations if they exist.
When dealing with data designs where the experimental sample units can be limiting such as life human subjects, it would be appropriate to invoke bootstrapping especially in logistic regression models to accurately test confidence intervals,standard errors and tests of hypotheses.
I is usually useful in non-parametric tests (distribution free samples)
It is very important to be careful with the idea to use bootstrap and other resampling based models when you have limited sample size. Although it is usefull in this situation, if you try to use it with very limited sample size you can end resampling from a non representative data and will obtain non representative confidence intervals and estimates.
No method is capable to save bad data! Some people thinks that statistics can solve every problem... Garbage in, garbage out!
Sometimes I have found bootstrapping to be of real practical value. If the results from asymptotic methods differ largely from results of bootstrapping, this indicates problem in the data. There might outliers in continuous data or very skewed distribution in categorical data or something else, but always something that needs to be understood. Examining the reason for this difference might lead you to valuable insight into the problem you analyze.
Assuming you have a large representative sample and your parameter of interest is as straight-forward as you have implied (e.g. beta coefficient from a logistic regression), I'm not sure whether Bootstrapping would add much. Outliers might turn up. Influential points are more important than outliers, but they are more likely to occur in clusters in large samples and therefore more difficult to identify (even with Bootstrapping). I find the Bootstrap estimate of bias quite useful. Also, when I have a parameter whose distribution/variance is difficult to determine I usually calculate the Bootstrap interval. It is quite a handy skill to have up your sleeve. If you are not familiar with the technique, one reason to use it for your situation (i.e. Bootstrap interval of a parameter from a logistic regression) is so that you learn to apply it later when you really need it.
It is important to anticipate the results that you will have with and without bootstrapping. Bootstrapping will narrow your confidence intervals by taking all possible random combinations of your sample (lets say a 1000 times) and then estimating a mean. The issue is you will be more likely to find something statistically significant - so you will have to then evaluate whether the change between experimental conditions really is significant. In population health for example you will want to balance avoidance of a mistaken observation that a difference exists when none actually does (Type I) and accepting a no change when one actually does exist (Type II). In a randomized trial you will not want to bootstrap because you should have calculated your sample size based upon functional effect that is significant (in medicine that would be the clinical effect for example of 10 mg of a pill to lower blood pressure by 10 mm Hg). Bootstrapping would be good to evaluate a pilot study to estimate an effect size for a larger more generalizable study. Of course engineering would also have applications for bootstrapping.
Can anyone suggest a paper that does simulation study on bootstrapping on Logit/ Probit models with different violations, say outliers, heteroscedasticity, skewed distribution, small sample (though it is not a violation) etc?