What reasons do we have for using bootstrapping to estimate confidence intervals in a logistic model of a randomized sample of the population?

07 July 2013 16 6K Report

What reasons would you have (and not have) for using boot strapping to estimate confidence intervals of independent variables in a logistic model of a randomized sample of the population?

Valter Sundh Popular answer

Sometimes I have found bootstrapping to be of real practical value. If the results from asymptotic methods differ largely from results of bootstrapping, this indicates problem in the data. There might outliers in continuous data or very skewed distribution in categorical data or something else, but always something that needs to be understood. Examining the reason for this difference might lead you to valuable insight into the problem you analyze.

Valter Sundh

Mark Anthony Clatworthy

Donald, one might be heteroskedasticity. Some bootstrap techniques can help inferences in such circumstances.

Sandro Sperandei

Usually, bootstrapping techniques are usefull when we have serious assumption violations.

Donald E Brannen

Heteroskedasticity and other violations like collinearity or not including important variables - but there is something else - isn't boostrapping for improving the estimates of effect and narrowing the confidence intervals - in order to avoid Type I error, declaring a difference when none actually exist. However, logistic modeling in public health the major concern is a Type II error, missing significant differences when they actually exist.

Sandro Sperandei

Bootstrapping will only produce narrower confidence intervals when data do not met the model assumptions. You can try this using simulation of a simple model.

About Type I and Type II errors, keeping the sample size unchanged, every time you reduce Type I, you increase Type II and vice-versa. To decrease both simultaneously, you will need bigger samples, which is always the major problem when dealing with human subjects...

Michael D Crowell

Bootstrapping provides a nonparametric approach to statistical inference when distributional assumptions may not be met. More accurate standard errors, conﬁdence

intervals, and even hypothesis tests for more complex samples can be derived using these methods.

Donald E Brannen

Why would you not use boot strapping with a large sample size? What if you were just interested in the total variance explained between models not the estimates of the independent variables.

Michael D Crowell

Reducing error variance or bias due to violation of distributional assumptions in logistic models should improve the ability to detect associations if they exist.

Michael D Crowell

With larger samples violations of distributional assumptions for residuals become less of an issue and bootstrapping may have little benefit.

David M Mburu

When dealing with data designs where the experimental sample units can be limiting such as life human subjects, it would be appropriate to invoke bootstrapping especially in logistic regression models to accurately test confidence intervals,standard errors and tests of hypotheses.

I is usually useful in non-parametric tests (distribution free samples)

Sandro Sperandei

It is very important to be careful with the idea to use bootstrap and other resampling based models when you have limited sample size. Although it is usefull in this situation, if you try to use it with very limited sample size you can end resampling from a non representative data and will obtain non representative confidence intervals and estimates.

No method is capable to save bad data! Some people thinks that statistics can solve every problem... Garbage in, garbage out!

Valter Sundh

Gavin Pereira

Assuming you have a large representative sample and your parameter of interest is as straight-forward as you have implied (e.g. beta coefficient from a logistic regression), I'm not sure whether Bootstrapping would add much. Outliers might turn up. Influential points are more important than outliers, but they are more likely to occur in clusters in large samples and therefore more difficult to identify (even with Bootstrapping). I find the Bootstrap estimate of bias quite useful. Also, when I have a parameter whose distribution/variance is difficult to determine I usually calculate the Bootstrap interval. It is quite a handy skill to have up your sleeve. If you are not familiar with the technique, one reason to use it for your situation (i.e. Bootstrap interval of a parameter from a logistic regression) is so that you learn to apply it later when you really need it.

Donald E Brannen

It is important to anticipate the results that you will have with and without bootstrapping. Bootstrapping will narrow your confidence intervals by taking all possible random combinations of your sample (lets say a 1000 times) and then estimating a mean. The issue is you will be more likely to find something statistically significant - so you will have to then evaluate whether the change between experimental conditions really is significant. In population health for example you will want to balance avoidance of a mistaken observation that a difference exists when none actually does (Type I) and accepting a no change when one actually does exist (Type II). In a randomized trial you will not want to bootstrap because you should have calculated your sample size based upon functional effect that is significant (in medicine that would be the clinical effect for example of 10 mg of a pill to lower blood pressure by 10 mm Hg). Bootstrapping would be good to evaluate a pilot study to estimate an effect size for a larger more generalizable study. Of course engineering would also have applications for bootstrapping.

Zarrukh Rakhimov

Can anyone suggest a paper that does simulation study on bootstrapping on Logit/ Probit models with different violations, say outliers, heteroscedasticity, skewed distribution, small sample (though it is not a violation) etc?

Akinyemi Ajibola

Good submissions.

Badges
Science topic

Similar topics
Medicine
Epidemiology

What is the best sampling strategy?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Is it possible to do transient simulation in HFSS in presence of magnetic biasing?

How do we pick data for determination of Validation Acceptance Criteria?

ANCOVA for between-subject design?

I am working on III-V based tandem solar cells.Can anyone explain that solar cells work under forward or reverse biased conditions?

Charge transfer characteristics of inverted perovskite leds?

How can I extract the mathematical equation from existing Neural Network Model?

Can ROBINS-I be used to measure risk of bias in single-arm studies?

I'm looking for a tool to evaluate the risk of bias in... Are there any recommended applications or platforms?