It seems journals are considering Bonferroni adjustment for p-values of terms within a multiple regression model. Has anyone else noticed this? What do you think of the trend?
The challenge of dealing with multiple testing in the same body of data is real. Performing many stochastically dependent tests may result in serious inflation of type I errors (i.e. accepting 'spurious' significance results as 'real'). There are many less conservative modifications to deal with the Bonferroni inequality. One mode that preserves much statistical power would be the false discovery rate approach (there was a nice review paper on that in about 2006 in Ecoscience).
With highly multivariate regression models, one may alternatively go for a different approach. Instead of classical hypothesis testing, you could follow a model-building strategy, for example guided by Akaike's AIC (or AICc with small sample numbers) or Bayes' BIC.
Both these information criteria allow you to compare models on the basis of (a) how good the fit to the data is PLUS (b) how few parameters are required to meet that. In this franework, the 'significance' of individual predictor variables does no longer attract much interest.
Frequently, you may then end up with a number of models of more or less equal goodness / parsimony. Instead of significance testing, then, model averaging would be a viable strategy.
Anyhow, as long as you go for multiple hypothesis testing, it is certainly wise to keep the Bonferroni problem in mind. Otherwise marginally 'significant' relationships may easily be given undue weight.
The most famous way to adjust multiple comparison is the Bonferroni test (sometimes the only one, which some researchers known) and the Scheffe test. One should have in mind, that the Bonferroni test is very conservative. For the Bonferroni test, you simply multiply each observed p-value by the number of tests you perform.
Sidak-Holm's method for correcting for multiple comparisons is less well-known, and is also less conservative
I had thought a major difference between mv fits and multiple bivariate fits such as t-tests is that considering multiple independent variables simultaneously allows you to look at the estimated effect of X1 when X2...Xn have been accounted for. The Bonferonni adjustment, on the other hand, considers each X vs Y comparison in a vacuum.
The challenge of dealing with multiple testing in the same body of data is real. Performing many stochastically dependent tests may result in serious inflation of type I errors (i.e. accepting 'spurious' significance results as 'real'). There are many less conservative modifications to deal with the Bonferroni inequality. One mode that preserves much statistical power would be the false discovery rate approach (there was a nice review paper on that in about 2006 in Ecoscience).
With highly multivariate regression models, one may alternatively go for a different approach. Instead of classical hypothesis testing, you could follow a model-building strategy, for example guided by Akaike's AIC (or AICc with small sample numbers) or Bayes' BIC.
Both these information criteria allow you to compare models on the basis of (a) how good the fit to the data is PLUS (b) how few parameters are required to meet that. In this franework, the 'significance' of individual predictor variables does no longer attract much interest.
Frequently, you may then end up with a number of models of more or less equal goodness / parsimony. Instead of significance testing, then, model averaging would be a viable strategy.
Anyhow, as long as you go for multiple hypothesis testing, it is certainly wise to keep the Bonferroni problem in mind. Otherwise marginally 'significant' relationships may easily be given undue weight.
I attach a file with the paper of Banjamini and Hochberg about the issue.
I find their approach rather simple and feasible (but I'm not a statistician!). My advice is to trust more on plausibility of the associations to be looked for than on swifting p-values! After all 0.05 is an arbitrary cut-off . What I have to meet is the request of producers of hundreds data to test hundreds of null hypothesis and then look for potentialy meaningful significance. I think that it should be stressed that one hypothesis has to be tested.
Thanks for the paper, Vincenzo. Like the Holm method, this seems straightforward.
I wish people would not lock on p-values. If you trim a multivariate model using AIC, which focuses on overall performance of the model, often a few of the surviving parameters have p > 0.05.
Along these lines, I had a question about how best to correct for multiple comparisons when performing a few multiple regressions within a single study. Specifically, I am using 3 behavioral measures to predict functional connectivity for rest-state fMRI data. So, in the study I perform 3 separate regressions and I have already performed a FWE cluster-correction for the results from each regression.
Thanks for the question, Carissa! I have the same Problem with measures of connectivity strength in 8 multiple regression models with the same independent variables and confounds. How would you control for multiple testing?
Could anyone recommend me how to use correction for multiple comparison (for example, FDR) in hierarchical linear regression? Does it need to correct only final model or all all models separately or all models together? For example, I have 2 predictors in the first block, 3 in the second, and 6 in the third, so do I need to use correction only for final model (11 predictors), for all three models separately (2, 5 and 11 predictors), or for all predictors in all models (2+5+11=18 predictors), no matter what the predictors are repeated?
Could "correction for multiple comparisons" be applicable for multivariable logistic regression? If so, how could it be done in SPSS? Please suggest me reading materials regaring this topic.