How Do I Control for Multiple Comparisons Using FDR in Beta Regressions?

24 March 2021 3 3K Report

Good afternoon stats experts,

I know this may be excessive, but I am currently working on a project (N = 61) investigating the effects of white matter metrics on memory development in children. I have run 4 sets of 9 beta regressions (using betareg in R) whose formulae look like this: [Memory Measure] ~ [IQ] + [Control Tract] + [Tract of Interest]. The only thing that varies within the sets is the memory measure (there are 9), and the only thing that varies between the sets is the type of white matter index we are looking at (i.e. FA, MD, streamlines, etc.). IQ and the control tract index are our control measures. The only variable we are truly interested in in any of these models is our tract of interest. With the way I have things set up currently, I have run a whopping 36 beta regressions. Note that all of these analyses are theory- and hypothesis-driven.

It is also important to note that I get a chi-square and p-value in each of these models in order to assess whether or not the inclusion of our variable of interest has given way to a significant change in the log-likelihood function yielded by each model (using lrtest() in R).

When I brought up this issue with my PI, she recommended that I control for multiple comparisons using an FDR correction due to the shear number of models I have run - not due to the number of predictors in each model. However, I an unsure (1) if this is necessary, and (2) how to approach this technically speaking. I am trying to use the p.adjust() function in R, which requires you to input a vector of p-values for which you are trying to correct. If this is the correct approach to be taking, I need to know the following:

When listing the vector of p-values in the p.adjust() function, should I be doing this for just the three p-values yielded for each predictor? That is, should I do the adjustment separately fore each model in each set (i.e. 36 times)? Should I included the p-value of the log-likelihood function as well?

Would it be correct instead for me to list all the p-values of all the models in each set in order to perform a correction?

Would it be correct for me to list all of the p-values across sets in order to perform the correction?

Is any correction needed at all?

Thank you in advanced for your expertise! Any advise would be sincerely appreciated.

Kind regards,

Linda

Jochen Wilhelm

There will surely different opinions, all with their pros and cons. Whatever you do here, you will likely find some expert saying that this is wrong what you did. So don't expect to get "the correct" answer.

Your study seems to be quite exploratory, and if you step away from painting black-and-white pictures of your findings ("this is significant and that is not"), there is no need to do any correction. But it requires a careful interpretation. You can always say that your data was most consistent with this and most inconsistent with that hypothesis, leaving it open if the evidence from your data is sufficient to make some general or "absolute" claim.

If you want to pick "the most promising tract, you should not rely on the p-value anyway. You should look of the entire confidence intervals and theterpret these (as a whole, from the upper to the lower limit; not just whether or not they include a logOR of zero). For this, I also don't see the neccesity to adjust p-values.

If you use your models to screen for "effective tracts" and select the candidates using p-values (e.g. to give a more or less small list of cadidates), then you should adjust the p-values. You should think if this list should have a small parobability to contain at least one "false positive" (->FWER) or if you want to control the proportion of false-positives among this list (->FDR). The p-values to be adjusted are all p-values that are used to decide if a tract will make it into your list.

Linda Hoffman

Dear Jochen Wilhelm ,

Thank you so much for your grounded and insightful response. I am going to go ahead and proceed without the correction, and instead construct confidence intervals around my beta coefficients and advance with caution in my interpretations. I would like to pick your brain about one last thing if you don't mind: given that this is not a standard linear regression, but rather a beta regression, what method for confidence interval construction do you feel would be the most prudent? I am thinking of using a bootstrapping method in R.

Kind regards,

Linda

Wim Kaijser

Would perhaps a constrained ordination also be useful (e.g. RDA), to explore the data before regression? You have 9 types of memory measures and three predictors ([IQ] + [Control Tract] + [Tract of Interest]), if I understand correctly? I am not sure if it adds, but it can give some general overview of the "relations" in the dataset. See: https://sites.google.com/site/mb3gustame/constrained-analyses/rda, or for an R code:

https://fukamilab.github.io/BIO202/06-B-constrained-ordination.html

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

What is the difference between mathematical R^4 space and physical 4D unit space?

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

Controlling for pupil light reflex when analyzing pupil size time course?

What are a “Farmers Producer Organization” (FPO) and its essential features?

Strugglling with m6A dot blot any suugesstion ?

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How to get moment output in Abaqus Standart?

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How do you delete a duplicate pdf for the same paper on ResearchGate?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?