How to interpret reference levels and interactions terms in a negative binomial GLMM with multiple multi-level categorical variables?

05 July 2025 1 4K Report

Hi everyone, I’m working on a statistical analysis to test the effects of various environmental conditions and planting techniques on plant survival in a revegetation project. I’d really appreciate any advice on interpreting my model output and choosing reference levels.

I chose a Generalized Linear Mixed Model (GLMM) because each individual plant is nested within a different sector of the site, and there are plantings in different years (i.e., nesting). The response variable is survival, which follows a binomial distribution. All of my explanatory variables—both fixed and random effects—are categorical:

Model specifications:

Fixed effects: Slope (2 levels) Exposure (2 levels) Species (6 levels) Technique (2 levels) Ecosystem (4 levels)
Random effects: Monitoring year Sector

I performed model selection using likelihood‐ratio tests (LRT) and then validated with residual simulations using the DHARMa package. After comparing different effect structures and checking residuals, I concluded that a negative‐binomial GLMM (nbinom2) fitted with glmmTMB provides the best fit:

glmmTMB(

Alive ~ Species + Exposure + Species:Ecosystem + Technique:Exposure +

(1 | Monitoring) + (1 | Sector) + offset(logPlantsTotal), family = nbinom2, data = my_data)

(I attached an image of the summary in spanish)

Specific questions:

Reference levels: In the model summary, I see 26 coefficients, but I’m unclear about which reference level is used for each factor. I understand that, by default, R uses the first level (alphabetically or by factor order), and that the intercept refers to this baseline. However, due to the number of variables and interactions, I find it difficult to interpret which levels are being compared in each case—especially in the interaction terms. Is there a systematic way to identify all reference levels from the glmmTMB output?

Reporting results: What is the recommended way to report main and interaction effects from this type of model?

Use of emmeans: I’ve attempted to use the emmeans package to conduct post-hoc pairwise comparisons between levels of my categorical predictors. However, I’m unsure whether this is valid in this model. Is emmeans appropriate in this context?

Any insights or methodological suggestions would be highly appreciated. I also welcome recommendations on best practices for interpreting and presenting results from GLMMs of this kind.

Luis Daniel Ramirez

Jochen Wilhelm

Thank you very much for your response. Your observations have certainly helped me broaden my perspective on the analysis.

Regarding your first comment, I hadn’t considered it in that way before, and I agree that it could pose a challenge for fitting an appropriate model. In this case, I should mention that my dataset is currently structured in an aggregated format, where each row represents all individuals of a given species observed within a sector during a specific year. Given this, would it be a good strategy to restructure the dataset in a disaggregated format, where each row corresponds to an individual observation? Another possibility I’ve considered is reducing the number of variables, but I wonder if this would be valid, considering that in the current model selection process, these variables were identified as significant.

I also appreciate your second observation. It does appear that the data show some degree of zero inflation, so I believe the question you raised will be important to address in the written analysis.

Your clarifications to my specific questions were also very helpful and have given me a clearer understanding. Thank you again for your valuable feedback, it has been very useful in helping me move forward with the analysis.

Badges
Science topic

How much total RNA concentration to be extracted from sorted plasma cells from bone marrow of C57BL/6 mice for RT-PCR ?

i have sorted anti-NP specific plasma cells from bone marrow of C57BL/6 mice at certain times after immunization with variable counts and isolated total RNA using TRIZOL method for RT-PCR using...

05 August 2024 8,835 1 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How to analyze multiple phosphorilation sites?

Hello, I am currently analyzing some phosphoproteomics data, but I have peptides with multiple phosphorylation sites or phosphorylations together with carbamidomethylation or oxidation. How can I...

04 August 2024 8,432 3 View

Request a single Lecture notes for math as detailed as this that I can find in one place?

- The Existence/Uniqueness of Solutions to Higher Order Linear Differential Equations - Higher Order Homogenous Differential Equations - Wronskian Determinants of $n$ Functions - Wronskian...

03 August 2024 2,366 0 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Just bounced on me. Before statistically analysing significant difference, shouldn't we see if data fits normal distribution first? Is 3 replicates enough to testify the hypothesis of normal...

31 July 2024 8,141 13 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

SAS Generalized Linear Model for trial/event anaysis and not survival (time to event) analysis?

I am looking for a published article using SAS or SPSS Generalized linear model for trial/event data and not survival analysis. Both software packages off the option for the number of success out...

30 July 2024 3,835 2 View