Hi everyone, I’m working on a statistical analysis to test the effects of various environmental conditions and planting techniques on plant survival in a revegetation project. I’d really appreciate any advice on interpreting my model output and choosing reference levels.

I chose a Generalized Linear Mixed Model (GLMM) because each individual plant is nested within a different sector of the site, and there are plantings in different years (i.e., nesting). The response variable is survival, which follows a binomial distribution. All of my explanatory variables—both fixed and random effects—are categorical:

Model specifications:

  • Fixed effects: Slope (2 levels) Exposure (2 levels) Species (6 levels) Technique (2 levels) Ecosystem (4 levels)
  • Random effects: Monitoring year Sector

I performed model selection using likelihood‐ratio tests (LRT) and then validated with residual simulations using the DHARMa package. After comparing different effect structures and checking residuals, I concluded that a negative‐binomial GLMM (nbinom2) fitted with glmmTMB provides the best fit:

glmmTMB(

Alive ~ Species + Exposure + Species:Ecosystem + Technique:Exposure +

(1 | Monitoring) + (1 | Sector) + offset(logPlantsTotal), family = nbinom2, data = my_data)

(I attached an image of the summary in spanish)

Specific questions:

  • Reference levels: In the model summary, I see 26 coefficients, but I’m unclear about which reference level is used for each factor. I understand that, by default, R uses the first level (alphabetically or by factor order), and that the intercept refers to this baseline. However, due to the number of variables and interactions, I find it difficult to interpret which levels are being compared in each case—especially in the interaction terms. Is there a systematic way to identify all reference levels from the glmmTMB output?
  • Reporting results: What is the recommended way to report main and interaction effects from this type of model?
  • Use of emmeans: I’ve attempted to use the emmeans package to conduct post-hoc pairwise comparisons between levels of my categorical predictors. However, I’m unsure whether this is valid in this model. Is emmeans appropriate in this context?
  • Any insights or methodological suggestions would be highly appreciated. I also welcome recommendations on best practices for interpreting and presenting results from GLMMs of this kind.

    Similar questions and discussions