Forgive the lack of a reproducible example in this question, as my problem stems from analysing a large (>50000 rows) dataset.

I want to ask a question about generalised linear mixed effects model diagnostics, I'm less familiar with handling GLMMs over GLMs. I am analysing a dataset where the response has a ‘fat tailed’ distribution. Its characteristics are: continuous values, all non-negative and greater than 0, with a strong positive skew & a maximum value of 1.

I have used R package lme4 and glmmTMB for the models themselves, and packages DHARMa and MuMIn (& base R) for my diagnostics.

I have fitted models with the following link functions: Gamma(inverse), Gamma(log), Beta(logit) and Gaussian(log). From some reading around I’m using simulateResiduals() in DHARMa because a normal QQ plot isn’t appropriate for most of these distributions.

My issue is I’ve fitted a selection of models to try to settle on the most appropriate and get conflicting results from different diagnostics, so I’m not sure what to do next.

In order of best to worst looking at the DHARMa QQ plot & residuals vs predicted plots is:

  • Beta
  • Gaussian
  • Gamma(log)
  • Gamma(inverse)
  • When using AIC (or AICc or BIC) the order is:

  • Gamma(log)
  • Beta
  • Gamma(inverse)
  • Gaussian
  • When I fit the mean estimate to the response data and eyeball it, the order is:

  • Gaussian
  • Gamma(inverse)
  • Gamma(log)
  • Beta
  • When I look at the prediction intervals, the order is:

  • Beta
  • Gamma(log)
  • Gamma(inverse)
  • Gaussian
  • And if I look just at fixed effects for confidence intervals, the order is:

  • Gamma(inverse)
  • Beta
  • Gamma(log)
  • Gaussian
  • At the moment, I am thinking the model with a beta family is the one to go with, even if the mean estimate is ‘worst’ (it’s still quite a good fit from eyeballing, it’s just the logit link flattens the estimate vs others), the prediction intervals and QQ plot are best and the AIC is OK. Also it's the best one on paper in terms of how it matches the characteristics of the response data. Is that a reasonable assessment of things? Does anybody have other ideas either about what I’ve done to check these models, or other things I could do that I haven’t thought of?

    More Malcolm Baptie's questions See All
    Similar questions and discussions