Dear all, I would like to start a discussion here on the use of generalised mixed effect (or additive) models to analyse count data over time. I reported here the "few" analyses I know in R for which I found GOOD (things) and LIMITS /DOUBTS. Please feel free to add/ comment further information and additional approaches to analyse such a dataset. Said that, generalised mixed effect modelling still requires further understanding (at least from me) and that my knowledge is limited, I would like to start here a fruitful discussion including both people which would like to know more about this topic, and people who knows more.

About my specific case: I have counted data (i.e., taxa richness of fish) collected over 30 years in multiple sites (each site collected multiple times). Therefore my idea is to fit a model to predict trends in richness over years using generalised (Poisson) mixed effect models with fixed factor "Year" (plus another couple of environmental factors such as elevation and catchment area) and random factor "Site". I also believe that since I am dealing with data collected over time I would need to account for potential serial autocorrelation (let us leave the spatial correlation aside for the moment!). So here some GOOD (things) and LIMITS I found in using the different approaches:

glmer (lme4):

GOOD: good model residual validation plot (fitted values vs residuals) and good estimation of the richness over years, at least based on the model plot produced.

LIMITS: i) it is not possible to include correction factor (e.g., corARMA) for autocorrelation.

glmmPQL(MASS):

GOOD: possible to include corARMA in the model

LIMITS: i) bad final residual vs fitted validation plot and completely different estimation of the richness over years compared to glmer; ii) How to compare different models e.g., to find the best autocorrelation structure (as far as I know, no AIC or BIC are produced)? iii) I read that glmmPQL it is not recommended for Poisson distributions (?).

gamm (mgcv):

GOOD: Possible to include corARMA, and smoothers for specific dependent variables (e.g., years) to add the non-linear component.

LIMITS (DOUBTS): i) How to obtain residual validation plot (residuals vs fitted)? ii) double output summary ($gam; $lme): which one to report? iii) in $gam output, variables with smoothers are not estimated (only degree of freedom and significance is given)? Is this reported somewhere else?

If you have any comment, please feel free to answer to this question. Also, feel free to suggest different methodologies.

Just try to keep the discussion at a level which is understandable for most of the readers, including not experts.

Thank you and best regards

More Alessandro Manfrin's questions See All
Similar questions and discussions