Say I would like to use species as the random-effects term in the model. I have 800 species in total. Among them, 600 species have only one record, 100 have 2-4 records, and 100 have 5-15 records (to be more specific with the example, let's say 70 have 5-10 records, and the rest 30 have 11-15 records). Therefore, in this case, the number of levels of a grouping factor for the random effects is close to the number of observations -- would it generate a statistic problem because there is only one observation in most levels?

Is there a point that I "should" or "shouldn't" treat species as the random-effects term in this case?

This style of data is quite common in data sets compiled from literature, i.e. the majority of species are studied just once while a small portion of species have been studied over and over in different papers. Also, this data style is possible when you want to treat genus as the random effects, for example this paper -- see the 2nd paragraph on page 325 ( https://doi.org/10.1017/S0960258518000090).

Let's think about an extreme case that data record is used as the random-effects term and there is only one record in each level. Then, R will print a note: "Number of levels of a grouping factor for the random effects is equal to n, the number of observations". In this case, it is NOT a nested design, and there is no group. GLMM gives the same result as GLM in this extreme case. Is my understanding correct?

Many thanks!

More Si-Chong Chen's questions See All
Similar questions and discussions