Where can I find good material on the difference between mixed models and gee models?

I presume we are discussing mixed MODELS - that is random coefficient models and GEE estimation.

There are two types of models when dealing with a panel study where occasions are nested within individuals, subjects or clusters; that is there is dependency or autocorrelation over time in the response variable. This has particular importance for discrete outcomes eg predicted probability having a disease or not (say) from an estimated model (This also applies to multilevel models with say pupil nested in classes in school but I will stick to the panel repeated measures case.) .

The two approaches to within subject variation are:

1) Marginal i.e. Population average models: estimated by GEE estimation: the estimated slope is the effect of change in the whole population if everyone’s predictor variable changed by one unit

2) Conditional: cluster specific, mixed or multilevel: the slope is the effect of change for a particular ‘individual’ or subject of a unit change in a predictor.

Marginal estimates should not be used to make inferences about individuals (committing ecological fallacy), while conditional estimates should not be used to inferences about populations (atomistic fallacy). The population average approach underestimates the individual risk & vice-versa .

Importantly, GEE estimation is robust to assumptions about the higher level, between- subject distribution; gives correct SE for fixed part and gives the Population average value. But GEE does not give the higher-level variance, it is not extendible to random slopes and 3 level models etc; and does not give & you cannot derive the cluster-specific estimates. All things that you get straightforwardly with the mixed, multilevel random-effects approach.

Moreover, it has been argued by Gary King that using robust SE's is like taking a canary down the mine; if the robust SE's give different results , there is something wrong with the model, it is not OK to interpret it. For what it is worth I agree with him and thus I think it is better to have an explicit model that does take the dependency into account and not just treat as a nuisance to be corrected.

http://gking.harvard.edu/publications/how-robust-standard-errors-expose-methodological-problems-they-do-not-fix

Consequently I believe, contra Hubbard et al, that random effects analysis is the more generally useful method - To GEE or Not to GEE: Comparing Population Average and Mixed Models for Estimating the Associations Between Neighborhood Risk Factors and Health, Epidemiology, Volume 21(4), July 2010, pp 467-474

Indeed, population-averaged values can be obtained from random effects model in the MLwiN software via simulating predicted probabilities (but not vice versa), see below

In practice there has to be a lot of difference between individuals and therefore (equivalently) a lot of similarity over time within subjects for there to be a difference between the two estimates. Thus when a fixed estimate is -1.5 on the logit scale, and the between variance is 1.0 (ie some 23 percent of the variance lies at the occasion level; with the level 1 within person variance of a standard logistic being constrained always to 3.29), the cluster specific result is a probability of 0.18 while the population average value is 0.22 . However if the between variance on a logit scale is 3 (so that 48% of the total variance lies at the occasion level) , the cluster specific result remains a probability of 0.18 while the population average value is now 0.27.

The GEE approach treats this clustering as a nuisance and does not give an estimate of the higher-level variance. The mixed multilevel random effects model treats the clustering as of substantive interest and you do get an estimate of its size and nature – but this generally requires distributional assumptions about the higher-level random effects; although non-parametric random effects procedures are possible.

There is more on mixed modelling for longitudinal analysis here (link below), and a long chapter considers different types of dependency over time such as the Toeplitz model and the unstructured model in the random effects model. The essence of this is the random intercepts model may not be complex enough to capture the nature of the dependence as it assumes that the autocorrelation between any two time points is the same (ie compound symmetry assumption). The unstructured model estimates different degree of dependence between each and every time period - this is flexible but not parsimonious as the number of parameters grows rapidly. The Toeplitiz model is quite flexible and quite parsimonious as it assumes equal autocorrolation for periods the same lag apart - all of this is detailed in the following

Developing multilevel models for analysing contextuality, heterogeneity and change using MLwiN, Volume 2 Kelvyn Jones, VS Subramanian ; available from

https://www.researchgate.net/publication/260772180_Developing_multilevel_models_for_analysing_contextuality_heterogeneity_and_change_using_MLwiN_Volume_2?ev=prf_pub

And the MLwin procedure for getting both population average and cluster-specific estimates via simulation post estimation is documented here

https://www.researchgate.net/publication/234015221_Manual_supplement_for_MLwiN_Version_2.14?ev=prf_pub

Apologies for going on at length and re-using some (most!) material from previous posts.

Book Developing multilevel models for analysing contextuality, he...

Data Manual supplement for MLwiN Version 2.14

Swaroop Kher

I came across this paper (attached below): To GEE or Not to GEE: Comparing Population Average and Mixed Models for Estimating the Associations Between Neighborhood Risk Factors and Health. This may be of some use.

Also there is one more link I came across: http://www.lexjansen.com/mwsug/2005/Pharmaceutical_Healthcare/PH400.pdf

There was a SAS SUGI paper on differences between linear Mixed Models and Generalized Estimating Equations using GLMM- i can't find it online anymore.

Ariel Linden

I am a Stata user, so I can't speak to other software. That said, you should the Stata manuals a read. They are very comprehensive yet very good. Here are some links to the GEE and multileveled model manuals. They will help you understand what these models do, and help you decide which are apppropriate for your data.

www.stata.com/manuals13/xtxtgee.pdfwww.stata.com/manuals13/xtxtgee.pdf

www.stata.com/manuals13/me.pdf

Peter Moono

Hi Kelvyn,

Thanks for additional information.

How can I adjust the buffering capacity of cell culture media in high CO2 incubators?

Does anyone have a very good and reproducible protocol for isolating total microbial RNA from saliva?

What is the best incubation time for Lysis buffer with Lysozyme and Protinease inhibitor?

What about the solubility of DSPE (1,2-distearoyl-sn-glycero-3-phosphoethanolamine)?

Hi, can anybody help me with the suggestions how can I amplify a stress induced gene of 1107 bp?

Liposome has white deposit flocculence at bottom, what happened?

Can anyone suggest me an effective media composition and sterilization method for clonal propagation of Fern (Adiantum sp.)?

Question about how to accurately measure the liposome digestion and cumulative release?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How does grain and grain boundary affect the ceramic when studying its dielectric properties?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?