I have data with 3 variables: group (0, 1), visit (0, 1), outcome. I recently have read some articles which discuss the use of ANCOVA with baseline outcome is better than difference-in-difference method in randomization study because inclusion of baseline outcome has two advantages. First, when the randomization does not work well, and the baseline outcome affects the randomization, then adjustment for baseline outcome can reduce the bias. Second, the baseline outcome usually is a strong predictor of outcome at the second visit. The inclusion of strong predictor in a randomization study whether the outcome value is binary or continuous can have higher statistical efficiency (i.e., smaller standard error).
However, my colleague asked me that can I use the GEE with baseline outcome adjustment. I found one thing that the GEE with and without baseline outcome have the same estimate of its interaction (the effect of difference-in-difference method). On the other hand, when I simulate a bias in baseline outcome, the ANCOVA with baseline outcome can adjust the bias than it did in the ANCOVA without baseline outcome.
I provide the following R code to simulate this question. Can someone tell why GEE with baseline outcome adjustment does not actually adjust baseline outcome?
#R code
set.seed(1)
n=5
g0t0=rnorm(n, 0, 1)
g1t0=rnorm(n, 5, 1)
g0t1=rnorm(n, 0, 1)
g1t1=rnorm(n, 0, 1)
t0=cbind.data.frame(y=c(g0t0, g1t0))
t0$id=1:nrow(t0)
t0$g=rep(c(0, 1), each=n)
t0$visit=0
t1=cbind.data.frame(y=c(g0t1, g1t1))
t1$id=1:nrow(t0)
t1$g=rep(c(0, 1), each=n)
t1$visit=1
t=rbind.data.frame(t0, t1)
t0_y=subset(t0, select = c(id, y))
t0_y$baseline_y=t0_y$y
t0_y$y=NULL
library(dplyr)
t=t %>% left_join(t0_y, by="id") %>% arrange(id,g, visit)
#GEE
library(gee)
gee(formula=y~ g*visit, id=id, data=t, family=gaussian, corstr="exchangeable")
gee(formula=y~ g*visit+ baseline_y, id=id, data=t, family=gaussian, corstr="exchangeable")
#the interaction is the same in these GEE analyses
#ANCOVA
t1=subset.data.frame(t, visit==1)
lm(y~ g, data=t1)
lm(y~ g+ baseline_y, data=t1)