In the book 'Mastering 'Metrics', Joshua D. Angrist and Jörn-Steffen Pischke, in an example of the causal effect of having health insurance or not on health levels, they describe "... ...This in turn leads to a simple but important conclusion about the difference in average health by insurance status:
Difference in group means = Avgn[Yi|Di=1]-Avgn[Yi|Di=0]
=Avgn[Y1i|Di=1]-Avgn[Y0i|Di=0], (1.2)"
Next, they prove that Difference in group means differs from causal effects "The constant-effects assumption allows us to write:
Y1i=Y0i+k, (1.3)
or, equivalently, Y1i - Y0i = κ. In other words, κ is both the individual and average causal effect of insurance on health. using the constant- effects model (equation (1.3)) to substitute for
Avgn[Y1i|Di = 1] in equation (1.2), we have:
Avgn[Y1i|Di = 1] - Avgn[Y0i|Di = 0]
={k+ Avgn[Y0i|Di = 1]}- Avgn[Y0i|Di = 0]
=k+ {Avgn[Y0i|Di = 1]-Avgn[Y0i|Di = 0]},”
From this they obtain that: "Difference in group means = Average casual effect + Selection bias."
I think there are some confusing aspects of this process. The difference between Difference in group means and ATE should be described in more detail as,
Difference in group means
= Avgn[Yi|Di=1] - Avgn[Yi|Di=0] = Avgn[Y1i|Di=1] - Avgn[Y0i|Di=0]
=Avgn[Y1i-Y0i]+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0]+{1-Pr[Di=1]}*{Avgn[Y1i-Y0i|Di=1]-Avgn[Y1i-Y0i|Di=0]}
where Avgn[Y1i - Y0i] is the ATE
or,
= Avgn[Y1i-Y0i|Di=1]+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0]
where Avgn[Y1i - Y0i|Di=1] is the ATT
or,
= Avgn[Y1i-Y0i|Di=0]+Avgn[Y1i|Di=1]-Avgn[Y1i|Di=0]
where Avgn[Y1i - Y0i|Di=0] is the ATC
Under the assumption of constant-effects (Y1i-Y0i=k), Avgn[Y1i-Y0i] = Avgn[Y1i-Y0i|Di=1] = Avgn[Y1i-Y0i|Di=0].
So,
Difference in group means
= Avgn[Y1i-Y0i]+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0]+0
= Avgn[Y1i-Y0i]+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0]
= k+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0].
or, = Avgn[Y1i-Y0i|Di=1]+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0]
=Avgn[Y1i-Y0i]+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0]
=k+Avgn[Y0i|Di=1]-Avgn[Y0i|Di=0].
or, = Avgn[Y1i-Y0i|Di=0]+Avgn[Y1i|Di=1]-Avgn[Y1i|Di=0]
= Avgn[Y1i-Y0i]+Avgn[Y1i|Di=1]-Avgn[Y1i|Di=0]
= k+Avgn[Y1i|Di=1]-Avgn[Y1i|Di=0].
I am not sure if my derivation process above is correct.