In his very helpful online book on structural equation modeling, Jon Lefcheck writes the following concerning d-separation tests for SEMs:

"Once the model is fit, statistical independence is assessed with a t-, F-, or other test. If the resulting P-value is >0.05, then we fail to reject the null hypothesis that the two variables are conditionally independent. In this case, a high P-value is a good thing: it indicates that we were justified in excluding that relationship from our path diagram in the first place, because the data don’t support a strong linkage between those variables within some tolerance for error."

https://jslefche.github.io/sem_book/local-estimation.html

My understanding is that this is a standard procedure in SEM and more broadly in DAG-data consistency checks, not a quirk of Lefcheck's workflow. My question is, doesn't this amount to an attempt to use p-values to control type II error rate?

Setting aside the inherent limitations of p-values even when used correctly, the fundamental problem is that p-values simply don't control type II error rate --- they control type I error rate.

The language in the documentation of the R package bnlearn, a suite of Bayesian causal discovery tools, is even more explicit in equating the failure to reject the null hypothesis of zero correlation with the acceptance of the null hypothesis, i.e. the conclusion that correlation is "in fact" zero:

"Now, let’s check whether the global Markov property is satisfied in this example. We will use the survey data to check the Null hypothesis that S is independent of T given O and R. In R, we can use ci.test function from bnlearn to find conditional dependence. [...] As you can see, the p-value is greater than 0.05 threshold, hence we do not reject the Null hypothesis and conclude that in fact, S is independent of T given O and R."

https://rpubs.com/sarataheri/bnlearnCGM

As a convert to causal inference and Bayesian logic from the cult of cause-blind null-hypothesis statistical testing, I find it frustrating that when it comes time to validate our causal graphs --- after all our effort to construct principled causal models and estimate parameters informatively --- we fall back on p-values, and a dubious use of p-values at that.

Am I missing something? I hope to be corrected.

More Douglas B. Sponsler's questions See All
Similar questions and discussions