beyond the difference between the incorporation of manifest variables versus latent variables, in this chapter Bollen and Pearl argue for much deeper differences between regression analysis and SEM (and also path analysis):
Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation modeling. In S. L. Morgan (Ed.), Handbook of Causal Analysis for Social Research (pp. 301-328). Dordrecht: Springer.
First of all, the primary goal of regression analysis is mere prediction (i.e., fit a regression plane into a multidimensional scatter of Y-values). The result is the conditional expected mean E(Y | X) where X is a vector of weighted predictors.
The reasons of including several predictors is mostly informational: Does a predictor explain variance (=add informational usefulness) beyond the inclusion of the others. The regression coefficients are weights chosen to maximize prediction and have no causal "content".
SEM/path analysis in contrast is based on strong and weak causal assumptions. Assumed exposure variables are included because the researcher assumes them to have a specific causal role in the system. Weak assumption concern the assumed effects of variables, and strong assumptions concern assumed NON-effects (~holes in the cheese). The estimated parameters are estimated under these set of assumptions, hence, they transport causal meaning and fuse the data patterns and the causal assumptions. For instance, a full mediation model implies two weak assumptions (i.e., the direct effects), and the following strong assumptions:
a) no direct effects
b) no unobserved confounding of the X-M, M-Y and X-Y link
c) no reverse causes
The reason why they are termed "strong" is that only one value (=zero) validates the assumption; weak assumptions are validated by an universe of nonzero values.
When estimating the parameters (most importantly, both direct effects), the algorithm incorporates these assumptions into the estimation of the parameters. This why model fit is so important: the reasonable interpretation of the parameters PRESUMES the correctness of the assumptions. Otherwise they may be biased or even useless.
The most important difference it that the structure (with its assumptions) implies testable implications (in contrast to regression).
Of course (and this is how regression is usually applied), the basis for a regression can be a causal model (with causal assumptions), but in this case, the actual model (behind the regression) is indeed a SEM and the regression is just a tool to control for confounders, and not a model in itself. Because--again--if the underlying model is wrong, the regression will result in nonsense parameters.
Moreover, the set of assumptions behind a regression is most often poorly developed, making the regression problematic as a tool:
a) researchers have no idea what they have to control for (i.e., and include in the regression)
b) researchers don't think about the relationships among the predictors which often results in controlling for mediators, post-treatment variables, or colliders.
Structural Equation Modeling is basically a version of regression that includes a "measurement model" for some of the concepts in the overall analysis. Rather than being represented by a single variable, these concepts are represented by multiple variables that are "weighted" in a fashion that is analogous to factor analysis. The advantage of SEM is that these concepts are usually more reliable than single item indicators.
Ordinary least squares regression could be considered a limited, special case of structural equation modeling. In fact, there is an underlying structural equation that links the predictors (independent variables/exogenous variables) to the outcome measure (dependent variable/endogenous variable): Y-est = Bo + B1X1 + ... + BkXk.
Path analysis, as developed by Sewall Wright (1920s), is just a generalization of this idea to the possibility of having multiple dependent variables, but the arithmetic is no more complex.
As David L Morgan correctly notes, the chief advantage of SEM models is the potential to add and investigate measurement models, in which latent variables (factors) are proposed and evaluated, and that paths among these latent variables may be estimated and considered.
How well does each approach work? Well, in regression, you always have the option of comparing observed scores on the dependent variable with estimated/predicted scores on the dependent variable. The closer these are, the better the model. In SEM, there are numerous indicators of how well the proposed model can reproduce the relationships observed among variables in the data set (so long as the model is not "saturated"). So, on that score, SEM offers a bit more options for understanding adequacy of model-data concordance.
beyond the difference between the incorporation of manifest variables versus latent variables, in this chapter Bollen and Pearl argue for much deeper differences between regression analysis and SEM (and also path analysis):
Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation modeling. In S. L. Morgan (Ed.), Handbook of Causal Analysis for Social Research (pp. 301-328). Dordrecht: Springer.
First of all, the primary goal of regression analysis is mere prediction (i.e., fit a regression plane into a multidimensional scatter of Y-values). The result is the conditional expected mean E(Y | X) where X is a vector of weighted predictors.
The reasons of including several predictors is mostly informational: Does a predictor explain variance (=add informational usefulness) beyond the inclusion of the others. The regression coefficients are weights chosen to maximize prediction and have no causal "content".
SEM/path analysis in contrast is based on strong and weak causal assumptions. Assumed exposure variables are included because the researcher assumes them to have a specific causal role in the system. Weak assumption concern the assumed effects of variables, and strong assumptions concern assumed NON-effects (~holes in the cheese). The estimated parameters are estimated under these set of assumptions, hence, they transport causal meaning and fuse the data patterns and the causal assumptions. For instance, a full mediation model implies two weak assumptions (i.e., the direct effects), and the following strong assumptions:
a) no direct effects
b) no unobserved confounding of the X-M, M-Y and X-Y link
c) no reverse causes
The reason why they are termed "strong" is that only one value (=zero) validates the assumption; weak assumptions are validated by an universe of nonzero values.
When estimating the parameters (most importantly, both direct effects), the algorithm incorporates these assumptions into the estimation of the parameters. This why model fit is so important: the reasonable interpretation of the parameters PRESUMES the correctness of the assumptions. Otherwise they may be biased or even useless.
The most important difference it that the structure (with its assumptions) implies testable implications (in contrast to regression).
Of course (and this is how regression is usually applied), the basis for a regression can be a causal model (with causal assumptions), but in this case, the actual model (behind the regression) is indeed a SEM and the regression is just a tool to control for confounders, and not a model in itself. Because--again--if the underlying model is wrong, the regression will result in nonsense parameters.
Moreover, the set of assumptions behind a regression is most often poorly developed, making the regression problematic as a tool:
a) researchers have no idea what they have to control for (i.e., and include in the regression)
b) researchers don't think about the relationships among the predictors which often results in controlling for mediators, post-treatment variables, or colliders.
Your post and the recommended literature are very enlightening as many researchers assume that there aren't many differences between regression analysis and SEM.
Thank you all for your responses. A special thanks to David L Morgan and Holger Steinmetz 's answers. They were very important for my understanding in the differences between the two statistical methods.
The SEM was used to validate the theoretically driven model while there is no model implemented in regression.
SEM is ideal when testing theories that include latent variables.
The SEM consists of the measurement model and the structural model. The structural model allows for the assessment of the relationships specified in the hypotheses. Specifically, the path coefficients are examined with attention to the strength, direction, and significance of the relationships.In addition, the model as a whole is assessed through the goodness of fit indices.
The model validation comprises both measurement and CFA. The structural model comprises each measurement model and observable variables. Validation is checked to see whether the SEM model clarifies the variance in the endogenous variable of the study.
Please check the following dissertation and manuscript that I co-authored to see an applied SEM technique:
SEM serves purposes similar to multiple reggression but in more powerful way. It was used as an extension of the general linear model of wich multiple reggression is a part.
After validating the measuers using factor analysis (EFA) then you can either use reggression or path analysis to test the hypothes. However, for path analysis you shoud firstly use (EFA) then (CFA) and finally path analysis through the SEM to obtain the result of testing the hypothese.
Also you have to mention that in reggression analysis principal component with virimax while in path analysis likelihod with vorimax are used.
Multiple regression is an excellent tool to predict variance in an interval dependent variable, based on linear combinations of the interval, dichotomous, or dummy independent variables. While, multiple regression is observed-variable (does not admit variable error). However, SEM is latent-variable (models error explicitly). Has applied to a variety of research problems, within the family of SEM, techniques are many methodologies, including covariance-based and variance-based methods. Covariance analysis also referred to as confirmatory factor analysis (CFA), causal modeling, causal analysis, simultaneous equation modeling, and analysis of covariance structures.