I really need scholarly explanation(s) and reference for the number of modifications (i.e. covariances between individual items) allowed in (Confirmatory Factor Analysis) Measurement models?
a) modifications should be theory-based - not using the modification indices (MI). There is research (see below) that shows that blindly following the MI can make a slighlty misspecified model a complete nonsense albeit obtaining "fit"
b) When you change the model post-hoc (which should be done in my point of view if the model fails) than you are leaving the "confirmatory / deductive road" and approach the "abductive road". The result is a new hypothesis which should be further tested (not simply by replicating the study+model as this prooves nothing) but by thinking further implications of the model and variables.
Best,
Holger
Kaplan, D. (1989). Model modification in covariance structure analysis: Application of the expected parameter change statistics. Multivariate Behavioral Research, 24(3), 285–305.
Hayduk, L. A. (1990). Should model modifications be oriented toward improving data fit or encouraging creative and analytical thinking? Multivariate Behavioral Research, 25(2), 193–196.
the problem with freeing covariances between measurement errors is
a) that you "open a valve" that may cloud a serious misspecification of the whole measurement model. For instance: A two - factor model falsely tested as a one-factor model can be fit by relaxing error covariances
b) an additional similarity between items giving rise to a error covariance could actually represent a further latent variable that influences both items. This latent variable, however, does not appear explicitly in the model but only as a covariance. This may be problematic. In extreme cases this latent variable may be an importnent predictor of other variables in the model (non-CFA) but is completely left out.
Beyond that; items sets that are more similar to others routinely occur in larger sets which are semantically heterogenous. This may point to a misspecified and more diverse latent structure...
See:
Landis, R. S., Edwards, B. D., & Cortina, J. M. (2009). On the practice of allowing correlated residuals among indicators in structural equation models. In C. E. Lance & R. J. Vandenberg (Eds.), Statistical and methodological myths and urban legends - Doctrine, verity and fable in the organizational and social sciences (pp. 193–215). New York: Routledge.
And with regard to "theory based". When freeing error covariances is so theory based: why not include it a-priori? :)
just a short suggestion: If your sample size is large enough, you may devide the sample into two random samples, use the first sample as the calibration sample and the second sample as the validation sample. In the calibration sample you could add error covariances based on theory and modification indices, and this model could be validated with the validation sample. Then you will know whether the error covariances are substantial or whether they are just sample-dependent.
That depends on the number of items. If you have, e.g., 100 items, a sample size of N=150 would not be enough. It also depends on the size of the correlation: For a low effect size the power would be quite small. So just give it a try and see whether any problems occur.
Take into account that every modification must have a theoretical explanation, thus, as more residuals are covariated the model will become more complex or even impossible to interpret.
Holger Steinmetz Thanks for your responses in the ResearchGate, not only this one but also under other topics related to CFA and SEM. I was able to find answers to many of my questions.
I have one follow-up question to the current one: I have 14 item questionnaire with 3 subscales (N=240).
-The one-factor models (for both 13 items and 14 items) have a worse fit. So, I guess I have evidence for the multidimensional model.
My question is: which one is preferable to delete a question from the questionnaire or add error covariances within a latent variable (i.e. a positively stated item and a reversely coded negatively stated item in the same subscale)?
When I add 2 error covariances with this negatively stated item within the same subscale, for 14 items: (χ2 = 157.744, df = 72, CFI = .954, TLI = .942, RMSEA = .073, SRMR = .073)
unfortunately none of the models fits. It is a common misperceptions to prefer "better" fitting models when all models don't fit and to think the better-fitting models are better or "more correct". You have to find out where the problems lie.
In this thread I refer to a recent paper in which I could demonstrate a way to proceed. Perhaps that helps (you don't have to read the whole paper--just study I)