I have a four-factor model determined a priori (some emergent theory) to which my data fits quite well when I do confirmatory factor analysis (CFI>0.98; TLI>0.98; RMSEA0.97; RMSEA
model comparison is a difficult issue. Of course, you can go the easy road by complying to standards such as "use the lower BIC or AIC values". I would however suggest to strictly test each model (ideally by sticking to the chisquare test) and to find out what the problems of the misspecified model is/are. Your results IMHO show why EFA can lead to nonsensical results. I would further but even your 4-factor model is (still a bit) misspecified. But that is only a guess.
I would like to highlight another perspective that seems to be missing. Although you can follow the easy approach as described by Professor Holger Steinmetz by comparing AIC and BIC.
I am worried that you might be taking a data-driven decision to explain the underlying structure of the phenomenon. Instead, I think that the number of factors to be investigated should come from available literature and your own perspective of the factors that play the main role in explaining the phenomenon.
It should be possible to conduct a direct comparison of the 2-factor versus 4-factor CFA using the chi-square data (i.e., difference in chi-square according to the difference in degrees of freedom).
Thanks Holger Steinmetz , Belal Edries and David L Morgan for your kind and helpful answers. I shall go ahead and examine the chi-square data - beyond the AIC and BIC tests.
yes, do that but be aware that even if model A is "better" than model B, it may still be misspecified, perhaps in a trivial way or perhaps in a serious way. Furthermore, a "better data-fitting" model may still be more severely wrong compared to a model with a worse data fit. Data fit != Correct causal specification.
Here's a simple example (albeit a non-factor model):
a) Model A describes a full mediation model (x-->m-->y) --however, it has one error namely, it misses a direct effect which exists in the true data generating model.
b) Model B describes a reverse causal, albeit partial mediation model (y --> m -->x and y --> x)
As easily observable, model B is complete bogus as non of the effects make sense. However, it fits better than model A. The same is true for any model structure. Data fit depends of several things with number of estimated parameter being on of them. You can fit any nonsensical model to the data with an increasing number of estimated parameters.
Take you time to read a bit into the graphcial literature to understand concepts of path tracing, d-separation, testable implications and equivalent models.
Here are two easy introductions.
Rohrer, J. M. (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27-42. doi:10.1177/2515245917745629
Keele, L., Stevenson, R. T., & Elwert, F. (2019). The causal interpretation of estimated associations in regression models. Political Science Research and Methods, 1-13. doi:doi:10.1017/psrm.2019.31
Thanks Holger Steinmetz for the recommended reading and thoughts on causal inferences when using observational data. Thanks Belal Edries for highlighting the concern about a datadriven decision. Perhaps I should sharpen clarity on my objective to invite even sharper advice. My objective is to examine the measurement model for dimensionality and patterns of association among aspects of the phenomenon with the observed variables being on a reflective scale and the study is largely exploratory. I am not advancing past the measurement model to the structural model in this case.
I've chosen to skip the EFA since I did not have a second sample to perform the CFA on (Article Structural Equation Modeling in Practice: A Review of Recomm...
)
As I go directly to the CFA, I find a discriminant validity issue with my data on the theory-based 4-factor structure - consistent with the suggestion of combining indicators of F1 with F2 and those of F3 with F4 into a two-factor model. This two-factor model appears to solve the discriminant validity problem (in my data). I proceed to evaluate three models: M1 being the Two-factor structure, M2 being the four-factor structure and M3 being the 4 factor-structure plus two second-order variables linking the two pairs of closely correlated (and highly co-varying) factors. All three models have very good fit statistics and the highest chisq/df ratio among them is 1.046 (for M3). The best-fit indices are on the four-factor model (M2 - specified a priori), followed by the two-factor model then the model with 2-second order constructs (M3).
My new question is: Does the discriminant validity issue with the "best-fitting" four-factor model (M2) undermine my objective to examine the dimensionality and patterns of association, enough to discard it for the 2-factor model (M1) instead?
An attempt to investigate the roots of lack the discriminant validity issue, I would suggest you try to use computed the latent variables scores to calculate the VIF using the SPSS.
Another approach is to use the correlation circle to plot the items and to check their correlation of the items. I have prepared a code to run it in the R environment. This can help to identify items that the respondents perceived to be similar.
Based on these actions, you can do the following:
Try to check if you can identify an item which is causing the discriminant validity issue, then you can drop them but you should not get below than 2 items per construct.
If option 1 was not available and the theory supports this, you may drop one of the factors that are causing discriminant validity issue.
If option 2 was not viable, then I think you are left with one option to go for the two factors models.
The two-factor model involves a single correlation between those factors, while the four-factor model involves either 6 correlations if you allow for the full set or 3 if you group them with second-order correlations. That means you can test the difference between the two-factor model and four-factor models with either 5 degrees of freedom or 2.
when you write " The best-fit indices are on the four-factor model (M2 - specified a priori), followed by the two-factor model then the model with 2-second order constructs (M3). " I doubt that you have read my post. BETTER fit does not necessarily mean "more correct".
Before you can take a look onto issues of validity, the model has to fit--otherwise loadings will be biased or the whole factor structure with its involved factors may be nonsense.
What are the chisquare values, degrees of freedom and N of the models?
Thanks Belal Edries for the pointers of digging deeper into the discriminant validity. I am using R so the code goes a long way. Thanks David L Morgan for the pointers on the correlations and the degrees of freedom.
Holger Steinmetz, I could use further insight into your note that "even the 4-factor model might be a bit misspecified". You may notice that you partially inspired my laying aside of the EFA :). BTW the N is 405. As regards the chisq and df, here we go; in case you see something:-
this is fascinating: None of the chisquare tests is significant. The only way to uncover the black swan is to enlarge the model with further variables. This could ideally be validation criteria (i.e., variables that should show a differential pattern of effects or relationships with the latent variables).