While validating a scale, I had first used EFA and then CFA with the same data set. Reviewer of my paper suggested not to perform EFA as we can't perform both the CFA and EFA in the same data set.
dear Arjun Shrestha
i think its depend on your purpose . if u want to explore then use EFA and if u want confirm use CFA .
I didnt found any research use both in one time . but in the purpose of teaching we can use it
I don't see any conflict because confirmatory factor analysis has to "confirm" something, and that hypothesis might well arise from EFA. Still, there is little to be gained by arguing with a reviewer, so you might just start with the CFA and say that the model you were testing was based a combination of "prior theory and empirical work."
if your data set is big enough, you can split it in 2 data sets - then you may do EFA with one and CFA with the other half. (you shouldn't do a lot of statistical analyses with the same data, that's what we learn in introductory psychology courses).
Yes, you can and should do! EFA permits cross-loadings... So, you can select the items with higher factor loadings in that factor (>.3) and performa a CFA. However, cross-loadings are not permited in CFA models. You should try bifactor models, and second-order models... Another alternative is doing exploratory structural equation modeling (ESEM) and confirmatory structural equation modeling (SEM)... Try function "omegaSem(data)" in the ´psych´ R package... is a bifactor CFA (based on loads >0.2 in 3 factors and one general factor).
There is nothing in CFA that prevents a cross-factor loading, but it implies that the item in question is related to all of the items in the second factor.
A more specific approach would be to include error correlations between pairs of items -- in this case, between items on the different factors. This implies that after you account for the expected covariance based on the larger model, there is still a substantial amount of "residual" correlation between a given pair of items.
There is no equivalent to that level of specificity in EFA.
The answer is a big "NO". No no never do that. EFA and CFA is totally two different things altogether and the data cannot be recycled!
EFA is carried out from a pilot study data. The reasons for doing EFA is to identify the dimensionality of items, to drop the items having low factor loading as well as redundant items from the questionnaires to be used in real study or normally termed as field study. After EFA, we compute the Cronbach Alpha for the remaining items so that we know the reliability of items to be used in the field study.
Once we obtain data from field study, we proceed to doing CFA to assess the Unidimensionality, Validity, and Reliability of the latent constructs. Once these three requirements are achieved, we proceed to modelling using SEM.
Remember - No EFA for field study data. And the pilot study data cannot be added to field study data for further analysis.
Normally, EFA is done to explore the possible underlying factor and CFA is used to verify the factor structure. Identification of factor and verification of factor can not be done on same data. if you are not tested your identified factors on other data, you can not generalize your result.
It all depends on where you are in the validation cycle. In a first attempt at validation using factor analysis, I use EFA to examine the multidimensionality of the measuring instrument. Once the underlying structure has been identified, I make predictions about the factor structure, identify a new sample of participants with similar demographics, and use CFA to test my predictions. I do not use both EFA and CFA on the same data set and I rarely divide a large data set into two parts. I prefer to confirm my predictions on a different sample so as not to take advantage of "chance" factors.
Hi folks,
just a few comments.
(1) Both EFA and CFA apply the "common factor model" that proposes that the set of indicators are causally influenced by one or more underlying latent factors. That means, this model is a causal model, in which the factor represents some empirical, one-dimenional entity, that exists independent of the measurement procedure and causes the correlations among the indicators.
Consequently, the question should not be to discover the "dimensionality of the indicators" which in my point of view is a very unclear goal (what does this actually mean?) but the correctness of the model - that means a) does the factors actually represent an existing entity (or is nonsense) and b) if the supposed causal effects (i.e., factor loadings) are correctly specified.
(2) If you first explore such a structure with EFA and succesively conduct a CFA (in the same sample), you test the validity of those restrictions implied by the CFA which were not part of the EFA (e.g., fixed cross-loadings, uncorrelated errors). If the structure is correct wrt "a" and "b" above, this test makes sense - if the structure is wrong, the test is nonsense (what sense makes the test of a loading=0 if the underlying factor is invalid?)
(3) Simply repeating the study and either re-do the EFA, or conducting a CFA after an EFA in the first sample or conducting two successive CFAs does not validate the supposed structure - unfortunately. The only thing that is tested (thats why replications are useful) are capitalization on chance, errors of the researcher, differences across samples and so on.
The structure is not validated because models can be fitted to the stable part of the sample (across potential samples) albeit being wrong. If you then repeat the study, you again fit the wrong model to the new data - with "success". This can be easily proven with a small simulation study: Create a population model, specify a wrong model and relax constraints until the model fits. Then draw a new sample and test the final model again. It will fit again.
Hence as Gary said, more useful would be to enlarge the model, include variables that should be correlated with the identified/tested factor if the factor is valid. Even better would be to move to a complete SEM with causally usefull restrictions - that should hold if the factor is valid. This can be done with the same sample because you add theory. I admit that I don't know if the tests in this situation have still same statistical properties, though.
And one final comment: Cross-validation creates two identical sub-samples. Hence, if the split has worked successfully, you should always get identical results. Thats proves nothing...
We should really move away from applying factor models in a way especially psychology has used in the last 100 years - namely to "discover dimensionality", "reducing items to interpretable factors" and the overall goal of convenience but rather apply realistic concepts als existence, truth, and causality.
I hope this request is not too bold :)
Best,
Holger
Dear Colleagues:
Performing EFA and CFA on the same data set, makes it appear as if that there is no prior theory on the structure of the instrument subject to validation. I do not believe this is the case in the question at hand.
If a theory exists it must be cleary stated and supported by reasonable argumentation.
Now, in the best of possible worlds, performing an EFA and then a CFA on different samples drawn from the same population is the strongest alternative.
If this is not posible, then a cross validation study (Brown & Cudek, 1980) by splitting the data set in two randomly selected sub-samples, provides strong suppor for accumulation of validty evidence.
However, if the theory is wrong or not carefuly construed, no statistical resource will help.
"Quod Teoria non dat Statistica non praestat".
Manual, with all the respect, it is good practice to read the postings in a thread prior to self-posting to avoid redundancies.
As I argued above (and I welcome you to provide some arguments and I am willing to listen), performing an EFA and successive CFA provides no strong support for the model.
Simililary, cross-validation is a ritual that proves nothing.
Let me give and simulaiton to illustrate the issue. I will use the software R to
a) create data according to a true population model. This model- however, is no factor model. This is intended to show the extreme difference from our factor model to the true model
b) conduct a EFA (that reveals a one-factor solution. This factor obviously is nonsense as it simple does not exist)
c) re-do the study and "validate" the found factor model with a CFA
Everyone can copy the code into R to re-do the simulation (note: packages "psych" and "lavaan" have to be downloaded and installed. I comment the code in the same line with "#'s".
# # # # # # First sample: EFA # # # # #
# Creating the true model (Note. The model is absolutely no factor model - the x's influence each other - hence theie is no "factor" at all)
x1 = rnorm(500)
x2 = .8*x1 + rnorm(500)
x3 = .7*x2 + .5*x1 + rnorm(500)
x4 = .4*x3 + .4*x2 + .3*x1 + rnorm(500)
data=as.data.frame(cbind(x1,x2,x3,x4))
km |z|)
Latent variables:
F =~
x1 1.000
x2 1.517 0.083 18.211 0.000
x3 1.931 0.103 18.756 0.000
x4 1.871 0.107 17.458 0.000
Hence, simply repeating a model or even a successful CFA does not validate the model.
Now - what happens if we include an additional variable W (an instrument for F)?
(Carveat: Of course, this tests depends of the actual causal role of W - in this example, I assume that it is only correlated with x1 - but any other variable would do, too)
# # # # # INCLUDING a Instrument for F # # # # #
W = rnorm(500)
x1 = .8*W + rnorm(500) #x1 is related to W
x2 = .8*x1 + rnorm(500)
x3 = .7*x2 + .5*x1 + rnorm(500)
x4 = .4*x3 + .4*x2 + .3*x1 + rnorm(500)
data=as.data.frame(cbind(x1,x2,x3,x4, W))
IVmodel
Holger,
Thank you for your input, and as soon as I can I will take he time to go through it. It is also good practice to review theory postulates before embaking in in statictical simulationsm whcich do help but do note entirely resolve. It is also good practice to spell a colleague´s name correctly.
Dear Manuel, sorry for the misspelling. It was a long text and simply a typo. And I agree totally with you (thats the bottom line of my simulation).
There is no way that excludes theory. Thats why a theory-less EFA can sometimes lead on the totally wrong track.
With warm regards,
Holger
I agree with the opinion offered by David L. Morgan.
I see no reason why EFA and CFA cannot be combined in a single study. An important point to consider is the purpose of a study. My line of reasoning is as follows:
(1) The EFA is conducted as a pretest to evaluate the questionnaire items and see whether the proposed constructs or construct dimensions would be reflected in the items' loading, the factors.
(2) The scales are modified based on the EFA results and the CFA is then performed so that a researcher can have a better measurement of construct validity. It may turn out, for example, that not all the factors have sufficient convergent validity (i.e., constructs have a low proportion of variance in common) or have insufficient discriminant validity (which indicates the construct in question is not sufficiently " unique" among the other constructs). This happens despite a fact that the EFA results supported construct validity (and reliability).
(3) The instrument is then modified based on the findings from the CFA.
As this outline shows, combining the EFA and CFA allows a rigorous assessment of the instrument properties.
Another reason to question the proposed "incompatibility" of the EFA and CFA in one study is that findings in statistical analysis reflect property of the data set. Therefore, different data set may / will yield a different outcome of the test. Conducting both EFA and CFA on the same data reduces such a possibility.
Further reading:
Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006). Multivariate data analysis. Prentice Hall Pearson Education.
Hi, Pat,
I am not against--indeed, all for-- conducting a second study. What I do not support is a rather dogmatic view that the EFA and CFA should never be used in the same study. (Despite such stance this has been --and will be -- done in practice for obvious reasons).
If a researcher wants to look a little deeper into the properties of the instrument at hand she or he would want to conduct the CFA on the same data set. It is practical. After the instrument is fine tuned -- benefiting from the insights provided by the findings of both the EFA and CFA -- one can (and normally would want to -- and should) definitely try it on a different group of respondents.
Best wishes,
Larisa
Hi Larisa,
basically I agree with your overall perception that that combining both should not be "prohibited". But the question is what you mean with "look a little deeper". When you do a CFA after an EFA, you test restrictions which the EFA does not imply (some/all double loadings fixed, and fixed error covariances). I don't know in how far this enlarges your understanding of the structure and the meaning of the latent.
And please take into account that testing the CFA after EFA in a new data set only tests for sampling error / capitalization on chance (as Pat points out). It does not validate the structure (see my simulation above). The better way to think which further variables and theoretical implications or restrictions should hold if the factor structure is correct and the latent has the proposed meaning. This, indeed, can be done in the same data set because it adds new theory to the model (above that, the sampling error problem however is still valid).
Best,
Holger
Hi Holger,
Thank you for your message. Basically, "looking a little deeper" means, among other things, having a better understanding whether dimensions that came up during the EFA, especially 'unexpected' dimensions, are promising (i.e., they "glue well" --in my own language -- and are "unique" enough) and, therefore, should be considered in the process of instrument development. This is especially relevant if an instrument measures attitudes or other psychological constructs.
Needless to say, besides all the figures and numbers, there are also face and content validity. Combining all of these, i.e., results of the EFA and CFA and the basic theoretical underpinning (face and content validity) gives a research a chance to "look a bit deeper".
Best wishes,
Larisa
Hi Larissa,
what you write in the first paragraph ("glue well" :) nice) is what I meant with investigating/testing the meaning of the (new) latent. The CFA only repeats the EFA and imposes further restrictions (the diverse fixations). This does not so much increase the understanding. You would rather learn much more by placing the new latent(s) in a more complex network of associations (controlling the "old / established latents") to see if there is something substantial in it/them. This can be done in the same data set and allows to move a bit more further beyond the EFA in THEORETICAL terms.
I agree with your focus on content validity. Inspecting the wording of the items (and in combination with qualitative interviewing/cognitive probing) provides important insights into validity.
Best,
Holger
Hi Holger,
Thank you for the message. Seems the participants in this discussion have a general agreement regarding the question.
I hope everyone benefits, knowledge-wise, from this exchange of opinions.
Regards,
Larisa
EFA and CFA are NOT in conflict with each other, rather can be used complimentary to each other.
I agree with you Massoud. EFA can be used for theory building then CFA for testing the theory.
If one is developing a test/instrument, in that case one can use EFA because the dimensions of a scale need to be known,--theory generating. In EFA, we can delete the items with poor loading, calculate validity as in CFA. If the theory is available to specify the dimensions of a measure and it is already ascertained earlier, directly we can go for CFA--theory confirming, estimate validity and composite reliability. in assessing internal consistency/reliability, the composite reliability is likely to yield more value than the Cronbach alpha because of the estimation formula. I do not find any justification for using both in a single study unless there is a compelling reason to do so. The concern is that the objective of the researcher will drive analysis, not the vive-versa.
hello evryone, may answer this question. I am a psychology researcher and had conducted research using EFA in the development of a personality test. In my experience, we must first conduct EFA so as to determine the underlying factors of a construct like for example, Filipino adolescents resiliency. After conducting the EFa and have known the underlying factors, these factor structures will now be subjected to Confirmatory Factor Analysis (CFA) thru the Structural Equation Modeling technique. In CFA, it must be conducted to ANOTHER set of sample respondents in order know if these items retained in the EFA really fits the data or what we call "Model Fit". Through this, we can determine if the items of a scale is VALID and RELIABLE.
It was recommended but not necessary. CFA is based on EFA setup!
Hello Everyone, It was nice to see all your feedback on EFA and CFA and I really appreciate you for sharing your views.
I saw that, somebody says that no need to use EFA in empirical studies. and somebody says we should not use same data set to perform both EFA and CFA. and other says, nothing wrong if we do both.
But my question is, Can we split our data set in to 2 equal parts (400+ samples) and use 1 for EFA and another for CFA?
Kindly answer in a simple way with brief explanation .
Hi Hulugesh,
splitting the sample randomly will result in two approximately identical halves. Hence "validating" anything what you have found in the first half in the second half proofs nothing.
Take a simple example. You have a population model consisting in two factors - each measured by two indicators. In the first half of your split samples, you estimate a one-factor model, notice a misfit, and you fit the model by adding a error covariance. Subsequently, you take the second half, and again estimate a one-factor-error-covariance model. It is extremely likely that this model will fit. This can be easily simulated.
Actually, I cannot understand why this "cross-validation" approach have ever been invented...
Best,
Holger
Hi Holger! Interesting debate! Have you written about this issue in some of your articles or is there an article you would recommend about the topic?
Best
John-Kåre
Hi John-Kare,
thanks for the comment. No, not yet. And I do not know any article that critizices cross-validation. However, it was not my idea, either :D. I heard that from Leslie Hayduk on the SEMNET mailing list years ago.
Best,
Holger
Thank you Holger for your reply.
And, Thank you İrfan for you answer and you have put in a very simple and clear way. As you already mentioned in your previous answer (Now also you have mentioned the same thing), I have collected 950 sample from the same population and then randomly divided in to two parts. After that I used one data set to perform EFA and another date set for CFA.
And now my question is simple and straight forward "What I have done is Right or wrong"? .
Because as i already mentioned, few authors says its wrong thing to perform both EFA and CFA in one study. And some authors says that you should not split your data into 2 parts and use them to perform both EFA and CFA. And also few people say that, there is no use of doing both in single study.
Hi Hulugesh. Not published yet? Just to get down to the pragmatics: As @David said: there is little to be gained by arguing with a reviewer. Those who have posted here may well be your next reviewers and you see for yourself that you may get different responses. You want to publish or not? You HAVE two choices:
#1) Send to a new journal with the split half and hope you don't get @Holger and @Zainudin as reviewers. #2) Do either EFA or CFA according to @Gary's advice: it depends on where you are in the validation cycle. If you have strong theory and some previous work with the scale; do CFA. If little is done before, just do EFA. Few reviewers will have problems with #2.
Best
John kåre
Hello John kåre ,
Thank you for your answer and yes i have not published yet that is why i am in confusion and wanted to clarify the doubts. Once again thank you all.
Folks,
the "discussion culture" within researchgate starts to deeply concern me (not to say it is annoying).
In particular:
(1) If you want to contribute something (which I am sure you want), READ THE THREAD. It is really not leading to anything if the same claims and counter-claims are made over and over again because each new contributor posts his/her view without noticing that the same claims had been made dozens of times before.
(2) If you make a claim, PROVIDE ARGUMENTS. If you contradict a claim, provide also arguments. This ping-pong back and forth is not helpful for anyone. Most of the claims made in this thread recently, have already being made over 3 years ago
With regard to the topic:
Of course you can do an EFA followed with a CFA in the split-halved sample or another sample. If the reviewer wants it than do it.
The point I make (and made already 3 years ago), is that the hopes to support or validate the explored model are on very shaky grounds.
a) In a cross-validation situation (i.e. the split-half-scenario), you will most likely find what you've found in the exploration sample (as it is identical) even if the model is systematically misspecified. I will provide arguments below
b) In a pure replication situation (re-sampling from the same population), the consequences are exactly the same (as it resembles the split half-procedure). Simply re-doing the model will support nothing.
c) In a non-exact replication (e.g., repeating the study in a different population),
1) a successful replication AGAIN tells you little about the validity of the explored model in the first study (because of the reasons above)
2) a failed replication also tells you nothing about the validity of the explored model (as it may have been correct in the first population but not in the newly investigated population.
Why does a cross-validation or pure replication does not support a explored structure?
Take the simple model A --> B-->C. which could be the result of the exploration. The data implications are 1) correlations between A and B, B and C and A and C, 2) the correlation between A and C disappear when B is controlled. Imagine you explored the model in half #1, then you test this model in half #2 and - BAM - it comes out again. Does this support the explored structure? No, not really, because if the true model is in fact C --> B --> A or A C, the exact same pattern will occur. That means, simply repeating the model does not support it. I will note later something as a "(theoretically) enriched replication". This indeed adds a lot of value to the replication!
And again, I posted all of this 3 years ago above. And I provided a simulation that proves the point. In this simulation, I created a non-factor model, leading to inter-correlated items (x1-x4). Thrown in a EFA, the model resulted in a nice one-factor model (albeit being nonsense), Then I drew a further sample (which resembles points a and b above) and voila, a CFA showed a clean fit with a non-significant chi-square test.
However, including one single external variable that was not part of the explored model but is now added in the "validation phase" leads this model to metaphorically explode, thus, revealing that the explored (and now tested model) was nonsense.
Above, David said that the hypotheses for the CFA must come from something. Fair point. Hence, I would not go so far to think that a CFA after a EFA in a new or split-half sample is useless. But the thing that is really *tested* are the restrictions not implied in the EFA (e.g., double loadings). This indeed my impose a challenge of the model leading to misfit and, hence, invalidating the explored model. The problem with this perspective however, is that the test of the model is biased towards fitting as the major part of the model (i.e., the main factor structure) is explored in the *same sample* (albeit the different half). Specifying a model structure *based on theory* leads to a stronger, more challenging test instead of specifying the structure based on exploration *within the same sample*
Drawing a new sample (pure repetition) may increase the challenge a bit (because of the sampling error) but the main issue with this repetition exercise is not the random part of the sample (varying across samples) but the stable part. The pattern of correlation for the A-B-C example above is the stable part, leading to a wrong-but-fitting model in the vast majority of drawn samples.
Incorporating NEW VARIABLES ("enriched replication"), instead provides a strong test because the model with its structure is challenged in ways not so far being done. The reason is that now the algorithm must account for the added covariances with the new variables and that will fail if the model is incorrect.
As an example, take, again the A-->B-->C model which is a false representation of the data (and A C is the correct one). Because this model wrongly fits in the first sample, a repetition in a cross-validation, pure validation or non-exact replication will also fit - thus, showing nothing. Adding an instrument for A ("W"), however (W --> A --> B --> C) will provide a huge challenge of the model and it will fail and, thus, show the misspecification (because it has other data implications than the correct model). The same is true by simple added external criteria, demographic variables as single-indicator "latent" variables into the CFA. Small change - huge benefit!
The wonderful issue is: You do not need to split the sample, nor do you have to draw a new study. Although, again, the statistical quality of the chi-square test will suffer, practically the model will fail to fit to the added *theoretically based* restrictions. Hence, if the model fits, this is at least some support of the explored structure (instead of simply repeating it).
Having this all said, these are MY conclusions. I have absolutely no problem in being challenged or refuted but provide arguments or evidence. This whole backward-forward (yes you can ---no you must not) for 3 years now makes me pessimistic that science overall progresses :)
In addition, of course, practical reasons to conduct an analysis in a certain way (e.g., demands by a reviewer) may not always match scientifically reasonable ways to conduct an analysis.
Best,
Holger
Hi Holger. I did not mean to offend or refute your concern about the split-half practice, I'm sorry if it sounded that way. My #1 above was meant as a good-humoured comment. I fully respect your competence and will take with me your concerns about the split-half practice. As it seemed like @Hulugesh had not published yet, I was merely trying to point to #2 as a way forward.
Best
John kåre
Hi John-Kare,
no offense taken :)
The issue is this year-lasting thread and everybody throws in the same opinions without any reference to the exchange being already done.
I fully support your advice (that's why I added the comments wrt practial considerations).
Best,
Holger
It looks like issues such as (a) EFA followed by CFA, and (b) split sampling practices have and continue to be ongoing concerns. I am puzzled because, as someone previously noted, with the availability of ESEM (e.g., with target rotations) and other advanced methods (bifactor-IRT), is it not time to move on?
You cant do EFA and CFA with the same data since it will be the same answer..cant validate that have been validated before.. you need to run EFA and CFA with different set of data.. but the issue is.. do you need to run EFA.. Please read this paper carefully.. may be you can understand better..
Marketing survey research best practices: evidence and recommendations from a review of JA MS articles John Hulland 1 & Hans Baumgartner2 & Keith Marion Smith 3
Some will insist on separate samples. If N is large you could split the original into validation and cross validation (CFA) samples
No you cant, if you you try to confirm the same test but different in term principle.. you will get the same result.. no need to test,.. i tell you it will be the same.. if you really want to test EFA and CFA, you need different sample.. EFA with your pilot test, CFA with your real test.. or..if you have a big sample size.. split into two files, then run EFA and CFA
Hi all, great thread!
I have a related question. I have two samples and would like to validate the factor structure of an existing measure in a new population. It is my understanding from the literature that it is recommended to conduct the EFA then a CFA (in the 2 different samples). What would be the benefits or drawbacks of completing two CFAs versus an EFA followed by a CFA?
Dear Dezarie,
it is really exhausting to constantly fight this nonsense :)
1) When you have a theory or at least expectations of the factor(s) and its/their relationships with the observed indicators, omit the EFA step and test the CFA. You only do EFA when you have a bunch of data and no clue about the latent structure. This EFA fetish is a consequence of mere tradition and often even worse a consequence of lumping two different goals together (identification of latent structures vs. data reduction).
In the following thread, I have argued the "testing priority" holds even in cases were your expectations are as thin as in the extreme case of a dream that gave rise to your expectations. This is an extreme example but see for yourself if you see some merits. I'd like to be convinced that going EFA has an advantage over being precise, rigid and test one's ideas.
https://www.researchgate.net/post/EFA_exploratory_factor_analysis_CFA_confirmatory_factor_analysis_or_both#view=5c9937470f95f1a905604215
2) When you have 2 samples what do you mean? Are they from the same population or from different ones? In either case, you could conduct a multigroup CFA that a) combines N and thus, testing power and precision, and (b) in the latter case you can test the robustness vs. invariance across populations.
3) Better than mere CFA would be to incorporate antecedents an/or consequences of the latent factor(s). CFAs are as aforementioned much stronger than EFA models as they have testable restrictions but, at the same time, they have the weakness that the latent structure (i.e., the covariances among the factors) have absolutely no restrictions (i.e., everything correlates with everything). Going to a SEM is thus a much stronger test and--if passed-- a stronger support for the validity and meaning of your latent factors and the effects they have or receive.
Having said that, try to go the hard way and test your models with the chisquare test. It is an unfortunate development that everyone explains away *potentially* informative misfit and keeps failing models. As humans, we learn from errors. Everything else is (false)-hypothesis-conservation. Innovation begins with learning that something is problematic and has to be improved.
HTH
Holger
Did you adopt or adapt research instruments????
if the underlying structure of items you are using is known then go for CFA if not then you have to explore and go first for EFA...
you can get an idea on EFA to CFA from my paper;
Article An Easy Approach to Exploratory Factor Analysis: Marketing P...
yes it is suggested that you do not do the EFA and the CFA on the same data.This can be explained as follows.In case you take a set of samples and make a door as per their height needs(EFA).Then you test the hwight of the door for generalising the same on a different sample(CFA).In you test it for the same sample,the results will be positive as the door was constructed according to their height.
Its a laughing stock if you explore and confirm using the same data. Full stop.
by doing this you may face with two potential dangers:
1. overfitting
2. rejection by many reviewers
https://psycnet.apa.org/record/2017-56618-001
Hi folks,
I would agree that it is of limited value to do first EFA and then CFA with the same sample (but also in different samples :).
But the question is: How can it be that in most cases doing that results in misfitting CFA model? :) [BTW: it is a rhetorical question :) ]
Best,
Holger
Hi,
For future readers I would like to balance the opinion of Holger Steinmetz which lies heavily in this thread about 1) the "uselessness" of doing EFA on one sample and CFA on another sample when a researcher has no clear ideas about the latent structure; and 2) the "uselessness" of doing cross-validation.
I disagree with these two statements.
For the first point, which is actually not uncommon in the literature (CFA after EFA on a different sample), I would simply point to a book on the subject (cf p193 of " Confirmatory Factor Analysis for Applied Research " by T.A. Brown) for clarity instead of paraphrasing it.
For the second point, cross-validation is an essential way of evaluating the generalizability of any model, and I would strongly disagree with "1) a successful replication tells you little about the validity of the explored model in the first study" and "2) a failed replication also tells you nothing about the validity of the explored model" as mentioned by Holger Steinmetz.
According to the goodness-of-fit metric in a given study, a "successful replication" (according to the value of this metric) in another sample will demonstrate the likely generalizability of your model to previously unseen data. This is essential in most scientific studies. Reciprocally, a failed generalization of your model to unseen data will suggest that your model is specific to your sample (often due to overfitting) and therefore of very limited interest to the scientific community. Obviously the two samples are assumed to come from the same population (as per the repeated suggestion in the thread to split the dataset in two -- i.e. to get two samples coming from the same population). Cross-validation typically assumes the data in each fold (i.e. each sample) comes from the same population.
Michael
Hi Michael Dayan , thanks for joining in.
perhaps, our disagreement can be solved by potentially differing perspectives about what a "model" is. My comment was focused on SEM as inherently causal models that make a set of claims about existing causal effects. In contrast, purely predictive models (e.g., Regression models or machine learning models) are descriptive/associative in their nature. All what you wrote would make perfect sense (and I would agree) if you refer to the latter.
In the case of an SEM as a causal model, however, replication and cross-validation is surely not useless, as you note. It can test for capitalization on chance, problematic behavior or errors by the researchers, or --if you apply the model in a different population, for boundary conditions or robustness of the underlying theory. This is the case for causal as well as predictive models.
My point however (and I still think, I made that clear in my posting from December, 8th) that a successful replication is no validation or test of the correctness of the causal structure and its inherent causal claims. The reason is the existence of equivalent models. Take a simple example: You test or explore a full mediation model with the sequence A-->B-->C. This model as one essential implication, that is, the conditional independence of A and C given B. The model fits, you are happy but as a honest researcher you draw a new sample and re-test the model again. The problem however, is that unfortunately, you got it totally wrong, as the relationship between A and C is totally spurious and B not being a mediator but a confounder. Thus, the true structure is A C. As a result of your replication, you will again find a decent fit as your (false) model as exactly the same implications.
Hence, you can do this exercise a dozen more times, and you will find (in the limits of sampling error) a very decent fit. Replication is not enough and that's why "a successful replication tells you little about the validity of the explored model in the first study".
The same example holds true for factor models. If you are interested, I can upload a simulation that I did, in which I created a population factor model, drew a sample, fitted the wrong model by implementing nonsense modifications (error covariances), finally re-drew a new sample and fitted the wrong modified model again and again and it fitted again and again.
Best,
Holger
Holger Steinmetz I understand the limitations of the cross-validation in the case you mentioned and this is a good point.
For the usefulness of cross-validation in general within the context of factor analysis I would like to point for example to this study proposing an approach based on cross-validation in the general case: Article Applied Psychometrics: The 3-Faced Construct Validation Meth...
I would need more time to examine your example and the simulation you described earlier in the thread (i went through most posts before writing my answer as you suggested). It would certainly be interesting to continue the discussion at a later point (it may take some time for me to find time to analyze properly your code and example).
Michael, the central issue is the concept of validity overall--no matter whether it concerns some claim about a causal effect or supposed meaning of an underlying latent factor (the usual issue in psychometric perspective on validity). These are claims about the unknown reality, and as you know, there is an essential gap between reality and observed data. Hence, no matter what data-related or statistical exercise you conduct: nothing follows from data (as the saying goes: the data is dumb).
Judea Pearl, in his "Book of why" has nicely addressed this with the "causal ladder", which on its first runge, has data, observations and associations. The second runge contains causal effects and the third contains counterfactuals. To build a connection between the first two runges, no matter whether it is from bottom to the top (exploration) or the top to the bottom (testing), you need some extra, i.e., non-data-related assumptions. As Nancy Cartwright said: "no causes in, no causes out".
A SEM, may it be a factor model or a structural model, also contains surplus assumptions and meaning that is not contained in the data or the data-fitting result. Hence, simple replications surely reveal if you had found something which is stable and reliable but not, whether the surplus assumptions are true. This is comparable to repeating a test result which only shows that the measure is not random but not whether you measured what you intended to measure. This is beyond the data. The only way out of this dilemma (and to bridge the essential gap) is to formulate new hypotheses or assumptions that must hold if your claim/model is true. Applied to the factor model, this would simply mean, to incorporate other variables adding new implications not involved in the first attempt to explore/test the model. Or applied to the full mediation example, you could incorporate an instrument for A which would allow to test your model and separate it from the confounder model because IV --> A --> B-->C is NOT equivalent to IV --> A C. But even then, a successful test would not proof your model is correct as still there are some variants that are equivalent to this model. This is basically the problem of non-verifiability that Popper addressed.
Having said that, I don't know whether the forwarded article discusses these issues or what you else you could refer to with the "context of factor analysis". The psychological literature on factor analysis is strongly positivist historically, and factor models for most of the time were rather descriptive or data-reduction focused. Still today, many people don't know the fundamental difference between a factor and a component (from principal component analysis). I can only speculate that the proposed treatment of crossvalidation in the paper is related to the latter (and again addresses important issues if stability, replicability, and reliability--but not the essential, hidden assumptions).
Best,
Holger
Holger Steinmetz I agree with you that cross-validation does not address the model underlying assumptions (and i would never imply it does), and is mostly used for generalizability (or replicability / reliability as you call it).
I would not say that "the data is dumb" however. The data is as dumb as the experimental design is. And you can perfectly evaluate causality with design such as stratified randomized trials. We would not be able to evalute the causational effect of medical treatments otherwise.
But we can always agree that no statistical method will tell you everything about everything...
Hi Michael,
99% agree, but a "design" (such as randomization) is only some kind of practical setup to make underlying assumptions more plausible. In an RCT, these assumptions are that the treatment is independent from ALL other causes of the outcome (N. Cartwright strongly criticized, see the nice video on youtube: https://www.youtube.com/watch?v=fuvXWnTl6_s ). Again, this is an untestable causal assumption which would not be validated when a successful experiment is successful replicated. That is, even in these cases, the data is just a mean difference between two groups from which nothing follows. It is the plausibility of the correctness of the underlying model which enriches this difference.
Best,
Holger
Nothing is ever completely independent, there are always known and unknown confounds that we ignore or don't take into account because we don't know they exist. However if a stratified randomized experiment is replicated in a different environment (which has therefore mostly different confounds) then it supports the fact that the treatment can be considered mostly independent from these ignored and unknown confounds. So this provides more validation to the (treatment) causal assumption (that it treats) IMHO.
Hi Michael, yes, you're addressing a critical issue that also troubles me. But there are two things to distinguish:
a) the difference between a relationship on the population level and in the sample
b) the difference between a causal relationship and an associational relationship (for whatever reasons).
When I refer to the RCT as creating independence, I refer to the causal relationship on the populations level (it is always vital to discuss causality on that level). That is, randomization interrupts potential effects of hidden confounders on the treatment. I think it is illogical to say that this is impossible; it's a probabilistic implication. On the sample level (especially with low N), however, it is very plausible to have a correlation between the treatment with the confounder (even if the assumption of exogeneity, that is zero-effect of the confounder holds in the population). That's why experiments don't deserve the status they are given (see the criticism by Cartwright in the video I posted).
If you compare the RCT between two environments, it does not matter, whether the confounds differ--the only thing that matters is that in both cases, the interruption of confounder effects works in the population and the treatment becomes independent in the sample. Otherwise, you get a bias in one environment (or both) and you are comparing apples with oranges. Of course, the degree of bias depends on the strength of the dependence. Anyways, you may then end up interpreting the environmental differences as revealing a theoretically interesting causally-related context effect where in fact the true treatment effect is equal but both cases differ in their bias. That is comparable with comparing any type of model (SEM or factor model) across contexts (also across time) when the only thing that varies is the degree of misspecification and bias.
Your statement seems to reflect a bit a kind of resignation a la "assumptions never hold--therefore anything goes". I don't think that is true (you don't know that) nor it is helpful (rather that it gives rise to sloppy science) nor is it a good advise. The only thing we can do is to test, and improve.
Best,
Holger
Holger Steinmetz I think you completely misunderstood what i wrote if you interpret it as saying "assumptions never hold--therefore anything goes". This is also terribly unconstructive to characterize it as bad advice and supporting sloppy science (and not worth following up with this kind of comment with any more message other than this one for other readers).
When i was talking about confounders, i meant experiment-level sample-specific confounders (confounders due to the particular settings of the experiment which affects all subjects). This is why conducting a randomized experiment in a different environment (e.g. different hospital, etc.) strenghten the validation of the causal assumption, with the treatment effect less likely to be caused by unknown local experiment setup confounds.
At the opposite of what you are saying, I urge anyone to invest resources in any kind of reproducibility approaches. This may be as straightforward as experiment replication and adding a cross-validation step in the analysis. Whatever is your reason to deem reproducibility unhelpful in general, I would avoid this kind of statement which go along the lines of what you were reproaching me to start with (a bad advice leading to sloppy science).
Performing both EFA & CFA on the same data set may not yield any insightful results. EFA helps in exploring the factor structure, whereas, CFA confirms the already explored structure. You can use a different dataset for CFA if your research objectives demand.
Dear Michael Dayan , I apologize if I misinterpreted your statement. On the other hand, I do not know what else would follow from your statement "Nothing is ever completely independent, there are always known and unknown confounds that we ignore or don't take into account". If you think that exogeneity or exclusion restrictions (i.e., fixed-to-zero-effects) are always false, you reject the idea of model testability (as there is nothing that can be tested) and you reject the idea to get unbiased effects even in "golden standard"-situations, like RCTs. I am sure this is not what you really think--but it follows from your statement. The reason for my reaction is that this statement is the No 1 excuse for explaining away a model misfit (often associated with the wrongly applied George Box citations that "all models are wrong"). Hence, sorry, if I accidentally have put you in that drawer.
When you regard a successful replication of an experiment as supporting the inherent causal claim, you get it wrong in its logical sequence: To be able to compare the estimate, you have to assume *as a precondition* the correctness of the inherent causal claims (i.e., exogeneity) in both contexts. You cannot turn that around and re-conclude from the equality/similarity of the estimate that the causal claims are correct as the estimates may be biased in both contexts. This is why configural invariance (an equal and fitting structure in two contexts) is vital prior to comparing the estimates in question. I am starting to repeat myself: Nothing (causally) follows from data. Causation does not follow from correlation even you repeat the sampling process and calculate the estimate again. The only chance to get it right is to challenge the model, enlarge it and to set up new barriers which it has to be passed.
Finally, with regard to my stance towards replications and cross-validations, YOU (repeatedly) misinterpret ME. I say it the third time: Replication and cross-validation are *very* useful for a ton of reasons and should be done much more often--but they are insufficient for supporting the proposed causal structure.
I am sorry, that I seem to have made you angry. This is and was not my intention. I cannot get loose of the impression that we are talking about different issues.
With best regards
Holger
I believe that this question was posed around 2014. Indeed, contemporary strategies for constructing and validating scores on our self-report instruments make such a query a non-issue.
-Psychological Assessment (2019), 31(12)--has a Special Issue section for those who are interested in several of the modern "methodological and statistical" strategies.
A common practice for most researchers that should be reviewed and discontinued, Performing both EFA & CFA on the same data set may not yield any new results. You can use a different dataset for CFA if your research objectives demand.
When we do CFA, some items might be removed from the constructs to improve the validity, which was considered to be a part of a component that was driven from EFA.
Some comments here simply say that it is not possible ore may not be useful to combine EFA and CFA without any justification. I'm sorry, but I don't think that is as helpful to the questioner as the original statement in the review, he got. What about: Wismeijer, A.A.J. (2012) Dimensionality Analysis of the Thought Suppression Inventory: Combining EFA, MSA, and CFA. In: Journal of Psychopathology and Behavioral Assessment; 34, 116–125. https://doi.org/10.1007/s10862-011-9246-5?
Chapter AYNI ÖRNEKLEME AÇIMLAYICI VE DOĞRULAYICI FAKTÖR ANALİZİ UYGU...
Holger Steinmetz, Thanks for helpful comments. If I understand correctly i) divide the sample or ii) using the sample totally different to do an EFA and then a CFA doesn't ensure the validation of the construct. Thus, use the same sample (EFA-CFA) wouldn't be a problem, especially if the structure identified with the EFA is confirmed in the CFA. So, the importance would be in identifying a useful and consistent relationship of the structure of the psychological construct previously validated. Would you recommend as literature that supports the non-necessity of divide the sample in EFA-CFA (there's a lot of literature that recommends divide the sample and a little or nothing that says that not divide that is not problematic). Thaks!
Claudia,
there is no literature about this as SEM is dominated by statisticians and statistically oriented folks for decades. Any issue of "validation" goes beyond data and statistics, however. If you have to do an EFA in a first step, fine. But then advance your approach by adding theory and meaning. That is, take your explored factor, interpret its meaning and ontological status based on the factor loadings and then ask yourself what WOULD have to be true IF this factor represents something meaningful in the world (beyond mere data reduction) and represents the very attribute you think it represents. Then advance the model. Simply repeating the factor analysis won't do the job. And sure, replication is extremely useful for sampling issues (capitalization chance) but not with regard to "validity".
I am sometimes puzzled why this is so controversial. You never would have the idea that simply giving a questionnaire to a person, getting some answers and repeating the exercise would ensure validity of the responses. This is reliability --not validity. In causal inferences (and factor modeling IS causal inference) the same holds true: Cross-validation is for ensuring reliability (do I always get the same result across samples?), bringing theory in (before or after exploration) is *supporting* (not ensuring) validity.
This paper is a fantastic blueprint and worth more than the entire literature on psychometrics:
Antonakis, J., & House, R. J. (2014). Instrumental leadership: Measurement and extension of transformational–transactional leadership theory. The Leadership Quarterly, 25(4), 746-771.
They present a multi-step study beginning from item generation up to causal tests.
Best,
Holger
Thanks very much Holger Steinmetz ,
The paper looks good.
I ask about the sample because I'm working in the argument-based approach to validation, with the aim of collect evidence about the interpretation of the results of the instruments in the situation of use that I researching.
I did different steps for different aims: like find evidence of content, response processes, internal structure, relations with other variables and consequences of testing. In the third step (internal structure) I did a factor analysis and I looked the internal consistency (alpha).
Regarding factor analysis I ran a parallel analysis, then a confirmatory analysis (because the literature is consistent about the possible dimensions and its meanings). Last, I tried with bifactor analysis because was reasonable to think in a general factor and specific ones. The results were very satisfactory and coherent with theory.
My doubt is if I did well to test all the factor analysis (parallel, confirmatory and bifactor) with the same sample (n=798) or maybe I should divide it?
Hi Claudia,
ok, the EFA-CFA succession was not optimal but it does not matter much if you do it in the same sample or in two halves. Sample splits make sense in more complex non-linear approaches in which it is likely that you fit the noise in the data. The data source of (usually linear factor models) is...well linear (covariances/correlations) which are not so sensitive to noise.
Best,
Holger
Hi again Holger Steinmetz , what do you mean that the EFA-CFA succession was not optimal?
Hi Claudia,
sorry, I misinterpreted your text and thought you had done that. My fault.
Best
Holger
Hi Holger Steinmetz Holger Steinmetz.
Just a follow up question for clarity.
Would it make sense then to do Exploratory Factor Analysis and get say 24 factors as per factor loading and then use the 24 factors from the EFA to do CFA to confirm a theory which can then be validated using statistical models like SEM?
Hello Bernard,
this thread should be informative:
https://www.researchgate.net/post/Can_we_do_exploratory_and_confirmatory_factor_analysis_in_the_same_data_set2#view=5486170ad4c118047c8b477f
In a nutshell, a simple repetition of a model (no matter of a factor model or any other model) in the same data set (by cross-validation) or new data set provides *some sort* of support (against sampling error, capitilization on chance, technical errors, data handling errors) but not with regard to its causal structure. If you have correlation between x and y and you conclude (=your explored model) that x causes y and you repeat the process, draw a new sample and again find the correlation, this does not fully support your model, as y could be the cause of x or both result from the same hidden confounder. Better is it to enlarge the model and to bring in some new variables with new testable implications.
Grüße
Holger
No need I think. If you are clear about your model and indicators then CFA is enough.
Thanks so much Holger Steinmetz and Muhammed Ashraful Alam. I followed the trend of the previous conversation. I this level, I do not know clearly that factors that influence the situation; so my reasoning is that I do an EFA first to explore and establish the factors first.
As Holger Steinmetz indicated earlier, I may perhaps exclude some of the factors from EFA to do CFA in order and proceed with SEM and regress the explanatory variables perhaps including another variable at the regression stage to avoid the issue of using models on the same set of data. The purpose of regressing is to ascertain how each of the explanatory variable influences the dependent variable.
I do not know if this makes some sense.
scale validation requires testing the psychometric properties of an operational construct. Earlier it used to be done by checking the correlation in multiple steps like item to item correlation, item to dimension correlation and item to total correlation. Now a days we primarily use EFA to explore the underlying factors of a theory/ concept. these factors are then confirmed through CFA. It is advisable to use two different data sets (we can also split one data set into two) for EFA and CFA because if same set will be used then it is already fitted with the data so there is no use of it. When you apply the factors emerged from EFA on another data set for CFA it will give valid result.
Regards
hi,
the enclosed is included one example.
Article The Mediating Effect of Study Approaches between Perceptions...
Well in my case is the opposite, I conducted an EFA and I followed those results with a repeated measures ANOVA, but my reviewer says that I should confirme my EFA with a CFA.
João Teixeira Thank you for the comment. So do you mean that the reviewer suggests a different data set?
Technically, I am not sure how you can "confirm" your EFA with a CFA, but aside from that I think you can convince the reviewer by running a CFA and reporting the an adequate goodness of fit.
Junaid Ahmad The CFA would be conducted in the same sample. David L Morgan thank you for your suggestion. I was thinking in trying to argue that it was an exploratory study, hence the EFA, and suggest that in a follow up study a CFA could be conducted with a new sample, but perhaps it's simpler to just conduct a CFA now with this sample.
David L Morgan
You are right , however I think as he mentioned he used the EFA for his exploratory study, thus reviewer suggested to do CFA for the validity and reliability items and as you also mentioned for "goodness of fit." so it is not for "confirm" the EFA with a CFA.
Hi,
If principal component analysis is not meant by exploratory analysis, CFA performed on the same data should tautologically confirm the results of EFA. Because, the both analyses are based on common factor analyses, and assume that causality flows from the construct to the indicators. So, a model generated from a reflective EFA would be confirmed by a refletive CFA with same estimation method.
An article related this question is below.
Chapter AYNI ÖRNEKLEME AÇIMLAYICI VE DOĞRULAYICI FAKTÖR ANALİZİ UYGU...
If there is any hypothesis that a relationship between the observed variables and their underlying latent construct(s) exists, CFA is adequate. If it is desired to determine the possible underlying factor structure of a set of observed variables, then EFA should prefer. Validating of the model from EFA also needs a new sample.
I hope the below link may give you an answer.
https://psycnet.apa.org/record/2017-56618-001
Thanks.
This a very good question since many researchers mistakenly believe that they can use the same dataset to conduct EFA and CFA. In fact, you need to split the dataset randomly into two sub datasets. Conduct EFA on the first and validate the model on the second one. Conducting both EFA and CFA on the same dataset is just confirming the data rather than the model. The objective is to ensure that the model obtained from EFA holds on other samples. FYI, it is rarely the case that a model obtained from EFA has a good model fit in CFA, particularly in the presence of more items with high cross-loadings. Remember that EFA is data driven and consequently does not constitute stable measurement. As a measurement specialist, I highly recommend fitting competing CFA models to the data consistent with the intended interpretations and uses of scores. This is collecting validity evidence based on the internal structure of the measure (American Educational Research Association et al., 2014).
Though it is ideal to use distinct samples for conducting EFA and CFA, I believe that the logic proposed by Holger Steinmetz should be considered: the correctness of the model -in terms of a) does the factors actually represent an existing entity (or is nonsense) and b) if the supposed causal effects (i.e., factor loadings) are correctly specified.
Conducting EFA and CFA on the same dataset does not serve any purpose. EFA is conducted to extract factors from a dataset for the first time, whereas CFA is conducted to validate the factors extracted from a different dataset.
Please read the thesis conducted using a Monte Carlo study regarding Evaluation of the split-data strategy in factor analysis. Evaluation of the split-data strategy in factor analysis | IDEALS (illinois.edu) https://hdl.handle.net/2142/116043
Hello Sagie,
thanks. I copied the relevant part from the abstract:
"Results show that the split-data strategy is less effective than the whole-sample strategy in evaluating the number of factors and cross-loadings in all simulation conditions. Using the split-data strategy is only acceptable, though not necessary, under conditions with large samples (greater than 1,000 for the investigated models) and good model quality (i.e., large primary loadings, no cross-loading, and small factor correlations)."
However, still better than the whole-sample strategy (simply repeating the EFA model with a CFA) would be to *enlarge* the model with validation criteria (variables that should affect the explored latens or are affected by these) or "competetion criteria" (established variables that are similar to the explored latents).
The first would challenge the model and provide more meaningful test of the explored structure--the second would deliver information of the extent of similarity or even identity of the new latent and established constructs.
It is really funny, how often I see "validation studies" who present indicators that you had already seen in other contexts (where they were supposed to measusure totally different things) :)
Best,
Holger