What are the general suggestions regarding dealing with cross loadings in exploratory factor analysis? Do I have to eliminate those items that load above 0.3 with more than 1 factor?
There is some controversy about this. Normally, researchers use 0.50 as threshold. However, other argue that the important is that items loadings in main factor are higher than loadings in other (they do not provide any threshold). Other also indicate that there should be, at least, a difference of 0.20 between loadings. For example, if an item loads 0.80 in one factor, the highest loading of this item on the other factors should be 0.60.
I have seen in some papers exactly the same as you have mentioned regarding 0.20 difference. In my case, I have used 0.4 criteria for suppression purpose, but still I have some cross-loadings (with less than 0.2 difference). General purpose of EFA is to retain those items that load the highest on one factor but do I have to eliminate the ones with cross-loadings in order to get independent factors (not correlated) ?
its upto you either you use criteria of 0.4 or 0.5. You can use it. But you have to give proper reference to support it. Cross loadings natching the criteria can be used for further analysis.
What if I used 0.5 criteria and I see still some cross-loading's that are significant ? How should I deal with them eliminate or not? I tried to eliminate some items (that still load with other factors and difference is less than 0.2) after suppressing and it seems quire reasonable and the model performance also has improved. Afterwards I plan to run OLS and I need independent factors.
Which software are you using? New tendencies in PLS-SEM recommend establishing discriminant validity via a new approach, HTMT, that has been demostrated to be more reliable than Fornell-Larcker criterion and cross-loading examination. According to them, cross-loadings should only be checked when HTMT fails, in order to find problematic items between construct.
Read this paper: https://link.springer.com/article/10.1007/s11747-014-0403-8
SmartPLS computes HTMT matrix directly, but I think should be able to compute it manually using the formula (which includes correlations among constructs).
In general, we eliminate the items with cross loading (i.e., items with loadings upper than 0.3 on more than 1 factor). But, before eliminating these items, you can try several rotations.
In practice, I would look at the item statement. Cross-loading indicates that the item measures several factors/concepts. This item could also be the source of multicollinearity between the factors, which is not a desirable end product of the analysis as we are looking for distinct factors. My point is that, do not rely solely on the factor loading value or specific cutoff, also take a look at the content of the item. The item statement could be too general.
I know that there are three types of orthogonal rotations Varimax, Quartimax and Equamax. Most widely used is Varimax, however can you simply tell me what is the difference between Quartimax and Equamax rotation methods?
Maybe this helps: http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/multivariate/principal-components-and-factor-analysis/methods-for-orthogonal-rotation/
Thank you for you feedback. I have checked determinant to make sure high multcolliniarity does not exist. Then I have checked for reliability for items (cronbach's alfa) and it quite high. Moreover, I have looked at correlated-item total correlation. It turned out that two items correlate quite law (less than 0.2) with scale score of the rest of the items. So, I have excluded them and ran reliability analysis again, cronbach's alfa has improved. But, still in factor analysis I have very few cross correlations that bothers me and as it is suggested I have to check other orthogonal rotations, before eliminating problematic items.
I think that elimitating cross-loadings will not necessarily make your factors orthogonal. I mean, if two constructs are correlated, they may remain correlated even after problematic items are removed. Do all your factors relate to a single underlying construct? In that case, I would try a Schmid-Leiman transformation and check the loadings of both the general and the specific factors. Additionally, you may want to check confidence intervals for your factor loadings.
yes, you are right all the factors relate to the same construct (brand image). What do you mean by "general" and "specific" factors? I have never used Schmid-Leiman transformation? What are the decision rules? or can you suggest any material for quick review?
Can Schmid-Leiman transofrmation be used when I have results with varimax rotation. I guess it needs pattern matrix results for analysis? or am I wrong ?
Davit, I'm attaching Wolff and Preising's paper for a quick and readable introduction to the S-L transformation. As for the actual computation, I don't know what software you're using, but Wolff and Preising present syntax for both SPSS and SAS. You can also do it by hand (I have an Excel file for this, but I don't have access to it now), but I'd suggest you use the free software FACTOR (http://psico.fcep.urv.es/utilitats/factor/). I'm also attaching Baglin's (2014) didactic tutorial about this program.
As far as I looked through quickly the first paper, Schmid-Leiman technique is used to transform an oblique factor analysis solution containing a hierarchy of higher-order factors into an orthogonal solution. I have used varimax orthogonal rotation in principal component analysis. What would you suggest? What do you think about the heterotrait-monotrait ratio of correlations?
Have you tried oblique rotation (e.g. Promax etc)? That may reveal the multicollinearity by looking at the "Factor Correlation Matrix" (in SPSS output, the last table). I assume that you are analyzing health related data, thus I wonder why you used orthogonal rotation. In my experience, most factors/domains in health sciences are better explained when they are correlated as opposed to keeping them orthogonal (i.e factor-factor r=0). The extracted factors are also easier to generalize to CFA as well whenever the rotation is oblique.
In addition, very high Cronbach's alpha (>.9, ref: Streiner 2003, Starting at the beginning: an introduction to coefficient alpha and internal consistency) is also indicative of redundant items/factor, so you may need to look at the content of the items.
If I use oblique rotation, then I will have a problem in linear regression. I need to get factors that are independent with no multicollinearity issue in order to be able to run linear regression. After I extract factors, goal is to regress them on likeness of the brand measured with o to 10 scale. Plus, only with orthogonal rotation is possible to to get exact factor scores for regression analysis. I have checked correlation matrix and also determinant, to make sure that too high multicollinearity is not a case >0.9.
I have checked not oblique and promax rotation. In both scenarios, I do not have to high correlations. Anyway, in varimax it showed also no multicollinearity issue.
I have one question. If I have high multicollinearity issue between my variables (determinant less than 0.00001) than should I first get rid of the variables causing this and then use oblique or promax rotations?
Given your explanation, using orthogonal rotation is well justified. In that case, you may need to look at the correlation matrix again (I find it easier to work with the correlation matrix by pasting the spss output in ms excel). I would manually delete items that have substantial correlations with all or almost all other items (e.g >.3) and run the EFA again. That might solve the cross-loading problem.
If the determinant is less than 0.00001, you have to look for the variables causing too high multicollinearity and possibly get rid of some of them.
I made mistake while looking at correlation matrix determinant which actually shows the following figure 2.168E-9 = 0.000000002168< 0.00001 (so definitely i have high multicollinearity issue). Firstly, I looked items with correlations above 0.8 and eliminated them. Still determinant did not exceed the threshold. Then I omitted items with correlations above 0.7 and now my determinant is 0.00002095> 0.00001. from 24 initial items I retained only 17 and now I can run EFA. What do you think about it ?/any comments/suggestions ?
However, I would be very cautious about it, since literature suggests that if multi-collinearity is between 5 and 10 is considered as high. In factor analysis, it is important not to have case of high multi-collinearity in order to be able to assign items to variables otherwise analysis will suffer from a lot of cross-loadings and you get correlated factors
It seems to be the case that your factors are correlated, and they will remain correlated no matter what you do. If somehow you manage to make them orthogonal, they may not be measuring the same construct anymore. My suggestion for a S-L transformation was to check whether items were more influenced by the general or by the specific factors. It might be the case that you will be able to extract those items that are only clearly influenced by their specific factors and no so much by the general one. Even then, however, you may not be able to achieve orthogonality or, if you do, you'll possibly be measuring only a specific aspect of the original construct. (For example, if you have items measuring anxiety and depression and you submit them to a S-L transformation, you may be left with items only related to physiological hyperarousal in the anxiety specific factor.)
As Wan has already suggested, consider using SEM for creating a model that includes both the correlation between your factors and any reasonable cross-loadings that you have.
The problem here is that you can have VIF values even under 3.3 (no multicollinearity), HTMT values under 0.90 (discriminant validity guaranteed, then, different constructs in your model) and Fornell-Larcker criterion ok (supporting again the discriminant validity). All these values show you can follow with your model. However, cross-loadings criteria is not met.
For this reason, some researchers tell you not to care about cross-loadings and only explore VIF and HTMT values.
# Aurelius arlitha Chandra...Check whether the issue of cross loading in that variable exist? If so try to remove that variable by checking the Cronbach's Alpha if Item Deleted. or Check communalities: less than 0.3? Remove the item.
Can anyone provide a reference of the idea that when an item loads on more than a single factor (cross-loading), such an item should be discarded if the difference in loadings is less than .2? I've read it on many statistics fora but would like to have a proper reference. Thank you.
I am doing factor analysis using STATA. After running command for "Rotated Component Matrix" there is one variable that shows factor loadings value 0.26. What should I do?
All of the responses above and others out there on the internet seem not backed by any scientific references. For that reason, this response aims to equip readers with proper knowledge from a book of a guru in Statistics, Joseph F. Hair, Jr.
First, it must be noted that the term cross-loading stemmed from the idea that one variable has moderate-size loadings on several factors, all of which are significant, which makes the interpretation job more arduous.
A loading is considered significant (over a certain threshold) depending on the sample size needed for significance [1], which can be seen as follow:
Factor loading - Sample size needed for significance
-----------------------------
.30 - 350
.35 - 250
.40 - 200
.45 - 150
.50 - 120
.55 - 100
.60 - 85
.65 - 70
.70 - 60
.75 - 50
-----------------------------
When a variable is found to have more than one significant loading (depending on the sample size) it is termed a cross-loading, which makes it troublesome to label all the factors which are sharing the same variable and thus hard to make those factors be distinct and represent separate concepts. The ultimate goal is to reduce the number of significant loadings on each row of the factor matrix (i.e. make each variable associate with only one factor). The solution is to try different rotation methods to eliminate any cross-loadings and thus define a simpler structure. If the cross-loadings persist, it becomes a candidate for deletion. Another approach is to examine each variable's communality to assess whether the variables meet acceptable levels of explanation. All variables with communalities less than .50 are viewed insufficient.
RESPECIFY THE MODEL IF NEEDED
What if we should not eliminate the variable base on rigid statistics because of the true meaning that a variable is carrying? Problems include (1) a variable has no significant loadings, (2) even with a significant loading, a variable's communality is deemed too low, (3) a variable has a cross-loading. In these cases, researchers can take any combination of the following remedies:
+ Ignore those problematic variables and interpret the solution as is but the researcher must note that the variables in question are poorly presented in the factor solution
+ Consider possible deletion: depending on the variable's overall contribution to the research as well as its communality index. If the variable is of minor importance to the study's objective and also has unacceptable communality value, then delete it and derive new factors solutions with those variables omitted.
+ Employ alternative rotation method: could be oblique method if only orthogonal had been used.
+ Decrease/increase the number of factors retained: to see whether a smaller/larger factor structure will solve the problem.
+ Modify the type of factor model used (component versus common factor): to assess whether varying the type of variance considered affects the factor structure.
Note:
No matter which options are chosen, the ultimate objective is to obtain a factor structure with both empirical and conceptual support. As we can see, many tricks can be used to improve upon the structure, but the ultimate responsibility rests with the researcher and the conceptual foundation underlying the analysis. Indeed, some empirical researches chose to preserve the cross-loadings to support their story-telling that a certain variable has indeed double effects on various factors [2]. So, ultimately, it's your call whether or not to remove a variable base on your empirical and conceptual knowledge/experience.
Reference:
[1] Hair, J. F., Black, W. C., Babin, B. J., & Anderson, R. E. (2009). Multivariate Data Analysis 7th Edition Pearson Prentice Hall
[2] Le, T. C., & Cheong, F. (2010). Perceptions of risk and risk management in Vietnamese Catfish farming: An empirical study. Aquaculture Economics & Management, 14(4), 282-314. https://doi.org/10.1080/13657305.2010.526019
There is some controversy about this. Normally, researchers use 0.50 as threshold. However, other argue that the important is that items loadings in main factor are higher than loadings in other (they do not provide any threshold). Other also indicate that there should be, at least, a difference of 0.20 between loadings. For example, if an item loads 0.80 in one factor, the highest loading of this item on the other factors should be 0.60.
I find Huy Dang's answer reallY convincing, I am yet to read Muayyad Ahmad's reference. I have actually found Huy's response quite helpful as I have been struggling with the same problem. In fact, on looking at the significance aspect related to the factor loadings, I actually discovered that all the cross loadings of were not significant for all factors they were not intended to measure in the first instance. Don't forget to consider your sample size on coming to this conclusion!
In fact some rules can be found in literature. For instance, the .40-.30-.20 rule. This rule recommends that satisfactory variables (a) load onto their primary factor above 0.40, (b) load onto alternative factors below 0.30, and (c) demonstrate a difference of 0.20 between their primary and alternative factor loadings.
Reference: Howard, M. (2015): A Review of Exploratory Factor Analysis (EFA)
Decisions and Overview of Current Practices: What We Are Doing and How Can We Improve?, International Journal of Human-Computer Interaction, DOI: 10.1080/10447318.2015.1087664
A “crossloading” item is an item that loads at .32 or higher on two or more factors. The researcher needs to decide whether a crossloading item should be dropped from the analysis, which may be a good choice if there are several adequate to strong loaders (.50 or better) on each factor. If there are several crossloaders, the items may be poorly written or the a priori factor structure
could be flawed
Reference: Costello & Osborne (2005).Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. Practical Assessment Research & Evaluation, Vol. 10 , Article 7.
https://doi.org/10.7275/jyj1-4868
You can check a published article in Nature sustainability that used the .40-.30-.20 rule.
Sandbrook, C., Fisher, J. A., Holmes, G., Luque-Lora, R., & Keane, A. (2019). The global conservation movement is diverse but not divided. Nature Sustainability, 2(4), 316–323. doi:10.1038/s41893-019-0267-5