These two methods may appear similar to the user, but aren't they quite different, and what would you tell a person who is considering using such methods? Thank you for your expert advises.
In factor analysis normally you already have a model where the objective is to predict observed variables from theoretical latent factors whereas in principal component analysis the objective is to extract linear composites of observed variables.
If you think that "there may be some underlying theoretical relationship", but you are unsure of it, would you still choose Factor Analysis of PCA?
Say, you suspect that certain cancer rates are somehow associated with air pollution. Could you use a FA model where you "throw in" all variables, with the goal to see if the cancer variables somehow appear in certain factors with air pollution?
I would say that FA is more for the determination underlying variables which explains why two other variables are correlated. While PCA is more on the distribution of individuals explained by principal component (i.e. by correlation between factors).
I would say that the choice depends on what you are the most interested factors or individual.
Finally I found that the PCA of the package FactoMineR (in R) is the best compromise for multivariate analysis:
For example, my data are mainly spectroscopic, thus always check physical meaning of extracted components (factor). (option for SPSS: check, scores, save as variables).
In other cases, look up the percentage of explained variance higher is (sometmes) better. For example when apply high kappa for promax in case of fluoresnce emission spectral components become "over-fitted" and gaining hiht percentage.
Also check if something is changing in qualitative meaning when changing the methods. Up to date, only once or twice I got different grouping of variables (some HPCL data) applying PCA and FA (with all possible options).
Principal components analysis is only a data reduction method. It was common many decades ago when computers were slow. I know it is the default method in many statistical applications but factor analysis seems to be superior.
You can take a look to the following article where more information about this technique is provided:
Costello, A. B., & Osborne, J. W. (2005). Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most From Your Analysis. Practical Assessment, Research & Evaluation, 10(7). Retrieved from http://pareonline.net/getvn.asp?v=10&n=7
If you need further guidance don't hesitate to contact me
My main interest in factor analysis is to study relationships between several types of diseases in the population, and how such variables are related to other variables from different fields. Then I aim to output factor scores and use those in a cluster analysis. Can this also be done with PCA?
Factor analysis (FA) is a group of statistical methods used to understand and simplify patterns of relationships underlying measured variables (Beavers, Lounsbury, Richards, Huck, Skolits, & Esquivel, 2013; Fabrigar, Wegener, MacCallum, & Strahan, 1999; Schmitt, 2011). Factor analysis is a concept that includes both exploratory factor analysis (EFA) and confirmatory factor analysis (CFA) (Jennrich & Bentler, 2011).
CFA tests whether a known factor model can predict a set of observed data (DeCoster, 1998). Researchers use CFA to verify or confirm hypotheses or theory (Ruscio & Roche, 2012; Schmitt, 2011), establish the validity of the factor model, compare two models using the same data, test the significance of factor loading, test relationships between factor loadings, test for correlation or lack of correlation of factors, and assess convergent and discriminate validity of measures (DeCoster, 1998).
EFA tests the number of common factors that influence measures and tests the strength and relationship between each common factor to the corresponding measure (DeCoster, 1998). Researchers use EFA to identify the nature of constructs that underlie responses given in a questionnaire, determine sets of items that interconnect, demonstrate the depth and breadth of measurement scales, classify the most important features of a group of items, and generate factor scores that represent the underlying constructs (DeCoster, 1998). Because EFA is a multivariate statistical approach, it is appropriate for reducing the number of factors, examining relationships between categories, and evaluating the construct validity of a measurement scale (Williams et al., 2010).
Exploratory factor analysis involves a series of statistical analysis steps. The first is the planning phase, where it is determined if the data is suitable for EFA by selecting the sample size then after collecting the data, creating a correlation matrix and testing for adequacy. The second step is to extract factors. The third step is to determine the number of factors to retain. The fourth step is factor rotation. The fifth step is to interpret the factor structure.
Principal component analysis (PCA) is a method of factor extraction (the second step mentioned above). Researchers use PCA when they want to reduce the number of variables while retaining as much of the original variance as possible (Conway & Huffcutt, 2003).
REFERNCES
Beavers, A. S., Lounsbury, J. W., Richards, J. K., Huck, S. W., Skolits, G J., & Esquivel, S. L. (2013). Practical considerations for using exploratory factor analysis in educational research. Practical Assessment, Research & Evaluation, 18(6), 1-13. Retrieved from http://www.pareonline.net/pdf/v18n6.pdf
Conway, J. M., & Huffcutt, A. I. (2003). A review and evaluation of exploratory factor analysis practices in organizational research. Organizational Research Methods, 6, 147-168. doi:10.1177/1094428103251541
DeCoster, J. (1998). Overview of Factor Analysis. Retrieved from http://www.stat-help.com/factor.pdf
Fabrigar, L. R., Wegener, D. T., MacCallum, R. C. & Strahan, E J. (1999). Evaluating the use of exploratory factor analysis in psychological research. Psychological Methods, 4, 272-299. doi:1082-989X/99/S3.00
Jennrich, R. I., & Bentler, P. M. (2011). Exploratory bi-factor analysis. Psychometrika, 76, 537-549. foi:10.1007/s11336-011-9218-4
Ruscio, J., & Roche, B. (2012). Determining the number of factors to retain in exploratory factor analysis using comparison data of known factorial structure. Psychologocial Assessment, 24(2), 282-292. doi:10.1037/a0025697
Schmitt, T. A. (2011). Current methodological considerations in exploratory and confirmatory factor analysis. Journal of Psychoeducational Assessment, 29(4), 304-321. doi:10.1177/0734282911406653
They are actually different tehcniques based on different assumptions and used for different objectives. PCA is only a geometric or statistical trasnformation of data in order to get new synthetic variables, while FA suppose a model with some assumptions about the data generation. I can provide you the link to a publication where we compare both techniques in the financial context. I hope this helps
Article Estimation of the underlying structure of systematic risk wi...
Just by looking at the many counts of people who have viewed the responses to my question could be an indication that this topic is still not taught well (or not understood well).
PCA and FA are related. PCA assumes the correlation between identical items is 1 (so the major diagonal of the R matrix contains 1's), and FA assumes that the correlation between identical items is less than 1, and the estimated correlation is used in the diagonal. In studies involving novel applications, PCA solutions are often used as a rough-cut, then EFA or CFA are used to fine-tune the solution. Of course, the more complicated the analysis--the more assumptions required, the more important it is to conduct a replication analysis using an independent random sample (or some other form of validity analysis).
The following "pay for view" chapter has been cited more than 1,100 times, last I checked (the book is in many academic libraries, I imagine available "in-house" at your university campus, certainly available via inter-library loan).
Bryant, F.B., & Yarnold, P.R. Principal components, and exploratory and confirmatory factor analysis. In: L.G. Grimm and P.R. Yarnold (Eds.), Reading and Understanding Multivariate Statistics. Washington, DC: APA Books, 1995, 99-136.
However, canned packages can miss the forest and the trees.
The following "open access" article shows how to constrain the PCA solution to identify models that meet specific a priori quality criteria, using mathematical programming. It also shows how to spot paradoxical confounding in results for linear models. The variables identified using this procedure were used to make the most accurate long-term predictions of temperature and precipitation anomalies ever published.
It takes an insightful mind to recognize insightful discussion, my dear friend. And, it takes a kind colleague to say kind things. Not to be remiss, your clustering studies are a pleasure to read. In particular (for me), studies your laboratory produces regarding health anomalies associated with environmental factors.
I have been encountering some interesting challenges when using factor analysis. If as a first step, we obtain a factor analysis, and then we output factor scores from the first few factors, then how do we use the factor scores in a cluster analysis in the next step?
Specifically: Let's say that we want to use the factor scores from Factor 1. The loadings are large for some variables and small for other variables; some are positive and some are negative. What will we "see" contained in the factor scores for Factor 1? If we will use the factor scores in a cluster analysis that can identify High clusters and also Low clusters, what exactly do such clusters mean, as realted to the original Factor 1 variables?
Has anyone here done such a cluster analysis? Please share with us your thoughts.
Factors identify new composite variables. The conceptual meaning of the identified variables is deduced by interpreting the new variables (factor scores).
The pay-for-view chapter I cited discusses how this interpretation is deduced based on internal quantitative aspects of the variables defining a factor score (e.g., the contribution of each variable in the eigenvector to the corresponding eigenvalue). And, the chapter also discusses how to use PCA solutions as a starting-point for EFA and CFA--used to help clarify factor identification and interpretation.
The open-access article I cited discusses how to identify factors that satisfy specific a priori criteria. In that article Rob and I used mathematical programming to identify factors. We were looking for new atmospheric currents to improve upon the factors (i.e., currents) previously identified using PCA by the National Weather Service. One of the criteria was that variables in each factor had to be contiguous in space (spatial contiguity). Then we compared the factors that we found to the sloppy, strongly confounded PCA-based factors.
In all FA-based studies, finding the new variables is the starting point--the new variables (factor scores) need to be validated using another method, such as discriminant analysis (when constituent groups in the sample are known) or cluster analysis (when constituent groups in the sample are unknown). As in all research, this follow-up work can be exploratory or confirmatory.
Thank you for your detailed response to my question, Paul. While FA is widely taught by Psychology Departments, it is less often found in statistics programs.
Neither are taught in entomology departments. Of course there aren't many options when your crowning achievement is 4 replicates.
There was a class at UC Davis in the late 80's in multivariate analysis that was required as part of getting a Minor in that subject. I know we went over PCA, but maybe not FA. I don't remember what textbook we used. The next encounter was about 5 years later when I had a large data set for my Ph.D.. I spent many happy hours stuffing my data through most of the procedures in the SAS-Stat user manual.