I have being studying the relationship between soil and plant variables. I have a total of 25 variables in 12 location, I'm a little bit confused on the number of PCs to use that will best explain the relationship.
1. If you are doing PCA as a pre processing step to supervised learning, than the optimal number of PCA dimensions should be chosen by cross validation. I am a fan of five times repeated 5-fold cross validation.
2. If you are using PCA as an unsupervised method to explore and visualize the data then several options exist:
-a. as Clément suggested a hard cap of a certain variance explained like 80% or even 95%
-b. construct a scree plot: variance explained (or eigenvalues) ~ number of dimensions. As one moves to the right, toward later components, the eigenvalues drop. When the drop ceases and the curve makes an elbow toward less steep decline, Cattell's scree test says to drop all further components after the one starting the elbow.
-c. Kaiser criterion: The Kaiser rule is to drop all components with eigenvalues under 1.0.
-d. Horn's parallel analyses - which I am a fan of. Horn's method contrasts
eigenvalues produced through a PCA on a number of random
data sets of uncorrelated variables with the same number of
variables and observations as the experimental or observational
data set to produce eigenvalues for components that
are adjusted for the sample error-induced inflation. Components
with adjusted eigenvalues greater than one are retained.
More detail: http://pdxscholar.library.pdx.edu/commhealth_fac/27/
Here is a decent post on how to perform it in R: https://www.r-bloggers.com/determining-the-number-of-factors-with-parallel-analysis-in-r/
Hello Ekene, remember that when using PCA, your objective is to reduce the number of variable or better still "group" the variables into a smaller number such that loss of information is minimal. Actually, the number of PCs to consider are those that will explain the variability at a very high degree. For example if the first three PCs can explain more than 80% of the variation, then consider three PCs. If four PCs can explain that, then consider four. However, given that you have up to 25 variables, try to limit the PCs to a maximum of four or five, provided they explain the variability to a very high extent (say > 80%).