I'd like to know if the number of variables chosen for the principal component analysis affects the quality of the analysis. in other words, a PCA with 15 variables is it better than 23 variables for example? thank you
In my experience with PCA, its not so much the fit of the analysis I would be worried about, you can always check that no matter how many variables you include. I would be more worried about interpretation of PCs composed of a higher number of variables.
Generally, when you are interpreting PCs it is often easier to understand what is occurring with a smaller number of variables. The more variables you include the more difficult it can be to interpret the PCs.
Lastly, no matter how many variables you include, make sure you test covariance. If you find variables which are strongly correlated, you may be able to remove one!
In my opinion with PCA, the number of variables chosen for the PCA does not affect the quality of the analysis but rather the correlation structure among considered variables is the most important. This means that it should find a significant correlation among the variables involved in terms of principal components and it is reflecetd in the label of PC.
Try "redundancy tests" or "stepwise procedures" is you want to decrease the number of variables you are using. Try a "rotation procedure" to see what happens. Sometimes you are using PCA because you have a lot of variables in the first place.
I would not be so worried about using 23 or 15... but it is nice to take away some variables that don´t help much in explaining variation ... The first two PC´s may explain most of variation but maybe they are not the ones that help you to see or emphasize what you wish. good luck :-)
50% of the variation explained is a little bit low. Ideally, a value of at least 70% is what you want. You could check the % variation explained in PC3, which might contain variables of interest that increase the cumulative %, and, if it does increase, plot your points in a 3-D figure. This might help you better explain the spatial pattern in your points. It would be a good idea to do a redundancy analysis, as suggested above, beforehand.
When I was working with phenotypes I found 50% to be reasonable. But jumping into SNP analysis I started to consider 50% a very high percentage. Nice discussion! Cheers everyone.
Marcelo is right, you should try a preliminary procedure to decrease your number of variables (and find out a simple model that explains almost the same amount of variation than the complete model).
BTW, be careful, PCA is not the best analysis for species abundance data if you have some 0 in your dataset! you have to work with Bray-Curtis based multivariates analysis (such as PCoA or nMDS)
Number of variable doesn't effect the quality of the PCA. As we move from PC(1).......PC(n) the variance also reduce so initial components major role in quality of result.
Number of variables is not important but there can be no redundant variables, which means that they express the same measurement even if it is otherwise. But there must be a significant correlation between the variables so that multivariate analysis of principal components analysis can be used
I also have the same query. I have genotyped data from Microarray. So my features include all of the SNPs genotyped. I did PCA using all the SNPS and samples I have. My first PC explains only 15% percent of the variance(from 10PC). My 10th PC explains 7% of the variance As you could understand there is not much difference in any of these PCs that is explaining the variability. What does it mean? Is my data has no variability to explain?
I do not think the number of variables will affect the PCA. But, it may pose challenges to interpret the data. In the genomics experiment, we commonly deal with thousands of variables and PCA is very effective to reduce the number of variables that can be further used by other dimensionality reduction methods for data analysis and interpretation such as t-SNE (see example https://reneshbedre.github.io/blog/tsne.html ).
Read more about PCA here https://reneshbedre.github.io/blog/pca_3d.html