The difference between principal component analysis PCA and HCA hierarchical cluster analysis (in classifying bacterial strains through FOURRIER TRANSFORM infrared spectroscopy)
Hello, Dr. Ben Hassena Amal, I am interested to classify bacterial strains using PCA and HCA and can explain the difference. You can send me an email at [email protected] for further discussion if you are interested.
Both of them are exploratory methodologies, and give different approaches to the same objective: Studying the variability in your dataset. I have a paper uploaded here in Researchgate, in which I use both PCA and HCA to study the covariance structure in resonant inelastic x ray scattering spectroscopy. It's not the same technique as yours, but the idea is basically the same. You may see what information we get from each of these exploratory methodologies. I hope this helps
Both two methods are termed as unsupervised machine learning because modeling problem related to this techniques need no prior groups.
pca: extracts of large related features measuring a certain variable of interest into a lower dimension without losing the original information. Sometimes is referred as direction analysis of variables to the component to maximize variations. Thus, the lower dimensions explain the variations more significantly and visibly unlike the original p-variable. Yet you can eve go beyond PCA, by fitting the modified model which used sparse matrix for the purpose of having stable features extracted.
Hierarchical clustering is the natural assignment of the n-data set following the nested trees using Link functin
Cluster analysis is different from PCA. Cluster analysis groups observations while PCA groups variables rather than observations. PCA can be used as a final method (by adding rotation to perform factor analysis) or to reduce the number of variables to conduct another analysis, such as regression or other data mining (classifying etc.) techniques.
Thank you @Eyup Calik. This actually clarifies my confusion. So, it would be good to follow Cluster analysis after PCA once I have many variables in my data set.
PCA is reducing variables of experiment based on its correlation towards observation (called PC), then the observation is grouped based on the PCs. Cluster analysis is a grouping of observations (or variables) based on their similarity