Is there any particular situation where applying Principal Component Analysis to the data and then applying K-Means clustering to the Principal Components gives better clustering results ?
Also, please explain where it gives worse results and why ?
I think the answer depends on what you mean by "better" results (and your specific research goal/question). If you first use PCA, then you will have created a set of (up to) k linearly independent composite scores for each case (where k is the number of original variables). However, unless you rotate the PCA solution, the first component will have the most variance associated with it, and likely will be most influential in how clusters are formed, especially if your similarity/proximity matrix is based on distance. Standardizing the component scores will solve the immediate problem of distortion due to unequal SDs, but the fact remains that, upon extraction, the first component (in PCA or first factor in common factor analysis) always accounts for the most variance from the data set.
You'll have to judge whether that makes sense in light of what you're trying to accomplish.
The first question that you should ask is whether or not you need to apply a dimensionality reduction technique. If you have very few features compared to the number of samples, you probably do not require to reduce the number of features. On the other hand, if the number of features is larger than the number of samples, then you will be dealing with the “curse of dimensionality”, and your k-means algorithm will not produce good results. In this case, you do want to reduce the number of features that you have. There are several techniques you could use for dimensionality reduction. For example, you could use feature selection, where you select the features that you think are the most relevant for the challenge at hand. Another approach is to use Principal Component Analysis (PCA), where you transform your data into a new dimensional space, where all the components are orthogonal to each other. Also, the components are sorted from the ones that describe the highest to lowest variance in the data. You would select a subset of the principal components as the features in your model, and capture a majority of your variance. Note that the k-mean clustering algorithm is typically slow and depends in the number of data points and features in your data set. In summary, it wouldn’t hurt to apply PCA before you apply a k-means algorithm.