I go about it a little differently. My preferred approach is to first run a factor analysis on the transformed data to reduce the impact of outliers, followed by extracting the factor scores. Then I change the factor scores to normal quantiles before running a cluster analysis on them.
the two methods have different aims. While a strategy starting with a PCA has the advantages mentioned by Raid Amin, starting with k-means has no practical nor theoretical advantages
FIRST you should use PCA in order To reduce the data dimensionality and extract the signal from data, If two principal components concentrate more than 80% of the total variance you can see the data and identify clusters in a simple scatterplot.
Then, after the PCA, you should apply K-Means or other clustering method To the PCA scores in order To form clusters.
The first question that you should ask is whether or not you need to apply a dimensionality reduction technique. If you have very few features compared to the number of samples, you probably do not require to reduce the number of features. On the other hand, if the number of features is larger than the number of samples, then you will be dealing with the “curse of dimensionality”, and your k-means algorithm will not produce good results. In this case, you do want to reduce the number of features that you have. There are several techniques you could use for dimensionality reduction. For example, you could use feature selection, where you select the features that you think are the most relevant for the challenge at hand. Another approach is to use Principal Component Analysis (PCA), where you transform your data into a new dimensional space, where all the components are orthogonal to each other. Also, the components are sorted from the ones that describe the highest to lowest variance in the data. You would select a subset of the principal components as the features in your model, and capture a majority of your variance. Note that the k-mean clustering algorithm is typically slow and depends in the number of data points and features in your data set. In summary, it wouldn’t hurt to apply PCA before you apply a k-means algorithm.