Is there any way for finding the best features for pre-clustering process in categorical data set? or an embedded feature selection in clustering algorithms?
When pre-processing a categorical data set, it solely depends on the domain of the the attribute, there is another method to convert a mixed attributed data set into a uniform numerical format.
based on the notion of co-occurance .
Ming-Yi Shih, Jar-Wen jheng, Lien-Fu Lai. (2010). A two-step method for Clustering Mixed Categorical and Numeric Data. Tamkang Journal of Science and Engineering, 13(1), 11-19.