in the context of clustering or classification, Feature Selection correspond to selecting a subset of relevant features (variables, predictors) for use in model construction. these features are selected from the original dataset. Several techniques are used in this context such as SFS (Sequential Feature Selection).
in another way, Unsupervised Feature Selection, correspond to remove the redundancy of information and often reduce the dimensionality of data. in this way it already exist different approach: first you can group features by similarity.
second, you can reduce and resume information via a dimensionality reduction method which can resume the essential of information in a low (reduced) features. for reduction dimension you have linear approach such as the well known principal component analysis (PCA) or you can apply nonlinear approach such as in:
For clustering and classification of data, some time features selection are required to overcome the problem of over fitting, efficiency etc. in supervised feature selection the features are selected with respected to a metric that is calculated wrt given class. while in the unsupervised feature selection you can reduce the information via a feature extraction. for reduction you have approach like PCA, LLE, Laplacian Score etc
preprocessing the data to obtain a smaller set of representative features, retaining the optimal salient characteristics of the data, not only decrease the processing time but also leads to more compactness of the models learned and better generalization. When class label of the data is available we use supervised feature selection, otherwise unsupervised feature selection is appropriate. In many data mining applications class labels are unknown thereby indicating the significance of unsupervised feature selection there. This is the simplest case for supervised and unsupervised feature selection
Unsupervised feature selection algorithms conduct feature selection globally by producing a common feature subset across all instances at the same time.
Feature selection can be done by reducing the dimensionality of the data. I have found the PCA (Principal Component Analysis) method to be an effective one for dimensionality reduction.
On the resultant you can perform spectral clustering either using K-Means or K-Medoid algorithms.