I have some unlabeled datasets, and want to find the most relevant features that define well-form clusters. The datasets have between 300 and 500 features.
This is impossible to answer, cf. the 'no free lunch' theorem of machine learning: there is no best learner per se, and there is no best feature selection per se -- there's just a best one per dataset.
I quite disagree with Chris’s comments saying that “This is impossible to answer”. Some feature set would general yield better performance than others when examined on various datasets. Also, some machine learning algorithms generally performs better than others.
I would suggest you to apply some data kind of data preprocessing techniques to your dataset in order to balance the dataset. Then you can proceed to apply a clustering algorithm of your interest and examined the results.
Wrapper based feature selection algorithms using a classifier to evaluate the feature subset. Therefore they can not be applied over unlabeled data sets. Anyway, you can try state-of-the-art unsupervised methods in this area of subject. Our experiments show that integrating ant colony optimization with graph theoretic approaches leads to improve the performance of unsupervised feature selection algorithms. Also we have implemented several state-of-the-art unsupervised feature selection algorithms. The source code of our implementations can be found in the following link:
Depends on the data-set, as rightly pointed out by Mr. Chris Biemann there is no universal feature selection algorithm that will work equally well on all data-sets.
In that case the advisable thing is to try a group of techniques and select the one with better performance