My aim is to cluster a set of molecules coming from a virtual screening (in the order of thousands of compounds). I want to cluster them by chemical similarity using hierarchical clustering. The distance metric used is the Tanimoto index. Ideally, I want to identify a limited number of clusters (10-30) that can be conveniently represented by the molecule corresponding to the centroid of each cluster.
My problem is that the results of hierarchical clustering seem to be strongly dependent on the linkage algorithm chosen. The optimal number of clusters predicted (in terms of Kelley criterion) seem to vary a lot according to different algorithms tested. Eventually, I am afraid that the results of my clustering will depend too strongly on the linkage algorithm chosen.
Is there a way to perform clustering without this kind of bias? Can consensus clustering help removing this problem? If so, is there any software that performs consensus clustering on molecular libraries?