You can use hierarchical clustering or nearest neighbor clustering method, both methods are implemented in InfoStat (http://www.infostat.com.ar/index.php?mod=page&id=46), Statgraphics, XLStat and R software. The most powerful software is R, and my favorite choice is hierarchical clustering with agglomerative algorithm in Infostat, is too intuitive to use.
The most common way for a cluster analysis is to compute euclidean distance and use a agglomerative algorithm, you have to choice among the linkage option and give a look your results. Please read this link http://en.wikipedia.org/wiki/Hierarchical_clustering.
If you decide to use Infostat in the english version, the software has many example data and a practical guide about how to do the Cluster Analysis. I hope the answer help you, best regards.
I appreciate for your valuable comments and suggestions.
So there are different types of clustering, why should one prefer one over the other? In In simple words why hierarchical clustering why not spectral clustering or any other?
I guess it depends on your question. In my case, I was clustering results of a docking experiment. So in order to decide which clustering method to use I performed test dockings, where I knew the correct answer, and tested which algorithm would reliably put all correct answers into one cluster.
Hierarchical clustering offers itself to biologists because it produces a tree, which has a straightforward analogy to the evolutionary process that generated these structures.
There are many clustering methods available for the work, as far as i have read about it, there are no an specific method that give you the best results. That is the reason because we need to compute several cluster analysis with different methods until you get the best results according to your criteria (the cluster must make sense to you). As the cluster analysis is just an exploration method and not a statistical test per se, there are no a unique way to do a best clustering.
"Clustering algorithms can be categorized based on their cluster model, as listed above. The following overview will only list the most prominent examples of clustering algorithms, as there are possibly over 100 published clustering algorithms. Not all provide models for their clusters and can thus not easily be categorized. An overview of algorithms explained in Wikipedia can be found in the list of statistics algorithms. There is no objectively "correct" clustering algorithm, but as it was noted, "clustering is in the eye of the beholder."[4] The most appropriate clustering algorithm for a particular problem often needs to be chosen experimentally, unless there is a mathematical reason to prefer one cluster model over another. It should be noted that an algorithm that is designed for one kind of model has no chance on a data set that contains a radically different kind of model.[4] For example, k-means cannot find non-convex clusters" (http://en.wikipedia.org/wiki/Cluster_analysis)
In my case, I use cluster analysis of bioclimatic variables and I got the best results when run the analysis using hierarchical method and agglomerative algorithm because I´m looking bioclimatic similarities across a wide geographical region. In your case you need to explore your data using differents methods, what kind of data you want to analyze?.
I am doing protein-protein docking and to predict the interface. In my case I don't have the correct answer. In that case I think I should validate with some known interfaces. Should I follow this approach or there is another better solution for this problem?
The data has RMSD of the same protein-protein complex in different orientations. The order of distance matrix can vary from 2,000 X 2,000 to 10,000 X 10,000. Is Akaike information criterion would of any help in such case for finding the best statistical model?
Loss of Solvent Accessible Surface Area (SASA) upon complexation is a criterion for residues on surface. There are other distance based criteria too. Do these work for your protein complex?
Can you be more clear? If you are asking about the distance matrix which I have created, then I will say it is just a RMSD of 1 protein-protein complex with other. And I am trying to find out the best structural pose for the interaction model. I am using Zdock to predict these complexes.