There are several method to effectively assess the performance of your clustering algorithm.
First of all try to compare it against once that is known to work well. Then compare the results. Secondly, time your algorithms and compare the time between both algorithms. If you have two set of good answers, then you can analyse how the quality of the solution improves through time. Thirdly, try your algorithms with several instances of a problem. One not too challenging, one medium and one very hard. Finally, using the evolution to optimise the parameters of your clustering algorithms, could test it well under duress and could point to some direction of how to improve it.
what kind of performance test you want to perform? I mean, you want to compare the results between different algorihms, speed test, or you want to validate your algorithm?
Basically, given a set of data points, the clustering algorithm group them into a number of clusters so that: a) points within each cluster are similar to each other and b) points from different clusters are dissimilar. To differentiate data, a 'similarity-criteria' is defined using a distance measure (i.e., Euclidean, Cosine, Jaccard, etc.).
If you want to test the output of your algorithm against different datasets, you could perform a 2D plot (i.e. x vs y) plotting the data-points of each clúster with different color.
Basically two type of performance evaluation are used for clustering techniques. First one is external evaluation in which we have previous information about data sets and second one is internal evaluation in which the evaluation is done with data set itself.
For external evaluation Accuracy, F-measure, Normalized Mutual Information(the average mutual information between every pair of clusters and their class), Rand Index etc. are commonly used.
For internal performance measure many validity indices are defined in literature like Davies-Bouldin index,Silhouette index,Dunn index,Partition Coefficient, Entropy,Separation Index,Xie and Beni's Index etc.
1) Quality: Check what boundaries can be implemented. Can the algorithm implement convex boundaries only? This may not be sufficient for good quality data (see Gomez/Stoop/Stoop bioinformatics 20914).
2) Does the algorithm correctly recover know data structures? This is conventionally measured by the Jaccard index (see e.g. Landis/Ott/Stoop, neural computation, 2010).
3) What are the run-time properties of the algorithm? How does it scale as a function of the size of the data set to be clustered? How does this compare to other approaches? (see e.g. Landis/Ott/Stoop, neural computation, 2010).
In my eyes clustering is of good quality when it makes a clear distinction among records, in order to help getting conclusions from extended data. The test of the clustering is in its using...
Clusters must help the analyst in rationalizing the data. When it is good (can be trusted) it leads to automation, prevention, and improvement of management decisions. It is not as much important that the clustering method should be efficient, as long as it has an information added value.
There are two method to test the performance of clustring algrothim.. Internal assessment and external assessment. In most cases the external assessment is used in which the clustering result is compared with expert result. MoJo, MoJoFM is the algorithms used for external assessment. For more detail please read the following paper...
There are two method to test the performance of clustring algrothim.. Internal assessment and external assessment. In most cases the external assessment is used in which the clustering result is compared with expert result. MoJo, MoJoFM is the algorithms used for external assessment. For more detail please read the following paper...
There are two method to test the performance of clustring algrothim.. Internal assessment and external assessment. In most cases the external assessment is used in which the clustering result is compared with expert result. MoJo, MoJoFM is the algorithms used for external assessment. For more detail please read the following paper...
There are two method to test the performance of clustring algrothim.. Internal assessment and external assessment. In most cases the external assessment is used in which the clustering result is compared with expert result. MoJo, MoJoFM is the algorithms used for external assessment. For more detail please read the following paper...
You can check the performance of the algorithm using cluster validity indices. The validity index you have chosen based on yours data set , cluster algorithm.
The proof of the pudding is in its eating, i.e. the test is the amount of conclusions that can be drawn based on that clustering. Some of the conclusion are expected to be new. It means that the expert may reject them (happened to me once, lesson has been taken :)).
For evaluating the performance of a clustering algorithm I would suggest to use cluster validity indices. In literature several different scalar validity measures have been proposed which result to be more or less appropriate depending on your data and the specific application (Non-hierarchical, crisp, fuzzy clustering), there are several: Root-mean-square standard deviation (RMSSTD) of the new cluster, Semi-partial R-squared (SPR), R-squared (RS) Distance between two clusters (CD), Partition Coefficient (PC), Classification Entropy (CE), Partition Index (PC), Separation Index (S), Xie and Beni's Index (XB), Inter-Cluster Density (ID), Davies-Bouldin (DB) index, Dunn's Index (DI), Alternative Dunn Index (ADI) and so on.
various F-measures exists for such evaluations. Try them and see what is the best for your case. Here a link to wiki that can help you out http://en.wikipedia.org/wiki/F1_score. Moreover, you can check my papers on face clustering to see a way to evaluate with such measures.
There are internal indexes and external indexes for evaluating the clustering algorithm performance, for more information you can take a look at following publication:
You can simply test it on the IRIS and WINE data sets. These are relatively small data sets but challenging enough for clustering algorithms due to the significant amount of overlapping between the data belonging to different clusters.
There are different manners of assessing the performance of a clustering algorithm; they depend on what we are meaning by performance. That is, we can be interested in evaluating the performance of a clustering algorithm in a specific dataset or in evaluating its average performance (i.e., independent of specific data). In the first case, as already various colleagues have suggested, validation indexes can be useful. These indexes are frequently oriented to measure the fulfillment of an determined property, e.g., the separability between clusters, if clusters are compact, among others, or a combination of them; therefore, they offer some valuable information about specific aspects of the performance of the algorithm in THAT dataset.
Now, if the results of an algorithm in some datasets are not as good as you was expecting, this does not necessarily implies that the algorithm is not appropriated or is not capable to perform well, even in those datasets were initially it failed. The classification algorithms have several pieces that must be carefully fixed (i.e., parameters). Recently various articles intended to study the clustering algorithms from generic properties and concepts have been published. For example, the work of Kleinberg about the Impossibility Theorem, the articles of Ackermann and Ben-David, the paper of Carlson and Memoli, an article of mine entitled An indication of unification for different clustering approaches, and some due to Christian Hennig. In all these articles you can find generic concepts to assess the clustering algorithms from a general perspective as well as some very explicative discussion about the philosophies of clustering methods.
I particularly think that the parameters of an algorithm are their soul. Most of classical algorithms (e.g., kmeans, linkage methods) are capable of yielding ANY possible partition of ANY dataset; so the partition that any of them returns in an specific instance depends on the values assigned to its parameters.
To test the stability of your clustering solution, you can try bootstrapping your data. By randomly drawing with replacement and performing cluster analysis, you change your data set (slightly). Objects (or variables) that remain in the same cluster most of the time during bootstrapped clustering, are considered trustwurthy since small deviations from the original dataset does not lead to a change in cluster membership. On the other hand, object / variables that switch a lot are not trushworthy since their cluster membership can easily change when the data set is change (slightly).
Following on Jyrko suggestions. Have you though to use the evolution to optimise your the parameters? Evolutionary Algorithms have been designed with this purpose in mind, but others algorithms will work. too.
three types of evaluation metrics are found. internal measures, external measures, and relative measures. the selection of the measurement is depending on you situation.
The problem doesn't lie in chosing the measurement: accuracy and adjusted Rand index are sufficient enough and do not make any data structure assumptions. The problem lies in finding the right benchmark system of data sets. I solved this problem in my book (download at no charge): http://www.springer.com/us/book/9783658205393
Mainly two types of performance evaluations are used for clustering techniques. For External Criteria, I am addressing some well-known approaches such as MoJo, EdgeSim, EdgeMoJo, MeCl, Precision, and Recall. For INternal criteria, like the Cophenetic Distance, the Silhouette Index, RS index, Compactness Metric and Dunn Index are very popular.