** Given:

a) Ground truth clusters for a data,

b) Clusters obtained using a clustering algorithm (eg: DBSCAN) when applied on the data after processing it .

** Issue:

How to evaluate the performance of the clustering technique when applied on a specific data??

** NMI (Normalized Mutual Information) is a popular external measure to do so. But in cases like below, it gives bad results:

E.g:

Ground_truth = [1,1,1,1,1] ;

DBSCAN_Clusters = [1,1,1,1,2];

nmi = normalized_mutual_info_score(Ground_truth, DBSCAN_Clusters); %python code

** The value of the variable "nmi" approximately equal to zero in this case.

** Here, note that, nmi = 0 in-spite of the fact that DBSCAN (clustering algorithm) has failed to cluster only one cluster member and rest four matches the ground truth.

** This is a typical case when the ground truth contains only one cluster.

** Questions :

1) Why does this happen?

2) Does it mean that clustering algorithm is performing bad?

3) Should I use other measures along with NMI ? If so which ones, and what are they for?

Thanks.

More Abhilash K Pai's questions See All
Similar questions and discussions