** Given:
a) Ground truth clusters for a data,
b) Clusters obtained using a clustering algorithm (eg: DBSCAN) when applied on the data after processing it .
** Issue:
How to evaluate the performance of the clustering technique when applied on a specific data??
** NMI (Normalized Mutual Information) is a popular external measure to do so. But in cases like below, it gives bad results:
E.g:
Ground_truth = [1,1,1,1,1] ;
DBSCAN_Clusters = [1,1,1,1,2];
nmi = normalized_mutual_info_score(Ground_truth, DBSCAN_Clusters); %python code
** The value of the variable "nmi" approximately equal to zero in this case.
** Here, note that, nmi = 0 in-spite of the fact that DBSCAN (clustering algorithm) has failed to cluster only one cluster member and rest four matches the ground truth.
** This is a typical case when the ground truth contains only one cluster.
** Questions :
1) Why does this happen?
2) Does it mean that clustering algorithm is performing bad?
3) Should I use other measures along with NMI ? If so which ones, and what are they for?
Thanks.