Is NMI alone (Normalized Mutual Information) a good measure to evaluate the quality of a clustering algorithm?

07 December 2018 6 9K Report

** Given:

a) Ground truth clusters for a data,

b) Clusters obtained using a clustering algorithm (eg: DBSCAN) when applied on the data after processing it .

** Issue:

How to evaluate the performance of the clustering technique when applied on a specific data??

** NMI (Normalized Mutual Information) is a popular external measure to do so. But in cases like below, it gives bad results:

E.g:

Ground_truth = [1,1,1,1,1] ;

DBSCAN_Clusters = [1,1,1,1,2];

nmi = normalized_mutual_info_score(Ground_truth, DBSCAN_Clusters); %python code

** The value of the variable "nmi" approximately equal to zero in this case.

** Here, note that, nmi = 0 in-spite of the fact that DBSCAN (clustering algorithm) has failed to cluster only one cluster member and rest four matches the ground truth.

** This is a typical case when the ground truth contains only one cluster.

** Questions :

1) Why does this happen?

2) Does it mean that clustering algorithm is performing bad?

3) Should I use other measures along with NMI ? If so which ones, and what are they for?

Thanks.

Desmond Bala Bisandu

It is but it depends on what is your point of improvement from the baseline paper or even if it is entire a novel proposed method.

Abhilash K Pai

Desmond Bala Bisandu : Thanks for the answer.

But, I was trying to ask a general question.

Let me put it in a simple way:

"Why does NMI value becomes close to zero when the ground truth has only one cluster + when my clustering result is not exactly matching the ground truth?

In such cases what are the other options to evaluate my clustering technique? "

Yusra Al-Najjar

I think the following site could be useful for you:

https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

hoping it answers your question

Soumaya Louhichi

I think you can add other measure to be able to discuss the obtained result such as the error measure, the F-measure and if you are working with density based clustering you can add internal evaluation metrics such as cd_bw index and DBCV index.

hoping it helps you in your question.

Thanks @Yusra and @Soumaya ..

I hope I can find what I am looking for combining both your answers.

Sura Alrashid

Read this https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html

any study on protective factors of care givers of dementia and resilience ?

How to increase simulation box size?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

Is VSM measurment of NdFeB alloy powder sample require any spesific sample preparation ?

How to see whether my protein is aggregated using western blotting?

How can i calculate the band alignmet of Sc2CO2/BlueP heterostructure from DFT?

How to get Consistency Ratio when scale of relative importance is used in AHP?

Can we use any inferential statistics on a data set measured using non-validated questionnaires?

How to get 2 Octanol from Castor oil at a laboratory scale ?

What are recent UWB Antennas ?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

Is it possible to plot the atom-projected band structure using GPAW?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?