How to evaluate performance of a hierarchical classifier? Is there any adopted standard to calculate accuracy?

Thank You Antonios for your answer,

Yes, this is straightforward manner for evaluating a classifier for classes with linear topology. But, for categories with a tree topology, I think we should not rate errors the same way.

Lets take the class Sport_news, which includes two sub-classes : local_sport_news and international_sport_news. If a test data (with international_sport_news label) is classified into local_sport_news class, it should not be rated the same way with classifying it as social_news or political_news.

Hope my question is more clear now.

Graham W Pulford

Hello

You are perhaps talking about unsupervised clustering? So you don't have training and test/evaluation data sets since there is no 'truth". In this case there are many different approaches to constructing a cost function that penalises poor clustering but prevents clusters from becoming too big. In your simulation, you still know the truth, so you can use this to evaluate performance even when your classifier is operating in an unsupervised learning mode. Sorry if this is not what you are looking for.

Mihir Shekhar

if the number of classes is not very large,

find the error for top-level classes, and then hierarchically find out the error in subsequent sub level classes of each top level class. All the errors can then later merged. This can be done by setting some alpha threshold to decay error for the level of classes or increasing it. It depends on your specific problem and what errors you want to be rectified.

if it's very large,

Use some structured approach to identify create a tree for error and then apply recursively from top to bottom or reverse to get the error.

if you don't know the number of classes

You use some hierarchical clustering technique or some other technique to identify the similarity between classes and get a tree structure as above and then proceed.

Attia Nehar

Thank You all,

@Graham W Pulford, actually we work on supervised learning, the structure of classes is well defined.

@Mihir Shekhar, how to merge errors? is there a standard way?

Thank You again.

Juan Carlos Gomez

You can check the papers belows. There are several ways to measure the classification performance in a hierarchy of classes, you can use the hierarchical version of precision, recall and F1, the example based and label based precision, recall and F1, the parent-accuracy, (H)Delta-loss, multi-label graph-induced error, etc. Each measure consider different properties in the data and the structure. The last link provides you with another link to download a package with several measures implemented.

Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., & Androutsopoulos, I. (2015), "Evaluation measures for hierarchical classification: a unified view and novel approaches", Data Mining and Knowledge Discovery, 29(3), 820-865.

Juan Carlos Gomez and Marie-Francine Moens (2014), "A Survey of Automated Hierarchical Classication of Patents", Professional Search in the Modern World, Lecture Notes in Computer Science 8830:215-249.

Silla Jr., C.N., Freitas, A.A. (2011), "A survey of hierarchical classification across differentapplication domains", Data Mining and Knowledge Discovery 22(1-2), 31–72.

Tsoumakas, G., Katakis, I., Vlahavas, I. (2010), "Mining multi-label data", Data Mining and Knowledge Discovery Handbook, pp. 667–685.

Sokolova, M., Lapalme, G. (2009), "A systematic analysis of performance measures for classification tasks", Information Processing and Management 45(4), 427–437.

http://lshtc.iit.demokritos.gr/LSHTC4_EVALUATION

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

How combine yolo with Faster R-CNN?

Why can't academics earn the money they deserve?