We want to build a classifier on a hierarchical data, i.e, we have many classes, each one with many sub-classes. How to rate a false decision when classifying a data into a sibling class or to completely different class?
Yes, this is straightforward manner for evaluating a classifier for classes with linear topology. But, for categories with a tree topology, I think we should not rate errors the same way.
Lets take the class Sport_news, which includes two sub-classes : local_sport_news and international_sport_news. If a test data (with international_sport_news label) is classified into local_sport_news class, it should not be rated the same way with classifying it as social_news or political_news.
You are perhaps talking about unsupervised clustering? So you don't have training and test/evaluation data sets since there is no 'truth". In this case there are many different approaches to constructing a cost function that penalises poor clustering but prevents clusters from becoming too big. In your simulation, you still know the truth, so you can use this to evaluate performance even when your classifier is operating in an unsupervised learning mode. Sorry if this is not what you are looking for.
find the error for top-level classes, and then hierarchically find out the error in subsequent sub level classes of each top level class. All the errors can then later merged. This can be done by setting some alpha threshold to decay error for the level of classes or increasing it. It depends on your specific problem and what errors you want to be rectified.
if it's very large,
Use some structured approach to identify create a tree for error and then apply recursively from top to bottom or reverse to get the error.
if you don't know the number of classes
You use some hierarchical clustering technique or some other technique to identify the similarity between classes and get a tree structure as above and then proceed.
You can check the papers belows. There are several ways to measure the classification performance in a hierarchy of classes, you can use the hierarchical version of precision, recall and F1, the example based and label based precision, recall and F1, the parent-accuracy, (H)Delta-loss, multi-label graph-induced error, etc. Each measure consider different properties in the data and the structure. The last link provides you with another link to download a package with several measures implemented.
Kosmopoulos, A., Partalas, I., Gaussier, E., Paliouras, G., & Androutsopoulos, I. (2015), "Evaluation measures for hierarchical classification: a unified view and novel approaches", Data Mining and Knowledge Discovery, 29(3), 820-865.
Juan Carlos Gomez and Marie-Francine Moens (2014), "A Survey of Automated Hierarchical Classication of Patents", Professional Search in the Modern World, Lecture Notes in Computer Science 8830:215-249.
Silla Jr., C.N., Freitas, A.A. (2011), "A survey of hierarchical classification across differentapplication domains", Data Mining and Knowledge Discovery 22(1-2), 31–72.
Tsoumakas, G., Katakis, I., Vlahavas, I. (2010), "Mining multi-label data", Data Mining and Knowledge Discovery Handbook, pp. 667–685.
Sokolova, M., Lapalme, G. (2009), "A systematic analysis of performance measures for classification tasks", Information Processing and Management 45(4), 427–437.