16 September 2020 6 1K Report

I would like to calculate the Inter-Annotator Agreement (IAA) for a dataset including entities and relations. The dataset has been labelled by several annotators and the labels itself have a frequency. I am getting a bit lost about the possibilities to calculate the agreement.

First of all, Scott's Pi (or Siegel & Castellan's Kappa) appears to be better than Cohen's Kappa dealing with bias (see for instance here: https://www.aclweb.org/anthology/J04-1005.pdf). In more recent related work I have seen the recommendation to use the F-Score instead, dealing with entities (see for instance here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3540456/pdf/amia_2012_symp_0144.pdf). To calculate the agreement for relations, I would probably also apply F-Score.

If you disagree, you are welcome to give me suggestions here. However, the questions I actually have are the following ones:

1) If I have more than two annotators, how would I ideally put the agreement into the F-Score? Shall I do that pairwise? And if labels are unbalanced micro F-Score for each annotator and the then macro over all annotators?

2) The annotated relations suffer from falsely annotated entities. Therefore if I just calculate the F-Score, I might "punish" an annotator twice, if the person did not label the related entities in the first place. So the annotator would be not able to annotate the relation at all. Therefore would it make sense to consider only those relations where the same entities are given? Any suggestions?

Thank you!

More Roland Roller's questions See All
Similar questions and discussions