I would like to use a benchmark with different semantic annotation systems. So, I would like to know if there is a gold standard composed with an ontology and a corpus of text?
Here is something else you may want to look at wrt the methodology and some resources. Benchmarking infrastructure for mutation text mining, http://www.jbiomedsem.com/content/5/1/11