I want to be able, in some quantitative way, to compare several plant species with respect to how much is known about the functions of all known loci in their genomes.
It seems like looking at the tags for each locus in different functional annotation or protein databases (i.e. Pfam, PANTHER, KOG, GO BP) might just be reflective of the species' evolutionary relationship to Arabidopsis, for which most of what we know about plant protein function has been determined. Are there any publicly available datasets that I could use to approximate the sum of all functional work done in each plant that would be comparable across species? How could I analyze them to roughly answer the question: "what proportion of the genome has been functionally characterized?"