Dear Almo, this is one of the superior papers on case based reasoning in terms of artificial intelligence and is evaluated in terms of percentage matching to the real outcome (spectral pattern or time series or decision etc)
Evaluating textual CBR systems is difficult and there is currently no best way. Crude evaluation have used IR-based techniques such as precision, recall and f-measure where proposed textual solutions are broken into words (or n-grams) and seen as equivalent of documents in IR. Machine Translation evaluation measures such as BLEU and NIST can also be applied but give good correlation where multiple versions of the actual textual solution is available; see link below. You might want to look into the areas of natural language understanding and text summarisation for more evaluation ideas.
Stefanie Br ¨uninghaus and Kevin D. Ashley in (Br¨uninghaus & Ashley
1998) evaluate Textual CBR Approaches and found accuracy as the evaluation measure, since in their domain, the non-assignment of categories is relevant.