I have a set of audio files with each audio annotated by >= 5 annotators, with annotations of the valence, activation and dominance (continuous units of affect). I want to measure the inter-rater agreement (and perhaps plot it). What metric is to be used here?