I have developed a real-time observation metric that produces a proportional breakdown of class activity in a number of domains. Can any suggest an approach/method for the the metric to be validated (i.e. interrater agreement of multiple observers, multiple sites, etc)?