I'm comparing several models of metabolism. There are 10 compounds that I have put through each model, and I have calculated the sensitivity (TP/(TP+FN)) and selectivity (TP/(TP+FP)) for each compound with each model. Plotting the sensitivities and selectivities from the whole set of compounds for each model shows that there is a lot of scatter; the 95% CIs are large and overlap significantly between models.
So, I am wondering how I can best check whether any model has statistically better sensitivity or selectivity than the others, and which pair(s) are significantly different. There are widely different numbers of reported metabolites for each compound (between 2 and 40 for my set) so a simple binomial choice assessment like Chi-square would weight heavily towards compounds with many known metabolites. So it seems to me that it would be better to test using the sensitivity and selectivity values for each compound. The normality and (in some cases) equal variance assumptions are violated in these data sets, so I can't do a simple ANOVA. One person suggested logistic regression, but it is unclear to me how to apply it in this sort of case.
So, what statistical tests would you recommend? Thank you for your help!