I'm asking because it seems to vary widely. Some authors claim to have done "SAR" with 4-10 compounds, which seems silly because the "relationship" between descriptors/structural features and biological activity in these studies could be random and not statistically significant.

SAR studies with 20-35 compounds begin to demonstrate the complexity of the chemistry-bioactivity relationship and distinguish between well-correlated and poorly-correlated descriptors. SAR studies with 40-50 compounds allow for division into training and testing datasets and comparison of models but of course one must account for redundancy between the sets. I have heard talk of SAR studies with 100+ compounds, which sounds like a great expense but such data could be used in many interesting ways. I would love to do a survey of the literature and compare descriptors and sample sizes for predictive power across series. It would help to have some statistics.

(And it would be nice if more people included their raw data in their SI... If you have already published your results, what do you have to hide?? #RantoftheDay).

Similar questions and discussions