Recently a collaborator came with a project which aims to validate at his setting some prediction models for digestive bleeding. They are the rockall score, glsagow blatchford score, AIMS65 score, Charlson morbidity index, Child-Pugh and MELD classification. His question was how to estimate a reasonable sample size. I took a look at the original papers and their validation sets ranged from 197 subjects with 41% outcome to 32504 subjects with 2% outcome. An absence of a refreshing happiness occurred when I saw that in the largest set of validation there were significant coefficients as low as 0.3 with SE of 0.18. Also, I did Iook around for some guidance and found the following comments in Steyerberg's book where there is a sample size for validation studies part: "modest sample sizes are required for model validation"; "to have reasonable power, we need at least 100 events and at least 100 non-events in external validation studies, but preferably more (>250 events). With lower numbers the uncertainty in performance measures is large." But in the text there are several simulations results showing that it depends a lot of the coefficients and SE, which could lead, even with these amounts of outcomes to power as low as 50%. Taking these rule of thumb, and expecting a 4% outcome in a validation cohort, it would be necessary to include 2500 to 6250 subjects. Pretty scary and with very wide range, which does not help much in the planning time. I found a logistic regression sample size formula, but it did not help much as it allows only two predictors at a time and the permutation predictors coeff and Se in the formula, the N ranged from a few hundred to dozens of thousands. http://www.dartmouth.edu/~eugened/power-samplesize.php
I would like very much to be comfortable in recommending a sample size of 2500 taking the Styerberg's rule of thumb as the basis of it. I would like to hear from those who have some experience in this issue.