Some collaborators and I have done an analysis using OLS multiple logistic regression, and adjusting the standard errors by multiplying by the square root of the design effect to account for clustering.
The data set has 2710 individuals clustered in 528 units (mean cluster size just over 5), but a lot of the clusters are singletons. One value of the dichotomous outcome variable has a prevalence rate of about 3%. Our most important predictor is a level 2 dichotomous variable (with approximately equal numbers of level 2 units for each value). We had an ICC of only 0.095, which according to Garson (2003) would not be statistically significant since the intercept was not statistically significant in the null model (see my other recent question), but one reviewer objected to the use of OLS rather than hierarchical methods because the main predictor was level 2 (and the most interesting results involved cross-level interactions).
Tossing out the singletons losses a lot of our data (and we might have to lose even more by tossing out other small clusters to use get good enough parameter estimates for a slopes-and-intercepts-as-outcomes model), and given the low prevalence rate of our outcome variable, I'm loathe to do that.
No, we can't collect more data -- this is a sample from a national database.
Any suggestions on a better way to handle the effects of clustering in this situation (that can be carried out using SPSS and/or HLM 7, the two packages we have and have facility using) would be much appreciated.