In finite population survey statistics, for survey methodology and data analysis, data are stratified to reduce overall variance. But sometimes publishing the individual categories becomes important, and this may negate the role of reducing overall variance for a given overall sample size. Further, the categories chosen may not be best for stratification purposes. In the case of regression model-based methods, the goal is the same. In that case, scatterplots and estimated regression coefficients with their standard errors can be used to sort out which data should go into which strata. Thus, regression analysis is important for model-based sampling and/or estimation by prediction.
For design-based sampling and estimation, there is Neymann allocation, but my question is not so much about allocating to strata already defined, but more how to define the strata in the first place. There must be some categorical-type heading, but one might do better by being imaginative as to what such data groupings could be tried.
Sometimes better ways to stratify become apparent after-the-fact, and poststratification is used.
What tips/methods do you propose, and/or examples do you have for stratifying either for design-based sampling and estimation methods, or model-based methodology, or model-assisted design-based methods?