What's the best method to handle a sparse-data/separation problem in a matched case-control study with 600 cases and 600 controls?

Scott D Chasalow Popular answer

It would be good to know with how many potential explanatory variables you are beginning. But I will assume it is many, given your comments about sparsity and multiple interactions.

If by "separation problem" you mean fitting issues near the boundaries of parameter spaces, e.g. when you have nearly perfect predictions, I know of 3 main approaches for dealing with this in logistic regression: (1) regularize; (2) focus on likelihood-ratio (LR) tests; (3) use exact methods.

(1) You might look into elastic-net regularized Cox PH regression, such as implemented in the coxnet function in the R package glmnet. As you probably know you can trick Cox PH regression, usually used for survival data, into fitting the conditional logistic regression model for matched case-control data. (See e.g. function clogit in the R package survival.) In addition to the other benefits of regularization - avoiding over-fitting and improving predictions in particular - you get a solution to separation problems for free.

I don't, however, know how well it will work with very sparse data. There ARE variations of this approach specifically designed for sparse solutions but that, I believe, is a different issue. Moreover, the interactions in your model present something of an additional problem. Usually you would not want main effects for a given variable to be dropped while retaining interactions involving that variable. To achieve that constraint you can in principle use "grouped regularization". Alas, I am not aware of publicly available software that gives you that option. (If there is some, I would love to hear about it!)

Also, you don't mention the goal of your analysis. For prediction, this approach is good. Doing inference using such approaches is possible but relatively undeveloped; I think Tibshirani and colleagues, among others, might have some recent publications in that area.

(2) IF you have few enough variables that regularization is not crucial, you might, depending on your goals, be able to deal with the separation problem by using likelihood-ratio tests. If I recall correctly, the LR test is OK with separation issues, even while Wald tests are not.

(3) Finally, you might consider using so-called EXACT conditional logistic regression, which should help with both the sparsity and separation problems. I know this is available in SAS and logXact, though they might choke on a sample of size 1200 (I'm not sure). In R, you might be able to use the elrm package, IF you have matched pairs. With pairs, you can convert the problem into (unconditional) logistic regression, which is what elrm gives you; see e.g. [ Breslow, N. E. (1982), “Covariance Adjustment of Relative-Risk Estimates in Matched Studies,” Biometrics, 38, 661–672 ], and/or - dare I say it - the SAS manual, http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_logistic_sect062.htm .

Ask a simple question, get a...

Scott D Chasalow

It would be good to know with how many potential explanatory variables you are beginning. But I will assume it is many, given your comments about sparsity and multiple interactions.

Ask a simple question, get a...

• What the possible Persistent Organic Pollutants and Heavy metals present in fluorspar, sediments, and water bodies around its mining area?

Enhancing Critical Thinking Skills for Slow Learners: A Review of Empirical Studies?

Are there any instruments for studying time similar to the way it is in space?

Research Methodology - Impact of Corporate Reputation on Stakeholders Behaviors?

Are there any good simple systems or platforms to recommend?

Anyone having idea about VN primer for miRNA primer design ?

Determining the worth of a point improvement in Hamilton Depression Scale?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

Why wait for a doctor's visit when you can become the guardian of your child's health today?

Ready to take control of your child's health and well-being?