In a particular problem I've been researching lately, we have discovered that many feature values are missing, but they are not missing at random. For example, the positive and negative training samples have much different degrees of missing values. Can anybody point me to resources that specifically address this problem?