Please don't answer my question unless you have a real answer. I'm trying to get a serious answer and some RGers just want to score undeserved points--which is an abuse of the system.

Here is the question I posted before:

I'm analyzing an ecological data set to compare species from the eastern and western USA. I want to use classification trees and Random Forest (RF) analyses, first to see which of 19 variables are most important for modeling parameters that vary between the two geographical regions (east and west), then to train a model for making predictions using RF. I have one binary (presence/absence) variable called Florida diversity (FD) which obviously only applies to the eastern USA, but it is an important variable for the overall model because it predicts the state of another binary variable, north-south diversity (NSD), which applies to both the eastern and western USA. FD is correlated with (or rather has a common dependency with) NSD in a way that greatly improves the RF model's predictive capacity, but it is not itself predictive without NSD. They are dependent on each other--synergistic, one might say--though NSD is reasonably predictive by itself. Together they are very predictive, but I have a nagging voice in my head that says I need to be able to defend using FD.

My question is this: is there any reason why I should not use a variable that is impossible to code positively for one class in my dependent variable? FD is one of 19 variables, but it is the third most important one in the models I've run thus far (NSD is the most important), so I don't want to toss it if I don't have to. The thing is, I can't find anything written on the subject. Obviously one wouldn't use such a variable in traditional statistics, but is it okay to use such a predictor in classification analyses, or does it violate some assumption somewhere?

It seems to me that, in the context of classification trees or Random Forest modeling, using an "accessory predictor" should be okay. It seems analogous to specifying a Bayesian prior, but I'm hoping to get a solid confirmation or an "absolutely not" reason.

If you know the answer, I would greatly appreciate your reply and a source, if you have one.

Similar questions and discussions