I have 2 classes (good and poor) based on several numeric variables (say, v1…. v20). 

Conditions of classification:

If v1, v2, v3 and v4 are “high”, then the class is “poor”.

If v1, v2, v3 and v4 are “low”, then the class is “good”

All is fine. I am able to get good accuracy, ROC and minimum classification error using Random Forest. All good.

But, I want to add a new variable, v21. I know from experience that if the value of this variable (v21) is high, the class may not be “poor”, even if v1, v2, v3 and v4 are high.

The probability of getting a “poor” class is low, when variable v21 is high (although the variables v1, v2, v3 and v4 are high).

 1)How can I use my knowledge about v21 in the classification to improve the accuracy? And which is the suitable classification technique?

2)As I have real data that matches my understanding, is there anyway, I can calculate the probability of getting a “poor” class when value of v21 is high and when values of v1, v2, v3 and v4 are high?

More Tamilalagan Natarajan's questions See All
Similar questions and discussions