Hi Fellow Researchers and Professors,

I am working on the Feature Selection methods for Text Classification. I'm following the famous Yang and Pederson (1997) research paper (attached).

I have successfully implemented the Mutual Information and Chi-Square based on the formulas given in the paper in terms of A,B,C,D,N.

The problem arises as we move to Information Gain. It's formula is given in probabilities and not in terms of A,B,C,D,N. I did not find it anywhere. It would be great if you can provide link to any paper or any other resource that might help.

I know it is already implemented in tools like Weka, but due to some other constraints, I'm coding this myself in Java.

I'm sure many of you can help me in this regard. So please do.

Thanks!

Farhan

Similar questions and discussions