Actually in different papers different formulas of information gain are given, can anyone tell me more accurate formula to get more accurate ranked list?
I believe you need to consider Stephan's reply above - the information theory definition of information gain is related to the transmission of state encoded using a finite set of values (for example 0 or 1 for binary; 0,1,2 for trinary, ... etc). If all values are equally likely to appear in all positions of a word then the information contained in the word will be based on the word length. Informal use of "information" as a substitute for "meaning" can lead to confusion.