Dear @Jesie, you are talking about a supervised model. It is known as the C4.5 algorithm developed by Quinlan. This kind of model implies trainig, pruning, and test stages. The trainig will perform a tree growing up. It will split data based on the best feature choice iteratively. It will conitnue until a) there are not more data, b) the minimum volume of data in a node is not enough for a node splitting, or c) all data into a node belong to the same target class
After training the model, a pruning stage is performed to avoid the overfitting in the model. A cross-validation technique (k-folds=10) should be used for better results. You could use the weka software for analysis of this algorithm. I hope it is useful for you. (https://www.cs.waikato.ac.nz/ml/weka/).
It would be more understandable if we go through the concept of Information Gain and Entropy.
Information Gain
If you have acquired information overtime which helps you to accurately predict if something is going to happen, then the information regarding the event which you have predicted is not new information. But, if the situation goes South and an unexpected outcome occurs, it counts as useful and necessary information.
Similar is the concept of Information gain.
The more you know about a topic, the less new information you are apt to get about it. To be more concise: If you know an event is very probable, it is no surprise when it happens, that is, it gives you little information that it actually happened.
From the above statement we can formulate that the amount of information gained is inversely proportional to the probability of an event happening. We can also say that as the Entropy increases the information gain decreases. This is because Entropy refers to the probability of an event.