The importance of features in decision tree is highly depend on splitting criteria. The splitting criteria including gini's diversity index (GDI) , deviance, or node error [1]. The importance of features computes as the difference between the risk for the parent node and the sum of risks for its children. The risk of splitting for each node is composed of the impurity measurement and the node probability. Also, the node probability is defined as the number of records reaching the node, divided by the total number of records [2]. In the case of GDI, the importance of node x can be described in the following formula:
Risk (x)= GDI (x) * Probability (x)
As a result, the root node usually is one of the most important features. For more information I suggest you to take look at the 4.2 Discussion section of [2] which is accessible in Research Gate [3].
Regards,
Reza Sadeghi
References:
[1]. L. E. Raileanu, K. Stoffel, Theoretical comparison between the gini index and information gain criteria, Annals of Math-ematics and Artificial Intelligence 41 (1) (2004) 77–93.
[2]. Sadeghi, R., Banerjee, T., & Romine, W. (2018). Early Hospital Mortality Prediction using Vital Signals. Smart Health, 9-10, 265-274.
[3]. Article Early Hospital Mortality Prediction using Vital Signals