How to calculate the gini-gain of a decision-Tree(Random-Forest)? Bootstrapping?

19 March 2020 4 5K Report

Hi, I’m working on my master thesis, and I would like to explain how the random-forest algorithm works.

I’ve plotted a decision tree of the random-forest and I don’t get how I calculate the Gini-Index and Gini-gain.

First of all, the sample and the values are not the same. This is because of bootstrapping, right?

So bootstrapping gets me from the 1262 samples the subsample-set with the values [514,488,509,505], correct?

The attached file shows the calculation of the gini-Gain of the root-node.

Is my attached calculation of the gini-gain correct? So, for the calculation of the Gini-Index and Gini-Gain I just use the values and not the samples, is that right?

Kind regards Ireno

David Eugene Booth

See the link:

https://www.bing.com/search?FORM=U528DF&PC=U528&q=random+forest+gini-gain

best, D. Booth

Muhammad Mudasser Afzal

Ireno Wälte for decision tree you have to calculate gain or Gini of every feature and then subtract it with the gain of ground truths. So in case of gain ratio choose the maximum and for Gini choose the minimum value for choosing the root node and for every decision we do it again on all features. Your calculation seems different to me, I think there should be some change according to formulas. Also, it doesn't depend on number of data points but if there are n number of categories in a feature then there will be n edges direct from root to a leaf node.

Ireno Wälte

Muhammad Mudasser Afzal i get that i have to calculate for every feature the gini-index and gini-gain. But my tree is already done. The best feature ist Peak_1 with the value 0.46.

The formula for the gini-index i found in the book:

An Introduction to Statistical Learning

http://faculty.marshall.usc.edu/gareth-james/ISL/

So for the gini-gain i found the formula( https://victorzhou.com/blog/intro-to-random-forests/ ) :

Gain=Ginital-p_left*G_left-p_right*G_right

How I calculate p_left?

From the value_array? (514+488+509)/(514+488+509+505)=0.749

Or from the samples? 945/1262=0.7488

Is there a difference between information-gain and gini-gain?

Muhammad Mudasser Afzal

Ireno Wälte

Three popular attribute selection measure:

Information gain

Gain ratio

Gini index

these are the main methods you can use even when applying them using sklearn library. you find the first root node than for the next node you use features other than the root and calculate it's Gini index and info gain. I am sharing a slide related to decision tree which will help you...

Hello researchers Is this a random laser or just fluorescence?

How do I replace a file with a more recent version of a paper that was uploaded to ResearchGate?

Is it possible to plot the atom-projected band structure using GPAW?

How to solve g_mmpbsa error?

How do you delete a duplicate pdf for the same paper on ResearchGate?

AUX gas reading problem on QE with full MS and PRM method in one run?

JCPDS 65-7246 file please?

Dirty and clean?

What is the ionic radius of Hydrogen anion? And how to know the exact ionic radii of the elements?

A Question about Phd thesis?