Hi, I’m working on my master thesis, and I would like to explain how the random-forest algorithm works.

I’ve plotted a decision tree of the random-forest and I don’t get how I calculate the Gini-Index and Gini-gain.

First of all, the sample and the values are not the same. This is because of bootstrapping, right?

So bootstrapping gets me from the 1262 samples the subsample-set with the values [514,488,509,505], correct?

The attached file shows the calculation of the gini-Gain of the root-node.

Is my attached calculation of the gini-gain correct? So, for the calculation of the Gini-Index and Gini-Gain I just use the values and not the samples, is that right?

Kind regards Ireno

Similar questions and discussions