Hi, I’m working on my master thesis, and I would like to explain how the random-forest algorithm works.
I’ve plotted a decision tree of the random-forest and I don’t get how I calculate the Gini-Index and Gini-gain.
First of all, the sample and the values are not the same. This is because of bootstrapping, right?
So bootstrapping gets me from the 1262 samples the subsample-set with the values [514,488,509,505], correct?
The attached file shows the calculation of the gini-Gain of the root-node.
Is my attached calculation of the gini-gain correct? So, for the calculation of the Gini-Index and Gini-Gain I just use the values and not the samples, is that right?
Kind regards Ireno