Any recommendations of implementations which parallelize the "information gain" part of a decision tree building algorithm (such as C4.5). Preferably using Hadoop but would also be interested in generic tips.
C5.0 has a parallel implementation of tree with boosting (http://rulequest.com/see5-comparison.html ) but I do not think its Hadoop implementation is yet released!!
Thankyou, Pks . In the link it explains that C5.0 is optimized and faster than C4.5, but doesn't explain exactly how parallelization is used to achieve this. However, it seems in the light of the benchmarking we should be using C5.0 instead of C4.5.