According to the Friedman's Stochastic Gradient Boosting algorithm, the variable space is divided into J regions if the regression tree has J leaf nodes. But Hastie et al. suggested that J > 10 is unlikely to be required (from his empirical experiments). But what I think is, if we divide the variable space into more number of regions then we will get more accuracy in fitting the residuals ( we may also face overfitting..). I know it is an empirical statement given by Hastie et al.. But what is the actual logic behind this concept? Should we always check for higher number of leaf nodes or we should blindly choose a number in between 4 to 8? If we are getting a better prediction model using more than 10 leaf nodes then up to what extent I should check for getting the exact number of leaf nodes for my regression tree that gives the best prediction model for my particular data set?

Similar questions and discussions