As far as I remember, regression trees are very robust to outliers and skewed distributions in EXPLANATORY variables. For the response/dependent variable, that's a different story... Transformations can make sense.
Issues with response/dependent variables log-transformations can be resolved by using the "regression random forest" machine learning approach instead of the classical CART analysis
Yes, I understand the advantages of random forests and boosted regression trees but a regular tree is enough in this case. Just found the answer in the Death & Fabricius 2000 paper in Ecology: because of nonconstant variation, transforming the response variable is "often desirable"
I have read the paper of Death & Fabricius 2000 in Ecology too, but I have a question, if I transformed the response variable, then at the terminal nodes, I get the mean of log(response variable) of the groups. It can be explained as the mean of response variable if I didn't transform it, right?
Yes, after you back-transform it. In regression analysis log-transformation induces a bias that should be corrected, but I suppose that regression trees are immune to that.
Bijoy Dey Normally, we don't do that. As for log transformation for the response variable, as far as I know, there should be no big difference between the original and after-log. Please be aware of the metrics, you need to calculate the original target R2 rather than after-log target to make the comparison.