Hello,

I'm trying to model my dataset with decision trees in Python. I have 15 categorical and 8 numerical attributes. Since I can't introduce the strings to the classifier, I applied one-hot encoding to the categories however some of the categories have 40 or above levels. Therefore dimensionality increased too much. Except the computational cost and overfitting possibility, I also realized that the categories are treated like continuous variables despite being one-hot encoded. For example, I saw silver

More Gülşah Gül's questions See All
Similar questions and discussions