Hi, I am trying to conduct a Logistic Regression to understand predictability of ship recycling locations using Big Data Approach. My label column is actually multiclass with about 15 categories. Initially I made it binary and ran a random train/ test which results 54% AUC. Then I used the actual label column and the model predicts worse, only 25% . It makes somewhat sense why. can adding additional variables do any better? Is there any better technique that I can apply in this case within machine learning techniques?

I will appreciate any thoughts on this.

Similar questions and discussions