Hi everyone,

Hope you're all okay !

I'm working on feature's importance and selection on the LDOS-CoMoDa dataset which a reference in CARS(Contextual-Aware Recommender System), a dataset that contains 12 contextual features and others which are static related to items and users, for movie ratings and recommendation.

My issue is the dataset is completely numerical (all the features are numerical values, even the categorical ones), and because it's a tabular dataset, I'm applying tree-based and ensemble methods like random Forest, XGBoost and other algorithm to assess features importance and selection, but, the results are mediocre (R2 square = 0.36 with optimization using Optuna).

My question is: how can I increase this result to higher performance ? I'm still focusing on features preprocessing and engineering, but i'm getting lost, I don't understand the problem ?

If someone has already worked with this dataset and performed such kind of analysis, please provide me with more explanations.

The csv file related to the dataset is joined to this message below.

Thank you !

More Nassim Lateb's questions See All
Similar questions and discussions