Hello all,
Pls. tell me how to do data mining on imbalanced dataset of Recommender Systems.
As per my knowledge one need to create training and testing files for each user, having their ratings as the class label. For Ex:
Suppose user have given Item1, Item4 and Item3 rating 5, 5 and1 respectively and we want to predict his rating for Item6
For a user1 the training data will be:
User1_F1,User1_F2,......................Item1_F1,Item1_F2....,5
User1_F1,User1_F2,......................Item4_F1,Item4_F2....,5
User1_F1,User1_F2,......................Item3_F1,Item3_F2....,1
// User1_F1 (shows the feature of user1) & Item3_F1 (shows the feature of item3) and so on...
testing data will be:
User1_F1,User1_F2,......................Item6_F1,Item6_F2....,?
Please correct me if i am wrong....
Here,as we can see class label 1 comes only one time but class 5 comes two times, how to remove this imbalanced dataset problem?
And also tell me how to handle imbalanced data or any tool that can do pre-processing before applying Data Mining techniques on this data?