In the process of investigating the effect of negative cases on model performance using the Movielens 100k dataset, I’ve got a question. I did two experiments to evaluate model performance.

In the first experiment, 55,375 cases with ratings 4 and 5 were extracted from the Movielens 100k dataset, classified as positive cases (target=1), and 17,480 cases with ratings 1 and 2 were extracted as negative cases (target=0). After constructing the training model, the results of performance evaluation are as follows.

--------------------------------------

precision recall f1-score support 0 0.67 0.47 0.55 5200 1 0.85 0.93 0.89 16657 accuracy 0.82 21857

aucs = 0.8306274331419916 rmse = 0.36533634653541674

In the second experiment, 55,375 cases with ratings 4 and 5 were extracted from the Movielens 100k dataset and classified into positive cases (target=1), and for negative cases, 17,480 cases were randomly extracted from unknown cells, and the training data was constructed and the performance was evaluated. The evaluation results are as follows.

-----------------------------------

precision recall f1-score support 0 0.77 0.60 0.67 5292 1 0.88 0.94 0.91 16565 accuracy 0.86 21857

aucs = 0.8838642248327038 rmse = 0.325668345531158

Initially, I thought that the performance of the first experiment would be better than that of the second experiment. This is because the first experimental model was expected to perform better than the model using randomly extracted negative cases because even negative case data contained user preference patterns. For example, in the process of randomly generating negative cases for a user who likes the SF genre, some another SF movies can be added as a negative cases. By this reason, the performance of the recommended model will inevitably be degraded due to the mixture of information that the user likes and dislikes SF movies. However, the results showed that the performance of the second experimental model, which randomly generated negative cases, was better than the first experimental model, contrary to the idea. Is it because there is a lot of noise in the user rating information? What do you think is the reason?

More Hee Seok Song's questions See All
Similar questions and discussions