Paradox of explicit rating for collaborative filtering-based recommendation system?

24 March 2021 0 7K Report

In the process of investigating the effect of negative cases on model performance using the Movielens 100k dataset, I’ve got a question. I did two experiments to evaluate model performance.

In the first experiment, 55,375 cases with ratings 4 and 5 were extracted from the Movielens 100k dataset, classified as positive cases (target=1), and 17,480 cases with ratings 1 and 2 were extracted as negative cases (target=0). After constructing the training model, the results of performance evaluation are as follows.

--------------------------------------

precision recall f1-score support 0 0.67 0.47 0.55 5200 1 0.85 0.93 0.89 16657 accuracy 0.82 21857

aucs = 0.8306274331419916 rmse = 0.36533634653541674

In the second experiment, 55,375 cases with ratings 4 and 5 were extracted from the Movielens 100k dataset and classified into positive cases (target=1), and for negative cases, 17,480 cases were randomly extracted from unknown cells, and the training data was constructed and the performance was evaluated. The evaluation results are as follows.

-----------------------------------

precision recall f1-score support 0 0.77 0.60 0.67 5292 1 0.88 0.94 0.91 16565 accuracy 0.86 21857

aucs = 0.8838642248327038 rmse = 0.325668345531158

Initially, I thought that the performance of the first experiment would be better than that of the second experiment. This is because the first experimental model was expected to perform better than the model using randomly extracted negative cases because even negative case data contained user preference patterns. For example, in the process of randomly generating negative cases for a user who likes the SF genre, some another SF movies can be added as a negative cases. By this reason, the performance of the recommended model will inevitably be degraded due to the mixture of information that the user likes and dislikes SF movies. However, the results showed that the performance of the second experimental model, which randomly generated negative cases, was better than the first experimental model, contrary to the idea. Is it because there is a lot of noise in the user rating information? What do you think is the reason?

Badges
Science topic

Similar topics
Analytical Chemistry
Column

More Hee Seok Song's questions See All

What is the meaning of Duhkin length?

I work in the field of nanofluids. While reading literature I come across the concept of "Duhkin length", I currently only know the meaning of Debye length in this area. Could you explain the...

29 May 2024 3,530 1 View

Issues with interactions in a linear mixed-effects model?

Seeking assistance: I'm in need of assistance with my analysis. I'm currently conducting a study utilizing a linear mixed-effects model with a 2x3 repeated measures design. Upon examining the...

16 May 2024 8,226 2 View

How to solve Pulsed Nano-Electrospray Ionization Corona discharge occurs?

May I ask if there is a scholar to help me solve the problem? There is an obvious blue discharge phenomenon at the tip of the glass capillary tube. The diameter of the glass capillary tube is...

14 May 2024 1,592 2 View

How to simulate plasma jet by COMSOL?

In COMSOL, how to use the gas concentration distribution results obtained by steady-state calculation as initial condition for plasma module for subsequent calculations? The commonly used data...

09 May 2024 7,335 0 View

Only middle wells of iPSC differentiate into cardiomyocytes?

Maintaining a WiCell fibroblast-derived line of iPSC in monolayer with laminin coating and E8 media, then using Wnti directed protocol (Gsk3i day 0, Wnti with conditioned media day 3, insulin day...

02 May 2024 6,314 1 View

What are the appropriate pH and temperature conditions I should use for the EDC/NHSS coupling reaction of the amino and carboxyl groups?

I want to attach some chemical molecules to my substrate, and I need to use the EDC/NHSS coupling reaction between the amino group and the carboxyl group. However, my first experiment failed, so I...

23 April 2024 3,953 6 View

How to calculate the Free energy in a coarse grained molecular dynamics?

In a coarse grained model, such as DPD (dissipative particle dynamics) or MD(molecular dynamics), how to calculate the entropy and the interaction energy so to calculate the free energy?

19 April 2024 3,489 3 View

What is the difference between PIC/MCC and fulid model?

From bolztmann equation to Fulid model or Drift-Diffusion model, What assumption we made? such as maxwell distribution or local thermal equilibrium.

08 April 2024 1,968 1 View

When cultured optimum pH AOB(Ammonia Oxidation Bacteria)?

AOB (Ammonia Oxidation Bacteria) was cultured pH 7.8(NaOH), NH4Cl medium A few hours later change pH 8.8. Does pH usually rise after culture?

31 March 2024 4,899 0 View

A260/230 ratio issue when RNA isolation from primary cell?

Hello Currently, we are separating primary cells from mouse spleen or thymus into single cells, and then diluting them by 1/10 from 10^7 cells by cell count to see which line mRNA can be...

20 March 2024 5,467 0 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View