Any advice on data mining on imbalanced dataset of Recommender Systems?

More Nidhi Kushwaha's questions See All

Time Series

Hi, Can anybody share good book and tutorial on Time Series. Specially i am looking for R tool for forecasting analysis..

04 May 2017 6,025 1 View

What are the popular measurement techniques used for evaluation of recommender system in both offline and online mode?

Hi all, I want to know the name of measurement techniques used for offline and online evaluation of recommender systems.

05 June 2016 5,566 0 View

Can any one tell the disadvantages of SVD++ over Probabilistic Matrix Factorization approach??

Both are the Factorization model but is there any difference in terms of performance (accuracy)??

04 May 2016 8,126 1 View

In Ph.D thesis, is it possible to mention a literature survey before each chapter?

Is it good practice to mention methodology used before each proposed work in the beginning of the chapter?

03 April 2016 3,955 2 View

Is one can publish own personal data directly to Linked data?

I want to know any work going on in the field of publishing my likings/disliking on Linked Data...

03 April 2016 9,870 0 View

Could you recommend some articles on Urban Transportation System optimization and Innovation?

13 August 2024 2,595 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Why can't academics earn the money they deserve?

Only Journals make money from the articles we have worked on for years. Academics do not earn money from their refereeing. Then shouldn't the solution be a system in which academics can earn...

01 August 2024 6,469 6 View

Michael James Siers Popular answer

Hi Nidhi,

The simplest way of approaching the class imbalance problem is by sampling during pre-processing.

Sampling can be either oversampling or undersampling. Oversampling is done by increasing the number of minority class examples, whereas undersampling is done by decreasing the number of majority class examples.

In your example, the minority class is "1" and the majority class is "5". If you were to choose oversampling, you could duplicate the third record. If you were to choose undersampling, you could remove either the first or second record.

In answer to your question about pre-processing: Sampling is typically done as a pre-processing step.

I would personally recommend the highly popular oversampling algorithm: SMOTE (3000 citations). There is an implementation of SMOTE available in WEKA.

I've attached a link to the SMOTE paper which describes the class imbalance problem very well.

I hope this helped!

Mike.

https://www.jair.org/media/953/live-953-2037-jair.pdf

Mani A.

The question is not very clear.

Whatever use association rule mining on the data.

Turgay Temel

Hello,

You can use collaborative learning algorithm.

1- Build a matrix of entries where rows are customers/clients while the columns are the items to be proposed.

2-Assume that known entries can be represented with regression model then generalize it to predict what might be the nearest neighbor of the unknown entry.

3- Optimize the cost supposing as if all entries are known

4- Find the best matching prediction.

I hope this will help.

Turgay

Michael James Siers

Nidhi Kushwaha

thanks Michael,

But is this not introduce noisy data ?