What is the new technique to handle imbalanced datasets for classification?

More Viswapriya Elangovan's questions See All

Can stroke disease be predicted by integrating different disease datasets with stroke dataset through knowledge graph? will it improve accuracy?

I came across an article that mentioned this problem as future work, is this possible?

31 August 2023 9,526 1 View

How can i get proper bands in sds page for type 1 and 3 collagen ?

hi everyone... i am working on bovine collagen protein but i am not getting proper bands in SDS PAGE. please give me some suggestion . thanks.

17 June 2023 6,234 2 View

Is it possible to combine image dataset and text dataset of different patients for classification?

is it possible to use multimodal data of different patients for classifying a disease?

14 April 2023 7,599 0 View

Why output conductance is high in parallel GaN HEMTs cascode configuration?

When multiple GaN DHEMTs are connected in parallel, their output currents are combined to provide a higher overall output conductance.

11 April 2023 7,276 3 View

Which Bio-inspired algorithm more suitable for optimizing CNN model?

Bio-inspired algorithm also known as nature-inspired algorithm. there are huge number of algorithms, difficult to choose the best to optimize CNN model. iam developing stroke prediction model...

28 February 2023 6,769 3 View

Either tabular or image dataset be used to predict brain stroke disease? which could perform well?

I want to know which kind of dataset more suitable for stroke prediction, can we use MRI/CT Scan for stroke prediction or stroke detection?

23 November 2022 6,535 0 View

Can we use image or tabular data for disease prediction? Which is more efficient?

In Deep Learning Classifier.

21 November 2022 7,629 4 View

How to find write and read time of SRAM cell using cadence virtuoso ?

can any one help me to do the time analysis of SRAM cell. How to find write and read time of SRAM cell from its transient time analysis waveform using calculator. i have used the following...

04 June 2021 4,207 4 View

How to find write and read time of SRAM cell ?

Can any one tell me how to calculate write and read time of SRAM cell fro its simulation waveform.

27 May 2021 9,476 2 View

Dear all, Can any one say how to calculate convective mass transfer coefficient and evaporative heat transfer coefficient in fluent?

Is it possible to find in fluent? Any other method is there?

10 September 2020 5,648 3 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

How to isolate lymphocytes from mouse spleen?

I have tried several times to isolate lymphocytes from mouse spleen, but all attempts have been unsuccessful. I tried most available protocols. I used different dissociation media (HBSS with Ca...

04 August 2024 9,913 7 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Talking therapies for bipolar, psychology?

what is the best research evidence for psychological interventions for Bipolar?

01 August 2024 6,023 2 View

What are some diseases that are caused by overactivity of enzymes?

I want to choose a resarch topic regarding enzyme inhibition. So I did my research and found out most of the diseases that originate from enzymes were actually caused by the "deficiency" of...

30 July 2024 2,483 5 View

Recruitment for Postpartum Mental Health Research?

We are currently recruiting for two studies on postpartum mental health as part of our work at the Perinatal Mental Health Research Lab at Alliant International University. If you or someone you...

30 July 2024 2,950 0 View

Entropy measure and QSPR modeling in Graph Theor. How to construct the table for lengthy equation?

The entropy measured of molecular graphs plays a crucial rule. The network structures in some cases are very lengthy calculations to handle. Some author avoid to construct table where as most...

30 July 2024 3,126 0 View

Inès François

Usually, I split my dataset into two subsets: training (70 or 80% and testing 30 or 20%) and I did the 10-folds-cross-validation on the training dataset, thus I am sure that all observations are integrated into the learning stage even if I have a proportion of outputs very imbalanced e.g., 85% for the level 1 and 15% for the level 2. You can also optimize the different parameters dependent on the selected model for example the shrinkage rate which is the speed of learning.

Oger Amanuel

First, you should get features from the data that have the highest weight and label them as (class 1), then divide the features as follows: training, testing, validation

second, apply any classification algorithm you think is good to deal with your data.

The predictions will know if the new data are belong to (class 1), otherwise, the data are classified as (class 2).

If you have more than two classes you should do the same process to extract features for each class no matter how the weight is, and classify new data according the constraints and criterias.

Best regards

Gaetano Zazzaro

Dear All,

The question posed by Viswapriya Elangovan is very complex, and there is a vast literature attempting to provide an answer.

Inès François makes valid points but does not directly address the specific question regarding new techniques to address (and overcome) the imbalanced class problem in Data Science.

On the other hand, Oger Amanuel's response is cryptic and not very helpful: dividing features into classes is a pointless operation, unless Oger meant something else and got confused. If confused with the terminology, what he suggests is redundant.

However, to clarify, it is not easy to determine when one is dealing with the problem of imbalanced classes in Data Mining. Typically, an imbalance index is calculated as the ratio between the elements of the positive class (minority) and the negative class (majority) (there are also other ways to calculate the index, such as using entropy). It is not known at what threshold of the index one can speak of an imbalanced class, but certainly, based on my experience, 85% and 15% do not pose a problem for the majority of machine learning algorithms, and Ines does not bring anything new to the table.

The approaches to overcome the imbalanced class problem (for example, if less than 1% of the data points in the dataset belong to the positive class) fundamentally remain two:

1) Data resampling: random undersampling (with various sub-samplings of the majority class, and then MCC can be used to determine which one to consider), oversampling with the creation of synthetic data, combining under and over.

2) Cost-sensitive: using a cost matrix during the learning phase.

Other techniques are less general and may depend on the nature of the problem and the data being analyzed.

Mahantesh M Sajjan

A widely adopted and perhaps the most straightforward method for dealing with highly imbalanced datasets is called resampling. It consists of removing samples from the majority class (under-sampling) and/or adding more examples from the minority class (over-sampling).

Yu Wang

I would like to add a method, which is the utilization of generative adversarial networks as data augmentation method.

Lamido Yahaya

The best technique to handle imbalanced datasets is SMOTE. There are other techniques such as Oversampling, etc. but SMOTE is the best.