How can I deal with imbalanced data in regression problems?

More Masoud Masoumi Moghadam's questions See All

I am planning to publish a book on Springer. Can you provide guidance on how to create a book proposal?

Additionally, I welcome any general advice and suggestions you may have.

11 May 2024 5,419 1 View

What are the key and important parameters in the design of switches based on photonic crystal?

What are the key and important parameters in the design of switches based on photonic crystal? How can we have an ideal switch?

09 May 2024 7,074 2 View

What is the effect of refractive index changes on the output wavelength of all-optical sensors?

30 April 2024 540 1 View

How can I improve tissue quality and prevent neuronal depolarization when using the patch clamp technique?

Hi I use patch clamp technique to do my PhD project. To prepare slides of brain samples, I treat them for one hour at 32 degrees Celsius in the cutting solution, and then I treat them for half an...

25 April 2024 3,801 3 View

What are the differences between Amine Value and Amine Content? How can I evaluate these two concepts?

Please introduce me to some references and provide formulas. Thanks

11 March 2024 1,845 1 View

What is the best dye for staining exosome membranes?

Hi Does anyone know which dye can be used for staining exosome membranes, aside from PKH67? Thank you in advance for your help

05 March 2024 7,081 3 View

Is it possible to achieve a high quality factor in the structure of the rings created in the crystal photonic substrate?

In the structures of combined rings, it is difficult to have completely symmetrical rings due to the delay in response time. Also, it is very difficult to reach a quality factor above 5000 in...

23 February 2024 1,222 4 View

What is industrial applications of SPD processes?

Dear researchers, as you know, the subject of severe plastic deformation (SPD) methods (ECAP, TT, HPT, ARB and so on) or ultrafine grained metals has been researched on for over 20 years. But I...

16 February 2024 1,096 3 View

What colors can be used to check the entry of exosomes into the cell?

Hi, experts. I want to transfer the drug to the cancer cell by exosome. To confirm the entry of exosomes into the cell, what dye can be used, other than PKH67, which would be both easy and cheap?

06 February 2024 9,194 0 View

Are You an Expert in Plant Abiotic Stress Response? Interested in Collaborating on Paper Writing?

I have been recently working on abiotic stresses (drought, high temperature, salinity and cold) responses in wheat (Triticum aestivum) using meta-analysis of transcriptomics (microarray) data. The...

04 February 2024 1,751 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Geotechnical Engineering (Proceedings of the ICE) time review?

Hello everyone, I recently submitted an article to Geotechnical Engineering (Proceedings of the ICE), and the current status has been listed as "EiC Pre-assessment: Ready" for the past 20 days. I...

10 August 2024 6,493 1 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

Jochen Wilhelm

" Do you think this conclusion is correct? "

Yes, as you have little data for middle speeds, the prediction is uncertain, leading to many "quite false" point estimates.

It is striking that the observed speeds are either quite low or quite high. Speeds inbetween seem to be got only in transitions from high to low speeds (and vice versa), so these speeds are not stable and may not be reflected much by the current situation (what makes it even more difficult or impossible as well as unmeaningful to predict these values).

For such data, a simpler "classification" into high-speed vs. low-speed would be more appropriate, to my understanding (what might be very wrong, though). The "in-between" speeds would be classified with accordingly similar probability to "high" and "low".

I would take this observation to investigate why these intermediate speeds are so rare and find a plausible real-world explanation that would then help to better model the data.

Masoud Masoumi Moghadam

Thanks for the answer. I have got an Idea to tackle the problem, but I am not sure if it's working or not.

add a categorical feature to data and set "class A" for speed in range of [22, 45] (where data points lack) and set the other data as "class B".

Then I use "SMOTE" tools to oversample class A or generating synthetic data for class A.

could this be a good solution if there were no problem with data observation?

I don't know. I still think the relevant problem (is the speed high or low) can be tackled in a considerably simpler way, unless there is some valuable information content in the slight speed differences within these groups (what I think is not the case). But since I don't understand the details of your research, I might be wrong. It's just my impression "from outside"...

Xiaoqun Yu

if you are interest in SMOTE method, check it in the this website https://towardsdatascience.com/machine-learning-multiclass-classification-with-imbalanced-data-set-29f6a177c1a. For the detail, you could check the imblearn module in python.

Mekonnen H Daba

Interesting question and discussion...

Paula Branco

There are already some methods to deal with imbalanced regression problems. For instance, you can use SMOTE for regression :Conference Paper SMOTE for Regression

You also have an R package (UBL) available on CRAN (https://cran.r-project.org/web/packages/UBL/index.html) with more pre-processing methods that allow you to deal with this type of problems.

This package has SMOTE for regression implemented as well as several other alternative pre-processing methods.

Nick Kunz

As previously mentioned by Paula Branco, I think what might help you given your problem is her Synthetic Minority Over-Sampling Technique for Regression (SMOTER).

If you're a Python user, I'm currently working to improve my implementation of the SMOGN algorithm, a variant of SMOTER. https://github.com/nickkunz/smogn

Also, there are a few examples on Kaggle that have applied SMOGN to improve their prediction results. https://www.kaggle.com/aleksandradeis/regression-addressing-extreme-rare-cases