What is the best strategy for coping with, both the challenges of large dimensionality and data imbalance?

More Asifullah Khan's questions See All

In case of intrusion detection through Machine learning, how could one tackle the polymorphism aspect of a malware?

Detecting behaviour of a malware with polymorphism attribute.

05 June 2019 8,025 4 View

In case of light weight CNN architectures, Which strategy is best?

For limited hardware resources, which ones are best? Is it the CNN architectures based on depthwise convolution or any other concept?

04 May 2019 9,447 2 View

In Machine learning, is it necessary that the training and test distribution be the same?

What if the test distribution is a little different than the training distribution.

03 April 2019 5,166 12 View

Can we train an AI system to guide us about the importance and worth of recent innovations?

I mean how can we tell that an existing idea/innovation is going to be very fruitful in next ten years or so. Can we train an AI system for this that can predict the dominance/worth of existing...

04 May 2018 1,803 2 View

Is it its possible that in future, each person has an adaptive AI System (like adaptive Deep NN) and wearable gadgets?

I think, its possible to have in future, an Adaptive AI System (like Adaptive Deep NN ) and wearable gadgets for each person. The Adaptive Deep NN will be smart enough to collect and analyse the...

04 May 2018 9,517 1 View

What is the best weight initialization strategy in Deep Convolutional Neural Notworks?

Setting initial parameters is quite important in many ML and specifically, in deep neural networks. Weights and biases are the parameters of a deep neural Network. So what is the best weight...

02 March 2018 8,040 4 View

Is ResNet a type of an ensemble Neural Network?

Resnet and other recent deep NN architectures seem to be Ensemble Neural Networks. Combiner/ MetaClassifiers

01 February 2018 9,284 1 View

How far we can go in bringing innovations in the architectures of Deep Convolutional Neural Networks ?

We have two basic aspects of Artificial Neural Networks; (a) Architecture and (b) Learning Strategy. How far we can go in bringing innovations in the Architectures of Deep Convolutional Neural...

31 December 2017 8,860 3 View

Is split,Transform, and Merge concept good for Recent Convolutional Neural Networks?

It is was proposed in Inception Architecture and is being exploited in recent CNN Architectures. Is it good in combination with depth of the architecture.

31 December 2017 7,701 1 View

What could be the strategies to avoid overfitting in deep neural Networks

Deep neural networks have high representational capacity and have gained much success in recent years. However, with the ability of high representational capacity, it can also suffer from...

05 June 2017 3,704 8 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Can you connect an HPLC to a Mass Spec only at a certain time point?

Can anyone explain this method? Especially the last statement where it says only at 1.5 to 2.5mins was the MS/MS connected to the UPLC. How is that possible, is it a feature in this specific...

11 August 2024 8,141 3 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Michal Hradis Popular answer

Asifullah,

It would be very helpful if you could give more information about the problem you are facing. I assume there is a concrete task behind your question.

I only assume your task is classification. How many classes do you have? How much is the dataset unbalanced? Is the distribution of classes in your training data the same as in the target application (test data)? How long are the feature vectors? Do you want to use a specific learning method? Does your data set cause you any specific problems?

For example with 2-class SVMs and neural networks, you rarely have to do anything special for unbalanced training sets. Usually, it is sufficient to calibrate the final classifier properly (set proper decision threshold).

Regards.

PS: I can't help it, I have to respond to the answer of Indrajit Mandal. Indrajit, do you think your answer is helpful in any way? I would strongly question the value of what you wrote.

Dr. Indrajit Mandal

for data imbalance problems, there are specific algorithms that handles this.

check out the literature.

BEST,

INDRAJIT MANDAL

Michal Hradis

Abdulbasit Al-Talabani

How large is the dimension?. For very large dimension PCA or Random Projection can be used to reduce the dimension. RP is data independent transformation and it shows good ability to dense the data space. For imbalance data set SMOTE is well known to solve this issue.

M. Ramakrishna Murty

There are number of methods to use to reduce dimensional like SVD( Singular Value Decomposition), PCA (Principal Component Analysis) , ICA( Independent Component Analysis) etc. Select the method based on your application. For text data normally use Singular Value Decomposition like that. You select methods based on your application

Simone Scardapane

You may be interested in the paper "Learning from Imbalanced Data" from He and Garcia. In particular, one of the issues over which they focus is "the combination of imbalanced data and the small sample size problem" (small sample size being the major problem of high dimensional input). For example, they mention a few techniques for dealing with both problems in Section 3.4.

@Michal Hradis: it seems to me that Mandal is making a large number of such comments, together with downvoting a lot of answers which contradicts him. Maybe we should signal this to the RG staff? (Sorry for going off-topic.)

Asifullah Khan

Thanks to all. Well, we are facing both the problems of high dimensionality and imbalance of examples in our Churn-Prediction problem (Michal; our small dataset is of 50,000 samples, 200 features, and with minority class of 3.7%; Binary-Class problem).

As mentioned by Natalia, we had used SMOTE earlier (data processing). But, we have not tried any special algorithms customized towards imbalanced examples such as some SVMs are reported for this purpose. Similarly, we did use PSO, mRMR as well for dimensionality reduction. But the problem is that once you decide about both dimentionality-reduction and balancing, which one has to be performed first. They are going to effect each other. In my opinion, First data balancing should be performed. What you suggest?