Usually, I split my dataset into two subsets: training (70 or 80% and testing 30 or 20%) and I did the 10-folds-cross-validation on the training dataset, thus I am sure that all observations are integrated into the learning stage even if I have a proportion of outputs very imbalanced e.g., 85% for the level 1 and 15% for the level 2. You can also optimize the different parameters dependent on the selected model for example the shrinkage rate which is the speed of learning.
First, you should get features from the data that have the highest weight and label them as (class 1), then divide the features as follows: training, testing, validation
second, apply any classification algorithm you think is good to deal with your data.
The predictions will know if the new data are belong to (class 1), otherwise, the data are classified as (class 2).
If you have more than two classes you should do the same process to extract features for each class no matter how the weight is, and classify new data according the constraints and criterias.
The question posed by Viswapriya Elangovan is very complex, and there is a vast literature attempting to provide an answer.
Inès François makes valid points but does not directly address the specific question regarding new techniques to address (and overcome) the imbalanced class problem in Data Science.
On the other hand, Oger Amanuel's response is cryptic and not very helpful: dividing features into classes is a pointless operation, unless Oger meant something else and got confused. If confused with the terminology, what he suggests is redundant.
However, to clarify, it is not easy to determine when one is dealing with the problem of imbalanced classes in Data Mining. Typically, an imbalance index is calculated as the ratio between the elements of the positive class (minority) and the negative class (majority) (there are also other ways to calculate the index, such as using entropy). It is not known at what threshold of the index one can speak of an imbalanced class, but certainly, based on my experience, 85% and 15% do not pose a problem for the majority of machine learning algorithms, and Ines does not bring anything new to the table.
The approaches to overcome the imbalanced class problem (for example, if less than 1% of the data points in the dataset belong to the positive class) fundamentally remain two:
1) Data resampling: random undersampling (with various sub-samplings of the majority class, and then MCC can be used to determine which one to consider), oversampling with the creation of synthetic data, combining under and over.
2) Cost-sensitive: using a cost matrix during the learning phase.
Other techniques are less general and may depend on the nature of the problem and the data being analyzed.
A widely adopted and perhaps the most straightforward method for dealing with highly imbalanced datasets is called resampling. It consists of removing samples from the majority class (under-sampling) and/or adding more examples from the minority class (over-sampling).