09 September 2018 2 8K Report

I have couple of datasets wherein each dataset has 10000 rows to 40000 rows. All format, columns and data in these dataset are different. Each of these dataset have anomalies (less than 2%). For example, a 24000 element data set has 172 anomalies. While building models for predicting the anomalies (binary classification), we are seeing a lot of errors which may be due to skewness in datasets. Looking for pointers/ approach or methods to handle datasets with skewed data. Thanks in advance for help.

Similar questions and discussions