01 January 2018 7 5K Report

Dear all respectful researchers,

I am working on a structured biomedical dataset that consists of many data type inconsistency, outliers and missing values (instances) on seven independent variables (attributes). I am considering to perform pre-processing methods such as data standardization and also imputations to improve the issues mentioned above. However, there are two version of the pre-processing methods, that is, supervised and unsupervised ones.

My main two questions regarding the common practice are:

1. Should I perform unsupervised discretisation method on the dataset to solve data type issue when, subsequently, I conduct cluster analysis using k-means cluster algorithm?

2. After completing the first clustering task above, should I perform supervised discretisation method on the same dataset when I train the model for classification task using supervised machine learning algorithms?

Thank you for your information and experience.

Similar questions and discussions