Dear experts in Machine Learning,

As you all know, pre-processing the data set is an important step for obtaining robust results in any machine learning algorithm. I know that scikit-learn can be used.

I usually pre-process the NIR spectra with 2sd derivative of SGolay and then I carry bout PLS (Partial Least Squares regression). After, while developing the regression, I removed outlier samples based on the large leverage and high residual variance. However, because I am a beginner in Machine Learning I wonder how I can pre-process my data, in particular, the NIR spectra:

1- Do I have to preprocess the NIR spectra separately?

2- Is Ok to process the whole set of data NIR spectra, color and physical properties with the Standardize of scikit learn?

3- Do you have any experience pre-processing such type of data set (NIR spectra, color, and physical properties)?

Any example, book, paper, link, ect. will be appreciate.

Thanks in advance

Similar questions and discussions