Dear experts in Machine Learning,

As you all know, pre-processing the data set is an important step for obtaining robust results in any machine learning algorithm. I know that scikit-learn can be used.

I usually pre-process the NIR spectra with 2sd derivative of SGolay and then I carry bout PLS (Partial Least Squares regression). After, while developing the regression, I removed outlier samples based on the large leverage and high residual variance. However, because I am a beginner in Machine Learning I wonder how I can pre-process my data, in particular, the NIR spectra:

1- Do I have to preprocess the NIR spectra separately?

2- Is Ok to process the whole set of data NIR spectra, color and physical properties with the Standardize of scikit learn?

3- Do you have any experience pre-processing such type of data set (NIR spectra, color, and physical properties)?

Any example, book, paper, link, ect. will be appreciate.

Thanks in advance

More Edenio Olivares Diaz's questions See All
Similar questions and discussions