Dear experts in Machine Learning,
As you all know, pre-processing the data set is an important step for obtaining robust results in any machine learning algorithm. I know that scikit-learn can be used.
I usually pre-process the NIR spectra with 2sd derivative of SGolay and then I carry bout PLS (Partial Least Squares regression). After, while developing the regression, I removed outlier samples based on the large leverage and high residual variance. However, because I am a beginner in Machine Learning I wonder how I can pre-process my data, in particular, the NIR spectra:
1- Do I have to preprocess the NIR spectra separately?
2- Is Ok to process the whole set of data NIR spectra, color and physical properties with the Standardize of scikit learn?
3- Do you have any experience pre-processing such type of data set (NIR spectra, color, and physical properties)?
Any example, book, paper, link, ect. will be appreciate.
Thanks in advance