Please share some more details. The occurrence of an outlier can be random or can be a failure of the experiment or the observation. From a statistical perspective, the reason is less important than the fact it is an outlier and not the part of the distribution of the sample. To justify that sometimes you need more repetition or larger sample to be sure it was an outlier you can neglect.
If you do not mind to share your dataset we could consult. I do not promise solution just fresh eyes.
Outliers in a chemometric model may be those observations with unexpected responses that may not fit properly in the model. It is very difficult to explain the presence of outliers that may be one of the most difficult problems in a chemometric study. Experimental errors in the response data may be a possible reason though some samples in a particular dataset may act through different mechanisms of actions. Moreover, the intrinsic noise related to the experimental data as well as the methods involved regarding the chemometric model development may lead to outliers.
- Outliers are the objects that lie away from the others or do not fit well to the model. However, this does not mean that they should be removed. The data analyst must decide after making sure that they are outliers but not extreme data. Before deciding on the outliers we should check the raw data and if needed we should consult with the person who collected the data.
Possible reasons that the data are different:
1) Erroneous value caused by an instrument
2) Reading mistake
3) Typing / data transcription mistake
4) Sample may have been collected in different conditions
5) May be an accidental extreme
Some tips about outliers:
- Atypical objects or variables
- If it is a result of an error, it should be eliminated
- If not, it may give us very important info
- Don’t include true outliers, don’t remove false outliers
- How to decide? Experience is very important
- Not removing outliers that should be removed causes us to model noise instead of important info
- Removing outliers that should not be removed makes us to miss important info (we may not describe all phenomena hidden in the data).