I am tackled with a industrial research issue in which a massive-scale data which is mostly a stream data is about to be processed for the purpose of outlier detection. The problem is that there are some labels for the so-wanted outliers in the data, even though they are not reliable and thus we should discard them.

My approach to resolve the problem is mainly revolving around unsupervised techniques, although my employer insists on finding a trainable supervised technique by which there will be a major need to have outlier label for each individual data point. In other words, he has got trust issues with unsupervised techniques.

Now, my concern is whether there is any official and valid approach to generate outlier labels, at least to some meaningful extent, especially for a massive-scale data? I have done some research in this regard and also have experience in outlier/anomaly detection, nevertheless, it would be an honor to learn from other scholars here.

Much appreciated

More Sayyed Ahmad Naghavi Nozad's questions See All
Similar questions and discussions