Given a dataset, is there a systematic way to come to a maximum classification accuracy, beyond which no algorithm or classifier can improve the accuracy ?
In the idealized case with a given closed dataset and a classification (fixed number of classes) problem, the maximum accuracy would be 100%. Every sample is sorted into the correct class. You can't be better than that. But...
...the problems arise when you introduce new data. The generalization capability of your approach and possible overfitting can only be tested on new data. You should separate your dataset before training (train, test, validation set) but every time you rerun your models you reuse data and can overfit it.
Hi Michal Rapczynski , I shouldn't have used the word theoretical maximum. Rather I meant practical maximum in case of noisy true labels. How would we tackle this situation ?
Do you mean with your 'target data is noisy', that the dataset does not have a correct class label for every sample? In this case, the minimal error would be very hard to quantify. It would be dependent on the size of the dataset, the relative number of wrong labels, and different for each used model and training run if you randomize your dataset before training.
For a linear model, there could be a way to solve it mathematically. If you use a non-linear model then you would probably have to test all the possible permutations, which could take very long (days to years depending on the model and data).