In machine learning, especially for classification, high quality training dataset is useful for training the classifier model. However, in practice, the label (class name) in training dataset may not be always correctly generated (e.g, when it is generated based on human being judgement). For example, an instance which should be labelled as "A" was labelled as "B", and so on. That means, the training dataset is not in high quality which may lead to low classification performance.
(1) How to validate the quality of a training dataset?
(2) If after validating, we found that the quality is not too high (e.g., around 70%, assuming that 100% is the highest quality) but not too low (e.g., < 60%), how to use this training dataset efficiently (still use the labelled dataset) for classification problem?
Regards,