Different tasks require different data quality measures. However, many data scientists and researchers tend to agree on a few dimensions of the high quality datasets, which they consider for big data projects. First and foremost, the dataset itself matters. Balance and variety of data points within it are an indicator of how well will the algorithm be able to predict further similar points and patterns. As an example, let's think of an autonomous vehicle training dataset, which is supposed to train AI in differentiating between moving and motionless vehicles. If it contains 90% of images of moving cars but only 10% of those parked, it is considered imbalanced. Naturally, this could lead to a high chance of error. To solve this issue, techniques such as oversampling, downsampling or weight balancing are introduced.
1) Is there a criterion for minimum class label participation? If so, how to satisfy it.
With machine learning in general it is best to have balanced classes in your dataset, as Shafagat Mahmudova mentioned you can use techniques such as weight balancing, downsampling and oversampling. In your case with a 90+% dominance I would recommend doing a mixture of all three techniques for the best performance. Downsample your largest class, oversample your smallest class but not too much as this leads to overfitting these classes, and then use weight balancing.
2) What are the algorithm and benchmarking for validation of labelling quality?
My research has been about using tiny datasets for deep learning, so my first question is how big is your dataset? In my research I have found that there is an inverse correlation between the number of images you have per class and the accuracy of ground truth labels needed.
If you have less than 500 image I would recommend very accurate annotations, and less than 100 images near pixel perfect annotations. Above 1K images the annotations can be looser fitting round your objects because the models can learn the boundaries them selves.
I want to let you know how grateful I am to Shafagat Mahmudova for those methods of oversampling, downsampling or weight balancing, and to Thomas Smith for having recommend adapting accurate annotations with the size of dataset. I wanted to thank you all Abdelhameed Ibrahim and Aravinda C V for the helpful articles.