What are the common numbers (in percentages) for splitting the image dataset for training a Yolov3/V4 models? Is there a formal research on this question?
Dr. Alaa Alshebeb , thank you for your interest. At the moment, I am using a 90 - 10 split but this seems more like a rule of thumb than an actual science.
If you have any suggestions or comments, I would be very interested to know about them.
I would answer the question in the context of Machine Learning in general. Train-test split is required inorder to reduce the problem of overfitting, and to generalize the learning algorithm well to future unseen instances. Generally, 10-fold cross validation method happens to be most preferred technique. But, there are instances where 70-30 or 80-20 split is used. This issue did not witness much research interest though. I will share one research work related to this.
Dobbin, K.K., Simon, R.M. Optimally splitting cases for training and testing high dimensional classifiers. BMC Med Genomics 4, 31 (2011). https://doi.org/10.1186/1755-8794-4-31