I have observed that the performance of a machine learning model trained on a larger dataset is not as convincing as the one trained on a smaller (different) dataset. What quantitative measures I can use to describe the variance across these datasets to interpret the difference in the decision making ability of the model? Do we have a generalized measure to describe these variances across datasets to choose the best set for Model training?