AI Datasets can serve multiple phase purpose:
-1) training dataset
To learn the weights of the neurons
-2) checking dataset
An independent Dataset assessing that the trained model works properly for the problem studied
-3) operational real life dataset
Once the model is trained and checked, using it in automatic/independent mode.
Are there criteria on such sets to ensure they are fit for purpose?
Size, quality, homogeneity, représentativity, statistical relevance?