I would like to write my master thesis on how well machine learning evaluation metrics align with statistical estimation properties.
For example, is it possible to use the F1 score to draw conclusions about the unbiasedness (and consistency) of the estimator for the population parameter?
I feel like there is no literature on this topic yet. Has anyone already heard of this issue?
I would really appreciate any response!