I am working on an imbalanced dataset that contains 1567 samples. I am a bit confused about how to evaluate machine learning models. I found some papers used K cross-validation for the whole dataset then evaluate the model based on the mean of K metrics. Others split the data into training and testing, then apply K cross-validation for the training set to build the model and for hyperparameter tunning and finally evaluate the model based on the test set.