Hello everyone,
I am interested in comparing three models for a regression problem: Random Forest (RF), eXtreme Gradient Boosting, and a baseline model which simply takes the mean of available data in the training set. I am planning to use nested cross-validation to tune the parameters for the RF and XGBoost model. Evaluation metrics I am planning to use are MSE and MAE.
I want to find out which model significantly performs best:
- Should I perform a Wilcoxon Signed Rank test or Friedman's test combined with Nemenyi Test?
- Should I compare the performance of the models on all the folds or only on the average performance of the folds on the training set?
Any help will be highly appreciated.