Suppose we have to algorithm (A and B) to solve a multi-objective problem. Each algorithm provides a set of solutions. Which statistical test is appropriate to compare these algorithms? Is Wilcoxon test appropriate?
To check the statistical significance of pairwise differences among solutions, the Wilcoxon test is fine and the Chi-square test also would work. To perform these tests, the null hypothesis is that there is no significant difference between the two solutions at a significant level (e.g., 5 %). The p value and z value are used to assess the significance of differences between the solutions. When the p value is less than the significant level (0.05) and the z value exceeds the critical values of z (−1.96 and +1.96), the null hypothesis is rejected, meaning that the performance of the algorithms is significantly different.