As you know, The No Free Lunch Theorem (NFLT) states that:
...for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method.
If your problem has some readily identifiable features (or sub-features), for example long, extremely narrow valleys, or other high-condition number characteristics, then a hybrid method like Loschilov and Glasmacher's
Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES)
might be a good choice. See: arxiv.org/pdf/1605.02720.pdf
Most practitioners recommend using more than one algorithm on difficult problems. That strategy can avoid difficulties where a single algorithm will not succeed reliably, but it does not get around the limitations dictated by the NFLT.
The NFLT is valid only on a set of problems closed under permutations (c.u.p.). It never happens in practice.
And for a set of problem that is _not_ c.u.p. it can be proved that there exists a best algorithm.
Unfortunately the proof is not constructive.
So it is worth trying to improve algorithms and, indeed, we need to know how to compare then.
Here is what I usually do for two stochastic algorithms A1 and A2 on a given problem:
- run 100 times A1, with a given search effort (usually a number of evaluations), plot the CDF_A1 (cumulative distribution function) of the 100 final best results.
- do the same for A2 => CDF_A2
If, on the figure, CDF_A1 is completely "above" CDF_A2, then A1 can safely be said "better", for this function.
And vice versa, of course.
If the two curves cross on say a value r, the conclusion is not that clear, unless you consider only best final values smaller than r. Then you have to be more precise. Something like: "If I accept only final results smaller than r, then _this_ algorithm is better".
Comparing Evolutionary algorithms is very important and can significantly impact the results. There are many good papers written on this topic that might help you in your research:
Benchmarking in optimization: Best practice and open issues (https://arxiv.org/abs/2007.03488)
Best practices for comparing optimization algorithms, Optimization and Engineering (https://arxiv.org/abs/1709.08242)
A critical note on experimental research methodology in EC (https://ieeexplore.ieee.org/document/1006991)
Fairness in bioinspired optimization research: A prescription of methodological guidelines for comparing meta-heuristics (https://arxiv.org/abs/2004.09969)
A conceptual comparison of several metaheuristic algorithms on continuous optimisation problems (https://link.springer.com/article/10.1007/s00521-019-04132-w)
In addition to Miha Ravber answer, you should launch several run for each meta-parameters configuration and objective function and consider the best, worst, median results, and standard deviation
You should apply a statistical test based on p-value to check if there is a significant difference between the two algorithms results.
For fairness, you have to be sure that, for the 2 algorithms, the objective function is called the same number of times...