Which are the best measures for comparing heuristic algorithms?

Antonio,

your enumeration of problems seems to suggest that you are mostly working on optimization. There are several possibilities for assessing the quality of an optimization algorithm. Among them are:

1. The time until the algorithm stops

2. The time until the algorithm finds a global optimum

3. The time until the algorithm finds a feasible point whose objective function value is within x% of the global optimum

4. The time until the algorithm finds the first feasible point

5.-8. The number of function evaluations of the objective/constraint functions until ... (see above for time)

9. The objective function value / constraint violation after N function evaluations

10. The objective function value / constraint violation after time T

11. The maximal problem dimension for which the algorithm produced a result

12. The order p such that the solution time depends O(n^p) on the dimension n

13. The percentage of problems for which the algorithm failed (i.e. did not find a feasible point, crashed, did not stop, etc.)

14. The percentage of problems for which the algorithm got stuck far away from the global minimum

15. The percentage of problems for which the algorithm was the optimal solver

16. The percentage of problems for which the algorithm was within x% of the optimal solver (in time, objective function value, constraint violation, etc.)

In comparison tests you can determine the "optimal solver", i.e., the best solution among all algorithms you have tested, and you can compare your results against this optimal solver (you should, however, make sure that at least one of the state-of-the-art solvers (CMA-ES, Baron, etc.) is among the solvers you compare. For that you might want to look at the comparison tests by Nick Sahinidis and Nikolaus Hansen.

From all these differents ways to assess an algorithm, you can probably understand my question about the application. Depending on your application one or the other of the above measures is appropriate. If you are developing an algorithm for very expensive objective functions (i.e. huge simulations, measurements, etc.) you will, e.g., be comparing the number of function evaluations rather than solution time, because in the real application the time for the function evaluation will eventually dominate the overall solution time, completely in contrast to the situation when your function evaluation is rather cheap.

In real-time applications, it might be interesting to measure how long an algorithm takes to find the first feasible point, or how good the point is after 10ms, because in the real application the algorithm will always be asked for its result after 10ms.

If the algorithm is used as the heuristic starting phase of a complete (i.e. deterministic) global optimization solver (i.e. branch&bound-like) it will be interesting in what percentage of problems it actually finds the global optimum.

There are several more aspects that need to be considered for a good comparison test. Taking a close look at already existing comparison tests might provide additional information (http://coco.gforge.inria.fr/doku.php?id=bbob-2010, https://plus.google.com/photos/101835671426479336232/albums?banner=pwa, http://archimedes.cheme.cmu.edu/?q=dfocomp, etc.)

Luis Martí

You have two issues combined here:

1. Measure performance. You must decide what performance means for you (convergence, CPU time, robustness, etc.) and measure it.

2. Comparing performance of different algorithms. As we are dealing with heuristic and -as it derives from your question- stochastic algorithms, it is required to repeat the experiments a sufficient amount of times and make some statistical test to compare the results of the methods.

This topic has to do with the Design of Experiments (http://en.wikipedia.org/wiki/Design_of_experiments) area.

I think that these readings can be helpful to clarify these points:

Cohen, P. R. (1995). Empirical methods for artificial intelligence (Vol. 139). Cambridge: MIT press.
Bartz-Beielstein, Thomas (2006) Experimental Research in Evolutionary Computation: The New Experimentalism. Springer (http://link.springer.com/book/10.1007%2F3-540-32027-X)
Garcia, S., & Herrera, F. (2008). An Extension on “Statistical Comparisons of Classifiers over Multiple Data Sets” for all Pairwise Comparisons. Journal of Machine Learning Research, 9, 2677–2694. (http://www.jmlr.org/papers/v9/garcia08a.html)

Best,

Have machine learning techniques been used to guide heuristic search?

(Meta)Heuristic Algorithms as tools in Cryptography?

Which textbook would you recommend for lecturing a course in Artificial Intelligence?

In Global Optimization, what is the formal definition of a function with adequate/weak global structure?

How to proof the convergence (properties) of a metaheuristic algorithm?

Which measures are used to compare the performance of Multi-objective optimization (MOO) algorithms?

Which real-life optimization problems are multi-modal?

Which are the best derivative-free methods for unimodal optimization?

What are the differences between niching, crowding and sharing?

How do non-Euclidean search spaces affect global optimizers?

Feedback defines the constitution of an organism?

What is the difference between mathematical R^4 space and physical 4D unit space?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Hello Everyone ! I'm looking for a good journal to publish my manuscript with low publication cost?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?