I'm looking for ideas to design a search heuristic which from a given initial solution can converge as fast as possible to the nearest (local) optima. By "fast" I mean using as few function evaluations as possible.
However it depends on your work, data, ... but generally I can say, genetic algorithm can be appropriate especially if you are not professional in writing your own algorithm because you can use MATLAB toolbox to find the best solution.
I would rather guess that - provided the objective function is Lipschitz continuous - you will fare better by utilizing an interpolating method, in tandem with a local search method for the model function that you produce. A global method utilizing this approach is found, for example, here:
Not knowing what type of problem you have (for example, how expensive the objective function actually is to compute), this is as detailed as I can get at the moment.
Adding to the Genetic Algorithms response above... I would say Differential Evolution is much simpler to implement and better in practice for continuous problems. (http://en.wikipedia.org/wiki/Differential_evolution) The problem of few evaluations is really open ended... How many are you thinking about? Tens? A simple hill climber would perhaps be best. Hundreds? Some specialized hessian approximation could do that. Thousands? Differential evolution or another evolutionary algorithm.
Genetic Algorithms and all other meta-heuristic methods mentioned above are totally off-the-mark, as they are NOT methods for finding a nearby local optimum. By nature, all meta-heuristics mentioned (GA, DE, Imperialist Competitive Algorithms etc.) are methods that don't even bother to use an initial starting point that you may want to provide, and instead generate their own populations of solutions and start "exploring" the search space, with a mission to find a good point. Special-purpose meta-heuristics may be devised to perform some kind of "local search" instead, constraining their search space, but they will no longer be "black box": you will have to implement them yourself.
If you want a really fast method for locating your nearest optimum (or some other close-by optimum), you should try one of the following:
Assuming your objective function is differentiable, (preferably, with uniformly continuous derivatives, so some nice theorems with impressive names, such as the "global convergence theorem" apply):
1. Approximate line-search using Armijo rule for step-size determination or bracketing-sectioning schemes proposed by Fletcher and others
2. Trust-region methods
3. Conjugate-gradient-based methods
All of the above, are guaranteed to take you to a close-by local optimum extremely fast (also in terms of function evaluations). If you haven't heard of these terms before, you really need to start reading a good introduction to "nonlinear programming algorithms" (not meta-heuristics).
If your objective function is not differentiable, you may try at first
1. simple coordinate descent methods (if you have relatively few dimensions), or
2. bundle methods as the best alternative otherwise.
I disagree... Metaheuristics can very well be initialized with prior information, and while typically they are good for global optimization, they can be used for "local" optimization. It all depends on the scale of local and global. Nothing guarantees global optimization except for exhaustive search.
The point of metaheuristics is that they are really black-box, so no differentiability is assumed. Several ready-to-use libraries offer the ability to specify the search space thus obviating the above comment.
The question as stated is severely underspecified and no good answer exists.
Sonia: nice query - there is a really interesting literature now on exactly this question of carrying out function optimization in a way that is as "sample-efficient" as possible, which is what you want to do in any case for which the function evaluations are expensive. This really starts from Jones 1998 (look for "Efficient Global Optimization of Expensive Black-box functions", which has been cited about 1500 times now. There's a recent tutorial that looks quite good here: http://arxiv.org/abs/1012.2599
(My own puny effort in this direction is at http://tinyurl.com/lo5ruoh).
I agree with the comment that GAs and other metaheuristics are off the mark, but for a more fundamental reason. Any kind of meta-heuristic or evolutionary approach throws samples all over the place by its nature. Although doing this may be useful in scenarios where the cost of function evaluation is very very low, you're looking for something smarter. I'm quite surprised to see people actually recommending metaheuristics and evolutionary algorithms here, as they're precisely what you DON'T want to do when evaluations are expensive, it seems to me! The more you "pay" per evaluation, the more you want to build (learn) and use a model of the search space that represents your uncertainty about where the maxima might be, rather than just combining old solutions willy-nilly via heuristics, meta or not!
I suggest to use gradient descent techniques. If the number of parameters of your model is not large, you can use as suggested by Ioannis batch second order method such as conjugate gradient or levenberg marquardt. If your model is large, you can use stochastic grandient methods which are very robust and very fast. See for instance the paper of Bottou & Bousquet "The tradeoffs of Large Scale Learning".
Sonia's question was about finding the nearest local minimum as fast as possible. Do you really insist that the best methods for local optimization are GAs (or EAs etc.)???
1. I remember attending the Math. Prog. symposium in Ann Arbor back in 1995, where John Holland himself stated in the keynote lecture that GAs are only good to take the search to a "good neighborhood", and from that point, a local search would be needed. Do you have any data to the contrary? if so, please send us some problem definitions for which GAs (any library that does NOT use some local search step inside), are better than, say, a CG method or a TR (Trust Region, aka Restricted Step method). On the other hand, I do have several functions for which traditional GAs have significant problems locating a good solution, whereas CG or other methods can find the optimum very quickly: try the TRID function (convex quadratic) defined as follows:
2. What standard mechanisms in a GA (x-over, mutation, inversion etc.) would guarantee taking you to a point satisfying KKT conditions?
3. The argument in favor of GAs because of the mention of "black-box optimization" is not quite valid, because the question already specifies that we are concerned with "continuous optimization", and therefore, either conjugate-gradient, or at least bundle methods for non-smooth optimization definitely apply.
In my experience, for "local optimization", the old methods of NLP are ALWAYS preferable to meta-heuristics. If you know of any systematic comparisons that show otherwise, please let me know...
Hi Everyone, thanks for all your answers and for the interesting discussion.
Allow me please to add some ideas, comments and doubts about my questions and your answers.
First, I want to recall that I'm going to work with "black-box functions", i.e. there is no information available about differentiability or continuity. In a beginning I will most likely work with the BBOB set of benchmark functions used in the GECCO conference.
It is true that I haven't provided much information about the exact nature of my "challenge", but I was looking for a more general debate before getting into the details :) My goal is to develop a hybrid algorithm for multi-modal optimization. On an early stage of the optimization process a metaheuristic should provide a set of initial solutions, which in theory will belong to attraction basins of different local optima. Therefore I need to develop a search technique capable of reaching the corresponding local optimum. Since the initial metaheuristic will consume a significant amount of the available function evaluations, and since there will be a "large" amount of initial solutions, this method needs to be as efficient as possible on its use of evaluations.
Thus, I agree with the comment that metaheuristics are not the best options. I believe that in general any methods which include mechanisms to promote exploration in order to escape local optima are not well suited for the task. I would rather like a local search method specifically designed for these conditions. Take into account that I may have an "initial guess" about how far (Euclidean distance) the corresponding optimum might be. Perhaps something a little bit more sophisticated than hill climbing...
@Ioannis In the interest of intelligent discourse I will not reply to your condescending tone, but your objective points.
Evolutionary algorithms have come a fair way since 1995. You cannot possibly suggest otherwise. Do I really insist that the best methods for local optimization are GAs? Nope, I never said such a thing! Read my original post! Strawman arguments and ad hoc attacks will not be tolerated.
Do I say that sometimes EAs might be a good quick and dirty solution to a local optimization problem with a black-box function in continuous space? Yes. Under some circumstances it might be good, even very good if you take into account the ability to use them out-of-the-box, no pun intended.
Also, the false dichotomy of global vs local search and how EAs are always global is just simplistic. It might be driven by good intuition or reassuring appeal, but true evolutionary processes are not global optimizers and, in fact, behave much more like local optimizers... it is a matter, again, of scale.
And if we must resort to anecdotal evidence, I should say that I have been in conferences in which researcher took very seriously adapting EAs to problems with very expensive objective function evaluations. Say, for instance, solving a problem when you can only evaluate the objective function 30 times. That could have been appropriate for this use-case. Then again it might not. Pretty much everything is problem dependent.
Sonia, I think the class of algorithms you look for belong to the class of memetic algorithms. There you have a strategy for global search and one for a local search. An example is (i believe) Variable Neighbourhood Search for which data on the BBOB benchmark exist. I think it was part of the first or second BBOB competition.
Further, finding the local optimum fast will strongly depend on your fitness function. Which in turn, is the reason why we have so many algorithms and discussions about what is good and what is not.
It is a very interesting discussion you have going on here. Although you have a precise idea of what your goals are, I agree with Steffen that results will ultimately depend on the fitness function... and other characteristics such as dimensionality, available function evaluations and actual distance to the optimum. So I guess you will have to test different algorithms and ideas and see which ones provide the best results.
I would suggest trying simplex methods such as Nelder-Mead. These methods are efficient in the use of evaluations and very good at unimodal problems (at least on standard dimensions). Since you are not attempting to escape local optima, but to find the nearest optimum, this may be a useful feature. And if you work with MATLAB the fminsearch() function is a very good implementation of Nelder-Mead.
My short experience with Nelder-Mead is reflected on this paper. The results there support my statements.
Just a couple of things came into my mind: first, are you after combinatorial optimization or continuous space optimization? Also, are you after a method to optimize something in real-world or theoretical benchmarks? If it is real-world, forget using a standard method (and by real-world I mean taken straight from real-world with real data and without simplification). If it is theoretical benchmarks (CECs, BBOBs, so on), if it is continuous space, and time complexity is not important for you, a safe choice is CMA-ES, or Luvenburg-Marquart with some restarts. If it is combinatorial, design your EA (with some mutation and maybe crossover), but ADD PROBLEM SPECIFIC KNOWLEDGE (such as fast heuristics to improve the solutions and so on). Also, be careful of local convergence property. There are not many methods that has been proven to be locally convergent (PSO in the standard form is not locally convergent). Also, if tour problems are separable, maybe direction-wise optimization (e.g. PSO) can be a good option, but if it is non-separable, CMA-ES is way better. Note, CMA-ES computational complexity is in O(d^2) which makes it slow in large-scale (d is the number of dimensions) while PSO is in O(d).
BTW: the assumption is you are not talking about constraints and noise (robustness). If constraints or noise are there, the story becomes different.
the notion that "Any kind of meta-heuristic or evolutionary approach throws samples all over the place by its nature." is quite misleading. Let me give a prominent example where I have some expertise.
Evolution Strategies have always been considered as stochastic _local_ search methods, in particular when the population size is small (say, not considerably larger than 10). They exhibit linear (i.e. geometrically fast) convergence to the optimum (on a large class of functions), with a rate of up to exp(-0.5/dimension) per evaluation, in practice we observe roughly exp(-0.1/dimension).
Throwing samples all over the place is a rather unsuitable search method unless in a 2- or 3-dimensional search space: in larger dimensions only small variances can realize feasible success probabilities and convergence rates. The optimal variance is typically proportional to 1 over dimension and proportional to the squared population size (for not too large population sizes). The latter naturally explains the importance of population size on local vs "global" behavior.
The above results are mainly a consequence of search space volume phenomena and not at all tightly connected to Evolution Strategies as a method.
@Ioannis,
which method is locally faster for sure depends on the function. On benign functions (like the above mentioned TRID), classical NLP methods (and/or modern "variants") are certainly the methods of choice. On (much) more nasty functions they might be very slow or just fail.
@Antonio,
the Nelder-Mead method is fast and robust in small dimensions (in particular when applied with a smart restart schedule), but breaks entirely down on many non-trivial problems even in moderate dimensions, say ten or twenty, because the simplex tends to collapse.
Hi, is your mentioned function convex or non-convex? For convex function optimization, gradient based methods work fine. Because you want to get the local optima, so newton method is good for you problem.
For non-convex function, there exists several family optimization methods. Like evolution algorithm works good in practice but no theoretical bounds are given. Like, tree search based algorithms (MCTS,SOO) provide loss bound. But these algorithms focus on global optima, may not suitable for your problem.
It is important to remark that convergence does not mean to find the optimum, It means that it finds a solution (with a high probability of being a local optima)