GA and other Evolutionary Algorithms seems to be quite famous within the researchers from Computer Science and Management but researchers from Mathematics just don't like it. Is there any particular reason for that?
We know that with these algorithms we can't prove that our solution is optimal but still in most of the cases we end up with an optimal solution. Apart from that, these algorithms are very easy to implement and fast.
Actually, I don't think it is the case that mathematicians don't like evolutionary algorithms. I think what mathematicians really don't like is the way that some researchers claim to come up with "new" meta-heuristics every month or two, based on things like bats, fireflies and water droplets. Very good discussions of this phenomenon can be found in the following papers.
http://onlinelibrary.wiley.com/doi/10.1111/itor.12001/abstract
http://www.iztok-jr-fister.eu/static/publications/Stu2016.pdf
http://www.sciencedirect.com/science/article/pii/S221471601500010X
Hi, you are true as there is no mathematical proff of convergence for these methods the mathematicians involved in optimization theor do not appreciate these approaches.
Dear Mohammad Asim Nomani,
In my opinion, fashion for any heuristic solutions will soon pass since these solutions are not science but only craft. Suppose that there are two algorithms "A" and "B". Algorithm "A" produces a good solution A(D) for the intial data D, but this algorithm does not know anything about it. Algorithm "B" produces a bad solution B (D) >A (D), but this algorithm knows about it because it has a LOWER BOUND LB(D) and knows how bad this solution is as a measure 100% * (B(D) - LB(D) / LB(D), where LB(D) < A(D) < B(D).
Now I want to ask: what algorithm is valuable for science: "A" or "B" ? My answer would be "B". My opinion is the following: a solution without any evaluation of this solution IS WORTHLESS, whereas a solution with any evaluation of this solution is of practical interest.
Hi all,
Very interesting discussion. I do agree that it is related with the proof of convergence, which can be guaranteed if you apply for instance simulated annealing. However, from my point of view approaches in Engineering are completely different from Mathematics, and therefore many suboptimal solutions are sufficient good to tackle real problems with contraints and nonlinearities in many variables.
Dear Prof. Fedulov, thank you very much for your answer. I think what I wanted to know is quite clear and understandable now from your example.
Dear Prof. Guerra, I agree. And this different approaches or problem leads people from engineering to rely on the solution which can't be proven optimal.
But is it the case that these evolutionary algorithms are faster in case of highly nonlinear optimization problem?
Hai
The mathematical professors very concern about the laws and postulates. Whereas researchers from Computer Science and Management are concern about the evolutionary algorithms because the constraints are modified according to the user. So they feel it more comfortable because they do not prove these algorithms with laws.
A first start is to accept that not all mathematicians have problems accepting EAs. Yes there are a few, however there are also EA researchers who do not see why we still are using conventional methods, which is just as bad.
The second issue comes down to the structure of the problems being solved; not just whether the problems are of theoretical interest only, or are real problems. If the problem is unimodal and quite well behaved, then sometimes an analytical solution can be found, or classic gradient approaches work extremely well. An EA could solve the same unimodal problem,but rarely as efficiently (the few exceptions are often for very high dimensional problems where calculating the gradient is expensive). If the problem is multi-modal, then there is not a single solution and direct analytic problems cannot in general be found. A gradient based method will not guarantee finding the true optima either; many people forget that a random start gradient method is actually a meta-heuristic algorithm and therefore no different to an EA in terms of the arguments relating to the 'stochastic' content vs. the 'deterministic' content of the optimisation process.
The third issue is very contentious and relates to the 'reality' of convergence proofs. For the gradient methods, the proof of convergence is for UNIMODAL functions; there are no general proofs for MULTI-MODAL functions that do not require a second optmiser to be combined with the gradient method (e.g. a random restart etc). For simulated annealing, to have guaranteed convergence as per the proof, the cooling rate needs to be so slow as to be impractical in practice; therefore a realistic algorithm does not actually satisfy the proof. For an EA (this is where it gets *really* controversial) I have heard it mentioned that "the lack of conversion proof is more a statement of the current state of mathematics, than a fault with the EA concept itself"; I do not fully agree with the statement, but it does make one think! Being realistic, there was a time when a proof for simulated annealing did not exist either, so a proof for EAs may well exist and therefore make them just as acceptable as gradient approaches; it is just we have not found it yet.
In general, the optimiser needs to be matched to the problem. Analytic solutions and classic gradient methods should always be considered first as if they work, they will often be the fastest and most accurate. Then meta-heuristics such as combinations of small EA+gradient should be considered, and finally should a full EA solution be tried when all other optimisers have failed. Without a proof of convergence (or at least something close to it), applying an EA is a black-art and fraught with difficulties in practice. The other alternative is often to use the concept of "if you have a problem that is hard to solve, try changing the problem"; the concept is used routinely in EAs where different objective functions or chromosome structures are used. In classic optimisation, solving a least squares solution may be the most common, but is not always the only approach that can be mathematically useful.
Dear Prof. Nomani,
We would like to add based on our Millenium Prize solution of the P vs. NP problem and the invention of Mallick-Hamburger-Mallick D Branes String Functor Algebra Calculus (Mallick (2016, 2017 (submitted)), may we humbly, that both Genetic Algorithms and Particle Swarm Optimisation are as we have all Scientific Theoretically, Mathematically and experimentally discovered are bound by certain Natural Laws of Systems (Computer Science) and String Theory (above references) as well as the Mathematical Computation Methods like Newton Raphson Method or Trapezoidal Method of Numerical Analysis and therefore at least in practice is both Mathematical as well as scientific and as we have discovered are conformal with Field Theory logic and trigger flow of information and energy hence are Relativistic. Sorry for overstating the answer but thought you and the other readers may be interested.
Soumitra K. Mallick
for Soumitra K Mallick, Nick Hmaburger, Sandipan Mallick
A confident guider is better than many random guiders. Because, it can save both time and energy to get desired goal. Something similar philosophy is their on preferring analytical approach over heuristics.
with best regards
Let me add my 2 cents from the aerodynamics field viewpoint, which heavily uses optimization technique to find the optimum shape of aerodynamic bodies.
Aerodynamic optimization is highly expensive, which hinders the use of metaheuristics for solving such cases except if you have the access to supercomputers. Luckily for aerodynamicists, they have the access to gradient information that can be obtained from the adjoint solver, with a cost roughly equals to one function evaluation for an arbitrary number of variables. When you have gradient, then you can mathematically check whether your solution is already optimum or not; by using gradient-based algorithm, convergence will also be very fast! Here, you can see than some researchers devoted their time so that they are able to develop an efficient gradient computation method. They will also be happy when they know that their optimizer reaches the global optimum, since it is mathematically makes sense. Also, most of them are mathematically trained, hence, their approach is also mathematical.
Now, you can use metaheuristics to solve such cases. But the problem is, they are too expensive and they cannot guarantee the optimality! It is much better if they use multi-start gradient-based search to discover global optimum than metaheuristics (please see this reference: http://www.tandfonline.com/doi/abs/10.3166/remn.17.103-126?needAccess=true&journalCode=tecm20)
There, you see that people in aerodynamics frequently uses gradient-based algorithms due to their practical (faster convergence) and mathematical advantages.
[Edited for multiple posts]
I would like to add my 2 cents on the discussion. Gradient based deterministic methods not only are applicable to smooth problems only (ok, one can use subgradients etc if he is tough enough), but as already said they converge only to the local solution closest to the initial guess. Gradient free deterministic methods require to evaluate an exponential number of possibilities, and if the problem has many variables this approach can simply be considered as intractable. Some problems also simply cannot be dealt with gradient based methods because have a discrete nature, and for those problems it's very easy to have a combinatorial kind of complexity.
Evolutionary methods were born exactly to get around those difficulties, so there IS a point in their existence and usage. Moreover, there IS a theorem of convergence for Evolutionary Algorithms in General search spaces (Rudolph, 1996), so there is some strong mathematical justification also for those.
From a practical point of view, a limitation of ALL algorithms (evolutionary and gradient based!) is the absence of an upper bound on the number of iterations required: all methods converge in a finite but arbitrarily large number of iterations. If time to solution is an issue, this aspect has to be carefully considered.
A final consideration: optimisation is a wonderfully vast and rich field, and one has to choose the right tools for the given problem and means available, there's no silver bullet, thus it's not for everyone. I had the honour of participating in an extremely tough optimisation competition, the GTOC, where each team has only 1 month to solve a monstrously complex space trajectory optimisation problem (seriosuly, Google for it). The only way to tackle those problems is with a mixture of gradient and evolutionary methods, a very smart approach to the problem and a lot of insight. Teams that didn't use both approaches either couldn't return a feasible solution or returned rather poor ones. Teams that used a mixture of approaches (our included) managed to return feasible and very good solutions.
@Gennady Fedulov: knowing the lower bound of a function is extremely useful and can be used also in evolutionary methods.
In most cases the knowledge of a lower bound is problem dependant, but for some problems you can refine it iteratively with an algorithm (like with Lipschitz optimisation or Branch and Bound). Those methods usually tend to perform a dense sampling of the search space, so the number of function evaluations can grow extremely fast.
The fact that some algorithms are just "craft" and ohters are "science" is mostly a matter of theoretical refinement, in my opinion. There's some very serious work going on by many scientists in that field, mathematicians included.
Finally, the judjement of a solution is not something that an algorithm can really provide, so only a field expert can tell if a solution is "worthless". An algorithm can converge to a provably optimal solution, but it can't tell if the problem was posed in a meaningful way or not: only a competent person can perform that kind of judgement on the solution.
Actually, I don't think it is the case that mathematicians don't like evolutionary algorithms. I think what mathematicians really don't like is the way that some researchers claim to come up with "new" meta-heuristics every month or two, based on things like bats, fireflies and water droplets. Very good discussions of this phenomenon can be found in the following papers.
http://onlinelibrary.wiley.com/doi/10.1111/itor.12001/abstract
http://www.iztok-jr-fister.eu/static/publications/Stu2016.pdf
http://www.sciencedirect.com/science/article/pii/S221471601500010X
Dear friend
Actually it is branch of applied mathematics. So
We Can Not Say Mathematian don't like it. So Researchers Of pure mathematics field don't like it. But Mathematian of applied field have great interest in it.
Dear friend
Actually it is branch of applied mathematics. So
We Can Not Say Mathematian don't like it. So Researchers Of pure mathematics field don't like it. But Mathematian of applied field have great interest in it.
Dear friend
Actually it is branch of applied mathematics. So
We Can Not Say Mathematian don't like it. So Researchers Of pure mathematics field don't like it. But Mathematian of applied field have great interest in it.
Dear friend
Actually it is branch of applied mathematics. So
Dear friend
Actually it is branch of applied mathematics. So
We Can Not Say Mathematian don't like it. So Researchers Of pure mathematics field don't like it. But Mathematian of applied field have great interest in it.
Dear He Yichao,
I completely agree with your phrase "Because GA and other evolutionary algorithms are randomized algorithms and lack of rigorous mathematical theory, most of the work of mathematicians rely on rigorous deductive reasoning, which is not consistent with randomness, so they are not very acceptable".
At the same time, I want to note that there is a very beautiful science "Probability Theory", which is very much consistent with randomness. As to me, I really very like this theory.
Ok,
since there appears to be some kind of agreement of the absence of rigorous mathematical proofs for Evolutionary algorithms, I'm linking to one such proof I was mentioning
http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=542332
The paper shows a theoretical proof of global convergence for any kind of evolutionary algorithm on any search space, provided the algorithm has some features. The demonstration uses Probability Theory and Markov Chains, and it in my opinion very deep and very simple at the same time.
From practical point of view, as I already said, this proof is unfortunately not telling us how far we are from the global optimum, so we don't have a rigorous stopping criterion. That's where the "art" comes into play.
Another interesting aspect is that, for single objective problems, almost all evolutionary algorithms I've seen so far are more or less contained into this one, so almost all "new" algorithms can be seen as rehashes of this general template.The main difference of performances we see between evolutionary algorithms then boils down to how some heuristics are efficient (tailored?) to solve a specific problem, but we already know from the No Free Lunch Theorem that they'll perform poorly on some other problems.
Note that I'm not saying that an algorithm must necessarily follow this template, only that if it does it is backed by rigorous proofs.
Mathematicians want formulas and rigorous demonstrations. And I can not say that they are not right, even if they sometimes exaggerate.
I think it's good and useful to use all the tools that help us to solve a problem. These instruments ( math and EC) should not eliminate each other.
Deep Learning/neural network/machine learning is similar to EA, like a black-box, they need iterations to get solutions.This method has been deemed as a very efficient and practical tool for the research area in AI, especially in the BIG DATA era. So far, I have not heard mathematicians don;t like it. What is your opinion about the tools mentioned above compare to EA?
Hi Sf Chien,
Artificial neural networks (and therefore DNNs) are simply general-purpose mapping functions that map points in one input space over to another space, often with a different dimensionality between the input and output: they are not optimisers in their own right. The nets have many free parameters that must be set in order to generate a specific mapping action. The structure of the nets with their hidden layers and non-linear elements allows for a very rich range of mappings to be created; the ability to handle extreme non-linearities, such as a softmax layer on the output, also makes the structures very useful for classification applications.
To determine useful values for the free parameters, an optimiser is needed; if the net is trained with supervised learning then the optimisation routine is explicit, but even self-organising systems use an optimisation step to adjust the weights. The vast majority of ANNs use a gradient descent optimisation routine of some form; back-propagation being a common example. The optimisation problem for ANNs is not only multi-modal, but is also multi-global and is therefore not trivial (i.e. swap over two neurons in the same layer and you get exactly the same net behaviour, but a different vector of weights to describe the network, leading to multiple global optima). Meta-heuristics, such as back-propagation with a random restart or back-propagation with momentum terms and random restart, are practical ways to train the network, however with methods based on propagating gradient information backwards, there is a real problem with the loss of gradient information as we work backwards from the output nodes, leading to great difficulty in training the early layers; the DNN techniques such as rectified linear activation functions have improved this, but not solved it.
The other problem is determining the net structure. The more weights there are to optimise then the larger the optimisation decision variable space dimensionality and the more difficult the optimisation problem. The DNN approach is primarily to use convolutional nodes to allow weight sharing and then pooling to reduce the dimensionality further. Other methods try to build up the network complexity over time, such as with NEAT etc.
Overall, Artificial neural networks are a really good example of an application where meta-heuristics, such as random restart back-propagation, are the normal approach for producing very useful optimisation results.
Dear Evan Huges,
thank you for your answer. I'm not an expert in ANN/Deep Learning, but you confirmed a personal point of view about them.
They are a very good and flexible way to generate a surrogate model, without having to explicitly define the structural function of the model.
The "learning" aspect is just an optimisation of those weights to maximise the matching of the model wrt given data, and this optimisation can be performed in several ways.
All in all, it seems to me that ANN is basically a rebranding of something we could already do decades (if not centuries) ago: regression.
Obviously, the incredible results we're getting with ANN are undeniable, but I suspect it is more a matter of having a large scale surrogate model and the computational power to train it, than some intrinsic feature of ANN itself.
Hi Lorenzo,
Yes a very simple 1 input, 1 output linear net (i.e. a single neuron) could be y=mx+c and linear regression does indeed often provide a very good solution. With many neurons in hidden layers and careful choice of the non-linearities, an ANN can be so versatile and useful, and when coupled with a good optimiser, can provide for fantastic surrogate models.
The ANN structure is also an interesting example of processes that can be optimised well with combinations of gradient methods, and also evolutionary approaches in particular. For example, instead of using the most basic meta-heuristic of a random restart for initialising back-propagation runs, a simple evolutionary programme is quite often far more effective for deciding where to start each back-propagation attempt, allowing many of the local optima to be avoided.
The Neuro-Evolution methods (i.e. NEAT etc.) can allow excellent network structures to be optimised; network structure optimisation is not a task that is well suited at all to classic gradient methods alone.
There are also interesting uses of co-evolution for optimising neural nets, in particular for developing game-playing engines. Here there is no directly quantifiable error function; all we can say is "net A plays better than net B". Despite no good gradient data being available, the co-evolutionary methods can work with the weak information to allow something useful to be achieved.
Although evolutionary and other stochastic methods can train and design neural nets, if gradient information is available and a well-formulated gradient-based algorithm can be found that addresses the design problem, then the gradient method would usually do better. For example, try back-propagation to train a fairly simple network on some training data, then try an EA to do the same job. Conversely, try gradient methods for deciding on how many neurons to use and in how many layers.
Hi Sf Chien,
The terms 'conventional' and 'unconventional' are not appropriate, and ANN/Deep learning and EAs are incomparable too as they do not do the same job: ANN/DNN are mapping functions that rely on using an optimiser; an EA is an optimiser.
ANN/DNN are (in general) highly non-linear parametric modelling functions, in comparison to other parametric models, such as Spline surfaces or Fourier Series, which are well suited to modelling specific types of data sets. Therefore saying ANN/DNN are non-linear general purpose modelling tools is appropriate, but they are not optimisers.
One of the best ways of comparing EAs to classical optimisation algorithms is to consider whether the gradient magnitude and direction are used explicitly or implicitly. In a gradient descent algorithm, we calculate the gradient explicitly at the current sample of interest, and then generate the next sample based on the gradient direction and magnitude.
An EA can have multiple uses of 'gradient' information. In the population management process that is common to all EAs, the gradient is used implicitly; i.e. gradient is not actually calculated and an only an approximate gradient direction is generated. For example, an EA may use a logical comparison to either rank solutions to allow the best solutions to be selected, or does pair-wise comparisons to decide whether a solution should replace its parent etc. The second common use of implicit gradient information may be in crossover operations; selection of parents may be based on their performance, and in techniques such as Differential Evolution, difference vectors are generated between multiple parents and used to create new trial points.
Therefore the terms 'explicit gradient' and 'implicit gradient' are pretty good at an approximate classification of the more classical optimisers and the EA approaches.
We must be very careful however though as for multi-modal optimisation, we must re-start a gradient-based optimser. Therefore we fundamentally have a meta-heuristic that has say a random search to start the gradient optimiser; we are using explicit gradient to find the local peaks, then implicit gradient information to select the best overall solution after our random search process.
A random search is probably the oldest optimiser (i.e. also therefore most conventional), followed by a 'trial and error' approach of trying a new version of the thing being optimsied based on our last attempt. The random search of N points is equivalent to an Evolutionary Programme with a population of N and 1 generation, and the 'trial and error' could be modelled by an EP with a population of 1 and N generations. Therefore making comparisons of algorithms is not a simple process.
Visit link for Matlab Code
https://youtu.be/OQ3T575sbI8
http://dg-algorithm.blogspot.com/
I think they simply want mathematical proof about the claim of the method.
A short answer would be that metaheuristics don't perform well in benchmarks, compared to other methods. (I agree that the issue of proofs is a bit contentious.)
This is especially true for small function evaluation budgets (not thousands or tens of thousands, but hundreds), which is relevant for costly scenarios such as CFD or other simulations. If function evaluation budgets are very large, algorithmic performance matters less (naturally).
See my own work (e.g., Article Model-based Optimization for Architectural Design—Optimizing...
), as well as black-box optimization competitions, as well as Rios LM, Sahinidis NV (2013) Derivative-free optimization: a review of algorithms and comparison of software implementations. Journal of Global Optimization 56:3, pp 1247–1293.May be in the past, is was in general a nonconformity of metaheuristics and mathematics. But, I see that there are a number works which interests on the use of mathematics to analyse for instance the convergence and the behaviour of metaheuristics.
Not to mention Flying Elephants
Article Editor’s Note on the MIC 2013 Special Issue of the Journal o...
This is a very interesting topic, similar to : why control engineers do not like to use fuzzy logic or neural networks? Mathmaticians and Engineers are educated to implement algorithms for which they can demonstrate/certify some properties (local convergence for example or robustness) which enhable a level of confidence and trust. Similarly control engineers request to determine features like stability margins and robustness. Many algorithms can be implemented that can be successfull in some specific contexts but nobody is capable to demonstrate any general feature, you are requested to trust them only on the basis of the success in some application. This is against our culture and should be done carefully, in certain conditions these algorithms could produce wrong solutions and in critical applications they could be extremely dangerous. I am a user of genetic algorithms, simple and fast to use, but I would never sell a solution using them, because I cannot certify the results.
In fact, there is already a verdict regarding Evolutionary Algorithms from https://www.researchgate.net/post/Can_anyone_tell_me_the_formula_to_compute_average_standard_deviation_from_optimal_solution_for_the_RCPSP_problem_using_metaheuristics
Michael Patriksson: "In fact - within a few days I will give an oral presentation about my experience of research work since 1988. In my presentation I will provide a series of examples of fraudelent science - or pseudo-science, among which I sometimes count metaheuristics. they promise so much, and delivers so meagerly".
Therefore, no need to pay any attention to these algorithms, as this is the way to nowhere.