How to modify my evolutionary algorithm to converge to good minima?

06 June 2018 3 2K Report

I have implemented an evolutionary algorithm where each gene is a floating point number representing a parameter in an inter-atomic potential. The goal is to generate parameter sets that can be used for MD simulations. The training is done by matching force predictions using the candidate individuals with that from quantum mechanical computations and the cost value is defined as the mean squared error of the vector difference between the reference and candidatate force predictions.

The algorithm is implemented as follows:

Starting guesses are drawn from normal distributions whose means and variances reflect expected variation of each parameter type (charge, force constants, equilibrium bonding distance etc.)
Costs are evaluated on a constant fraction of stochastically chosen training data points (usually ~10-50%) and tested on a different set of testing data, which can be seen in the plot below.
Parents are drawn from a softmax/Boltzmann distribution where high cost is exponentially suppressed in parent selection. An individual is allowed to be selected more than once in the parent population so that it has a higher probability of producing more offspring
Until a new population is filled, parents selected in the previous step are drawn uniformly to produce offspring using 2-point or 3-point cross-over. Both complements are used so that one cross-over event produces two offspring. Cross-over is applied for whole floating point number, i.e. not at random bit addresses.
After cross-over, mutation is performed on all genes in all individuals, i.e. all floating point values in the new generation. The mutation is in this case a small numeric perturbation of the value, proportional to the standard deviation of the initial population for each gene
In each generation, the fittest few individuals are propagated unchanged into the next generation (while also being eligible as parents)

What seems to happen in practice when we apply this algorithm, not for a simpler toy case for which it works fine (finding Fourier coefficients for a sawtooth function), but for the case described above of finding interaction parameters, is that the costs drop rapidly a few orders of magnitude and then plateauing at prediction errors on the order of 100% of the size of the forces. We have also observed that very rapidly, the linear bonding force constants (k in Hooke's law) decrease ~2 orders of magnitude, which we know is not reasonable. We guess that the whole population is rapidly sucked into a broad local minimum, where bond forces are almost turned off, and from which it is hard or impossible to reach physical solutions.

Some representative hyperparameters we have tried are:

Population size: 200-500

Number of parents: 500-100

Mutation amplitude: 1-10% of initial variation

Number of elites propagated directly to the next generation: 5-10

I would like to know if anyone can give me good suggestions for what to change in either algorithm or parameter choices to avoid the problems we are having.

I attached a semilog plot showing how the minimum cost value changes over time. As can be seen, it looks like a sum of three exponentials with very different slopes. We are not quite sure what to make of this, although it seems the orders of magnitude decrease of the bonding force constants is the main driver of the second slope, up to ~300 iterations.

Evgeni B. Starikov

Dear Rasmus,

1. In your case I would suggest looking into the so-called "particle swarm optimization" approach, which is a very nice possibility to succesfully look for the global optima, apart from the conventional genetic algorithms you are employing right now...

For a very nice introduction into this field see, e.g. here:

https://www12.informatik.uni-erlangen.de/edu/OC/common/pso-tutorial.pdf

2. Very fast simulated annealing (VFSARES) approach would most probably be another pertinent choice for your case:

Article VFSARES—a very fast simulated annealing FORTRAN program for ...

Respectfully yours,

Evgeni

Rasmus Andersson

Thanks for your answer Evgeni,

We are on a quite tight time frame here, so we are looking for minor modifications to our GA or our parameters, rather than completely different methods. There are of course a number of different methods suitable for our problem but we need to focus on the one we have invested time and effort into and hopefully make it work in a reasonable time frame so that we can get some publications out of it before looking at something different ;-)

None the less, thanks for your tips!

OK, I see the point...

Then, the papers and the references therein come to my mind, if you wish to anyway try modifying your genetic algorithm approach:

www.wacong.org/wac2006/allpapers/issci/issci_193.pdf

https://pdfs.semanticscholar.org/7a1b/102eca72f586760ec64e84f283f70f7407e8.pdf

https://www.sciencedirect.com/science/article/abs/pii/S0360835297001988

Article Convergence Criteria for Genetic Algorithms

https://link.springer.com/article/10.1134%2FS1054661806030084

http://www.acadiau.ca/~fmendivi/Papers/GGASub.pdf

... There are even much more suggestions, although, frankly speaking, I haven't used genetic algorithms myself...

How to increase simulation box size?

Stability of the Solar System: Insights from Einstein’s Equations ?

Are there instances where molecules with larger molecular weights exhibit greater mobility than those with smaller molecular weights?

For an in-vitro drug release study, what molecular weight cut-off (MWCO) dialysis bag is required for a 117 kDa protein?

How to use Density Functional Theory to calculate carrier mobilities of solid system?

How to start a Molecular Dynamics Simulation?

Which test should be used to study association among demographic profile and awarness level?

Which will be the best software for the Hydration shell analysis with molecular dynamics?

Can anyone provide me with molecular docking softwares/ websites?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?