I have implemented an evolutionary algorithm where each gene is a floating point number representing a parameter in an inter-atomic potential. The goal is to generate parameter sets that can be used for MD simulations. The training is done by matching force predictions using the candidate individuals with that from quantum mechanical computations and the cost value is defined as the mean squared error of the vector difference between the reference and candidatate force predictions.

The algorithm is implemented as follows:

  • Starting guesses are drawn from normal distributions whose means and variances reflect expected variation of each parameter type (charge, force constants, equilibrium bonding distance etc.)
  • Costs are evaluated on a constant fraction of stochastically chosen training data points (usually ~10-50%) and tested on a different set of testing data, which can be seen in the plot below.
  • Parents are drawn from a softmax/Boltzmann distribution where high cost is exponentially suppressed in parent selection. An individual is allowed to be selected more than once in the parent population so that it has a higher probability of producing more offspring
  • Until a new population is filled, parents selected in the previous step are drawn uniformly to produce offspring using 2-point or 3-point cross-over. Both complements are used so that one cross-over event produces two offspring. Cross-over is applied for whole floating point number, i.e. not at random bit addresses.
  • After cross-over, mutation is performed on all genes in all individuals, i.e. all floating point values in the new generation. The mutation is in this case a small numeric perturbation of the value, proportional to the standard deviation of the initial population for each gene
  • In each generation, the fittest few individuals are propagated unchanged into the next generation (while also being eligible as parents)

What seems to happen in practice when we apply this algorithm, not for a simpler toy case for which it works fine (finding Fourier coefficients for a sawtooth function), but for the case described above of finding interaction parameters, is that the costs drop rapidly a few orders of magnitude and then plateauing at prediction errors on the order of 100% of the size of the forces. We have also observed that very rapidly, the linear bonding force constants (k in Hooke's law) decrease ~2 orders of magnitude, which we know is not reasonable. We guess that the whole population is rapidly sucked into a broad local minimum, where bond forces are almost turned off, and from which it is hard or impossible to reach physical solutions.

Some representative hyperparameters we have tried are:

Population size: 200-500

Number of parents: 500-100

Mutation amplitude: 1-10% of initial variation

Number of elites propagated directly to the next generation: 5-10

I would like to know if anyone can give me good suggestions for what to change in either algorithm or parameter choices to avoid the problems we are having.

I attached a semilog plot showing how the minimum cost value changes over time. As can be seen, it looks like a sum of three exponentials with very different slopes. We are not quite sure what to make of this, although it seems the orders of magnitude decrease of the bonding force constants is the main driver of the second slope, up to ~300 iterations.

Similar questions and discussions