In evolution strategies the normal (gaussian) distribution is generally regarded as the best choice for mutation. I've often seen authors cite the original work by Schwefel and others, and settling with the idea that gaussian distributions are better.
I have seen a lot less attention paid to using other distributions, and my first ideas for improving and/or benchmarking my own implementations of ES revolve around changing the distribution. I believe there are some strong theoretical reasons for using a normal distribution, but I have not found an explicit proof or insight in this fact.
Is this true, are there sound theoretical advantages for using a gaussian distribution, such that all research with different distributions in ES is nonsense? Or is it more like a rule of thumb, that in the absence of prior knowledge the normal distribution is expected to be the best?