Thats an interesting question! First thing that came to my mind was the fact that N+1 people apply Genetic Algorithms to generate test data for other algorithms, questioning the correctness of the GA is therefore highly relevant.
Typically, I separate the problem into
Stage 1. Fitness calculation and crossover logics
These can be tested by normal unit/injection tests.
Stage 2. Testing of Monte-Carlo engine
1. Check induced distribution of random numbers: are they correlated? Injection testing is here more complex, because it must be done for the involved probability distributions.
2. Check the spectrum of the random numbers as well. Are selecting frequencies popping out or not?
3. Fix the seed and the whole random number series and rerun the analytical part multiple times. Does it produce the same results or is there a hidden randomness buried in the algorithmic procedure?
4. Try out typical stochastic scenarios, like Brownian Motion, which can be checked analytically.
Stage 3. Assembly test of GA
1. If possible, find analytically reachable regimes to verify the algorithmic prediction at least for boundaries.
2. Rerun algorithm with small additive stochastic perturbations to your parameters (similar to Lyapunov testing). This should give you an understanding of the stability of your results.
Different optimization functions have different properties (e.g., uni-modal, multi-modal, separable,...) and a set of function that covers all these properties is of interest. To my best knowledge, CEC benchmark functions that are presented at "congress on evolutionary computation" have such properties and are widely used to test new algorithms in the literature. You can download them at the following link.
Thank you very much for your answers, time and resources. The CEC benchmark functions as suggested by @Hojjat Rakhshani was particularly usefull!
I decided to test it on both Rastrigin’s Function and Ackleys fuction for now as these seems to be the most common bench mark for GAs. But I will consider others in the CEC set.