I am trying to get as close as possible with an evaluation to the theoretic results in "A Capacity Scaling Law for Artificial Neural Networks" by Gerald Friedland and myself. So far, I realized that L-BFGS does a very good job in the optimization to perfectly fit nets to the data (100% training accuracy). But is there a better (global) optimizer? It would be also good to know how well other optimizers can handle imbalance or where I could get an implementation. I am well aware that usually overfitting and testing on the training data is not desired but for this evaluation is is required.
Thanks in advance.
https://arxiv.org/abs/1708.06019