What is the best optimizer for an artificial neural network, if I want perfect overfitting on the training data and don't care about testing?

Mario Michael Krell @Mario_Michael_Krell

09 September 2017 3 10K Report

I am trying to get as close as possible with an evaluation to the theoretic results in "A Capacity Scaling Law for Artificial Neural Networks" by Gerald Friedland and myself. So far, I realized that L-BFGS does a very good job in the optimization to perfectly fit nets to the data (100% training accuracy). But is there a better (global) optimizer? It would be also good to know how well other optimizers can handle imbalance or where I could get an implementation. I am well aware that usually overfitting and testing on the training data is not desired but for this evaluation is is required.

Thanks in advance.

https://arxiv.org/abs/1708.06019

John C Cancilla

For the most commonly used type of supervised artificial neural network (ANN), the multilayer perceptron, if you want to reach a overfit model, you can design a network topology where the number of hidden neurons is high (leading to a high weight-to-sample value; typically undesired, but useful for your case), and avoid any kind of verification process.

Many ANN packages will automatically use 10-20% of data to verify the model (verification set) and avoid overfitting, stopping the training cycles or epochs early, when it locates an increase in the error for the estimation of this verification set for a series of training cycles (around 6 usually). Therefore, make sure the model is using 100% of your data as the training set. If you do this, the estimation error of your ANN should tend asymptotically to zero, and your training cycles can even be in the thousands (you can set an upper limit of training cycles to make it stop; maybe at around 10,000).

Note that the verification set that I mention is different to a test set, which is used to test the ANN once it has been trained. The verification set is used during the training process internally by the ANN to negate overfitting, reason why you should avoid it completely.

Hope this helps you and good luck. Let me know if you need help with anything else.

John

Mario Michael Krell

That's definitely helpful to know that I should switch of the internal validation process.

However, you solution approach does not work for my research question. I cannot increase the number of weights because I am looking for the minimum number of weights and the maximum number of samples to still get perfect training. Indeed, the objective is to experimentally verify those turning points, where a network is too small and cannot learn everything anymore.

John C Cancilla

Hmmm... You could try to heuristically test (trial-and-error) a wide window of hidden neurons and compare your statistical results. Also, you could vary the amount of training cycles (maybe in blocks of 100) to see at which point each network topology (or architecture) meets your requirements.

Feedback defines the constitution of an organism?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How to design human-centered classroom in the age of A.I.?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Measuring the Intelligence of a Species?

What's the role of IT & AI in Telecommunication Industry?

Can usage of AI tools like chat GPT in research work is recommendable ?