Determining the network architecture is one of the most important and difficult tasks in ANN model development, what is the best way to fined the the optimum number of layers and the number of nodes in each of these nodes?
Hello Majid, I have been working with Neural Networks in the last few years. I am not an expert, but can tell you what I learned in my experience.
Also, in the link attached, you will find a good discussion about determining the number of layers and nodes in each layer of a neural network.
In summary, the discussion says that the choice of the number of hidden layers must be preceded by an understanding of the nature of the input and output variables of the model. If the data is linearly separable you will be able to get good results with no hidden layers, but this will, in many cases, won't be the case, because NN are usually used for more complex relationships between input and output. Besides this, in most cases one hidden layer will be enough (read through the link for more detail).
Respecting the number of nodes in the hidden layer, it is common practice to use a number that is between the number of input nodes and output nodes. There are a few recommended ways to optimize the number of nodes in the hidden layer, mainly, using a training and validation sets and comparing training error with validation error with different configurations until the minimum validation error is achieved.
There are a cup of information method to build the number of neurons automatically using kernel learning, see my recent publications please. Regards, Dr. Song
In order to solve this problem it is necessary to clearly define its terms. What we have at the input and at the output we expect? On what type of neural network, we focus: direct distribution network, a recurrent neural network? What should be the memory capacity of a neural network? In principle, a full mesh of (single or double) recurrent neural network, we can form a network of any configuration. One approach to finding the optimal structure of the neural network is considered in the article: Osipov V. Yu. Optimization of Associative Intelligent Systems / Mehatronika, avtomatizacija, upravlenie 03/2011.
Much of the formal theory concerning feedforward neural networks comes from the work of Kolmogorov and later that of Cybenko on Universal approximation theorem.
In practices, a single hidden layer is sufficient for any continuous function and two layers are sufficient for any computable function (continuous or not). Deep learning methods and convolution networks have shown us that additional hidden layers can generate more beneficial features within the layers of the network. Further to this, internal architectures such as receptive fields from one layer to the next provide useful inductive bias (overcomes scale, rotation, translation invariances) for problems such as object classification in images.
You can never use too many hidden nodes, provided that you ensure the model is not over-fit to the training data (by using a validation set, weight decay, or some other form of regularizer). The only cost is the increased training time.
There re basically two different approaches, control the layers and numbers of neurons. For recurrent single hidden layer, please see my Jordan network paper. For kernel method, the hidden layer has infinity dimension, the number of output layers can be sparcified, see my linear kernel papers. Both are uploaded at the research gate.
Your question is one of the first kinds of questions that were asked about neural networks when they began to be actively investigated in the late 1980's and early 1990's. You will find good answers to your questions by going back to papers from that timeframe, and by reading some classic books on neural networks.
One of the most classic papers in the field is Sejnowski's work:
◾Gorman, R. P. Sejnowski, T. J. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets, Neural Networks, 1, 75-89, 1988 (PDF: http://papers.cnl.salk.edu/PDFs/Analysis%20of%20Hidden%20Units%20in%20a%20Layered%20Network%20Trained%20to%20Classify%20Sonar%20Targets%201988-2996.pdf)
If you can get a hold of my earlier book, "Handbook of Neural Computing Applications" (Academic, 1990), take a look at the REFERENCES at the ends of Chapters 7 & 8 (Multilayer Feed-foward Networks, Parts I & II) and 15 (Configuring and Optimizing the Back-Propagation Network).
While this is going back in time a bit, you'll find that researchers from this timeframe were very interested in this question, and there are a lot of good papers from this era that will be useful for you.