I have 18 input features for a prediction network, so how many hidden layers should I take and what number of nodes are there in those hidden layers? Is there any formula for deciding this, or it is trial and error?
You need to use Cross-validation to test the accuracy on the test set. The optimal number of hidden units could easily be smaller than the number of inputs, there is no rule like multiply the number of inputs with N... If you have a lot of training examples, you can use multiple hidden units, but sometimes just 2 hidden units works best with little data. Usually people use one hidden layer for simple tasks, but nowadays research in deep neural network architectures show that many hidden layers can be fruitful for difficult object, handwritten character, and face recognition problems.
The introduction of hidden layer(s) makes it possible for the network to exhibit non-linear behavior. I do not know the nature of the problem. You also have to decide if you expect your network to learn your training set to perfection, or if you are content with a e.g. 95% performance. In order to secure the ability of the network to generalize the number of nodes has to be kept as low as possible. If you have a large excess of nodes, you network becomes a memory bank that can recall the training set to perfection, but does not perform well on samples that was not part of the training set.
Generally 2 hidden layers will enable the network to model any arbitrary function. Check out this URL:
http://www.heatonresearch.com/node/707
But you may want to optimise the number of layers and nodes etc. Network growth and pruning algorithms have been around for a long time. You can also try using a genetic algorithm to define the network structure.
@David : Can you please explain in detail how to define network using genetic algorithm?(or any book references)I have little bit knowledge about genetic algorithm.
You need to use Cross-validation to test the accuracy on the test set. The optimal number of hidden units could easily be smaller than the number of inputs, there is no rule like multiply the number of inputs with N... If you have a lot of training examples, you can use multiple hidden units, but sometimes just 2 hidden units works best with little data. Usually people use one hidden layer for simple tasks, but nowadays research in deep neural network architectures show that many hidden layers can be fruitful for difficult object, handwritten character, and face recognition problems.
I am agree with Wiering, there is no rule of thumb to find out how many hidden layers you need. In many cases one hidden layer works well, but in order to justify this for a specific problem, you have to apply a heuristic method such as cross validation. Using cross validation you divide your data in two parts namely training set and validation set (also called test set).
You use the training set for training your network, and the validation set to identify how well you neural network performed. To do this you need to predict the labels of your validation set.
In order to minimize the effect of sampling you do this more than once for example using five-fold cross validation you do it five times and then you look into the results you can now take an average of your results. By results I mean a performance measurement or more than one performance measurement such as specificity, sensitivity, MCC, misclassification rate,....
This a commonly used method to answer your question :
How many hidden layers do I need?
What is the best learning rate?
..........
I know that there is a very good implementation of cross validation and neural networks in R , the package called CMA. But if you implemented your own brand new neural network it is a good idea to also implement some kind of cross validation program.
How many hidden layers should I use? : http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-9.html (mirror: http://francky.me/aifaq/FAQ-comp.ai.neural-net.pdf)
How many hidden units should I use? : http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-10.html (mirror: http://francky.me/aifaq/FAQ-comp.ai.neural-net.pdf)
What is genetic algorithm? : https://www.researchgate.net/post/What_is_genetic_algorithm1
Before you start implementing genetic algorithms to optimize the topology of your neural net, you should first find out, if a neural network is appropriate for solving your problem. You mention that you have a prediction problem with 18 inputs. I recommend you to start with some simulation tool such as the Rapidminer and design an experiment comparing the generalization performance (averaged test error) of several algorithms. Start with weak learners (Linear regression or Logistic regression in case you have classification problems), then you can proceed and experiment with the neural net of increasing capacity.
(no of inputs + no of outputs)^0.5 + (1 to 10). to fix the constant value (last part, 0 to 10), use trial and error and find the optimal no of hidden layer neurons for the min MSE.
I think it depends to number of features(neurons in input layer). Higher number of hidden layers increase order of weights and it helps to make a higher order decision boundary.
A NN with N hidden layer can make a (N+1) order decision boundary.
Example: A perceptron without a hidden layer(N=0) can only draw a (0+1=1) first order decision boundary.
A multi layer perceptron with a hidden layer(N=1) is capable to draw a (1+1=2) second or fewer order decision boundary.
So I believe an MLP with N hidden layers surely can solve your (N-1) feature problem.
The upper bound on the number of hidden neurons that won't result in over-fitting is:
Nh=Ns(α∗(Ni+No))
Ni = number of input neurons.
No = number of output neurons.
Ns = number of samples in training data set.
α = an arbitrary scaling factor usually 2-10.
Others recommend setting alphaalpha to a value between 5 and 10, but I find a value of 2 will often work without overfitting. As explained by this excellent N Design text, you want to limit the number of free parameters in your model (its degree or number of nonzero weights) to a small portion of the degrees of freedom in your data. The degrees of freedom in your data is the number samples * degrees of freedom (dimensions) in each sample or Ns∗(Ni+No) (assuming they're all independent). So α is a way to indicate how general you want your model to be, or how much you want to prevent overfitting.
There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following:
1. The number of hidden neurons should be between the size of the input layer and the size of the output layer.
2. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer.
3. The number of hidden neurons should be less than twice the size of the input layer.
These three rules provide a starting point for you to consider. Ultimately, the selection of an architecture for your neural network will come down to trial and error. But what exactly is meant by trial and error? You do not want to start throwing random numbers of layers and neurons at your network. To do so would be very time consuming. Chapter 8, “Pruning a Neural Network” will explore various ways to determine an optimal structure for a neural network.
I think the normalization change the input vector to anther one so what the guarantee that the produced ANN will maps all inputs to their desired ones?
I think we should make another change because the normalization change is not a feature in the ANN but it is a vector feature.
I think the produced ANN will map the normalized input to the desired one only if it is an element of the training set
The size of the hidden layer is normally between the size of the input and output-.It should be should be 2/3 the size of the input layerplus the size of the o/p layer The number of hidden neurons should be less than twice the size of the input layer.
Just trial and errors because the the performance of deep neural networks are depending on the data structure.
Or you can find the number of layer and neurons by using the global optimization algorithm such as particle swarm, simulated annealing, patternsearch, bayesian optimization, and etc, to minimize the validation errors
Third method is a type of rule of thumbs called geometric pyramid rule which can determine the number of neurons in each hidden layer
" From Introduction to Neural Networks for Java (second edition) by Jeff Heaton - preview freely available at Google Books and previously at author's website:
The Number of Hidden Layers There are really two decisions that must be made regarding the hidden layers: how many hidden layers to actually have in the neural network and how many neurons will be in each of these layers. We will first examine how to determine the number of hidden layers to use with the neural network. Problems that require two hidden layers are rarely encountered. However, neural networks with two hidden layers can represent functions with any kind of shape. There is currently no theoretical reason to use neural networks with any more than two hidden layers. In fact, for many practical problems, there is no reason to use any more than one hidden layer.
The Number of Neurons in the Hidden Layers Deciding the number of neurons in the hidden layers is a very important part of deciding your overall neural network architecture. Though these layers do not directly interact with the external environment, they have a tremendous influence on the final output. Both the number of hidden layers and the number of neurons in each of these hidden layers must be carefully considered. Using too few neurons in the hidden layers will result in something called underfitting. Underfitting occurs when there are too few neurons in the hidden layers to adequately detect the signals in a complicated data set. Using too many neurons in the hidden layers can result in several problems. First, too many neurons in the hidden layers may result in overfitting. Overfitting occurs when the neural network has so much information processing capacity that the limited amount of information contained in the training set is not enough to train all of the neurons in the hidden layers. A second problem can occur even when the training data is sufficient. An inordinately large number of neurons in the hidden layers can increase the time it takes to train the network. The amount of training time can increase to the point that it is impossible to adequately train the neural network. Obviously, some compromise must be reached between too many and too few neurons in the hidden layers. There are many rule-of-thumb methods for determining the correct number of neurons to use in the hidden layers, such as the following: The number of hidden neurons should be between the size of the input layer and the size of the output layer. The number of hidden neurons should be 2/3 the size of the input layer, plus the size of the output layer. The number of hidden neurons should be less than twice the size of the input layer.These three rules provide a starting point for you to consider. Ultimately, the selection of an architecture for your neural network will come down to trial and error. But what exactly is meant by trial and error? You do not want to start throwing random numbers of layers and neurons at your network. To do so would be very time consuming. Chapter 8, “Pruning a Neural Network” will explore various ways to determine an optimal structure for a neural network."
-- jj
I quoted from this link, so you can just follow this link for better understanding,
Flores, Juan J., Mario Graff, and Hector Rodriguez. "Evolutive design of ARMA and ANN models for time series forecasting." Renewable Energy 44 (2012): 225-230.
A few years back it was believed that one layer was capable of equivalent modeling capabilities that any number of layers. That is true, but the number of neurons grow rapidly with one layer. The trend nowadays is to design ANN with more hidden layers (therefore deep ANNs).
The number of neurons in the hidden layer is selected based
on the following formula (no of inputs + no of outputs)^0.5 + (1 to 10). to fix the constant value (last part, 0 to 10), use trial and error and find the optimal no of hidden layer neurons for the min Mean Square Error.
@Ines Abdeljaoued do you have a reference for that, did you mean this paper Sheela, K. G., & Deepa, S. N. (2013). Review on methods to fix number of hidden neurons in neural networks. Mathematical Problems in Engineering, 2013.
Yet, you can implement a nested for-loop, one for the number of hidden layers and the other for the number of nodes among each layer. Then you record the error for each topology and select the one that has the minimum error.