For example, if we consider one hidden layer MLPs, the number of neurons in the hidden layer along with weights and biases can be defined as design variables of an optimization problems. Any Evolutionary algorithm (GA, PSO,...) can then be used to find the best combination of variables which produces the least error.
For example, if we consider one hidden layer MLPs, the number of neurons in the hidden layer along with weights and biases can be defined as design variables of an optimization problems. Any Evolutionary algorithm (GA, PSO,...) can then be used to find the best combination of variables which produces the least error.
As first you have to decide the purpose of your ANN: will it be used in the case of unsupervised, supervised or reinforcement learning. This will give you a good hint about the general topology to be used.
In general you can model most of the problems of learning as a layered structure with inputs X on one side and output Y on the other as your goal is to fit a function Y=f(X) with the specific case of auto-encoding when Y==X.
In that sense you can see f() as a transformation of space occuring through a dimensionality reduction/expansion mechanism. As the X is projected on a hidden layer of neurons, it will get transformed to another representation of space that will then be used as the input for the next layer, etc.
In this sense, the number of layers define the number of successive transformation of the input space. The number of neuron of each layer depends of your problem and is usually tuned manually or explored systematically with optimization algorithms like the ones mentioned by Behrouz in the previous answer. However, you can understand this number as a constraint that you impose on the transformation of your data: the smaller the number is, the higher the compression is. A fun way to understand that is to have a ANN compressing an input space into a 3 dimension layer, and then plot the activity of this layer when the network is presented with stimulis. You will see that the different classes are represented in 3D as different "interleaved threads" as shown in the picture below (the network was learning how to manipulate objects with different caracteristics along an axis. Check the full publication for more details).
The general principle is : the more you compress, the more generic will be the classes represented but the reconstruction error will be lower.
Conference Paper Multiple Object Manipulation: is structural modularity neces...
Sorry, picture here. On the 3 neuron hidden layer, the input parameters (acceleration, speed, position) and the output parameters (force to apply) end up in creating separate internal representation that correspond to the different objects. We can see here that the red and green objects share the same dynamic on some portion of the input space.