For example, if we consider one hidden layer MLPs, the number of neurons in the hidden layer along with weights and biases can be defined as design variables of an optimization problems. Any Evolutionary algorithm (GA, PSO,...) can then be used to find the best combination of variables which produces the least error.
For example, if we consider one hidden layer MLPs, the number of neurons in the hidden layer along with weights and biases can be defined as design variables of an optimization problems. Any Evolutionary algorithm (GA, PSO,...) can then be used to find the best combination of variables which produces the least error.
As first you have to decide the purpose of your ANN: will it be used in the case of unsupervised, supervised or reinforcement learning. This will give you a good hint about the general topology to be used.
In general you can model most of the problems of learning as a layered structure with inputs X on one side and output Y on the other as your goal is to fit a function Y=f(X) with the specific case of auto-encoding when Y==X.
In that sense you can see f() as a transformation of space occuring through a dimensionality reduction/expansion mechanism. As the X is projected on a hidden layer of neurons, it will get transformed to another representation of space that will then be used as the input for the next layer, etc.
In this sense, the number of layers define the number of successive transformation of the input space. The number of neuron of each layer depends of your problem and is usually tuned manually or explored systematically with optimization algorithms like the ones mentioned by Behrouz in the previous answer. However, you can understand this number as a constraint that you impose on the transformation of your data: the smaller the number is, the higher the compression is. A fun way to understand that is to have a ANN compressing an input space into a 3 dimension layer, and then plot the activity of this layer when the network is presented with stimulis. You will see that the different classes are represented in 3D as different "interleaved threads" as shown in the picture below (the network was learning how to manipulate objects with different caracteristics along an axis. Check the full publication for more details).
The general principle is : the more you compress, the more generic will be the classes represented but the reconstruction error will be lower.
Conference Paper Multiple Object Manipulation: is structural modularity neces...
Sorry, picture here. On the 3 neuron hidden layer, the input parameters (acceleration, speed, position) and the output parameters (force to apply) end up in creating separate internal representation that correspond to the different objects. We can see here that the red and green objects share the same dynamic on some portion of the input space.
After a lot of trial and error, I think the best way to approach this is by generating a grid of e.g. 100 different NN architectures by looping through a range of hidden neurons (e.g. 2:2:20) and training dataset proportion (e.g. 40%:5%:90%). To make this manageable in terms of computer time, I suggest normalizing the inputs and outputs and applying principal components analysis to them both. As a rule of thumb for the range of hidden neurons, choose a range that is between the number of principal component inputs and principal component outputs. Then, for each NN, calculate the MSE of the validation dataset and use this to select the optimal NN architecture from the grid.
I recently applied this to a multiple-input multiple-output (MIMO) problem to convert satellite measurements to target aerosol microphysical and optical parameters with moderate success:
There aren't any rule to choose ANN topology, we have to use step-by-step, one by one methodology in order to obtain optimal solution, but it takes a lot of time and effort