As you increase the model complexity (number of hidden layers), you can reduce the training error to practically zero. However, if your model complexity is too large, the network may potentially overfit the data, resulting in poor generalization (or prediction capabilities). So you should increase your model complexity just enough in order to avoid overfitting. Or use regularization methods, such as, L1/L2 regularization, dropouts, or even early stopping to avoid overfitting.
In summary, increasing the model complexity improves prediction capabilities initially, however, beyond a certain point, the network starts to overfit, resulting in poor prediction capabilities.
The relation between number of trainning datasets and number of weights is important. It should be at least twice more of datasets. The higher the ratio the better. The liasion between input and output cannot be found if the ratio is too low.
To avoid overfitting datasets sholud be divided into: train (60-70% of all cases), validation and test dataset. For validation one, when MSE is starting to grow it gives a signal to stop teaching ANN. As you can see on attached fig. the blue line is still going down, but after 19th epoch it means overfitting.
Thank you so much in your generous input. I am kinda new in Neural network and I am trying to learn deeper and deeper. This is definitely a great platform for me to learn from all the experts!