Let's assume we have a standard feedforward ANN with just a single  hidden layer. It is standard practice to normalize the input data ,usually in [0,1] or [-1,1]. Let's assume min-max normalization. If we have sigmoid activation function, wouldn't it be more sensible to normalize in a range like [-4,4] or [-5,5] ? The sigmoid function is in essence linear in [-2,2] so if we normalize in [-1,1] the approximated function is linear for the most part. It might be argued that for certain weights the input for the sigmoid activation function can be outside the normalized range but still that's generally few cases (depending also on what values the weights might take).

As for how to initialize the weights: A common formula is in the range [-b,b] where b = 1 / sqrt(Ninput+Nhidden) (assuming sigmoid function).

It is usually said that small weights are more preferable (also if you like at regularized error functions that penalize large weights ) since large weights have higher chances of leading to overfitting.

Any thoughts?

More Nikolaos Bismpikos's questions See All
Similar questions and discussions