To answer shortly, it is not about the distribution which one needs to feed in rather about how easy it is to make the network learn. Normal distribution is one such way.
When we conduct deep learning training, we often hope that the closer the data is to the normal distribution, the better, so that the training effect will be significantly improved.
In fact, we are talking about data standardization. This is a feature processing method, so that all standardized features meet the standard normal distribution of N(0, 1).
e.g. If one of your features is 10,000 and one feature is 0.2, when these two are input into any learning model, the learning rate will be very slow. If one of the two features is 0.1 and the other is 0.3, the learning rate will be accelerated.
As for the data itself, there is no mandatory requirement that it must meet the conditions of the normal distribution, except for the abnormal distribution. But to accelerate the convergence process, standardization can be adopted.
There is a kind of deep neural network learning Batch Normalization, which normalizes the data at each layer. This is also for avoiding vanishing gradient, making each layer more independent and not affected by the previous layer.
Hi this is because among the best practices for training a Neural Network is to normalize your data to obtain a mean close to 0. Therefore, Normalizing your data generally speeds up learning and leads to faster convergence.
Because the methods are based on parametric statistics. I recommend using nonparametric and intelligent mathematical models. then it will not need to be checked