Hi all,
One of the most common techniques to solve the vanishing gradient problem in deep neural networks is using different skip connection (shortcut) schemes that enable the gradient to be directly backpropagated to earlier layers of a deep network. Do you think it is worthwhile to investigate the possibility of employing skip connection modules in shallow neural networks?
Thanks so much.