"Residual connections are the same thing as 'skip connections'. They are used to allow gradients to flow through a network directly, without passing through non-linear activation functions. Non-linear activation functions, by nature of being non-linear, cause the gradients to explode or vanish (depending on the weights). "
Hi Junayed, thank you for the answer. Just as a follow-up question, don't you think that residual connections are supposed to increase the accuracy ? But the exact opposite is happening.
Can I ask why you identified a residual CNN as the most appropriate architecture for your use-case? Generally speaking, ResNets evolved from more simple architectures like VGGNet, with the intend to be able to "go deeper", i.e. use more layers, while avoiding the problem of vanishing gradients in the higher layers of the network by allowing to skip the deeper layers with the residual connections.
In this context, a 10 layer architecture seems too shallow for residual connections to add benefit.