Can you explain the architectural benefits of Deep Learning models including Inception, Residial networks and DenseNets in simple terms?

Please read these explanations:

Inception network:

The module basically acts as multiple convolution filters, that are applied to the same input, with some pooling. The results are then concatenated. This allows the model to take advantage of multi-level feature extraction . For instance, it extracts general (5x5) and local (1x1) features at the same time.

Using multiple features from multiple filters improve the performance of the network. Other than that, there is another fact that makes the inception architecture better than others. All the architectures prior to inception, performed convolution on the spatial and channel wise domain together. By performing the 1x1 convolution, the inception block is doing cross-channel correlations, ignoring the spatial dimensions. This is followed by cross-spatial and cross-channel correlations via the 3x3 and 5x5 filters.

The Inception Module is based on a pattern recognition network which mimics the animal visual cortex. After presenting several examples of images, the network gets used to small details, middle sized features or almost whole images if they come up very often. Each layer of the deep network reinforces some features it thinks is there and passes on to the next. If it has been trained to recognize faces, the first layer detects edges, the second overall design, the third eyes, mouth, nose, the fourth the face, the fifth the mood, for instance.

See more details here:

https://www.quora.com/How-does-the-Inception-module-work-in-GoogLeNet-deep-architecture

Resnet:

According to the universal approximation theorem, given enough capacity, we know that a feedforward network with a single layer is sufficient to represent any function. However, the layer might be massive and the network is prone to overfitting the data. Therefore, there is a common trend in the research community that our network architecture needs to go deeper.

However, increasing network depth does not work by simply stacking layers together. Deep networks are hard to train because of the notorious vanishing gradient problem — as the gradient is back-propagated to earlier layers, repeated multiplication may make the gradient infinitively small. As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly.

Before ResNet, there had been several ways to deal the vanishing gradient issue, e.g. adding an auxiliary loss in a middle layer as extra supervision, but none seemed to really tackle the problem once and for all.

The core idea of ResNet is introducing a so-called “identity shortcut connection” that skips one or more layers,

The authors of Resnet argue that stacking layers shouldn’t degrade the network performance, because we could simply stack identity mappings (layer that doesn’t do anything) upon the current network, and the resulting architecture would perform the same. This indicates that the deeper model should not produce a training error higher than its shallower counterparts. They hypothesize that letting the stacked layers fit a residual mapping is easier than letting them directly fit the desired underlaying mapping. And the residual block explicitly allows it to do precisely that.

https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035

Densenets:

DenseNet(Densely Connected Convolutional Networks) is one of the latest neural networks for visual object recognition. It’s quite similar to ResNet but has some fundamental differences.

For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

ResNet architecture proposed Residual connection, from previous layers to the current one. Roughly saying, input to the present layer was obtained by summation of outputs from previous layers.

So, let’s imagine we have an image with shape(28, 28, 3). First, we spread image to initial 24 channels and receive the image (28, 28, 24). Every next convolution layer will generate k=12 features, and remain width and height the same. The output from Lᵢ layer will be (28, 28, 12). But input to the Lᵢ₊₁ will be (28, 28, 24+12), for Lᵢ₊₂ (28, 28, 24 + 12 + 12) and so on.

See more details about Densenet here:

https://medium.com/intuitionmachine/notes-on-the-implementation-densenet-in-tensorflow-beeda9dd1504

What is the compelling need for image super resolution?

How to translate images at the image level using Deep Learning?

What quantitative measures to use to explain variance across datasets?

Which colormap gives better performance in chest x ray classification with CNN?

What brings the performance difference in Deep Learning with different data augmentation strategies?

Does AP/PA chest x ray view classification carry any clinical significance?

How to concatenate feature vectors of different dimensions?

How to concatenate feature vectors in Python?

How to find the optimal Deep Learning model architecture while performing cross-validation studies?

How to measure the performance of a deep learning model in parasite detection?

Feedback defines the constitution of an organism?

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Articles on" Gender disparities i leatherwork education"?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?