Residual neural Network (ResNet)

In this article we will look at the working of the ResNet (Deep Residual Learning for Image Recognition Paper).

What is vanishing gradient problem?

When deeper networks starts converging, a degradation problem has been exposed: with the increasing network depth , accuracy gets saturated and then degrades rapidly
as the gradient is back-propagated to earlier layers, repeated multiplication may make the gradient extremely small.
As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly.

What is ResNet?

Is an artificial neural network (ANN)
These models are known as HighwayNets
Utilises skip connections, or short-cuts to jump over some layers
Examples of ResNet are: U-Net, Fully Constitutional Network (FCN)
Are used to flow information from earlier layers in the model to later layers
In these architectures they are used to pass information from the downsampling layers to the upsampling layers

What is skip connection?

Picture 1: Skip connection

ResNet uses skip connection to add the output from an earlier layer to a later layer
As shown in the picture 1: we have stack convolution layers (layer 1 and 2) also add the original input to the output of the convolution block. We called it skip connection.

More about skip connections:

ResNet is implemented with double- or triple- layer skips that contain non-linearities (ReLu) and batch normalization in between
An additional weight matrix may be used to learn the skip weights
Models with several parallel skips are referred to as DenseNets

Why skip connections are used?

To avoid the problem of vanishing gradients,
by reusing activations from a previous layer until the adjacent layer learns its weights.
During training, the weights adapt to mute the upstream layer[clarification needed], and amplify the previously-skipped layer.In the simplest case, only the weights for the adjacent layer’s connection are adapted, with no explicit weights for the upstream layer
They allow the model to learn an identity function which ensures that the higher layer will perform at least as good as the lower layer, and not worse

What is Residual Block and Residual Function?

Shallow network and its deeper variant both giving the same output. Why?
In the worst case scenario, both the shallow network and deeper variant of it should give the same accuracy.
In the rewarding scenario case, the deeper model should give better accuracy than it’s shallower counter part.
But experiments reveal that deeper models doesn’t perform well.
So using deeper networks is degrading the performance of the model.
ResNet tries to solve this problem using Deep Residual learning framework.

Identity mapping in Residual blocks
ResNet incorporates identity shortcut connections which essentially skip the training of one or more layers -creating a residual block.
Instead of learning a direct mapping of x ->y (A few stacked non-linear layers) Let us define the residual function using, which can be reframed into = F(x)+x, where F(x) and x represents the stacked non-linear layers and the identity function(input=output) respectively.
If the identity mapping is optimal, We can easily push the residuals to zero (F(x) = 0) than to fit an identity mapping (x, input=output) by a stack of non-linear layers. In simple language it is very easy to come up with a solution like F(x) =0 rather than F(x)=x using stack of non-linear cnn layers as function.
So, this function F(x) is what the authors called Residual function.

Residual block

Residual block function when input and output dimensions are same :
The identity shortcuts (x) can be directly used when the input and output are of the same dimensions.
Residual block function when input and output dimensions are not same: When the dimensions change,
The shortcut still performs identity mapping, with extra zero entries padded with the increased dimension. The projection shortcut is used to match the dimension (done by 1*1 conv) using the following formula

The first case adds no extra parameters, the second one adds in the form of W_{s}

Was ResNet Successful?

Won 1st place in the ILSVRC 2015 classification competition with top-5 error rate of 3.57% (An ensemble model)
Won the 1st place in ILSVRC and COCO 2015 competition in ImageNet Detection, ImageNet localization, Coco detection and Coco segmentation.
Replacing VGG-16 layers in Faster R-CNN with ResNet-101. They observed a relative improvements of 28%
Efficiently trained networks with 100 layers and 1000 layers also.

ResNet Architectures

Each ResNet block is either 2 layer deep (Used in small networks like ResNet 18, 34) or 3 layer deep( ResNet 50, 101, 152).

ResNet 2 layer and 3 layer Block

To know more about ResNet some useful sources:

I am going to provide more information regarding ResNet in incoming articles please check them too on the Deep Learning Page.

If you have any questions regarding above blog feel free to ask in the comment section below also please like and subscribe to the blog. 🙂

Recent Posts