top of page

Residual neural Network (ResNet)

  • neovijayk
  • Jul 6, 2020
  • 3 min read

In this article we will look at the working of the ResNet (Deep Residual Learning for Image Recognition Paper).

What is vanishing gradient problem?


  1. When deeper networks starts converging, a degradation problem has been exposed: with the increasing network depth , accuracy gets saturated and then degrades rapidly

  2. as the gradient is back-propagated to earlier layers, repeated multiplication may make the gradient extremely small.

  3. As a result, as the network goes deeper, its performance gets saturated or even starts degrading rapidly.

What is ResNet?

  1. Is an artificial neural network (ANN)

  2. These models are known as HighwayNets

  3. Utilises skip connections, or short-cuts to jump over some layers

  4. Examples of ResNet are: U-Net, Fully Constitutional Network (FCN)

  5. Are used to flow information from earlier layers in the model to later layers

  6. In these architectures they are used to pass information from the downsampling layers to the upsampling layers

What is skip connection?


Picture 1: Skip connection


  1. ResNet uses skip connection to add the output from an earlier layer to a later layer

  2. As shown in the picture 1: we have stack convolution layers (layer 1 and 2) also add the original input to the output of the convolution block. We called it skip connection.

More about skip connections:

  1. ResNet is implemented with double- or triple- layer skips that contain non-linearities (ReLu) and batch normalization in between

  2. An additional weight matrix may be used to learn the skip weights

  3. Models with several parallel skips are referred to as DenseNets

Why skip connections are used?

  1. To avoid the problem of vanishing gradients,

  2. by reusing activations from a previous layer until the adjacent layer learns its weights.

  3. During training, the weights adapt to mute the upstream layer[clarification needed], and amplify the previously-skipped layer.In the simplest case, only the weights for the adjacent layer’s connection are adapted, with no explicit weights for the upstream layer

  4. They allow the model to learn an identity function which ensures that the higher layer will perform at least as good as the lower layer, and not worse

Residual Block

What is Residual Block and Residual Function?


  1. Shallow network and its deeper variant both giving the same output. Why?

  2. In the worst case scenario, both the shallow network and deeper variant of it should give the same accuracy.

  3. In the rewarding scenario case, the deeper model should give better accuracy than it’s shallower counter part.

  4. But experiments reveal that deeper models doesn’t perform well. 

  5. So using deeper networks is degrading the performance of the model.

  6. ResNet tries to solve this problem using Deep Residual learning framework.


  1. Identity mapping in Residual blocks

  2. ResNet incorporates identity shortcut connections which essentially skip the training of one or more layers -creating a residual block.

  3. Instead of learning a direct mapping of x ->y (A few stacked non-linear layers) Let us define the residual function using, which can be reframed into = F(x)+x, where F(x) and x represents the stacked non-linear layers and the identity function(input=output) respectively.

  4. If the identity mapping is optimal, We can easily push the residuals to zero (F(x) = 0) than to fit an identity mapping (x, input=output) by a stack of non-linear layers. In simple language it is very easy to come up with a solution like F(x) =0 rather than F(x)=x using stack of non-linear cnn layers as function.

  5. So, this function F(x) is what the authors called Residual function.

There are two kinds of residual connections:


Residual block



  1. Residual block function when input and output dimensions are same :

  2. The identity shortcuts (x) can be directly used when the input and output are of the same dimensions.

  3. Residual block function when input and output dimensions are not same: When the dimensions change,

  4. The shortcut still performs identity mapping, with extra zero entries padded with the increased dimension. The projection shortcut is used to match the dimension (done by 1*1 conv) using the following formula


The first case adds no extra parameters, the second one adds in the form of W_{s}

Was ResNet Successful?

  1. Won 1st place in the ILSVRC 2015 classification competition with top-5 error rate of 3.57% (An ensemble model)

  2. Won the 1st place in ILSVRC and COCO 2015 competition in ImageNet Detection, ImageNet localization, Coco detection and Coco segmentation.

  3. Replacing VGG-16 layers in Faster R-CNN with ResNet-101. They observed a relative improvements of 28%

  4. Efficiently trained networks with 100 layers and 1000 layers also.

Results using ResNet :

In the paper the following networks are studied Deep Residual Learning for Image Recognition Paper:

ResNet Architectures


Each ResNet block is either 2 layer deep (Used in small networks like ResNet 18, 34) or 3 layer deep( ResNet 50, 101, 152).


ResNet 2 layer and 3 layer Block


To know more about ResNet some useful sources:

I am going to provide more information regarding ResNet in incoming articles please check them too on the Deep Learning Page.

If you have any questions regarding above blog feel free to ask in the comment section below also please like and subscribe to the blog. 🙂

Comentarios


Subscribe to BrainStorm newsletter

For notifications on latest posts/blogs

Thanks for submitting!

  • Twitter
  • Facebook
  • Linkedin

© 2023 by my-learnings.   Copy rights Vijay@my-learnings.com

bottom of page