VGG and Resnet

ResNet –Residual Network

Deep Residual Learning for Image Recognition
Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
Microsoft Research
• Very very deep network
• 152 layers
• won the 1st place on the ILSVRC 2015 classification task.
Stacking CNN deep

• Deep network should perform better but not

performing as good as shallow network
• The problem is due to not optimizing the learning
• The authors introduced deep residual learning
• hypothesize that it is easier to optimize the
residual mapping than to optimize the original
(plain network)
• Proposed Hypothesis performed better
• use layers to fit a residual mapping rather than
fitting to underline mapping
Residual Block

F(x) := H(x) – x
Fitting Residual:
H(x) =F(x)+x
• H(x) is underlying mapping
• F(x)+x can be realized by feedforward neural
networks with “shortcut connections” known
as identity mapping
– shortcut allows the gradient to be directly
backpropagated to earlier layers
• It not creates any extra parameters and
• Training achieved by backprogation with SGD
Full ResNet architecture
Resnet Architecture
• residual blocks Stacking
• each residual block with 3x3 conv layers
• By doubling the number of filters and
downsample spatially using stride 2
• At beginning additional conv layer
• NO FC layers at the end
34 Plain -Residual
Training Parameters
• Batch Normalization after every CONV layer
• Xavier/2 initialization(instead random
• Learning rate: 0.1,0.01
• Mini-batch size 256
• No dropout used
Batch Normalization
• Layer used to normalize the output of the
previous layer
• Type of regularization to avoid overfitting
Building Resnet34- Identity block
def identity_block(x, filter):
x_skip = x

x = tf.keras.layers.Conv2D(filter, (3,3), padding = 'same')(x)

x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)

x = tf.keras.layers.Conv2D(filter, (3,3), padding = 'same')(x)

x = tf.keras.layers.BatchNormalization(axis=3)(x)

x = tf.keras.layers.Add()([x, x_skip])
x = tf.keras.layers.Activation('relu')(x)
return x
Putting together

def ResNet34(shape = (32, 32, 3), classes = 10):

# Step 1 (Setup Input Layer)
x_input = tf.keras.layers.Input(shape)
x = tf.keras.layers.ZeroPadding2D((3, 3))(x_input)
# Step 2 (Initial Conv layer along with maxPool)
x = tf.keras.layers.Conv2D(64, kernel_size=7, strides=2, padding='same')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x)
# Define size of sub-blocks and initial filter size
block_layers = [3, 4, 6, 3]
filter_size = 64
# Step 3 Add the Resnet Blocks
for i in range(4):
if i == 0:
for j in range(block_layers[i]):
x = identity_block(x, filter_size)
# One Residual/Convolutional Block followed by Identity blocks
# The filter size will go on increasing by a factor of 2
filter_size = filter_size*2
x = convolutional_block(x, filter_size)
for j in range(block_layers[i] - 1):
x = identity_block(x, filter_size)
# Step 4 End Dense Network
x = tf.keras.layers.AveragePooling2D((2,2), padding = 'same')(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(512, activation = 'relu')(x)
x = tf.keras.layers.Dense(classes, activation = 'softmax')(x)
model = tf.keras.models.Model(inputs = x_input, outputs = x, name = "ResNet34")
return model
[Simonyan and Zisserman, 2014]

• Used Small Filters 3x3 conv

• Deep in layers(Alexnet 8 layers)
• Similar training procedure as
Alex net
• Simple
• Top-5 error rate of 7.3% on ImageNet
• 16 layer CNN
• 138 M parameters
• Trained on 4 Nvidia Titan Black GPUs
for two to three weeks
Use of 3X3 filter :why3X3 filter
• Used multiple times = greater receptive fields
• Stack of three 3x3 conv (stride 1) layers has
same effective receptive field (efr) as one 7x7
conv layer
• efr :concept is that not all pixels in the
receptive field contribute equally to the
output unit’s response
concepts in deep CNNs is the receptive field, or field of view,
a unit in convolutional networks only depends on a region of the input.
This region in the input is the receptive field for that unit
VGG16 Keras Code
model = Sequential()
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))

