VGG and Resnet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

ResNet –Residual Network

(34,50,101,152)
Deep Residual Learning for Image Recognition
Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
Microsoft Research
• Very very deep network
• 152 layers
• won the 1st place on the ILSVRC 2015 classification task.
Stacking CNN deep

• Deep network should perform better but not


performing as good as shallow network
• The problem is due to not optimizing the learning
• The authors introduced deep residual learning
framework
• hypothesize that it is easier to optimize the
residual mapping than to optimize the original
(plain network)
• Proposed Hypothesis performed better
• use layers to fit a residual mapping rather than
fitting to underline mapping
Residual Block

F(x) := H(x) – x
Fitting Residual:
H(x) =F(x)+x
• H(x) is underlying mapping
• F(x)+x can be realized by feedforward neural
networks with “shortcut connections” known
as identity mapping
– shortcut allows the gradient to be directly
backpropagated to earlier layers
• It not creates any extra parameters and
computation
• Training achieved by backprogation with SGD
Full ResNet architecture
Resnet Architecture
• residual blocks Stacking
• each residual block with 3x3 conv layers
• By doubling the number of filters and
downsample spatially using stride 2
• At beginning additional conv layer
• NO FC layers at the end
34 Plain -Residual
Training Parameters
• Batch Normalization after every CONV layer
• Xavier/2 initialization(instead random
weights)
• SGD
• Learning rate: 0.1,0.01
• Mini-batch size 256
• No dropout used
Batch Normalization
• Layer used to normalize the output of the
previous layer
• Type of regularization to avoid overfitting
Building Resnet34- Identity block
def identity_block(x, filter):
x_skip = x

x = tf.keras.layers.Conv2D(filter, (3,3), padding = 'same')(x)


x = tf.keras.layers.BatchNormalization(axis=3)(x)
x = tf.keras.layers.Activation('relu')(x)

x = tf.keras.layers.Conv2D(filter, (3,3), padding = 'same')(x)


x = tf.keras.layers.BatchNormalization(axis=3)(x)

x = tf.keras.layers.Add()([x, x_skip])
x = tf.keras.layers.Activation('relu')(x)
return x
Putting together

def ResNet34(shape = (32, 32, 3), classes = 10):


# Step 1 (Setup Input Layer)
x_input = tf.keras.layers.Input(shape)
x = tf.keras.layers.ZeroPadding2D((3, 3))(x_input)
# Step 2 (Initial Conv layer along with maxPool)
x = tf.keras.layers.Conv2D(64, kernel_size=7, strides=2, padding='same')(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Activation('relu')(x)
x = tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same')(x)

https://www.analyticsvidhya.com/blog/2021/08/how-to-code-your-resnet-from-scratch-in-tensorflow/
# Define size of sub-blocks and initial filter size
block_layers = [3, 4, 6, 3]
filter_size = 64
# Step 3 Add the Resnet Blocks
for i in range(4):
if i == 0:
for j in range(block_layers[i]):
x = identity_block(x, filter_size)
else:
# One Residual/Convolutional Block followed by Identity blocks
# The filter size will go on increasing by a factor of 2
filter_size = filter_size*2
x = convolutional_block(x, filter_size)
for j in range(block_layers[i] - 1):
x = identity_block(x, filter_size)
# Step 4 End Dense Network
x = tf.keras.layers.AveragePooling2D((2,2), padding = 'same')(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(512, activation = 'relu')(x)
x = tf.keras.layers.Dense(classes, activation = 'softmax')(x)
model = tf.keras.models.Model(inputs = x_input, outputs = x, name = "ResNet34")
return model
VGGNet
[Simonyan and Zisserman, 2014]

• Used Small Filters 3x3 conv


• Deep in layers(Alexnet 8 layers)
• Similar training procedure as
Alex net
• Simple
• Top-5 error rate of 7.3% on ImageNet
• 16 layer CNN
• 138 M parameters
• Trained on 4 Nvidia Titan Black GPUs
for two to three weeks
Use of 3X3 filter :why3X3 filter
• Used multiple times = greater receptive fields
• Stack of three 3x3 conv (stride 1) layers has
same effective receptive field (efr) as one 7x7
conv layer
• efr :concept is that not all pixels in the
receptive field contribute equally to the
output unit’s response
concepts in deep CNNs is the receptive field, or field of view,
a unit in convolutional networks only depends on a region of the input.
This region in the input is the receptive field for that unit
VGG16
VGG16 Keras Code
model = Sequential()
model.add(Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same",
activation="relu"))
model.add(Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"))
model.add(MaxPool2D(pool_size=(2,2),strides=(2,2)))

You might also like