Convolutional Neural Network Models
Convolutional Neural Network Models
Convolutional Neural Network Models
Models
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
Convolutional Neural Network (CNN)is a multi-layer neural
network
Convolutional Neural Network is comprised of one or more
convolutional layers (often with a pooling layers) and then
followed by one or more fully connected layers.
CNN Models
Convolutional layer acts as a feature extractor that extracts
features of the inputs such as edges, corners , endpoints.
CNN Models
Pooling layer reduces the resolution of the image that
reduce the precision of the translation (shift and distortion)
effect.
CNN Models
fully connected layer have full connections to all activations in
the previous layer.
CNN Models
Output Image =
CNN Models
Conv 3x3 with stride=1,padding=0
4x4
6x6 Image
CNN Models
Conv 3x3 with stride=1,padding=1
4x4
4x4 Image
CNN Models
Conv 3x3 with stride=2,padding=0
3x3
7x7 Image
CNN Models
Conv 3x3 with stride=2,padding=1
3x3
5x5 Image
CNN Models
MaxPooling 2x2 with stride=2
2x2
4x4 Image
CNN Models
MaxPooling 3x3 with stride=2
3x3
7x7 Image
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
ImageNet Large Scale Visual Recognition Challenge
is image classification challenge to create model that
can correctly classify an input image into 1,000 separate
object categories.
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
AlexNet achieve on ILSVRC 2012 competition 15.3% Top-5
error rate compare to 26.2% achieved by the second best
entry.
CNN Models
AlexNet has 8 layers without count pooling layers.
AlexNet trained on two GTX 580 GPUs for five to six days
Conv1
Conv4
Conv2
Conv3
Conv5
Image
Pool1
Pool2
Pool3
FC2
FC1
FC3
CNN Models
Image
Conv11-96 Maxpool Conv5-256
227x227x3
CNN Models
AlexNet Model
CNN Models
Layer 0: Input image
CNN Models
Layer 0: 227 x 227 x 3
Layer 1: Convolution with 96 filters, size 11×11, stride 4, padding 0
Outcome Size= 55 x 55 x 96
(227-11)/4 + 1 = 55 is size of outcome
Memory: 55 x 55 x 96 x 3 (because of ReLU & LRN(Local Response Normalization))
Weights (parameters) : 11 x 11 x 3 x 96
CNN Models
Layer 1: 55 x 55 x 96
Layer 2: Max-Pooling with 3×3 filter, stride 2
Outcome Size= 27 x 27 x 96
(55 – 3)/2 + 1 = 27 is size of outcome
Memory: 27 x 27 x 96
CNN Models
Layer 2: 27 x 27 x 96
Layer 3: Convolution with 256 filters, size 5×5, stride 1, padding 2
Outcome Size = 27 x 27 x 256
original size is restored because of padding
Memory: 27 x 27 x 256 x 3 (because of ReLU and LRN)
Weights: 5 x 5 x 96 x 256
CNN Models
Layer 3: 27 x 27 x 256
Layer 4: Max-Pooling with 3×3 filter, stride 2
Outcome Size = 13 x 13 x 256
(27 – 3)/2 + 1 = 13 is size of outcome
Memory: 13 x 13 x 256
CNN Models
Layer 4: 13 x 13 x 256
Layer 5: Convolution with 384 filters, size 3×3, stride 1, padding 1
Outcome Size = 13 x 13 x 384
the original size is restored because of padding (13+2 -3)/1 +1 =13
Memory: 13 x 13 x 384 x 2 (because of ReLU)
Weights: 3 x 3 x 256 x 384
CNN Models
Layer 5: 13 x 13 x 384
Layer 6: Convolution with 384 filters,
size 3×3, stride 1, padding 1
Outcome Size = 13 x 13 x 384
the original size is restored because of
padding
Memory: 13 x 13 x 384 x 2 (because of ReLU)
Weights: 3 x 3 x 384 x 384
CNN Models
Layer 6: 13 x 13 x 384
Layer 7: Convolution with 256 filters, size 3×3, stride 1, padding 1
Outcome Size = 13 x 13 x 256
the original size is restored because of padding
Memory: 13 x 13 x 256 x 2 (because of ReLU)
Weights: 3 x 3 x 384 x 256
CNN Models
Layer 7: 13 x 13 x 256
Layer 8: Max-Pooling with 3×3 filter, stride 2
Outcome Size = 6 x 6 x 256
(13 – 3)/2 + 1 = 6 is size of outcome
Memory: 6 x 6 x 256
CNN Models
Layer 8: 6x6x256=9216 pixels are fed to FC
Layer 9: Fully Connected with 4096 neuron
Memory: 4096 x 3 (because of ReLU and Dropout)
Weights: 4096 x (6 x 6 x 256)
CNN Models
Layer 9: Fully Connected with 4096 neuron
Layer 10: Fully Connected with 4096 neuron
Memory: 4096 x 3 (because of ReLU and Dropout)
Weights: 4096 x 4096
CNN Models
Layer 10: Fully Connected with 4096 neuron
Layer 11: Fully Connected with 1000 neurons
Memory: 1000
Weights: 4096 x 1000
CNN Models
Total (label and softmax not
included)
CNN Models
first use of ReLU
CNN Models
CNN Models
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
ZFNet the winner of the competition ILSVRC 2013 with 14.8%
Top-5 error rate
ZFNet built by Matthew Zeiler and Rob Fergus
CNN Models
AlexNet but:
• CONV1: change from (11x11 stride 4) to (7x7 stride 2)
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
Keep it deep. Keep it simple.
VGGNet the runner up of the competition ILSVRC 2014 with 7.3%
Top-5 error rate.
VGGNet use of only 3x3 sized filters is quite different from AlexNet’s
11x11 filters in the first layer and ZFNet’s 7x7 filters.
two 3x3 conv layers have an effective receptive field of 5x5
Three 3x3 conv layers have an effective receptive field of 7x7
VGGNet trained on 4 Nvidia Titan Black GPUs for two to three
weeks
CNN Models
Interesting to notice that the number of filters doubles after each
maxpool layer. This reinforces the idea of shrinking spatial
dimensions, but growing depth.
VGGNet used ReLU layers after each conv layer and trained with
batch gradient descent
CNN Models
Image Image
Conv
CNN Models
Conv
Pool
Feature
Low Level
Conv
Conv
Pool
Conv
Conv
Conv
Pool
Feature
Mid Level
Conv
Conv
Conv
Pool
Conv
Conv
Conv
Feature
High Level
Pool
FC
FC
Classifier
FC
Input
Conv3-64 Conv3-64 Maxpool Conv3-128 Conv3-128
224x224x3
CNN Models
VGGNet 16
CNN Models
Input
Conv3-64 Conv3-64 Maxpool Conv3-128 Conv3-128 Maxpool
224x224x3
CNN Models
CNN Models
CNN Models
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
GoogleNet is the winner of the competition ILSVRC 2014 with
6.7% Top-5 error rate.
CNN Models
GoogleNet used 9 Inception modules in the whole architecture
CNN Models
GoogleNet use an average pool instead of using FC-Layer, to go
from a 7x7x1024 volume to a 1x1x1024 volume. This saves a
huge number of parameters.
CNN Models
Inception module
CNN Models
Input Conv3/1-
Conv7/2-64 Maxpool3/2 Conv1 Maxpool3/2
224x224x3 192
GoogleNet Softmax-
1000
FC-1000
Dropout
40%
CNN Models
CNN Models
Param
Conv1
Conv3
Conv5
Depth
Size/ # #
Pool
Type Output Ops
Stride Conv3 Conv5
Conv 7x7/2 112x112x64 1 - - - - - - 2.7K 34M
Maxpool 3x3/2 56x56x64 0 - - - - - - - -
Conv 3x3/1 56x56x192 2 - 64 192 - - - 112K 360M
Maxpool 3x3/2 28x28x192 0 - - - - - - - -
Inception 3a - 28x28x256 2 64 96 128 16 32 32 159K 128M
Inception 3b - 28x28x480 2 128 128 192 32 96 64 380K 304M
Maxpool 3x3/2 14x14x480 0 - - - - - - - -
Inception 4a - 14x14x512 2 192 96 208 16 48 64 364K 73M
Inception 4b - 14x14x512 2 160 112 224 24 64 64 437K 88M
Inception 4c - 14x14x512 2 128 128 256 24 64 64 463K 100M
Inception 4d - 14x14x528 2 112 144 288 32 64 64 580K 119M
CNN Models
Param
Conv1
Conv3
Conv5
Depth
Size/ # #
Pool
Type Output Ops
Stride Conv3 Conv5
Inception 4e - 14x14x832 2 256 160 320 32 128 128 840K 170M
Maxpool 3x3/2 7x7x832 0 - - - - - - - -
Inception 5a - 7x7x832 2 256 160 320 32 128 128 1072K 54M
Inception 5b - 7x7x1024 2 384 192 384 48 128 128 1388K 71M
Avgpool 7x7/1 1x1x1024 0 - - - - - - - -
Dropout .4 - 1x1x1024 0 - - - - - - - -
Linear - 1x1x1024 1 - - - - - - 1000K 1M
Softmax - 1x1x1024 0 - - - - - - - -
Total Layers 22
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
ResNet the winner of the competition ILSVRC 2015 with 3.6%
Top-5 error rate.
ResNet mainly inspired by the philosophy of VGGNet.
ResNet proposed a residual learning approach to ease the
difficulty of training deeper networks. Based on the design ideas
of Batch Normalization (BN), small convolutional kernels.
ResNet is a new 152 layer network architecture.
ResNet Trained on an 8 GPU machine for two to three weeks
CNN Models
Residual network
Keys:
No max pooling
No hidden fc
No dropout
Basic design (VGG-style)
All 3x3 conv (almost)
Batch normalization
CNN Models
Preserving base information
Conv
Layers
can treat
perturbation
CNN Models
Residual block
CNN Models
Residual Bottleneck consist of a
1×1 layer for reducing dimension, a
3×3 layer, and a 1×1 layer for
restoring dimension.
CNN Models
Conv7/2-
Image Pool/2 Conv3-64 Conv3-64 Conv3-64 Conv3-64 Conv3-64
64
Conv3/2-
Conv3-128 Conv3-128 Conv3-128 Conv3-128 Conv3-128 Conv3-128 Conv3-64
128
Conv3/2-
Conv3-128 Conv3-256 Conv3-256 Conv3-256 Conv3-256 Conv3-256 Conv3-256
256
Conv3/2-
Conv3-512 Conv3-512 Conv3-256 Conv3-256 Conv3-256 Conv3-256 Conv3-256
512
2Conv3/2- 2Conv3/2-
2Conv3-256 2Conv3-128 2Conv3-128 2Conv3-128
256 128
2Conv3/2-
2Conv3-256 2Conv3-256 2Conv3-256 2Conv3-256 2Conv3-512
512
CNN Models
ResNet Model
CNN Models
Layer Output 18-Layer 34-Layer 50-Layer 101-Layer 152-Layer
Conv-1 112x112 7x7/2-64
3x3 Maxpooling/2
Conv-2 56x56 𝟑𝐱𝟑, 𝟔𝟒 𝟑𝐱𝟑, 𝟔𝟒 𝟏𝐱𝟏, 𝟔𝟒 𝟏𝐱𝟏, 𝟔𝟒 𝟏𝐱𝟏, 𝟔𝟒
𝟐𝐱 𝟑𝐱 𝟑𝐱 𝟑𝐱𝟑𝐱𝟔𝟒 𝟑𝐱 𝟑𝐱𝟑𝐱𝟔𝟒 𝟑𝐱 𝟑𝐱𝟑𝐱𝟔𝟒
𝟑𝐱𝟑, 𝟔𝟒 𝟑𝐱𝟑, 𝟔𝟒
𝟏𝐱𝟏𝐱𝟐𝟓𝟔 𝟏𝐱𝟏𝐱𝟐𝟓𝟔 𝟏𝐱𝟏𝐱𝟐𝟓𝟔
𝟑𝐱𝟑, 𝟏𝟐𝟖 𝟑𝐱𝟑, 𝟏𝟐𝟖 𝟏𝐱𝟏, 𝟏𝟐𝟖 𝟏𝐱𝟏, 𝟏𝟐𝟖 𝟏𝐱𝟏, 𝟏𝟐𝟖
Conv-3 28x28 𝟐𝐱
𝟑𝐱𝟑, 𝟏𝟐𝟖
𝟒𝐱
𝟑𝐱𝟑, 𝟏𝟐𝟖
𝟒𝐱 𝟑𝐱𝟑𝐱𝟏𝟐𝟖 𝟒𝐱 𝟑𝐱𝟑𝐱𝟏𝟐𝟖 𝟖𝐱 𝟑𝐱𝟑𝐱𝟏𝟐𝟖
𝟏𝐱𝟏𝐱𝟓𝟏𝟐 𝟏𝐱𝟏𝐱𝟓𝟏𝟐 𝟏𝐱𝟏𝐱𝟓𝟏𝟐
𝟑𝐱𝟑, 𝟐𝟓𝟔 𝟑𝐱𝟑, 𝟐𝟓𝟔 𝟏𝐱𝟏, 𝟐𝟓𝟔 𝟏𝐱𝟏, 𝟐𝟓𝟔 𝟏𝐱𝟏, 𝟐𝟓𝟔
Conv-4 14x14 𝟐𝐱
𝟑𝐱𝟑, 𝟐𝟓𝟔
𝟔𝐱
𝟑𝐱𝟑, 𝟐𝟓𝟔
𝟔𝐱 𝟑𝐱𝟑𝐱𝟐𝟓𝟔 𝟐𝟑𝐱 𝟑𝐱𝟑𝐱𝟐𝟓𝟔 𝟑𝟔𝐱 𝟑𝐱𝟑𝐱𝟐𝟓𝟔
𝟏𝐱𝟏𝐱𝟏𝟎𝟐𝟒 𝟏𝐱𝟏𝐱𝟏𝟎𝟐𝟒 𝟏𝐱𝟏𝐱𝟏𝟎𝟐𝟒
𝟑𝐱𝟑, 𝟓𝟏𝟐 𝟑𝐱𝟑, 𝟓𝟏𝟐 𝟏𝐱𝟏, 𝟓𝟏𝟐 𝟏𝐱𝟏, 𝟓𝟏𝟐 𝟏𝐱𝟏, 𝟓𝟏𝟐
Conv-5 7x7 𝟐𝐱
𝟑𝐱𝟑, 𝟓𝟏𝟐
𝟑𝐱
𝟑𝐱𝟑, 𝟓𝟏𝟐
𝟑𝐱 𝟑𝐱𝟑𝐱𝟓𝟏𝟐 𝟑𝐱 𝟑𝐱𝟑𝐱𝟓𝟏𝟐 𝟑𝐱 𝟑𝐱𝟑𝐱𝟓𝟏𝟐
𝟏𝐱𝟏𝐱𝟐𝟎𝟒𝟖 𝟏𝐱𝟏𝐱𝟐𝟎𝟒𝟖 𝟏𝐱𝟏𝐱𝟐𝟎𝟒𝟖
1x1 Avgpool-FC1000-Softmax
Flops 𝟏. 𝟖𝐱𝟏𝟎𝟗 𝟑. 𝟔𝐱𝟏𝟎𝟗 𝟑. 𝟖𝐱𝟏𝟎𝟗 𝟕. 𝟔𝐱𝟏𝟎𝟗 𝟏𝟏. 𝟑𝐱𝟏𝟎𝟗
CNN Models
Implement ResNet using TFLearn
CNN Models
CNN Models
CNN Models
CNN Models
Convolutional Neural Network
ILSVRC
AlexNet (2012)
ZFNet (2013)
VGGNet (2014)
GoogleNet 2014)
ResNet (2015)
Conclusion
CNN Models
30
25
20
15
26.2
10
15.3 14.8
5
7.3 6.7
3.6
0
Before 2012 AlexNet 2012 ZFNet 2013 VGGNet 2014 GoogleNet 2014 ResNet 2015
CNN Models
CNN Models
facebook.com/mloey linkedin.com/in/mloey
mohamedloey@gmail.com
mloey@fci.bu.edu.eg
twitter.com/mloey
mloey.github.io
CNN Models
THANKS FOR
YOUR TIME
www.YourCompany.com
© 2020 Companyname PowerPoint Business Theme. All Rights Reserved.
CNN Models