K-Max Pooling Operation
K-Max Pooling Operation
K-Max Pooling Operation
2
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• Sentiment Analysis by CNN
3
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• Sentiment Analysis by CNN
4
Image Recognition
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
5
Image Recognition
6
Local Connectivity
Neurons connect to a small
region
7
Parameter Sharing
• The same feature in different positions
Neurons
share the same weights
8
Parameter Sharing
• Different features in the same position
Neurons
have different weights
9
Convolutional Layers
weights weights
height
depth
10
Convolutional Layers
depth = 1 depth = 2
a1 wb1
b1 b1 =wb1 a1 +wb2 a2
wc1 wb2
c1 c1 =wc1 a1 +wc2 a2
wc2
a2 wb1
b2 b2 =wb1 a2 +wb2 a3
wc1 wb2
c2 c2 =wc1 a2 +wc2 a3
a3 wc2
11
Convolutional Layers
depth = 2 depth = 2
b1 wc2
wc1 c1 = a1 wc1 + b1 wc2
a1 c1
wc4 + a2 wc3 + b2 wc4
b2 d1
wc2
wc3
a2 wc1 c2 = a2 wc1 + b2 wc2
c2
wc4 + a3 wc3 + b3 wc4
b3
wc3 d2
a3
12
Convolutional Layers
depth = 2 depth = 2
b1
wd2 c1 = a1 wc1 + b1 wc2
a1 wd1 c1
+ a2 wc3 + b2 wc4
b2 wd4
d1
d1 = a1 wd1 + b1 wd2
wd3 + a2 wd3 + b2 wd4
wd2
a2 c2 = a2 wc1 + b2 wc2
wd1 c2
+ a3 wc3 + b3 wc4
b3 wd4
d2 = a2 wd1 + b2 wd2
d2
a3 wd3 + a3 wd3 + b3 wd4
13
Convolutional Layers
A B C
A B C D
14
Hyper-parameters of CNN
• Stride • Padding
Stride = 1 Padding = 0
0 0
Stride = 2 Padding = 1
15
Example
Output Stride = 2
Volume (3x3x2)
Filter
(3x3x3)
Input
Volume (7x7x3) Padding = 1
http://cs231n.github.io/convolutional-networks/
16
Convolutional Layers
http://cs231n.github.io/convolutional-networks/
17
Convolutional Layers
http://cs231n.github.io/convolutional-networks/
18
Convolutional Layers
http://cs231n.github.io/convolutional-networks/
19
Relationship with Convolution
X
x[k]
y[n] = x[k]w[n k]
k
k
x[n] w[0 k]
n k
w[n]
y[n]
n
n
20
Relationship with Convolution
X
x[k]
y[n] = x[k]w[n k]
k
k
x[n] w[1 k]
n k
w[n]
y[n]
n
n
21
Relationship with Convolution
X
x[k]
y[n] = x[k]w[n k]
k
k
x[n] w[2 k]
n k
w[n]
y[n]
n
n
22
Relationship with Convolution
X
x[k]
y[n] = x[k]w[n k]
k
k
x[n] w[4 k]
n k
w[n]
y[n]
n n
23
Nonlinearity
• Rectified Linear (ReLU)
⇢
nin if nin > 0
nin n nout =
0 otherwise
2 3 2 3
1 1
647 6 47
6 7 ReLU 6 7
4 35 4 05
1 1
24
Why ReLU?
• Easy to train
• Avoid gradient vanishing problem
saturated
Sigmoid gradient ≈ 0 ReLU not saturated
25
Why ReLU?
• Biological reason
strong stimulation weak stimulation
v v
t t
neuron neuron
ReLU
strong stimulation
weak stimulation
26
Pooling Layer
1 3 2 4
5 7 6 8
0 0 3 3
5 5 0 0 no weights
Maximum Average
Pooling Pooling
7 8 4 5
5 3 5 3 no overlap depth = 1
Max(1,3,5,7) = 7 Avg(1,3,5,7) = 4
Max(0,0,5,5) = 5
27
Why “Deep” Learning?
28
Visual Perception of Human
http://www.nature.com/neuro/journal/v8/n8/images/nn0805-975-F1.jpg
29
Visual Perception of Computer
Input Convolutional
Layer Layer
Pooling Convolutional
Layer Layer Pooling
Layer
Receptive Fields
Receptive Fields
30
Visual Perception of Computer
Convolutional Max-pooling
Layer with Layer with
Receptive Fields: Width =3, Height = 3
Input Layer
Filter Responses
7
Class
Label
32
Visual Perception of Computer
• Alexnet
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
33
Training
• Forward Propagation
n2(out) n2(in) w21 n1(out)
n2 n1
34
Training
• Update weights
n2(out) n2(in) w21 n1(out)
Cost
function: J n2 n1
@J @J @n2(out) @n2(in)
=
@w21 @n2(out) @n2(in) @w21
@J
w21 w21 ⌘
@w21
@J @n2(out) @n2(in)
) w21 w21 ⌘
@n2(out) @n2(in) @w21
35
Training
• Update weights
n2(out) n2(in) w21 n1(out)
Cost
function: J n2 n1
37
Training Convolutional Layers
• example:
Convolutional
Layer
wb1 a1
b1 wb2
output a2 input
wb1
b2
wb2 a3
38
Training Convolutional Layers
• Forward propagation
Convolutional
Layer
wb1 a1
b1 = wb1 a1 + wb2 a2 b1 wb2
a2 input
wb1
b2 = wb1 a2 + wb2 a3 b2
wb2 a3
39
Training Convolutional Layers
• Update weights
@b1
@wb1
@J
a1
@b1 b1 wb1
Cost
function: J wb1 a2
b2
@J
a3
@b2 @b2
@wb1
@J @b1 @J @b2
wb1 wb1 ⌘( + )
@b1 @wb1 @b2 @wb1
40
Training Convolutional Layers
• Update weights
@b1
b1 = wb1 a1 + wb2 a2 = a1
@wb1
@J
a1
@b1 b1 wb1
Cost
function: J wb1 a2
b2
@J
a3
@b2 @b2
b2 = wb1 a2 + wb2 a3 = a2
@wb1
@J @J
wb1 wb1 ⌘( a1 + a2 )
@b1 @b2
41
Training Convolutional Layers
• Update weights
@b1
@wb2
@J
a1
@b1 b1
Cost
function: J wb2 a2
b2 wb2
@J
a3
@b2 @b2
@wb2
@J @b1 @J @b2
wb2 wb2 ⌘( + )
@b1 @wb2 @b2 @wb2
42
Training Convolutional Layers
• Update weights
@b1
b1 = wb1 a1 + wb2 a2 = a2
@wb2
@J
a1
@b1 b1
Cost
function: J wb2 a2
b2 wb2
@J
a3
@b2 @b2
b2 = wb1 a2 + wb2 a3 = a3
@wb2
@J @J
wb2 wb2 ⌘( a2 + a3 )
@b1 @b2
43
Training Convolutional Layers
• Propagate to the previous layer
@b1 @b1
@a1 @a2
@J @J @b1
@b1 a1
b1 @b1 @a1
Cost @J @b1 @J @b2
function: J a2 +
@b1 @a2 @b2 @a2
b2
@J @J @b2
a3
@b2 @b2 @a3
@b2 @b2
@a3 @a2
44
Training Convolutional Layers
• Propagate to the previous layer
@b1 @b1
b1 = wb1 a1 + wb2 a2 = w b1 = wb2
@a1 @a2
@J @J
@b1 a1 wb1
b1 @b1
Cost @J @J
function: J a2 wb1 + wb2
@b1 @b2
b2
@J @J
a3 wb2
@b2 @b2
b2 = wb1 a2 + wb2 a3 @b2 = wb2 @b2 = w
@a3 b1
@a2
45
Max-Pooling Layers during Training
• Pooling layers have no weights
• No need to update weights Max-pooling
b1 = max(a1 , a2 ) a1
b1 a 1 > a2
a2
b2 a 2 > a3
b2 = max(a2 , a3 )
a3
⇢
a2 if a2 a3
= ⇢
a3 otherwise @b2 1 if a2 a3
=
@a2 0 otherwise
46
Max-Pooling Layers during Training
• Propagate to the previous layer
@b1 @b1
@J =1 =0
@a1 @a2
@b1 @J
a1 @b1
b1 a 1 > a2
Cost @J
function: J a2
a 2 > a3 @b2
b2
@J a3
@b2
@b2
=1 @b2
@a2 =0
@a3
47
Max-Pooling Layers during Training
• if a1 = a2 ??
◦ Choose the node with smaller index
@J
a1 = a2 = a3
@b1 @J
a1
b1 @b1
Cost
@J
function: J a2
@b2
b2
@J a3
@b2
48
Avg-Pooling Layers during Training
• Pooling layers have no weights
• No need to update weights Avg-pooling
1 a1
b1 = (a1 + a2 ) b1
2
a2
1 b2
b2 = (a2 + a3 ) a3
2
@b2 1 @b2 1
= =
@a2 2 @a3 2
49
Avg-Pooling Layers during Training
• Propagate to the previous layer
@b1 @b1 1
= =
@J @a1 @a2 2
1 @J
@b1
a1 2 @b1
b1
Cost
1 @J @J
function: J a2 ( + )
2 @b1 @b2
b2
a3 1 @J
@J
2 @b2
@b2
@b2 @b2 1
= =
@a2 @a3 2
50
ReLU during Training
⇢
nin if nin > 0
nin n nout =
0 otherwise
⇢
@nout 1 if nin > 1
=
@nin 0 otherwise
51
Training CNN
52
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• Sentiment Analysis by CNN
53
LeNet
◦ Paper:
http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf
54
ImageNet Challenge
• ImageNet Large Scale Visual Recognition Challenge
◦ http://image-net.org/challenges/LSVRC/
• Dataset :
◦ 1000 categories
◦ Training: 1,200,000
◦ Validation: 50,000
◦ Testing: 100,000
http://vision.stanford.edu/Datasets/collage_s.png
55
ImageNet Challenge
http://www.qingpingshan.com/uploads/allimg/160818/1J22QI5-0.png
56
AlexNet (2012)
• Paper:
http://www.cs.toronto.edu/~fritz/absps/imagenet.pdf
• The resurgence of Deep Learning
D: VGG16
E: VGG19
All filters are 3x3
58
VGGNet
• More layers & smaller filters (3x3) is better
• More non-linearity, fewer parameters
59
VGG 19
60
GoogLeNet (2014)
• Paper:
http://www.cs.unc.edu/~wliu/papers/GoogLeNet.pdf
Inception
Module
61
Inception Module
• Best size?
◦ 3x3? 5x5?
62
Inception Module
1x1 convolution
3x3 convolution
5x5 convolution
Previous Filter
layer Concatenate
3x3 max-pooling
63
Inception Module with Dimension Reduction
• Use 1x1 filters to reduce dimension
64
Inception Module with Dimension Reduction
Input size Output size
1x1x256 1x1x128
256 128
1x1 convolution
(1x1x256x128)
Previous Reduced
layer dimension
65
ResNet (2015)
• Paper: https://arxiv.org/abs/1512.03385
• Residual Networks
• 152 layers
66
ResNet
• Residual learning: a building block
Residual
function
67
Residual Learning with Dimension Reduction
• using 1x1 filters
68
Pretrained Model Download
• http://www.vlfeat.org/matconvnet/pretrained/
◦ Alexnet:
◦ http://www.vlfeat.org/matconvnet/models/imagenet-matconvnet-
alex.mat
◦ VGG19:
◦ http://www.vlfeat.org/matconvnet/models/imagenet-vgg-verydeep-
19.mat
◦ GoogLeNet:
◦ http://www.vlfeat.org/matconvnet/models/imagenet-googlenet-dag.mat
◦ ResNet
◦ http://www.vlfeat.org/matconvnet/models/imagenet-resnet-152-dag.mat
69
Using Pretrained Model
• Lower layers:edge, blob, texture (more general)
• Higher layers : object part (more specific)
http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
70
Transfer Learning
• The Pretrained Model is • If your data is similar to
trained on ImageNet the ImageNet data
dataset ◦ Fix all CNN Layers
◦ Train FC layer
FC layer FC layer
Conv layer Conv layer
… …
… …
Conv layer Conv layer
Your
Your
Labeled data
Labeled data
Labeled data
ImageNet
Labeled data data
data
data
71
Transfer Learning
• The Pretrained Model is • If your data is far different
trained on ImageNet from the ImageNet data
dataset ◦ Fix lower CNN Layers
◦ Train higher CNN and FC layers
FC layer FC layer
Conv layer Conv layer
… …
… …
Conv layer Conv layer
Your
Your
Labeled data
Labeled data
Labeled data
ImageNet
Labeled data data
data
data
72
Tensorflow Transfer Learning Example
• https://www.tensorflow.org/versions/r0.11/how_tos/styl
e_guide.html
75
Visualizing CNN
http://vision03.csail.mit.edu/cnn_art/data/single_layer.png
76
Visualizing CNN
filter
flower
response
CNN
random filter
noise response
CNN
77
Gradient Ascent
• Magnify the filter response
lower higher
random filter score score
noise: x response: f
F
fi,j
X
score: F = fi,j
i,j
x
@F
gradient:
@x
78
Gradient Ascent
• Magnify the filter response
lower higher
random filter score score
noise: x response: f
F
fi,j
update x
@F
x x+⌘ x
@x @F
gradient:
learning rate @x
79
Gradient Ascent
80
Different Layers of Visualization
CNN
81
Multiscale Image Generation
visualize resize
visualize
resize
visualize
82
Multiscale Image Generation
83
Deep Dream
• https://research.googleblog.com/2015/06/inceptionism-
going-deeper-into-neural.html
• Source code:
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/ex
amples/tutorials/deepdream/deepdream.ipynb
http://download.tensorflow.org/example_images
/flower_photos.tgz
84
Deep Dream
85
Deep Dream
86
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• Sentiment Analysis by CNN
87
Neural Art
• Paper: https://arxiv.org/abs/1508.06576
• Source code : https://github.com/ckmarkoh/neuralart_tensorflow
content artwork
http://www.taipei-
101.com.tw/upload/news/201502/2015
021711505431705145.JPG
style
https://github.com/andersbll/neural_ar
tistic_style/blob/master/images/starry_
night.jpg?raw=true
88
The Mechanism of Painting
Artist Brain
90
Content Generation
Content Neural
Artist Brain
Stimulation
Minimize
Canvas the
difference
Draw
91
Content Generation
Content Width*Height Filter
VGG19 Responses
Depth
Minimize
Canvas the
difference
Update the
Result
color of
the pixels
92
Content Generation
Input Layer l’s Filter
Input Layer l’s Filter l
Photo: Responses:
Canvas: Responses:
Depth (i)
Depth (i)
Width*Height (j) Width*Height (j)
93
Content Generation
• Backward Propagation
Layer l’s Filter l
VGG19 Responses:
Input
Canvas:
Update
Canvas
Learning Rate
94
Content Generation
95
Content Generation
VGG19
96
Style Generation
Artwork VGG19 Filter Responses Gram Matrix
Depth
Depth
G
Depth
Width*Height
Position- Position-
dependent independent
97
Style Generation
Layer l’s Filter Responses
Width*Height Gram Matrix
1. .5 1. .5 .25 1.
Depth
.5 .5 .25 .5
Depth
.5 .25 .25
1. 1. .5 1.
k1 k2
Depth
k2
k1
98
Style Generation
Input Layer l’s Input Layer l’s
Artwork: Gram Matrix Canvas: Gram Matrix
Layer l’s
Filter Responses
99
Style Generation
Filter Gram
Style VGG19 Responses Matrix
G
Minimize
the
Canvas difference
G
101
Style Generation
VGG19
Gram Matrix
103
Artwork Generation
VGG19 VGG19
Conv1_1
Conv4_2 Conv2_1
Conv3_1
Conv4_1
Conv5_1
104
Artwork Generation
105
Content v.s. Style
0.15 0.05
0.02 0.007
106
Neural Doodle
• Paper: https://arxiv.org/abs/1603.01768
• Source code: https://github.com/alexjc/neural-doodle
content semantic maps result
style
107
Neural Doodle
• Image analogy
108
Neural Doodle
• Image analogy
恐怖連結,慎入!
https://raw.githubusercontent.com/awentzonline/
image-analogies/master/examples/images/trump-
image-analogy.jpg
109
Real-time Texture Synthesis
• Paper: https://arxiv.org/pdf/1604.04382v1.pdf
◦ GAN: https://arxiv.org/pdf/1406.2661v1.pdf
◦ VAE: https://arxiv.org/pdf/1312.6114v10.pdf
110
Outline
• CNN(Convolutional Neural Networks) Introduction
• Evolution of CNN
• Visualizing the Features
• CNN as Artist
• Sentiment Analysis by CNN
111
A Convolutional Neural Network for Modelling
Sentences
• Paper: https://arxiv.org/abs/1404.2188
• Source code:
https://github.com/FredericGodin/DynamicCNN
112
Drawbacks of Recursive Neural
Networks(RvNN)
• Need human-labeled syntax tree during training
Train
RvNN
RvNN
RvNN
This is a dog
RvNN
Word
vector
This is a dog
113
Drawbacks of Recursive Neural
Networks(RvNN)
• Ambiguity in natural language
http://3rd.mafengwo.cn/travels/info_wei http://www.appledaily.com.tw/realtimen
bo.php?id=2861280 ews/article/new/20151006/705309/
114
Element-wise 1D operations on word vectors
• 1D Convolution or 1D Pooling
Represented
by
operation operation
This is a
This is a
115
From RvNN to CNN
• RvNN • CNN
Different
RvNN conv3 conv layers
Same
RvNN
116
CNN with Max-Pooling Layers
• Similar to syntax tree
• But human-labeled syntax tree is not needed
117
Sentiment Analysis by CNN
• Use softmax layer to classify the sentiments
positive negative
softmax softmax
conv2 conv2
118
Sentiment Analysis by CNN
• Build the “correct syntax tree” by training
negativeerror negative
softmax softmax
conv2 conv2
Backward
propagation
pool1 pool1 pool1 pool1
119
Sentiment Analysis by CNN
• Build the “correct syntax tree” by training
negative positive
softmax softmax
conv2
Update conv2
the weights
pool1 pool1 pool1 pool1
120
Multiple Filters
• Richer features than RNN
This is a
121
Sentence can’t be easily resized
• Image can be easily resized • Sentence can’t be easily
resized
全台灣最高樓在台北市
resize
resize
全台灣最高的高樓在台北市
全台灣最高樓在台北
台灣最高樓在台北
122
Various Input Size
• Convolutional layers and pooling layers
◦ can handle input with various size
123
Various Input Size
• Fully-connected layer and softmax layer
◦ need fixed-size input
softmax softmax
fc fc
124
k-max Pooling
• choose the k-max values
• preserve the order of input values
• variable-size input, fixed-size output
12 21 15 13 7 8
3-max 3-max
pooling pooling
12 5 21 15 7 4 9 13 4 1 7 8
125
Wide Convolution
• Ensures that all weights reach the entire sentence
126
Dynamic k-max Pooling
L l
kl = max(ktop , d se)
L
l : index of current layer
kl : k of current layer ktop wide convolution
& k-max pooling
ktop : k of top layer
L : total number of layers L
s : length of input sentence kl
wide convolution &
k-max pooling
s
127
Dynamic k-max Pooling
L l
kl = max(ktop , d se)
L
ktop = 3
conv & pooling
L=2
2 1
k1 = max(3, d ⇥ 10e) = 5 conv & pooling
2
s = 10
128
Dynamic k-max Pooling
L l
kl = max(ktop , d se)
L
ktop = 3
conv & pooling
L=2
2 1
k1 = max(3, d ⇥ 14e) = 7 conv & pooling
2
s = 14
129
Dynamic k-max Pooling
L l
kl = max(ktop , d se)
L
ktop = 3
conv & pooling
L=2
2 1
k1 = max(3, d ⇥ 8e) = 4 conv & pooling
2
s=8
130
Dynamic k-max Pooling
131
Convolutional Neural Networks for Sentence
Classification
• Paper: http://www.aclweb.org/anthology/D14-1181
• Sourcee code:
https://github.com/yoonkim/CNN_sentence
132
Static & Non-Static Channel
• Pretrained by word2vec
• Static: fix the values during training
• Non-Static: update the values during training
133
About the Lecturer
Mark Chang
HTC Research & Healthcare
Deep Learning Algorithms
Research Engineer
134