mv_cs4243_2024_amir_6_p1 (1)
mv_cs4243_2024_amir_6_p1 (1)
mv_cs4243_2024_amir_6_p1 (1)
Pattern Recognition
CS 4243
S1-Y2024/25
1
Lesson 6- Part 1
Computer Vision and Deep Learning
2
ARTIFICIAL NEURAL NETWORKS,
HISTORY
HTTPS://MEDIUM.COM/ANALYTICS-VIDHYA/BRIEF-HISTORY-OF-NEURAL-NETWORKS-44C2BF72EEC
3
LETS DIVE INTO IT
Program: ann2.ipynb
ANN2.IPYNB
4
ANN EXAMPLE: A SIMPLE ADDER
• Learning by samples/examples
5
ANN EXAMPLE: A SIMPLE ADDER
6
HOW IT WORKS?
Delta Rule
7
ANN EXAMPLE: A SIMPLE
SUBTRACTOR
i1 i2 O
• w1 and w2 are selected 4 1 3
randomly, e.g. -0.4 and 0.6 5 1 4
7 0 7
• New training samples, mean
…
new functionality. 8 2 6
8
ANN EXAMPLE: A SIMPLE
SUBTRACTOR
𝑜𝑜 = 𝑖𝑖1 𝑤𝑤1 + 𝑖𝑖2 𝑤𝑤2
Training our subtractor
E,S I1,i2 w1 w2 Or error Dw1 Dw2
i1 i2 O
1,1 4,1 -0.4 0.6 -1 4 + + 4 1 3
1,2 5,1 -0.3 0.7 -0.8 4.8 + + 5 1 4
3 3 0
1,4 3,3 -0.2 0.8 1.8 -1.8 - -
7 0 7
… …
…
N,M 8,2 ~1 ~ -1 ~6 ~0 0 0 8 2 6
9
HOW IT WORKS?
10
PERCEPTRON
Two
implementations of
a single neuron
Perceptron, be able
to do any 2x1
linear logical
mapping
i1 i2 O=i1 . I2
(and)
0 0 0
0 1 0
1 0 0
w1=w2=1, th=1.5 1 1 1
11
PERCEPTRON
We may try a Perceptron network to
materialize multi-input / multi-output
function.
An example is a 1-out-of-c maximum
classifier
In many cases, it is more accurate than
having a single output, in particular
𝑛𝑛
when there are many classes.
𝑜𝑜𝑜𝑜𝑜𝑜𝑗𝑗 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(� 𝑖𝑖𝑖𝑖 𝑤𝑤𝑖𝑖𝑖𝑖 − 𝜃𝜃𝑗𝑗 )
𝑖𝑖=1
12
PERCEPTRON
NN_EXAM4_PERCEPTRON.IPYNB , PERCEPTRON.XLSX
13
PRACTICE: TRAIN IT AS AN ‘OR’
1 𝑥𝑥 ≥ 0
𝑜𝑜 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 = �
𝑜𝑜 𝑥𝑥 < 0
14
PRACTICE: TRAIN IT AS AN ‘XOR’
1 𝑥𝑥 ≥ 0
𝑜𝑜 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 = �
𝑜𝑜 𝑥𝑥 < 0
15
PERCEPTRON/ SINGLE NEURON
DISADVANTAGES: XOR
1 𝑥𝑥 ≥ 0
𝑜𝑜 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 = �
𝑜𝑜 𝑥𝑥 < 0
16
MULTI LAYERED PERCEPTRON AND
XOR
• There is no way to train a single-layered
perceptron to carry out non-linear tasks.
• We need a few things to guarantee a
non-linear behavior:
• At least 1 hidden layer of neurons XOR Problem
between the input and output layers
• With at least 2 neurons
• With a non-linear activation function
• Next we need a good training algorithm,
e.g. Error Back-Propagation Algorithm
• Then we can have a fantastic non-linear
system …
17
MULTI LAYERED PERCEPTRON AND
XOR
18
MULTILAYERED PERCEPTRON
19
UNDERFITTING / OVERFITTING
Underfitting
good
20
HOW TO EMPLOY AN ANN
Configure Determine
Select your Test and
your train the Train your
training evaluate
and test structure of network
algorithm your ANN
data your ANN
21
ACTIVATION FUNCTIONS
1 2 3 4 5
Sigmoid Hyperbolic Step Function ReLU Piecewise
Function Tangent Function Linear Func
𝑥𝑥 𝑥𝑥 ≥ 0
𝑓𝑓 𝑥𝑥 = �
0 𝑥𝑥 < 0
−1 𝑥𝑥 < 0
1,2: invertible, differentiable; 1,4: popular; 4,5: partially
6 S𝑖𝑖𝑖𝑖𝑖𝑖 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹, 𝑓𝑓 𝑥𝑥 = � 0
1
𝑥𝑥 = 0
𝑥𝑥 > 0 differentiable, 1: Logistic, 4: Rectified Linear Unit
22
ACTIVATION FUNCTIONS
23
TRAINING
• Vanishing Gradient Problem
• Not Zero-Centered
• Error/Loss/Cost Function
Complicated
and heavy
Optimization
24
ANN, HOW TO BUILD
Formulating neural network solutions for particular problems is a
multi-stage process:
1. Understand and specify your problem, inputs and outputs
2. Take the simplest form of network that might be able to solve your problem
3. Try to find appropriate connection weights, i.e. training, and other parameters
4. Evaluate train and test errors and measure under/over fitting
5. If the network doesn’t perform well enough, go back to stage 3
6. If the network still doesn’t perform well enough, go back to stage 2 and try
harder.
7. If the network still doesn’t perform well enough, go back to stage 1 and try
harder.
8. Problem solved – move on to next problem.
25
ANN TRAINING
26
ANN TRAINING
27
ERROR BACKPROPAGATION
ALGORITHM
An algorithm for
supervised learning of
artificial neural
networks using gradient
descent.
A general optimization
method.
WWW.TOWARDSDATASCIENCE.COM
28
ERROR BACKPROPAGATION
ALGORITHM
The last 3
Forward
boxes
pass/propagation: set Evaluate error signal show the
the inputs, compute the for each layer back
outputs propagatio
n stages
Update layer
Use the error signal to parameters using the
compute error error gradients with an
gradients optimization algorithm
such as GD.
29
ERROR BACKPROPAGATION
ALGORITHM
Your Comments:
30
ERROR BACKPROPAGATION
ALGORITHM
Your Comments:
31
ERROR BACKPROPAGATION
ALGORITHM
Your Comments:
32
ADAM TRAINING ALGORITHM
• Introduced in 2015
• Adaptive Moment Estimation Algorithm
• Advantages:
Invariant to diagonal
Straightforward to Computationally Little memory
rescale of the
implement. efficient. requirements.
gradients.
Hyper-parameters
Well suited for Appropriate for
have intuitive
problems that are Appropriate for non- problems with very
interpretation and
large in terms of data stationary objectives. noisy/or sparse
typically require little
and/or parameters. gradients.
tuning.
33
ADAM TRAINING ALGORITHM
34
ADAM TRAINING ALGORITHM
Adaptive
Gradient
Algorithm
(AdaGrad)
ADAM
Root Mean
Square
Propagation
(RMSP)
35
ADAM TRAINING ALGORITHM
• Adam is Effective, and
popular in the field of deep
learning because it achieves
good results fast.
• Comparison of Adam to
Other Optimization
Algorithms Training a
Multilayer Perceptron
WWW.TOWARDSDATASCIENCE.COM
36
PRACTICAL CONSIDERATIONS
37
PRACTICAL CONSIDERATIONS
38
PRACTICAL CONSIDERATIONS
8. Use the test set every now and then, or monitor the
training error curve
39
LEARNING RATE AND
MOMENTUM
(w)
40
LEARNING WITH MOMENTUM
41
ANN PARAMETERS
42
ANN PARAMETERS
These 4 factors
together increase/
Number of Number of decrease the ANN’s
training hidden overfitting probability.
epochs neurons
Training
Constraints
algorithm
α Degree of
freedom
43
ANN Parameters
• A rule in system science:
• To keep your system general (no overfitting), the
number of constraints should be at least k=4 times
bigger than the system’s degree of freedom.
• Question: What is the degree of freedom in an
MLP? What is the number of constraints?
44
Overfitting: System Eng Viewpoint
45
TRAINING ALGORITHM
Limited Memory
ADAM
BFGS
The best?
Broyden–
Conjugate
Sadly, no rule
Fletcher–
Goldfarb–
Gradient and
Scaled Conjugate
of thumb!
Shannon
Gradient
algorithm
46
HOW MANY HIDDEN NEURONS?
It depends in a complex way on many factors, including:
47
DIFFERENT LEARNING RATES FOR
DIFFERENT LAYERS?
It is often quicker to just use the same rates η for all the weights and
thresholds, rather than spending time trying to work out appropriate differences. A
very powerful approach is to use evolutionary strategies to determine good
learning rates.
48
PREVENTING UNDER-FITTING AND
OVER-FITTING
To prevent under-fitting we
need to make sure that:
1. The network has enough hidden
units to represent to required
mappings.
2. We train the network for long
enough so that the sum
squared error cost function is
sufficiently minimized.
49
PREVENTING UNDER-FITTING AND
OVER-FITTING
50
INTRODUCTION TO
Deep Learning
51
DEEP LEARNING
WWW.WIKIPEDIA.COM
52
A POINT TO THINK ABOUT
53
IN THE REAL WORLD: DRIVERLESS
CARS
55
SO, WHAT IS DEEP LEARNING?
56
SO, WHAT IS DEEP LEARNING?
HTTPS://THENEWSTACK.IO/DEMYSTIFYING-DEEP-LEARNING-AND-ARTIFICIAL-INTELLIGENCE/
57
COMMENTS
• The train dataset, configures and sets the parameters of the hidden
feature extractor layers.
• A “from detail to blocks to object” scenario would be chased up in
different layers of the deep convolutional network, from input towards
output.
58
FILTERING AND FILTER RESPONSES
• We have developed a test in Octave to make the convolution, image
features and filter responses clearer.
• You can ask it to filter a given image with two horizontal and vertical
edge-detection filters, and show you the results and the filter
responses, i.e. the power of filtered images.
• It uses convolution to do the filtering.
∞ ∞
60
OVERFITTING
Training error
is very low,
while testing
error is rather
high
Overfitting
or
Overtraining
happened
The ANN has
lost its
generalization
ability
61
OVERFITTING
62
OVERFITTING
How to avoid overfitting? When the model capacity increases, the model
gradually changes from underfitting to overfitting.
63
REGULARIZATION
• In deep learning, we wish to minimize the following loss/cost/
error function:
1
ℒ(𝑤𝑤1 , 𝑏𝑏1 , … , 𝑤𝑤𝑛𝑛 , 𝑏𝑏𝑛𝑛 ) = ∑𝑚𝑚
𝑖𝑖=1 𝐸𝐸( 𝑦𝑦
� 𝑖𝑖 𝑖𝑖
, 𝑦𝑦 )
𝑚𝑚
• L can be any loss, E is any difference function.
• For L2 regularization, we add a component that will penalize
large weights:
1 𝜆𝜆
ℒ(𝑤𝑤1 , 𝑏𝑏1 , … , 𝑤𝑤𝑛𝑛 , 𝑏𝑏𝑛𝑛 ) = ∑𝑚𝑚
𝑖𝑖=1 𝐸𝐸 𝑦𝑦
� 𝑖𝑖 𝑖𝑖
, 𝑦𝑦 + ∑𝑛𝑛𝑖𝑖=1 𝑤𝑤𝑖𝑖 2
𝐹𝐹
𝑚𝑚 2𝑚𝑚
• λ is the regularization coefficient/parameter
• Usually, L1 is absolute values norm, while L2 is square norm.
64
REGULARIZATION
• The higher the λ, the higher the penalty rate for larger
weights.
• Large weights will be driven down in order to minimize the
cost function
• Output of each neuron before applying the activation
function: 𝑧𝑧 = 𝑊𝑊 𝑇𝑇 𝑥𝑥 + 𝑏𝑏
• By reducing the values in the weight matrix, z will also be
reduced, which in turns decreases the effect of the activation
function. Therefore, a less complex function will be fit to the
data, effectively reducing overfitting.
HTTPS://TOWARDSDATASCIENCE.COM
65
DROPOUT
HTTPS://TOWARDSDATASCIENCE.COM
66
Dropout
https://towardsdatascience.com
67
Learning
𝜕𝜕𝜕
• Δ𝑤𝑤 = Γ + Α + Β = −𝜂𝜂 + α∆𝑤𝑤𝑛𝑛−1 + 𝛽𝛽𝛽𝛽
𝜕𝜕𝜕𝜕
• Γ = 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• A = momentum term
• B = random term, r= random Gaussian
• 𝜂𝜂, α, 𝛽𝛽 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐
68
Batch Normalization
• Batch normalization is a technique for training deep neural
networks that normalizes the contributions to a layer for
every mini-batch. This has the impact of settling the
learning process and drastically decreasing the number of
training epochs required to train deep neural networks.
• Any modification of weights, changes many things inside
your network.
• Batch normalization provides an elegant way of
reparametrizing almost any deep network. The
reparametrization significantly reduces the problem of
coordinating updates across many layers.
69
Batch Normalization
• For any mini-batch of samples during the training, for any
input or hidden layer 𝐻𝐻𝑚𝑚 , normalize the output over the
mini-batch using
′
𝐴𝐴 𝐻𝐻𝑚𝑚 − 𝜇𝜇[𝐴𝐴 𝐻𝐻𝑚𝑚 ]
𝐴𝐴 𝐻𝐻𝑚𝑚 =
𝜎𝜎[𝐴𝐴 𝐻𝐻𝑚𝑚 ]
Before sending that to the next layer 𝐻𝐻𝑚𝑚+1 .
• A is the activation function.
• So, considering a complete mini-batch, we normalize the
output of a layer, before forward passing that to the next
layer.
70
YOUR COMPUTER, NO GPU
71
Fine Tuning and Transfer Learning
72
FINE TUNING AND TRANSFER
LEARNING
WWW.MATHWORKS.COM
73
Fine Tuning and Transfer Learning
• In transfer learning for deep neural networks, it's common to apply transfer learning
on earlier layers rather than the last layers of the network. Transfer learning
involves taking a pre-trained model (often trained on a large dataset for a related
task) and fine-tuning it for a specific task or dataset of interest.
• The reason for fine-tuning earlier layers is based on the idea that the early layers of
a deep neural network capture general features and patterns that are often
transferable across different tasks and domains. These layers learn low-level
features like edges, textures, and basic shapes, which tend to be similar in many
different types of data.
• In deep learning fine-tuning, the extent to which the parameters of early layers
versus the last layers are changed can vary depending on several factors, including
the specific problem, the architecture of the neural network, and the amount of
available training data. There is no fixed rule, and it depends on how you configure
the fine-tuning process. However, in many transfer learning scenarios, the early
layers of a pre-trained model tend to undergo less change compared to the later
layers.
74
Deep Neural Network Models
Convolutional Generative
Neural Adversarial
Networks Networks
Recurrent
Long-Short
Term
Memory
75
LONG SHORT-TERM MEMORY,
APPLICATIONS
Protein Time series
Robot control homology anomaly
detection detection
Business
Time series Sign language
process
prediction translation
management
76
LONG SHORT-TERM MEMORY,
CASE STUDIES
77
Applications: DL for CV
Image
Object Detection Tracking
Classification
Deep
Convolutional Regional CNN Discriminative
Neural Networks (RCNN) tracker (DSST)
(CNN)
Siamese trackers
and segmentation
78
DL for Image Classification: Cat
or Dog?
Python program, Open Kaggle
Dataset has been
using Tensorflow cats & dogs
cleaned before
and Keras image data set
Configure the
Build the deep
training and Show the images
model
validation sets
Deep5.ipynb 79
CAT OR DOG EXAMPLE
DEEP5.IPYNB
80
CAT OR DOG EXAMPLE
DEEP5.IPYNB
81
CAT OR DOG EXAMPLE
DEEP5.IPYNB
82
AlexNet Structure
Schematic of the ZFNet architecture. This schematic is very similar to that for AlexNet.
Notice that AlexNet contains 7 hidden layers whereas ZFNet contains 8 hidden layers (these
figures count S layers as parts of the corresponding C layers). Also, note that ZFNet is
implemented using only a single GPU and its architecture is unsplit. 83
VGGNet Architecture
84
VGGNet Architecture
• Note that the convolution layers all have unit stride, and that
their input fields are limited to a maximum size of 3 × 3: the
subsampling layers all have 2 × 2 input fields and 2×2 strides.
85
SqueezeNet
SqueezeNet is another popular deep-learning model designed
specifically for image classification tasks with a focus on reducing
the model’s size and computational requirements. It was developed
by researchers from UC Berkeley and DeepScale in 2016.
86
SqueezeNet
Fire Modules
• Squeeze Layer (1x1 Convolution)
• Expand Layer (1x1 and 3x3 Convolutions)
Max Pooling Layers
87
SqueezeNet
88
INCEPTION V.3
89
Architecture
Improvement Features
• Factorized Convolutions
• Auxiliary Classifiers
• Label Smoothing
90
Initial Convolution and Pooling Layers
Inception v.3 • Conv2D Layers
Factorized Convolutions
• 1x1 and 3x3 Convolutions
Inception Modules
Auxiliary Classifier
Reduction Modules
91
Inception v.3
92
Inception v.3
93
IMPORTANT POINTS
Error
Regularization and
Backpropagation, Momentum
Dropout
how it works
Convolutional Deep
Training Underfitting and Neural Networks
Algorithms Overfitting for image
classification
94
1. Main:
• E. R. Davies and M.
Turk (ed), Advanced-
Methods-and-Deep-
Learning-in-Computer-
Vision, E1st ed., 2021,
References Elsevier (Ch 1, 2, and 9)
95
2. Auxiliary:
• Deep learning, Yann LeCun, Yoshua
Bengio & Geoffrey Hinton, Nature,
volume 521, pages 436–444 (2015)
• Learn Keras for Deep Neural
Networks, Jojo Moolayil, 2019
• https://keras.io/examples/vision/
• Samira Pouyanfar et al., A Survey
on Deep Learning: Algorithms,
References Techniques, and Applications. ACM
Comput. Surv. 51, 5, September
2018,
https://doi.org/10.1145/3234150
• Neural Networks and Learning
Machines, 3rd Ed., by Simon S.
Haykin.
• Deep Learning, by Ian Goodfellow
et al., 2016.
96
THAT’S IT …
Thank You!
Any Question?
97