mv_cs4243_2024_amir_6_p1 (1)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 97

Computer Vision and

Pattern Recognition
CS 4243
S1-Y2024/25

1
Lesson 6- Part 1
Computer Vision and Deep Learning

100 billion CPUs, 500 trillion connections, OMG!!!

2
ARTIFICIAL NEURAL NETWORKS,
HISTORY

HTTPS://MEDIUM.COM/ANALYTICS-VIDHYA/BRIEF-HISTORY-OF-NEURAL-NETWORKS-44C2BF72EEC

3
LETS DIVE INTO IT

Use Jupyter Notebook/ Anaconda/


Python

You need Tensorflow Package too

Program: ann2.ipynb

This is a function estimation example

ANN2.IPYNB
4
ANN EXAMPLE: A SIMPLE ADDER

𝑜𝑜 = 𝑖𝑖1 𝑤𝑤1 + 𝑖𝑖2 𝑤𝑤2

• w1 and w2 are selected randomly, i1 i2 O

e.g. 0.3 and -0.7 1 2 3

• Then we will try to find the best 1 3 4

w1 and w2 to have the output 1 1 2

equal (or close enough) to the Os 3 3 6


7 1 8
of the training data samples.

• Iteratively 8 1 9

• Learning by samples/examples

5
ANN EXAMPLE: A SIMPLE ADDER

𝑜𝑜 = 𝑖𝑖1 𝑤𝑤1 + 𝑖𝑖2 𝑤𝑤2


real
Training our adder target

E,S I1,i2 w1 w2 Or error Dw1 Dw2


i1 i2 Ot
1,1 1,2 0.3 -0.7 -1.1 4.1 + +
1 2 3
1,2 1,3 0.4 -0.6 -1.4 5.4 + + 1 3 4
1,3 1,1 0.5 -0.5 0 2 + + 1 1 2

1,4 3,3 0.6 -0.4 0.6 5.4 + + 3 3 6


7 1 8
… … …

N,M 1,1 ~1 ~1 ~2 ~0 0 0
8 1 9

6
HOW IT WORKS?
Delta Rule

𝑂𝑂𝑟𝑟 = � 𝑖𝑖𝑗𝑗 𝑤𝑤𝑗𝑗


𝑗𝑗
𝑤𝑤𝑗𝑗+1 = 𝑤𝑤𝑗𝑗 + ∆𝑤𝑤𝑗𝑗
∆𝑤𝑤𝑗𝑗 = 𝜂𝜂(𝑂𝑂𝑡𝑡 − 𝑂𝑂𝑟𝑟 )𝑖𝑖𝑗𝑗
η= Learning Rate

7
ANN EXAMPLE: A SIMPLE
SUBTRACTOR

𝑜𝑜 = 𝑖𝑖1 𝑤𝑤1 + 𝑖𝑖2 𝑤𝑤2

i1 i2 O
• w1 and w2 are selected 4 1 3
randomly, e.g. -0.4 and 0.6 5 1 4

• Learning by examples and 3 2 1

iterations once again 3 3 0

7 0 7
• New training samples, mean

new functionality. 8 2 6

8
ANN EXAMPLE: A SIMPLE
SUBTRACTOR
𝑜𝑜 = 𝑖𝑖1 𝑤𝑤1 + 𝑖𝑖2 𝑤𝑤2
Training our subtractor
E,S I1,i2 w1 w2 Or error Dw1 Dw2
i1 i2 O
1,1 4,1 -0.4 0.6 -1 4 + + 4 1 3
1,2 5,1 -0.3 0.7 -0.8 4.8 + + 5 1 4

1,3 3,2 -0.2 0.8 1 0 0 0 3 2 1

3 3 0
1,4 3,3 -0.2 0.8 1.8 -1.8 - -
7 0 7
… …

N,M 8,2 ~1 ~ -1 ~6 ~0 0 0 8 2 6

9
HOW IT WORKS?

Inverse matrix scheme


In matrix format: O=IW
⇒ 𝑊𝑊 = 𝐼𝐼 −1 𝑂𝑂
We most likely need to compute
the pseudo inverse

10
PERCEPTRON
Two
implementations of
a single neuron
Perceptron, be able
to do any 2x1
linear logical
mapping

i1 i2 O=i1 . I2
(and)
0 0 0
0 1 0
1 0 0
w1=w2=1, th=1.5 1 1 1

11
PERCEPTRON
We may try a Perceptron network to
materialize multi-input / multi-output
function.
An example is a 1-out-of-c maximum
classifier
In many cases, it is more accurate than
having a single output, in particular
𝑛𝑛
when there are many classes.
𝑜𝑜𝑜𝑜𝑜𝑜𝑗𝑗 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠(� 𝑖𝑖𝑖𝑖 𝑤𝑤𝑖𝑖𝑖𝑖 − 𝜃𝜃𝑗𝑗 )
𝑖𝑖=1

12
PERCEPTRON

NN_EXAM4_PERCEPTRON.IPYNB , PERCEPTRON.XLSX
13
PRACTICE: TRAIN IT AS AN ‘OR’

1 𝑥𝑥 ≥ 0
𝑜𝑜 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 = �
𝑜𝑜 𝑥𝑥 < 0

What are w1,w2, and b?


i1 i2 O=i1 \/ i2
0 0 0
0 1 1
1 0 1
1 1 1

14
PRACTICE: TRAIN IT AS AN ‘XOR’

1 𝑥𝑥 ≥ 0
𝑜𝑜 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 = �
𝑜𝑜 𝑥𝑥 < 0

What are w1,w2, and b?


i1 i2 O=i1 xor i2
0 0 0
0 1 1
1 0 1
1 1 0

15
PERCEPTRON/ SINGLE NEURON
DISADVANTAGES: XOR

1 𝑥𝑥 ≥ 0
𝑜𝑜 = 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑥𝑥 = �
𝑜𝑜 𝑥𝑥 < 0

To deal with such problems, a


i1 i2 O=i1 xor i2
Perceptron needs a multi-neuron
0 0 0
hidden layer with non-linear
0 1 1
activation function.
1 0 1
1 1 0

16
MULTI LAYERED PERCEPTRON AND
XOR
• There is no way to train a single-layered
perceptron to carry out non-linear tasks.
• We need a few things to guarantee a
non-linear behavior:
• At least 1 hidden layer of neurons XOR Problem
between the input and output layers
• With at least 2 neurons
• With a non-linear activation function
• Next we need a good training algorithm,
e.g. Error Back-Propagation Algorithm
• Then we can have a fantastic non-linear
system …

17
MULTI LAYERED PERCEPTRON AND
XOR

i1 i2 O=i1 xor i2 A multi layered


0 0 0 perceptron: multi-
line borders, or
0 1 1 using partial lines to
1 0 1 estimate functions
1 1 0

18
MULTILAYERED PERCEPTRON

Input hidden output


layers

• In a classification problem, a MLP, after a proper training, can classify the


samples using line pieces/segments boundary.
• Number of line pieces depends on the number of hidden neurons, training
samples, training epochs, and training algorithm.

19
UNDERFITTING / OVERFITTING

Underfitting

good

• Underfitting: Both training and testing


Overfitting
error are high
• Overfitting: The training error is low
but the testing error is high.

20
HOW TO EMPLOY AN ANN

Configure Determine
Select your Test and
your train the Train your
training evaluate
and test structure of network
algorithm your ANN
data your ANN

21
ACTIVATION FUNCTIONS
1 2 3 4 5
Sigmoid Hyperbolic Step Function ReLU Piecewise
Function Tangent Function Linear Func

𝑥𝑥 𝑥𝑥 ≥ 0
𝑓𝑓 𝑥𝑥 = �
0 𝑥𝑥 < 0

−1 𝑥𝑥 < 0
1,2: invertible, differentiable; 1,4: popular; 4,5: partially
6 S𝑖𝑖𝑖𝑖𝑖𝑖 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹, 𝑓𝑓 𝑥𝑥 = � 0
1
𝑥𝑥 = 0
𝑥𝑥 > 0 differentiable, 1: Logistic, 4: Rectified Linear Unit

22
ACTIVATION FUNCTIONS

23
TRAINING
• Vanishing Gradient Problem
• Not Zero-Centered
• Error/Loss/Cost Function

Complicated
and heavy

Optimization

Training and testing


data, non overlapped,
complete, reflexive

Training algorithms are


Weight and other parameters almost the most challenging
of the net part of neural computing

24
ANN, HOW TO BUILD
Formulating neural network solutions for particular problems is a
multi-stage process:
1. Understand and specify your problem, inputs and outputs
2. Take the simplest form of network that might be able to solve your problem
3. Try to find appropriate connection weights, i.e. training, and other parameters
4. Evaluate train and test errors and measure under/over fitting
5. If the network doesn’t perform well enough, go back to stage 3
6. If the network still doesn’t perform well enough, go back to stage 2 and try
harder.
7. If the network still doesn’t perform well enough, go back to stage 1 and try
harder.
8. Problem solved – move on to next problem.

25
ANN TRAINING

26
ANN TRAINING

27
ERROR BACKPROPAGATION
ALGORITHM

An algorithm for
supervised learning of
artificial neural
networks using gradient
descent.
A general optimization
method.

WWW.TOWARDSDATASCIENCE.COM

28
ERROR BACKPROPAGATION
ALGORITHM
The last 3
Forward
boxes
pass/propagation: set Evaluate error signal show the
the inputs, compute the for each layer back
outputs propagatio
n stages

Update layer
Use the error signal to parameters using the
compute error error gradients with an
gradients optimization algorithm
such as GD.

29
ERROR BACKPROPAGATION
ALGORITHM

Your Comments:

30
ERROR BACKPROPAGATION
ALGORITHM

Your Comments:

31
ERROR BACKPROPAGATION
ALGORITHM

Your Comments:

32
ADAM TRAINING ALGORITHM

• Introduced in 2015
• Adaptive Moment Estimation Algorithm
• Advantages:

Invariant to diagonal
Straightforward to Computationally Little memory
rescale of the
implement. efficient. requirements.
gradients.

Hyper-parameters
Well suited for Appropriate for
have intuitive
problems that are Appropriate for non- problems with very
interpretation and
large in terms of data stationary objectives. noisy/or sparse
typically require little
and/or parameters. gradients.
tuning.

D. KINGMA, J. BA . “ADAM: A METHOD FOR STOCHASTIC OPTIMIZATION“, ICLR, 2015.

33
ADAM TRAINING ALGORITHM

• SGD maintains a single learning


rate (termed α) for all weight
updates and the learning rate Adaptive
Gradient
Algorithm (Ada
does not change during training. Grad)

• A learning rate is maintained for


each network weight (parameter)
and separately adapted as ADAM
learning unfolds.
• ADAM computes individual
Root Mean
adaptive learning rates for Square
Propagation (R
different parameters from MSP)

estimates of the first and second


moments of the gradients.

34
ADAM TRAINING ALGORITHM
Adaptive
Gradient
Algorithm
(AdaGrad)

ADAM
Root Mean
Square
Propagation
(RMSP)

35
ADAM TRAINING ALGORITHM
• Adam is Effective, and
popular in the field of deep
learning because it achieves
good results fast.
• Comparison of Adam to
Other Optimization
Algorithms Training a
Multilayer Perceptron

Taken from Adam: A Method for


Stochastic Optimization, 2015.

WWW.TOWARDSDATASCIENCE.COM

36
PRACTICAL CONSIDERATIONS

37
PRACTICAL CONSIDERATIONS

1. Sometimes, normalization, etc.


2. Randomly
3. Start with larger ƞ, then make it smaller in later
epochs

4. Online vs. batch training, and the batch size


5. Recently, ReLU is more popular, formerly, Sigmoid
was.

38
PRACTICAL CONSIDERATIONS

6. Try momentum term, also try to avoid 0 and 1 as


your output
7. Learning rate and momentum

8. Use the test set every now and then, or monitor the
training error curve

39
LEARNING RATE AND
MOMENTUM
(w)

40
LEARNING WITH MOMENTUM

• We simply add a momentum term,


which is the weight change of the previous step times a
momentum parameter α.
• If α is zero, then we have the standard online
training algorithm used before.
• As we increase α towards 1, each step includes
increasing contributions from previous training
patterns.

41
ANN PARAMETERS

42
ANN PARAMETERS

These 4 factors
together increase/
Number of Number of decrease the ANN’s
training hidden overfitting probability.
epochs neurons

Training
Constraints
algorithm

α Degree of
freedom

43
ANN Parameters
• A rule in system science:
• To keep your system general (no overfitting), the
number of constraints should be at least k=4 times
bigger than the system’s degree of freedom.
• Question: What is the degree of freedom in an
MLP? What is the number of constraints?

44
Overfitting: System Eng Viewpoint

• There are 2 factors which determine the system generality:


Degree of Freedom (𝑭𝑭𝒐𝒐 ), and Number of Constraints
(#C).
• To avoid loss of generality in a system, #C must be k
times more than 𝑭𝑭𝒐𝒐 . k=4,5,..,10.
#𝐶𝐶 ≥ 𝑘𝑘𝐹𝐹 𝑜𝑜
• In a neural network, #C is the number of training samples,
and 𝑭𝑭𝒐𝒐 is the number of amendable parameters (mostly
weights and biases).

45
TRAINING ALGORITHM

Evolutionary Error Back


Learning Propagation

Limited Memory
ADAM
BFGS

The best?
Broyden–
Conjugate
Sadly, no rule
Fletcher–
Goldfarb–
Gradient and
Scaled Conjugate
of thumb!
Shannon
Gradient
algorithm

46
HOW MANY HIDDEN NEURONS?
It depends in a complex way on many factors, including:

The numbers of The amount of


The number of
input and output noise in the
training patterns
units training data

The complexity of The type of


the function or hidden unit The training
classification to activation algorithm
be learned function

47
DIFFERENT LEARNING RATES FOR
DIFFERENT LAYERS?

It is often quicker to just use the same rates η for all the weights and
thresholds, rather than spending time trying to work out appropriate differences. A
very powerful approach is to use evolutionary strategies to determine good
learning rates.

48
PREVENTING UNDER-FITTING AND
OVER-FITTING
To prevent under-fitting we
need to make sure that:
1. The network has enough hidden
units to represent to required
mappings.
2. We train the network for long
enough so that the sum
squared error cost function is
sufficiently minimized.

49
PREVENTING UNDER-FITTING AND
OVER-FITTING

To prevent over-fitting we can:


1. Stop the training early – before it has had time to learn the training
data too well.
2. Restrict the number of adjustable parameters the network has, e.g.
by reducing the number of hidden units, or by forcing connections
to share the same weight values.
3. Add some form of regularization term to the error function to
encourage smoother network mappings.
4. Add noise to the training patterns to smear out the data points.

50
INTRODUCTION TO
Deep Learning
51
DEEP LEARNING

• Deeper is better than fatter !!!


• Deep learning is a class of machine
learning algorithms that uses
multiple layers to progressively
extract higher-level features from
the raw input.
• Most modern deep learning
models are based on artificial
neural networks, specifically,
Convolutional Neural Networks
(CNN)s.

WWW.WIKIPEDIA.COM

52
A POINT TO THINK ABOUT

53
IN THE REAL WORLD: DRIVERLESS
CARS

WWW.YOUTUBE.COM , MOBILE GEEKS


54
WHAT IS IT ALL ABOUT?
• We need an AI to classify objects/ entities/
samples for us.
• We know the class of each object.
• We extract the features of each object
• Feature: a number, quantity, or tag, which
represents (an aspect of) the object.
• Now, we need to train our AI-Classifier to
show it how to classify.
• Now on, Lets focus on convolutional neural
networks (CNN)

55
SO, WHAT IS DEEP LEARNING?

• We get closer to the biological


brain’s structure.
• We get closer to the mental
system’s process and
functionality.
• Therefore, we get closer to its
performance too.
• It is important, in particular, when
• The data is semi or
unstructured.
• The problem in hand is hard to
solve.

56
SO, WHAT IS DEEP LEARNING?

HTTPS://THENEWSTACK.IO/DEMYSTIFYING-DEEP-LEARNING-AND-ARTIFICIAL-INTELLIGENCE/

57
COMMENTS

• The train dataset, configures and sets the parameters of the hidden
feature extractor layers.
• A “from detail to blocks to object” scenario would be chased up in
different layers of the deep convolutional network, from input towards
output.

58
FILTERING AND FILTER RESPONSES
• We have developed a test in Octave to make the convolution, image
features and filter responses clearer.
• You can ask it to filter a given image with two horizontal and vertical
edge-detection filters, and show you the results and the filter
responses, i.e. the power of filtered images.
• It uses convolution to do the filtering.
∞ ∞

𝑏𝑏 𝑚𝑚, 𝑛𝑛 = ℎ 𝑚𝑚, 𝑛𝑛 ∗ 𝑥𝑥 𝑚𝑚, 𝑛𝑛 = � � ℎ[𝑖𝑖, 𝑗𝑗]. 𝑥𝑥[𝑚𝑚 − 𝑖𝑖, 𝑛𝑛 − 𝑗𝑗]


𝑖𝑖=−∞ 𝑗𝑗=−∞
𝑀𝑀 𝑁𝑁
1
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 𝑎𝑎 = � � 𝑎𝑎2 (𝑖𝑖, 𝑗𝑗)
𝑀𝑀𝑀𝑀
𝑖𝑖=1 𝑗𝑗=1

IMAGE4.IPYNB , CNN_FILTERS1.IPYNB , CNN_FILTERS2.IPYNB


59
TRAINING

Good for deep


learning

60
OVERFITTING
Training error
is very low,
while testing
error is rather
high

Overfitting
or
Overtraining
happened
The ANN has
lost its
generalization
ability

61
OVERFITTING

Top: Function estimation without


overfitting, reds are training
samples and blues are testing
results, sinusoidal underneath
function is revealed clearly. 4
hidden neurons, 200 training
epochs.
Bottom: Overfitting, passing
through all training samples but
no sign of the sinusoidal
meaning. 100 hidden neurons,
10000 training epochs.

62
OVERFITTING
How to avoid overfitting? When the model capacity increases, the model
gradually changes from underfitting to overfitting.

63
REGULARIZATION
• In deep learning, we wish to minimize the following loss/cost/
error function:
1
ℒ(𝑤𝑤1 , 𝑏𝑏1 , … , 𝑤𝑤𝑛𝑛 , 𝑏𝑏𝑛𝑛 ) = ∑𝑚𝑚
𝑖𝑖=1 𝐸𝐸( 𝑦𝑦
� 𝑖𝑖 𝑖𝑖
, 𝑦𝑦 )
𝑚𝑚
• L can be any loss, E is any difference function.
• For L2 regularization, we add a component that will penalize
large weights:
1 𝜆𝜆
ℒ(𝑤𝑤1 , 𝑏𝑏1 , … , 𝑤𝑤𝑛𝑛 , 𝑏𝑏𝑛𝑛 ) = ∑𝑚𝑚
𝑖𝑖=1 𝐸𝐸 𝑦𝑦
� 𝑖𝑖 𝑖𝑖
, 𝑦𝑦 + ∑𝑛𝑛𝑖𝑖=1 𝑤𝑤𝑖𝑖 2
𝐹𝐹
𝑚𝑚 2𝑚𝑚
• λ is the regularization coefficient/parameter
• Usually, L1 is absolute values norm, while L2 is square norm.

64
REGULARIZATION

• The higher the λ, the higher the penalty rate for larger
weights.
• Large weights will be driven down in order to minimize the
cost function
• Output of each neuron before applying the activation
function: 𝑧𝑧 = 𝑊𝑊 𝑇𝑇 𝑥𝑥 + 𝑏𝑏
• By reducing the values in the weight matrix, z will also be
reduced, which in turns decreases the effect of the activation
function. Therefore, a less complex function will be fit to the
data, effectively reducing overfitting.

HTTPS://TOWARDSDATASCIENCE.COM

65
DROPOUT

Consider a rate 𝜃𝜃, 0 < 𝜃𝜃 < 100


Randomly select and eliminate 𝜃𝜃% of your ANN’s nodes after
each training epoch and evaluation.

Does it really work? Yes, it does.

HTTPS://TOWARDSDATASCIENCE.COM

66
Dropout

• Using Dropout, the neural network cannot


rely on any input node.
• So, the neural network will be reluctant to
give high weights to certain features,
because they might disappear.
• Consequently, the weights are spread across
all features, making them smaller.
• This effectively shrinks the model and
regularizes it.
• We usually apply dropout on hidden layers.

https://towardsdatascience.com
67
Learning
𝜕𝜕𝜕
• Δ𝑤𝑤 = Γ + Α + Β = −𝜂𝜂 + α∆𝑤𝑤𝑛𝑛−1 + 𝛽𝛽𝛽𝛽
𝜕𝜕𝜕𝜕
• Γ = 𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔𝑔 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
• A = momentum term
• B = random term, r= random Gaussian
• 𝜂𝜂, α, 𝛽𝛽 = 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐

68
Batch Normalization
• Batch normalization is a technique for training deep neural
networks that normalizes the contributions to a layer for
every mini-batch. This has the impact of settling the
learning process and drastically decreasing the number of
training epochs required to train deep neural networks.
• Any modification of weights, changes many things inside
your network.
• Batch normalization provides an elegant way of
reparametrizing almost any deep network. The
reparametrization significantly reduces the problem of
coordinating updates across many layers.

69
Batch Normalization
• For any mini-batch of samples during the training, for any
input or hidden layer 𝐻𝐻𝑚𝑚 , normalize the output over the
mini-batch using

𝐴𝐴 𝐻𝐻𝑚𝑚 − 𝜇𝜇[𝐴𝐴 𝐻𝐻𝑚𝑚 ]
𝐴𝐴 𝐻𝐻𝑚𝑚 =
𝜎𝜎[𝐴𝐴 𝐻𝐻𝑚𝑚 ]
Before sending that to the next layer 𝐻𝐻𝑚𝑚+1 .
• A is the activation function.
• So, considering a complete mini-batch, we normalize the
output of a layer, before forward passing that to the next
layer.

70
YOUR COMPUTER, NO GPU

71
Fine Tuning and Transfer Learning

• Fine-tuning, in general, means making small adjustments to


a process to achieve the desired output or performance. Fine-
tuning deep learning involves using weights of a previous
deep learning algorithm for training another similar deep
learning process.
• Transfer learning, A neural network model is first trained on
a problem similar to the problem that is being solved. One or
more layers from the trained model are then used in a new
model trained on the problem of interest.

72
FINE TUNING AND TRANSFER
LEARNING

WWW.MATHWORKS.COM

73
Fine Tuning and Transfer Learning
• In transfer learning for deep neural networks, it's common to apply transfer learning
on earlier layers rather than the last layers of the network. Transfer learning
involves taking a pre-trained model (often trained on a large dataset for a related
task) and fine-tuning it for a specific task or dataset of interest.
• The reason for fine-tuning earlier layers is based on the idea that the early layers of
a deep neural network capture general features and patterns that are often
transferable across different tasks and domains. These layers learn low-level
features like edges, textures, and basic shapes, which tend to be similar in many
different types of data.
• In deep learning fine-tuning, the extent to which the parameters of early layers
versus the last layers are changed can vary depending on several factors, including
the specific problem, the architecture of the neural network, and the amount of
available training data. There is no fixed rule, and it depends on how you configure
the fine-tuning process. However, in many transfer learning scenarios, the early
layers of a pre-trained model tend to undergo less change compared to the later
layers.

74
Deep Neural Network Models

Convolutional Generative
Neural Adversarial
Networks Networks

Recurrent
Long-Short
Term
Memory

75
LONG SHORT-TERM MEMORY,
APPLICATIONS
Protein Time series
Robot control homology anomaly
detection detection

Business
Time series Sign language
process
prediction translation
management

Speech Action Prediction in


Drug design
recognition recognition medical care

Rhythm Semantic Short-term


OCR
learning parsing traffic forecast

Music Grammar Object Airport


composition learning Segmentation management

76
LONG SHORT-TERM MEMORY,
CASE STUDIES

77
Applications: DL for CV

Image
Object Detection Tracking
Classification

Deep
Convolutional Regional CNN Discriminative
Neural Networks (RCNN) tracker (DSST)
(CNN)

Siamese trackers
and segmentation

78
DL for Image Classification: Cat
or Dog?
Python program, Open Kaggle
Dataset has been
using Tensorflow cats & dogs
cleaned before
and Keras image data set

Configure the
Build the deep
training and Show the images
model
validation sets

Train the deep


Validate that
model

Deep5.ipynb 79
CAT OR DOG EXAMPLE

DEEP5.IPYNB

80
CAT OR DOG EXAMPLE

DEEP5.IPYNB

81
CAT OR DOG EXAMPLE

Results after 5 training epochs:

DEEP5.IPYNB

82
AlexNet Structure

Schematic of the ZFNet architecture. This schematic is very similar to that for AlexNet.
Notice that AlexNet contains 7 hidden layers whereas ZFNet contains 8 hidden layers (these
figures count S layers as parts of the corresponding C layers). Also, note that ZFNet is
implemented using only a single GPU and its architecture is unsplit. 83
VGGNet Architecture

84
VGGNet Architecture

• Note that the convolution layers all have unit stride, and that
their input fields are limited to a maximum size of 3 × 3: the
subsampling layers all have 2 × 2 input fields and 2×2 strides.

85
SqueezeNet
SqueezeNet is another popular deep-learning model designed
specifically for image classification tasks with a focus on reducing
the model’s size and computational requirements. It was developed
by researchers from UC Berkeley and DeepScale in 2016.

Compact Model Size


Key Features of SqueezeNet Efficient Parameter Reduction

SqueezeNet’s architecture is designed to deliver a lightweight yet


powerful model by focusing on efficient parameter use. This makes it
an excellent choice for scenarios where memory, power, or processing
capacity is a limiting factor.

86
SqueezeNet

Initial Convolution Layer


• 7x7 Convolution

Fire Modules
• Squeeze Layer (1x1 Convolution)
• Expand Layer (1x1 and 3x3 Convolutions)
Max Pooling Layers

Final Convolution Layer


• 1x1 Convolution

Global Average Pooling Layer

Softmax Output Layer

87
SqueezeNet

88
INCEPTION V.3

The Inception v3 model is a popular


deep learning model designed for Inception v3 is widely used in real-
image classification tasks. It was world image classification problems
developed by researchers at Google and transfer learning. Due to its
and is part of the Inception family of pretrained weights on large datasets
models, which focus on improving like ImageNet, it can serve as a
the efficiency and accuracy of powerful feature extractor for other
convolutional neural networks custom image datasets.
(CNNs).

89
Architecture

Inception v.3 • Improved version of the earlier


Inception models
• Multiple "Inception modules“:
parallel convolutional layers with
different filter sizes (e.g., 1x1,
3x3, 5x5).
• Factorized convolutions

Improvement Features

• Factorized Convolutions
• Auxiliary Classifiers
• Label Smoothing

90
Initial Convolution and Pooling Layers
Inception v.3 • Conv2D Layers

Factorized Convolutions
• 1x1 and 3x3 Convolutions

Inception Modules

Auxiliary Classifier

Reduction Modules

Global Average Pooling Layer

Dense Layer and Softmax Output

91
Inception v.3

92
Inception v.3

93
IMPORTANT POINTS

Error
Regularization and
Backpropagation, Momentum
Dropout
how it works

Convolutional Deep
Training Underfitting and Neural Networks
Algorithms Overfitting for image
classification

Batch Perceptron and Perceptron and


Normalization MLP features MLP training

94
1. Main:
• E. R. Davies and M.
Turk (ed), Advanced-
Methods-and-Deep-
Learning-in-Computer-
Vision, E1st ed., 2021,
References Elsevier (Ch 1, 2, and 9)

95
2. Auxiliary:
• Deep learning, Yann LeCun, Yoshua
Bengio & Geoffrey Hinton, Nature,
volume 521, pages 436–444 (2015)
• Learn Keras for Deep Neural
Networks, Jojo Moolayil, 2019
• https://keras.io/examples/vision/
• Samira Pouyanfar et al., A Survey
on Deep Learning: Algorithms,
References Techniques, and Applications. ACM
Comput. Surv. 51, 5, September
2018,
https://doi.org/10.1145/3234150
• Neural Networks and Learning
Machines, 3rd Ed., by Simon S.
Haykin.
• Deep Learning, by Ian Goodfellow
et al., 2016.

96
THAT’S IT …

Thank You!

Any Question?

97

You might also like