Week8 - Machine Learning
Week8 - Machine Learning
Week8 - Machine Learning
22/03/2021
What is machine learning?
Interaction
Classification Feature
/interpretation extraction
What is machine learning
Images of bananas Images of oranges
elongated? spherical?
no Model
yes yes
others Query image
bananas oranges
Orange!
Heuristics based Learning based
Why machine learning
• Machine learning is often considered when it is very
challenging for human experts to derive explicit
instructions.
Why machine learning
• Examples:
o Face recognition (note that this is different to face detection)
̶ What features to use?
o Email Spam and Malware Filtering
̶ A large list of rules
o Disease diagnosis
̶ Lung cancer, for example (widened
mediastinum? Reduced vascularity? …)
Types of machine learning
• Supervised learning Whether or not trained with
• Unsupervised learning human labelled data
• Semi-supervised learning ()
• Reinforcement learning (keywords: agent, policy, reward, penalty…)
…
Supervised vs unsupervised learning
• Training data and labels are provided – supervised learning
• Only training data (but not labels) are provided – unsupervised learning
Feature 1
Feature 1 (e.g. colour)
Feature 2
Feature 2 (e.g. shape)
Unsupervised learning Supervised learning Class 1
Class 2
Unknown
Supervised learning
• K Nearest Neighbours (KNN)
• Artificial Neural Network (Multilayer Perceptron)
Feature 1
Class 1
Class 2
Feature 2 Unknown
Limitations of KNN
• Large dataset and/or high dimensionality limit efficiency
• “Curse of dimensionality”
Hidden layer
𝑊
𝑗 ,𝑘
Output layer
flattens Input
𝑊
𝑖, 𝑗
MLP
• An example – forward path
𝑥1
4
𝑎1
𝑥2 8
4 8 = 0.5
= 1.0 24 = 0.2
0 1 𝑥3 0 = 7.0
a2 = 2.0 ???
𝑥 4 = 4.0
0 5 1 = 0.5
a3 Output layer
𝑥5 0
𝑥6 5
24*0.2 + a *2.0 + a *0.5
MLP
• What happens next?
o Update weights until the difference between the output and the label
(i.e. the ground truth) is zero (or small enough), or does not decrease.
o A loss function is used to measure this difference. Training a model is
the process of minimising this loss function (Gradient descent,
Backpropagation)
Images
INPUT
Labels
MLP
• What happens next?
o When the weights (hyper-parameters) are fixed, the model is trained
o For each query image, use these hyper-parameters to perform forward
path calculation to determine the output (i.e. prediction)
Adding e.g.
nonlinear function
nonlinearity
• Linear models may struggle to represent complex problems
X)
• Adding nonlinearities
activation function
𝑊
𝑗 ,𝑘 Hidden layer
Output layer
f
Input
𝑊
𝑖, 𝑗
Question
• What does it mean when the weights represented by red lines have zero values?
𝑊
𝑘, 𝑗 Hidden layer
Output layer
a1 =0
Input
a2
=0
=0
a3
x5 𝑊
𝑗 ,𝑖
• Learning of weights
0 1 3 ? ? ?
2 200 5
⊗ ? ? ? …
7 10 4 ? ? ?
CONV
4
Nonlinear activation
𝑎 × 𝑎
e.g.
a ≠ b?
Refer to the convolution lecture, hint: use of padding
a = b?
Layers in a CNN
CONV
Nonlinear activation
…
another set of filters (6)
6
𝑎 × 𝑎 4
…
1. Convolutional layers
2. Nonlinear layers/Activation layers
Layers in a CNN
3. Pooling layers: progressively reduce the spatial size of the representation to
reduce the amount of parameters and computation in the network
1 3 2 7
6 2 6 5 6 7
1 5 7 2 8 9
4 8 5 9
e.g. max pooling with a 2×2 window and a stride of 2
Layers in a CNN
3. Pooling layers
• smaller number of parameters
• does not affect depth
Pooling
𝑎 × 𝑎 4 6 6
𝑣𝑒𝑐𝑡𝑜𝑟
𝑙𝑒𝑛𝑔𝑡h: 6 ∗𝑑 2
Layers in a CNN A high parameter count
𝑎 × 𝑎 4 6 6
Feature extraction and
classification in a CNN
𝑎 × 𝑎 4 6 6
Features:
• Raw pixel values
• More commonly seen:
handcrafted features
weights updating…
Features:
• Automatically
extracted by filters
• End-to-end
e.g. the VGG architecture
Applications
Deep learning
Eye centre localisation
• Eye tracking
• Human-computer interaction
• Psychology studies and medical applications
Directed Advertising
Eye centre localisation
Attention monitoring
3D face reconstruction
Shown in the video a (2D+3D) face being automatically classified as a YOUNG MALE. Real-time eye tracking allows the user to issue gaze gestures to
interact with the system. Personalised advertisements are being displayed, which are also being manipulated by gaze gestures
Eye morphology
• Gradient based voting
Zhang, W., Smith, M.L., Smith, L.N. and Farooq, A., 2016. Gender and gaze gesture recognition for
human-computer interaction. Computer Vision and Image Understanding, 149, pp.32-50.
Eye saccade analysis for dementia diagnosis
Input: Output:
(Image data)
(eye coordinates) (predicted coordinates)
𝑐
𝑥𝑙
𝑐 𝑦𝑙
𝑐 𝑥𝑟
…
𝑐 𝑦𝑟
…
Zhang, W. and Smith, M., 2019, July. Eye Centre Localisation with Convolutional Neural Network Based Regression.
In 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC) (pp. 88-94). IEEE.
Visualising features/filters
• Not all pixels make an equal contribution
Can have physical meanings (e.g. face Data driven and learning based, therefore tailed to
landmarks) a specific problem
Can be explicitly modelled Do not require human experts to extract a given
Often do not require a large training set set of carefully chosen characteristics
Intuitive to visualise and analyse features Generate multiple levels of representation (e.g.
Dimensionality are often lower compared high-level and low-level) at the same time
to automatically extracted features Often generated by a single model rather than
following a multi-step process