Week8 - Machine Learning
Week8 - Machine Learning
Week8 - Machine Learning
What is machine learning?
Classification Feature
/interpretation extraction
What is machine learning
Images of bananas Images of oranges
elongated? spherical?
no Model
yes yes
others Query image
bananas oranges
Heuristics based Learning based
Why machine learning
• Machine learning is often considered when it is very
challenging for human experts to derive explicit
Why machine learning
• Examples:
o Face recognition (note that this is different to face detection)
̶ What features to use?
o Email Spam and Malware Filtering
̶ A large list of rules
o Disease diagnosis
̶ Lung cancer, for example (widened
mediastinum? Reduced vascularity? …)
Types of machine learning
• Supervised learning Whether or not trained with
• Unsupervised learning human labelled data
• Semi-supervised learning ()
• Reinforcement learning (keywords: agent, policy, reward, penalty…)
Supervised vs unsupervised learning
• Training data and labels are provided – supervised learning
• Only training data (but not labels) are provided – unsupervised learning
Feature 1
Feature 1 (e.g. colour)
Feature 2
Feature 2 (e.g. shape)
Unsupervised learning Supervised learning Class 1
Class 2
Supervised learning
• K Nearest Neighbours (KNN)
• Artificial Neural Network (Multilayer Perceptron)
Feature 1
Class 1
Class 2
Feature 2 Unknown
Limitations of KNN
• Large dataset and/or high dimensionality limit efficiency
• “Curse of dimensionality”
Hidden layer
𝑗 ,𝑘
Output layer
flattens Input
𝑖, 𝑗
• An example – forward path
𝑥2 8
4 8 = 0.5
= 1.0 24 = 0.2
0 1 𝑥3 0 = 7.0
a2 = 2.0 ???
𝑥 4 = 4.0
0 5 1 = 0.5
a3 Output layer
𝑥5 0
𝑥6 5
24*0.2 + a *2.0 + a *0.5
• What happens next?
o Update weights until the difference between the output and the label
(i.e. the ground truth) is zero (or small enough), or does not decrease.
o A loss function is used to measure this difference. Training a model is
the process of minimising this loss function (Gradient descent,
• What happens next?
o When the weights (hyper-parameters) are fixed, the model is trained
o For each query image, use these hyper-parameters to perform forward
path calculation to determine the output (i.e. prediction)
Adding e.g.
nonlinear function
• Linear models may struggle to represent complex problems
• Adding nonlinearities
activation function
𝑗 ,𝑘 Hidden layer
Output layer
𝑖, 𝑗
• What does it mean when the weights represented by red lines have zero values?
𝑘, 𝑗 Hidden layer
Output layer
a1 =0
x5 𝑊
𝑗 ,𝑖
• Learning of weights
0 1 3 ? ? ?
2 200 5
⊗ ? ? ? …
7 10 4 ? ? ?
Nonlinear activation
𝑎 × 𝑎
a ≠ b?
Refer to the convolution lecture, hint: use of padding
a = b?
Layers in a CNN
Nonlinear activation
another set of filters (6)
𝑎 × 𝑎 4
1. Convolutional layers
2. Nonlinear layers/Activation layers
Layers in a CNN
3. Pooling layers: progressively reduce the spatial size of the representation to
reduce the amount of parameters and computation in the network
1 3 2 7
6 2 6 5 6 7
1 5 7 2 8 9
4 8 5 9
e.g. max pooling with a 2×2 window and a stride of 2
Layers in a CNN
3. Pooling layers
• smaller number of parameters
• does not affect depth
𝑎 × 𝑎 4 6 6
𝑙𝑒𝑛𝑔𝑡h: 6 ∗𝑑 2
Layers in a CNN A high parameter count
𝑎 × 𝑎 4 6 6
Feature extraction and
classification in a CNN
𝑎 × 𝑎 4 6 6
• Raw pixel values
• More commonly seen:
handcrafted features
weights updating…
• Automatically
extracted by filters
• End-to-end
e.g. the VGG architecture
Deep learning
Eye centre localisation
• Eye tracking
• Human-computer interaction
• Psychology studies and medical applications
Directed Advertising
Eye centre localisation
Attention monitoring
3D face reconstruction
Shown in the video a (2D+3D) face being automatically classified as a YOUNG MALE. Real-time eye tracking allows the user to issue gaze gestures to
interact with the system. Personalised advertisements are being displayed, which are also being manipulated by gaze gestures
Eye morphology
• Gradient based voting
Zhang, W., Smith, M.L., Smith, L.N. and Farooq, A., 2016. Gender and gaze gesture recognition for
human-computer interaction. Computer Vision and Image Understanding, 149, pp.32-50.
Eye saccade analysis for dementia diagnosis
Input: Output:
(Image data)
(eye coordinates) (predicted coordinates)
𝑐 𝑦𝑙
𝑐 𝑥𝑟
𝑐 𝑦𝑟
Zhang, W. and Smith, M., 2019, July. Eye Centre Localisation with Convolutional Neural Network Based Regression.
In 2019 IEEE 4th International Conference on Image, Vision and Computing (ICIVC) (pp. 88-94). IEEE.
Visualising features/filters
• Not all pixels make an equal contribution
Can have physical meanings (e.g. face Data driven and learning based, therefore tailed to
landmarks) a specific problem
Can be explicitly modelled Do not require human experts to extract a given
Often do not require a large training set set of carefully chosen characteristics
Intuitive to visualise and analyse features Generate multiple levels of representation (e.g.
Dimensionality are often lower compared high-level and low-level) at the same time
to automatically extracted features Often generated by a single model rather than
following a multi-step process