(Slide) Multi Task Learning
(Slide) Multi Task Learning
All-in-One Course
Module 10 - Project
Multi-Task Learning
AI VIET NAM
Nguyen Quoc Thai
1
Year 2023
Objectives
! Multi-task Learning for Computer Vision
Feature-based MTL
Task 2 Training Data Model
Parameter-based MTL Generalization
3
Introduction
! Single-Task Learning
Ø Image Classification
4
Introduction
! Single-Task Learning
Ø Image Segmentation
0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 1 0 2 2 0
MODEL 0 1 1 1 0 2 2 0
(UNet) 0 1 1 1 2 2 2 0
0 1 1 1 1 2 2 0
1 1 1 1 1 2 2 0
0 0 0 0 0 0 0 0
DOG CAT
5
Introduction
! Single-Task Learning
Ø Object Detection
MODEL
(UNet)
9
Introduction
! Multi-Task Learning
What to Share?
Task 2 Training Data Model
Generalization
How to Share?
Ø Feature-based MTL
o Aims to learn common features among different tasks
Ø Parameter-based MTL
o Learns model parameters to help learn parameters for other tasks
Ø Instance-based MTL
o Identify useful data instances in a task for other task
11
Introduction
! MTL Methods (based on how to share?)
Ø Feature-based MTL
o Feature learning approach
o Deep learning approach
Ø Parameter-based MTL
o Low-Rank approach
12
Introduction
! Feature Learning Approach
13
Introduction
! Feature Learning Approach
14
Introduction
! Feature Learning Approach
15
Introduction
! Low-Rank Approach
16
Introduction
! Deep Learning Approach
17
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
18
Deep Multi-Task Architectures
! Deep Multi-Task Architectures used in Computer Vision
Deep Multi-Task
Architectures
19
Deep Multi-Task Architectures
! Encoder-Focused
Task specific
Shared Encoder
(Soft/Hard)
20
Deep Multi-Task Architectures
! Encoder-Focused
Task specific
21
Deep Multi-Task Architectures
! Encoder-Focused
Task specific
22
Deep Multi-Task Architectures
! Encoder-Focused
Ø Cross-Stitch Networks
o Shared the activations amongst all single-task networks in the encoder
+ 𝛼 𝛼 +
Share Parameters
23
Deep Multi-Task Architectures
! Encoder-Focused
Ø Cross-Stitch Networks
o Shared the activations amongst all single-task networks in the encoder
o Cross connection
+ 𝛼 𝛼 + + Conv Conv +
Task specific
25
Deep Multi-Task Architectures
! Decoder-Focused
Shared Encoder
(Soft/Hard)
26
Deep Multi-Task Architectures
! Decoder-Focused
Ø PAD-Net
o Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous
Depth Estimation and Scene Parsing
27
Deep Multi-Task Architectures
! Decoder-Focused
Ø PAD-Net
o Deep Multimodal Distillation
28
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
29
Optimization Strategy
! Task Balancing Approaches
ℒ!"# = # 𝑤$ . ℒ$
$
30
Optimization Strategy
! Uncertainty Weighting
1 1
ℒ W, σ+ , 𝜎, = , ℒ+ 𝑊 + , ℒ, 𝑊 + log 𝜎+ 𝜎,
2𝜎+ 2𝜎,
31
Optimization Strategy
! Dynamic Weight Averaging (DWA)
Ø Learns to average task weighting over time by considering the rate of change of loss
for each task
Training Time Relative Loss Change
r- t − 1
N exp T L.(t − 1)
w- t = , r. t − 1 =
r t−1 L.(t − 2)
∑. exp .
T
Temperature
(Softness of Task Weighting)
32
Optimization Strategy
! Other methods
Ø Gradient Normalization
Ø Dynamic Task Prioritization
33
Quiz
34
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment
35
Experiment
! NYUD-v2 Dataset
36
Experiment
! Model
37
Experiment
! Code
38
Summary
Deep Multi-Task
Optimization Strategy
Architectures
39
Thanks!
Any questions?
40