0% found this document useful (0 votes)
31 views

(Slide) Multi Task Learning

The document discusses multi-task learning in computer vision. It introduces different multi-task learning architectures, including encoder-focused approaches that share features in the encoding stage like hard parameter sharing, soft parameter sharing, and cross-stitch networks.

Uploaded by

vu.le.bui.quoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views

(Slide) Multi Task Learning

The document discusses multi-task learning in computer vision. It introduces different multi-task learning architectures, including encoder-focused approaches that share features in the encoding stage like hard parameter sharing, soft parameter sharing, and cross-stitch networks.

Uploaded by

vu.le.bui.quoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

AI VIETNAM

All-in-One Course

Module 10 - Project

Multi-Task Learning

AI VIET NAM
Nguyen Quoc Thai

1
Year 2023
Objectives
! Multi-task Learning for Computer Vision

Task 1 Training Data Model


Generalization

Feature-based MTL
Task 2 Training Data Model
Parameter-based MTL Generalization

Task 3 Training Data Model


Generalization
2
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment

3
Introduction
! Single-Task Learning

Ø Image Classification

MODEL Class: CAT


(LeNet, ResNet,…)

4
Introduction
! Single-Task Learning

Ø Image Segmentation

0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 0
0 1 1 1 0 2 2 0

MODEL 0 1 1 1 0 2 2 0

(UNet) 0 1 1 1 2 2 2 0
0 1 1 1 1 2 2 0
1 1 1 1 1 2 2 0
0 0 0 0 0 0 0 0

DOG CAT
5
Introduction
! Single-Task Learning

Ø Object Detection

DOG – 0.98 CAT – 0.87

MODEL
(UNet)

Assign labels, bounding boxes


to objects in the image
6
Introduction
! Single-Task Learning

Task 1 Training Data Model


Training Generalization

Task 2 Training Data Model


Training Generalization

Task 3 Training Data Model


Training Generalization
7
Introduction
! Multi-Task Learning

Task 1 Training Data Model


Generalization

Task 2 Training Data Model


Training Generalization

Task 3 Training Data Model


Generalization
8
Introduction
! Motivation

Ø Learning multiple tasks jointly with the aim of mutual benefit


Ø Improves generalization on other tasks
Caused by the inductive bias provided by the auxiliary task

9
Introduction
! Multi-Task Learning

Task 1 Training Data Model


Generalization

What to Share?
Task 2 Training Data Model
Generalization
How to Share?

Task 3 Training Data Model


Generalization
10
Introduction
! MTL Methods (based on what to share?)

Ø Feature-based MTL
o Aims to learn common features among different tasks
Ø Parameter-based MTL
o Learns model parameters to help learn parameters for other tasks
Ø Instance-based MTL
o Identify useful data instances in a task for other task

11
Introduction
! MTL Methods (based on how to share?)

Ø Feature-based MTL
o Feature learning approach
o Deep learning approach
Ø Parameter-based MTL
o Low-Rank approach

12
Introduction
! Feature Learning Approach

Ø Why need to learn common feature representations?


o Original features may not have enough expressive power
Ø Two sub-categories
o Feature transformation approach
o Feature selection approach

13
Introduction
! Feature Learning Approach

Ø Feature transformation approach


o The learned features are a linear or nonlinear transformation of the original
feature representation
o Multi-task feedforward NN

Input 1 Output for task 1

Input d Output for task 2

14
Introduction
! Feature Learning Approach

Ø Feature selection approach


o Select a subset of the original features as the learned representation
o Eliminates useless features based on different criteria

15
Introduction
! Low-Rank Approach

Ø Assumes the model parameters of different


tasks share a low-rank subspace

16
Introduction
! Deep Learning Approach

Ø Deep Multi-Task Architectures


o Encoder-Focused
o Decoder-Focused
Ø Optimization Strategy Methods
o Task Balancing
o Other: Heuristics, Gradient Sign Dropout

17
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment

18
Deep Multi-Task Architectures
! Deep Multi-Task Architectures used in Computer Vision

Deep Multi-Task
Architectures

Encoder-Focused Decoder-Focused Other

MTL Baseline PAD-Net


Cross-Stitch Networks ASTMT
PAP-Net
NDDR-CNN MTI-Net
MTAN

19
Deep Multi-Task Architectures
! Encoder-Focused

Ø Share the task features in the encoding stage

Task A Task B Task C

Task specific

Shared Encoder
(Soft/Hard)

20
Deep Multi-Task Architectures
! Encoder-Focused

Ø Hard Parameter Sharing


o Generally applied by sharing the hidden layers between all tasks
o Keep several task-specific output layers

Task A Task B Task C

Task specific

21
Deep Multi-Task Architectures
! Encoder-Focused

Ø Soft Parameter Sharing


o Each task has its own model with its own parameters
o Uses a linear combination in every layer of the task-specific networks

Task A Task B Task C

Task specific

22
Deep Multi-Task Architectures
! Encoder-Focused

Ø Cross-Stitch Networks
o Shared the activations amongst all single-task networks in the encoder

Task A Task B Task A Task B

+ 𝛼 𝛼 +

Share Parameters

23
Deep Multi-Task Architectures
! Encoder-Focused

Ø Cross-Stitch Networks
o Shared the activations amongst all single-task networks in the encoder
o Cross connection

Task A Task B Task A Task B

+ 𝛼 𝛼 + + Conv Conv +

Conv Conv Conv Conv


24
Deep Multi-Task Architectures
! Encoder-Focused

Ø Multi-Task Attention Networks


o Used a shared backbone network in conjunction with task-specific attention
modules in the encoder
Task B Task C

Task specific

Shared Encoder Attention Module Attention Module

Attention Module Attention Module

25
Deep Multi-Task Architectures
! Decoder-Focused

Task A Task B Task C

Task A Task B Task C Task specific

Shared Encoder
(Soft/Hard)

26
Deep Multi-Task Architectures
! Decoder-Focused

Ø PAD-Net
o Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous
Depth Estimation and Scene Parsing

27
Deep Multi-Task Architectures
! Decoder-Focused

Ø PAD-Net
o Deep Multimodal Distillation

28
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment

29
Optimization Strategy
! Task Balancing Approaches

Ø Set a unique weight for each task

ℒ!"# = # 𝑤$ . ℒ$
$

Ø Use SGD to minimize the objective


𝜕ℒ$
𝑊%&'()* = 𝑊%&'()* − 𝛾 # 𝑤$
𝜕𝑊%&'()*
$

30
Optimization Strategy
! Uncertainty Weighting

Ø Use the homoscedastic uncertainty to balance the single-task losses


Ø Optimize the model weights W and noise parameters

1 1
ℒ W, σ+ , 𝜎, = , ℒ+ 𝑊 + , ℒ, 𝑊 + log 𝜎+ 𝜎,
2𝜎+ 2𝜎,

31
Optimization Strategy
! Dynamic Weight Averaging (DWA)

Ø Learns to average task weighting over time by considering the rate of change of loss
for each task
Training Time Relative Loss Change

r- t − 1
N exp T L.(t − 1)
w- t = , r. t − 1 =
r t−1 L.(t − 2)
∑. exp .
T

Temperature
(Softness of Task Weighting)

32
Optimization Strategy
! Other methods

Ø Gradient Normalization
Ø Dynamic Task Prioritization

33
Quiz

34
Outline
Ø Introduction
Ø Deep Multi-Task Architectures
Ø Optimization Strategy
Ø Experiment

35
Experiment
! NYUD-v2 Dataset

36
Experiment
! Model

Task A Task B Task C Task A Task B Task C

Hard Parameter Sharing Soft Parameter Sharing

37
Experiment
! Code

38
Summary

Deep Multi-Task
Optimization Strategy
Architectures

Encoder-Focused Decoder-Focused Other Task Balancing

MTL Baseline PAD-Net Uncertainty Weighting


Cross-Stitch Networks ASTMT Gradient Normalization
PAP-Net
NDDR-CNN MTI-Net DWA
MTAN DTP

39
Thanks!
Any questions?

40

You might also like