0% found this document useful (0 votes)
30 views

Introduction To Face Processing With Computer Vision

This document provides an introduction to face processing with computer vision. It discusses the theory behind key tasks like face detection, recognition, and other tasks. It covers early approaches using hand-crafted features and recent deep learning methods. State-of-the-art models like Faster R-CNN, MTCNN, and RetinaFace are described. The document also discusses open challenges around cross-factor recognition, security, privacy, and other tasks like alignment and 3D reconstruction. It emphasizes the many tools and APIs available for rapid prototyping of face processing systems at scale.

Uploaded by

Arohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Introduction To Face Processing With Computer Vision

This document provides an introduction to face processing with computer vision. It discusses the theory behind key tasks like face detection, recognition, and other tasks. It covers early approaches using hand-crafted features and recent deep learning methods. State-of-the-art models like Faster R-CNN, MTCNN, and RetinaFace are described. The document also discusses open challenges around cross-factor recognition, security, privacy, and other tasks like alignment and 3D reconstruction. It emphasizes the many tools and APIs available for rapid prototyping of face processing systems at scale.

Uploaded by

Arohan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 82

Introduction to

Face Processing with Computer Vision


Gabriel Bianconi
Founder, Scalar Research
AI & Data Science Consulting Firm

Previously at the Stanford AI Lab


Agenda
• Theory
• Detection
• Recognition
• Other Tasks

• Practice
• Rapid Prototyping
• Scaling

3
Theory
4
Face Detection

5
Haar-Like Features
• Summarize image based on simple color patterns
• Manually determined feature extractors (kernels)

• Leveraged for first real-time face detector (2001)

Ref: Viola & Jones (2001). Image: Wikimedia 6


7
8
Histogram of Oriented Gradients (HOG)
• Summarize image by distribution of color gradients
• Gradient intensities and orientations represent edges, etc.
• Captures more information than simple Haar-like features

Ref: Shu et al. (2011). 9


Ref: Shu et al. (2011) 10
Ref: Shu et al. (2011) 11
Ref: Rojas et al. (2011) 12
Ref: Rojas et al. (2011) 13
R-CNN
• Introduces CNNs for object detection
• CNNs learn how to extract features from data
• Breakthrough in performance
• Beats previous SOTA methods by huge margin
• However, detection is extremely slow

Ref: Girshick et al. (2014). 14


CNN Features

Ref: Lee et al. (2009). 15


CNN Features

Ref: Lee et al. (2009). 16


CNN Features

Ref: Lee et al. (2009). 17


CNN Features

Ref: Lee et al. (2009). 18


R-CNN

Ref: Girshick et al. (2014). 19


Fast R-CNN
• Improvement to R-CNN that leverages CNN for
classification and regression
• Other than proposing regions, system is now end-to-end vs. three
components trained greedily.
• Predictions are 200x+ faster with better performance
• Region proposals still are a bottleneck; total inference time is ~2s.

Ref: Girshick (2015). 20


Fast R-CNN

Ref: Girshick (2015). 21


Faster R-CNN
• Leverages CNN for region proposals as well
• “Region Proposal Network”
• Finally an end-to-end system with deep learning
• About 10x faster than Fast R-CNN, with better performance
• Total inference time is ~0.2s

Ref: Ren et al. (2016). 22


Faster R-CNN

Ref: Ren et al. (2016). 23


MTCNN
• Many model for face detection draw heavily from
the generalized object detection methods.
• MTCNN, for example, trains a multi-task system for
detection and alignment.

Ref: Zhang et al. (2015). 24


MTCNN

Ref: Zhang et al. (2015). 25


RetinaFace
• The current SOTA method combines many
techniques such as multi-task learning
• R-CNN family uses a two-stage approach
(proposals → refinement)
• RetinaFace uses a single-stage approach (faster,
higher recall, more false positives)

Ref: Deng et al. (2019). 26


RetinaFace

Ref: Deng et al. (2019). 27


Are we there yet?

WIDER Face (Easy) WIDER Face (Medium) WIDER Face (Hard)

~97% AP ~96% AP ~92% AP

Ref: Yang (2016). 28


Facial Recognition

29
Facial Recognition
• Facial recognition actually corresponds to group of
different tasks.
• Verification vs. Identification vs. Grouping vs. …
• Closed-Set vs. Open-Set

30
Closed-Set Recognition
• Every identity appears in training set
• Example: recognizing celebrities
• Effectively a classification problem
• Model aims to learn separable features

31
Closed-Set Identification

Test Sample Model Label Confidences

Label 0 Label 1 …

… …
Images: Wikimedia 32
Closed-Set Verification

Test Sample A Label Confidences


Model

Test Sample B Label Confidences

Images: Wikimedia 33
Open-Set Recognition
• Not every identity appears in training set
• Example: Facebook Photos
• Effectively a metric learning problem
• Model aims to learn large-margin features (embeddings)

34
Embeddings
• Map each sample to a vector (coordinate system)
• Used for words, graphs, faces, etc.
• Embeddings preserve similarity
• Similar samples close to each other
• Dissimilar samples far from each other

35
Images: Wikimedia 36
Embeddings
• “Similar” depends on the training data
• Same person, physical characteristic, etc.
• Embeddings represent latent information
• High-dimensional embeddings trained on large datasets
learn to represent latent information about the person (e.g.
physical characteristics)

37
Open-Set Identification

Test Sample Model Embedding + Distance

Emb. 0 Emb. 1 Emb. 2 …

Images: Wikimedia 38
Open-Set Verification

Test Sample A Embedding A


Distance
Model vs.
Threshold
Test Sample B Embedding B

Images: Wikimedia 39
Metric Learning

Ref: Liu et al. (2018) 40


Are we there yet?

LFW (Labeled Faces in the Wild)


Verification

99.85%+ accuracy

Ref: Yan et al. (2019); Learned-Miller et al. (2016) 41


Cross-Factor
Facial Recognition

42
Cross-Age

Ref: Zheng et al. (2017) 43


Cross-Pose

Ref: Li et al. (2011) 44


Cross-Makeup

Ref: Chen et al. (2013) 45


Further Research

46
Security
• How do we deal with adversarial users?
• Real face goes undetected or misclassified
• Fake face gets recognized
• Private data is extracted from model
•…

47
Security

Ref: Grigory Bakunov (2017) 48


Biometrics & Multi-Modal Data
• How do we deal with…
• Identical twins?
• Plastic surgery?
• ...

Ref: Singh et al. (2010) 49


Ref: Singh et al. (2010) 50
Biometrics & Multi-Modal Data
• Combine with other biometric data
• Biometric traits (e.g. hand)
• Multiple sensors (e.g. 2D + 3D)
• Multiple pictures (e.g. viewpoints, sequences)
•…

Ref: Singh et al. (2010); Ross & Jain (2004); Ross & Govindarajan(2005) 51
Ref: Apple 52
Privacy
• How do we deal with…
• Models that can predict gender, race, …?
• Models that leak the data?
• Predictions without sharing the raw data?
•…

Ref: Singh et al. (2010) 53


Other Tasks

54
Alignment & Pose Estimation

Ref: Ruiz et al. (2018) 55


Face Landmarks

56
Classification

Neutral

Happy
Happy

57
3D Reconstruction

Ref: Sela et al. (2017) 58


Practice
59
Rapid Prototyping

60
61
accuracy

ac
e
a ce
aF tF
in gh
t si
Re In
N
N et
N
TC ce
M Fa
n
tio
ni
Dozens of Tools

V
nC c og
pe re
O
c e_
fa


simplicity
APIs
• There are dozens of APIs providing low-cost face
processing at scale
• Most services charge less than $1 per 1000 images
• Depending on the use case, might be cheaper than provisioning GPUs
and deploying your own models (esp. if considering developer time)

• Often these APIs can achieve performance that’s


close to state-of-the-art

62
APIs – Example: Azure
• Detection
• Classification
• Gender, age, emotion, hair, smile, eyes, glasses, makeup, …
• Landmarks
• Pose Estimation
• Recognition
• Verification, identification, grouping, similarity search, …

63
Embeddings
• Face embeddings are typically used for open-set
recognition systems
• They can be leveraged to quickly train models for
downstream tasks (e.g. classification)
• Tools
• face_recognition (Github): extremely fast, reliable for frontal
• FaceNet: based on deep learning, strong across the board
64
Example – Facebook Photos
• Task: open-set face identification
• Strategy:
1. Detect faces and compute embeddings for known photos
of users; store for future use.
2. Whenever a photo is uploaded, do the same and compare
against known set.

65
Example – Detection

import face_recognition as fr

image = fr.load_image_file("file.jpg")

face_locations = fr.face_locations(image)

Ref: github.com/ageitgey/face_recognition 66
Example – Embedding

image = fr.load_image_file("file.jpg")

face_embedding = fr.face_encodings(image)[0]

Ref: github.com/ageitgey/face_recognition 67
Example – L2 Distance

- 0.31 0.59 0.69

0.31 - 0.52 0.63

0.59 0.52 - 0.50

0.69 0.63 0.50 -


Images: WikiMedia 68
Face Landmarks
• Face landmarks can also be quickly extracted with
pretrained models and used for a number of
downstream tasks.

69
Example – Face Landmarks

face_landmarks = fr.face_landmarks(image)[0]

print(face_landmarks.keys())
# left_eyebrow, right_eyebrow, lower_lip, top_lip, …

Ref: github.com/ageitgey/face_recognition 70
71
Example – Snapchat Filters
• Task: face manipulation
• Strategy:
1. Detect face and localize landmarks in image
2. Add objects, reshape image, etc. based on landmarks

72
Example – Snapchat Filters
from PIL import Image, ImageDraw

pil_image = Image.fromarray(image)
d = ImageDraw.Draw(pil_image, 'RGBA’)
lip_fill = (150, 0, 0, 128) # shade of red, 50% alpha
d.polygon(face_landmarks['top_lip'], fill=lip_fill)
d.polygon(face_landmarks['bottom_lip'], fill=lip_fill)

73
Scaling

75
Bias
• People & Demographics
• Is your training set… Coworkers? Single location?
• Environment
• Does it cover… Day and night? Seasons? Lighting
conditions? Backgrounds?
• Sensors
• Did you consider… Diverse hardware? Calibration?
Viewpoint (angle)? Resolution? Occlusion?
76
Optimizations
• It is often easier to simplify the real-world task than
drastically improve ML models.

77
Optimizations

Multiple model optimizations


($$$ in developer time, etc.)
Performance

Time (weeks)

78
Optimizations

Install a new light


($)
Performance

Time (weeks)

79
Risks
• What happens when your model makes a mistake?
• How can you deal with adversarial users?
• What is your threat model?

80
Other Considerations
• How do you handle…
• Model getting stale over time?
• Growing search space?
• Large amounts of real-time data?
• Detecting or tracking people vs. faces?
• Speed vs. cost vs. performance trade-offs?

81
Thank you.
gabriel@scalarresearch.com

82

You might also like