AIS412 - Lecture 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 126

Fall 24

Deep Learning
for Computer
Vision
AIS412
MUSTAFA ELATTAR

*This course material is sourced from Carnegie Mellon


University for Computer Vision and Stanford University for the
CNN for Visual Recognition course.
What is computer vision?

AIS412 - DEEP LEARNING FOR COMPUTER VISION MUSTAFA ELATTAR 2


What is Computer Vision?

AIS412 - DEEP LEARNING FOR COMPUTER VISION MUSTAFA ELATTAR 3


Photo by Svetlana Lazebnik

What a person sees


What a computer sees
Photo by Svetlana Lazebnik

Why are we able to interpret this image?


The goal of computer vision is to give
computers
(super) human-level perception

7
typical perception pipeline
representation

‘fancy math’

output
typical perception pipeline
representation

what should we look


at? (image
features)

‘fancy math’

output what can we understand?


(semantic segmentation)
typical perception pipeline
representation

what should we look


at? (image
features)

easy to get lost in


the techniques

‘fancy math’

output what can we understand?


(semantic segmentation)
typical perception pipeline
representation

The parts that we are


what should we look
most interested in at? (image
features)

‘fancy math’

output what can we understand?


(semantic segmentation)
Important note:
In general, computer vision does not
work

12
Important note:
In general, computer vision does not
work
(except in certain situations/conditions)

13
Applications of computer vision

14
Machine Vision

Automated visual inspection


Object Recognition

Toshiba Tech IS-910T 2013

DataLogic LaneHawk LH4000 2012


Face detection

Age recognition

Sony Cyber-shot

Smile recognition
Face makeovers
BMW 5 series

BMW night vision


“Around view” camera
Infiniti EX
Image stitching
Photosynth
Tango
Ball Possession
Virtual Fitting
Deep Face
Deep Dream
Facebook video style transfer 2016
Industry aggressively hiring
CV faculty from universities
Industry aggressively
hiring CVgraduates, or
even students!
(strong dominant industrial presence
at conferences for recruitment)
ITCS Vision and Mission
Vision:
To be a world-class school, recognized as one of the top in
the region in research, education and entrepreneurship.

Mission:
The mission of the school is to contribute to the development
of cultural values and to information technology-driven
economies in the region through the pursuit of education,
research, innovation and entrepreneurship at the highest
levels of excellence.

ADD FOOTER HERE 33


AIS412 - Course Aim
The aim of this course is to provide students with a
comprehensive understanding of deep learning methods
and their practical applications in Computer Vision. By the
end of the course, students will have gained a solid
foundation in the basic components of deep learning
algorithms, and will be able to apply them to real-life
scenarios. The course aims to equip students with the skills
and knowledge necessary to contribute to cutting-edge
developments in the field of computer vision and deep
learning, and to inspire them to think creatively and
innovatively in their use of these methods.

ADD FOOTER HERE 34


Fall 24

AIS412 - ILOs
Explain the basic components and workings of deep learning algorithms, including CNNs,
A1
RNNs, attention mechanisms, encoder-decoder models, and generative models.
List and describe real-life use cases of deep learning, including voice, NLP, and vision
A2
applications.
Identify the limitations of traditional machine learning and the advantages of deep learning over
A3
machine learning.
Apply deep learning algorithms to real-life scenarios, including developing and training neural
B1
networks for various applications.
Investigate and analyze the performance of deep learning models and identify areas for
B2
improvement.
Combine and test different deep learning techniques to solve complex problems and explore
B3
creative applications.
C1 Design and suggest deep learning solutions for specific use cases and evaluate their effectiveness.
Present and report on the results of deep learning projects, including analysis of performance and
C2
areas for improvement.
Link and judge the ethical and social implications of deep learning, including issues of bias and
C3
fairness in algorithmic decision-making.
D1 Collaborate with peers in developing and testing deep learning solutions.
D2 Apply entrepreneurial skills to develop and explore creative applications of deep learning.

ADD FOOTER HERE 35


Fall 24

AIS412 - Syllabus
Date Topics Lab Assignments
Topic 1: Course Introduction
30 Sep
Topic 2: Advanced Image processing - Hough transforms
Topic 2: Advanced Image processing - Hough transforms
7 Oct Lab 1
Topic 3: Feature Detection - Corner detection
Topic 3: Feature Detection - Corner detection
14 Oct Lab 2 Assignment 1
Topic 4: Feature Detection - Feature descriptors
21 Oct Topic 5: Stereo Vision Lab 3 Assignment 1 Deadline
Topic 6: Motion Tracking: Optical flow (LK, HS) - Tracking (KLT, Mean-Shift)
28 Oct Lab 4 Assignment 2
Topic 7: Image Registration: Correspondence Finding
4 Nov Topic 8: Object Features Learning: K-means, Bag of words Lab 5 Assignment 2 Deadline
11 Nov Midterm Week
Topic 9: Classification, Loss Functions and Optimization
18 Nov Topic 10: Convolutional Neural Networks Lab 6 Project registered for each five students
Topic 11: Training Neural Networks
Topic 12: CNN Architectures (VGG, GoogLeNet, ResNet, etc)
25 Nov Lab 7 Assignment 3
Topic 13: Recurrent Neural Networks
2 Dec Topic 14: Detection and Segmentation Lab 8 Assignment 3 Deadline
9 Dec Topic 15: Generative Models Lab 9
16 Dec
23 Dec Backup Week
30 Dec
ADD FOOTER HERE
Project Submission, Discussion, and Oral Presentation 36
6 Jan Study Week
AIS412 – Grading Scheme and Resources
Grading Policy:
● Attendance 3%
● 3 Assignments 18% (3*6)
● Tutorial and Lab 14% (9*1.5)
● Quizzes 5%
● Project 15%
● Midterm 15%
● Final 30%
Handouts:
● Lectures + Labs
● Textbook
● Computer Vision: Algorithms and Applications, 2nd ed. by Richard
Szeliski
● Deep Learning by Ian Goodfellow

ADD FOOTER HERE 37


Team
Instructor:
Dr. Mustafa A. Elattar
Associate Professor
melattar@nu.edu.eg
Office: 210
Office hours: Tuesday 8:30 to 10:00

TA:
Eng. Aly Abdelmegeid
alymohamed@nu.edu.eg
Office: 220
Office hours: Monday 11:00 to 1:00

ADD FOOTER HERE 38


AIS412
Lecture 2: Hough
Transform
MUSTAFA ELATTAR

*This course material is sourced from Carnegie Mellon


University for Computer Vision and Stanford University for the
CNN for Visual Recognition course.
Hough transform

LECTURE 2: HOUGH TRANSFORM AND CORNER DETECTION 40


Slide Credits
Most of these slides were adapted from:

• Kris Kitani (15-463, Fall 2016).

Some slides were inspired or taken from:

• Fredo Durand (MIT).

• James Hays (Georgia Tech).

LECTURE 2: HOUGH TRANSFORM AND CORNER DETECTION 41


Lecture Overview

Finding boundaries

Line fitting

Line parameterizations

Hough transform

Hough circles

TEACH A COURSE 42
Finding Boundaries

43
Where are the object boundaries?
Human annotated boundaries
Edge detection
Multi-scale edge detection
Edge strength does not necessarily correspond to our
perception of boundaries
Where are the object boundaries?
Human annotated boundaries
Edge detection
Defining boundaries
are hard for us too
Where is the boundary of the mountain top?
Applications

Autonomous Vehicles tissue engineering behavioral genetics


(lane line detection) (blood vessel counting) (earthworm contours)

Autonomous Vehicles Computational Photography


(semantic scene segmentation) (image inpainting)
Line Fitting

55
Line fitting
Given: Many pairs

Find: Parameters

Minimize: Average square distance:

Using:

Note:

What are some problems with the approach?


Problems with parameterizations
Where is the line that minimizes E?

Huge E!
Problems with parameterizations
Where is the line that minimizes E?

Line that minimizes E!!


Problems with noise

Least-squares error fit Squared error heavily penalizes outliers


Model fitting is difficult because…
• Extraneous data: clutter or multiple models
– We do not know what is part of the model?
– Can we pull out models with a few parts from much larger
amounts of background clutter?
• Missing data: only some parts of model are present
• Noise

• Cost:
– It is not feasible to check all combinations of features by
fitting a model to each possible subset

So what can we do?


Line parameterizations

61
Slope intercept form

slope y-intercept
Double intercept form

x-intercept y-intercept

Derivation:

(Similar slope)
Normal Form

Derivation:

plug into:
Hough transform

65
Hough transform
• Generic framework for detecting a parametric model

• Edges don’t have to be connected

• Lines can be occluded


• Key idea: edges vote for the possible models

66
Image and parameter space
variables variables

parameters parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters

What would a point in image space


become in parameter space?

Image space
Image and parameter space
variables variables

parameters parameters

a point
becomes a
line

Image space Parameter space


Image and parameter space
variables variables

parameters parameters

two points
become
?

Image space Parameter space


Image and parameter space
variables variables

parameters parameters

two points
become
?

Image space Parameter space


Image and parameter space
variables variables

parameters parameters

three points
become
?

Image space Parameter space


Image and parameter space
variables variables

parameters parameters

three points
become
?

Image space Parameter space


Image and parameter space
variables variables

parameters parameters

four points
become
?

Image space Parameter space


Image and parameter space
variables variables

parameters parameters

four points
become
?

Image space Parameter space


How would you find the best fitting line?

Image space Parameter space

Is this method robust to measurement noise?


Is this method robust to outliers?
Line Detection by Hough Transform

Al g o r i t h m:

1 . Quant i ze P aramet er Space

2 . C r e a t e Accumulator Array Parameter Space

3. Set
1 1

4 . F o r e a c h image e d g e 1
1 1
1

For each element i n 2

If l i e s on t h e l i n e :
1 1
1 1

Increment 1 1

5 . F i n d l o c a l maxima i n
Problems with parameterization
How big does the accumulator need to be for the parameterization ?

1 1
1 1
1 1
2
1 1
1 1
1 1
Problems with parameterization
How big does the accumulator need to be for the parameterization ?

1 1
1 1
1 1
2
1 1
1 1
1 1

The space of m is huge! The space of c is huge!


Better Parameterization
Use normal form:

Given points find

Image Space
Hough Space Sinusoid

?
(Finite Accumulator Array Size)

Hough Space
Image and parameter space
variables parameters

parameters variables

a point
becomes?

Image space Parameter space


Image and parameter space
variables parameters

parameters variables

a point
becomes a
wave

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes?

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes?

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables

parameters
Wait …why is rho negative?

a line
becomes a
point

Image space Parameter space


Image and parameter space
variables
same line through the point

parameters a line becomes a point

Image space Parameter space


There are two ways to
write the same line:

Positive rho version:

Negative rho version:

Recall:
Image and parameter space
variables
same line through the point

a parameters
line becomes a point

Image space Parameter space


Image and parameter space
variables

parameters

two points
become
?

Image space Parameter space


Image and parameter space
variables

parameters

three points
become
?

Image space Parameter space


Image and parameter space
variables

parameters

four points
become
?

Image space Parameter space


Implementation

NOTE: Watch your coordinates. Image origin is top left!


Image space Votes
Basic shapes
(in parameter space)

can you guess the shape?


Basic shapes
(in parameter space)

line rectangle circle


More complex image
In practice, measurements are noisy…

Image space Votes


Too much noise …

Image space Votes


Effects of noise level
Number of votes for a line of 20 points with increasing noise

15

Maximum number of votes

10

5
0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
Noise level

More noise, fewer votes (in the right bin)


Effects of noise level
12

Maximum number of votes


9

0
20 40 60 80 100 120 140 160 180 200
Number of noise points

More noise, more votes (in the wrong bin)


Real-world example

Original Edges parameter space Hough Lines


Hough Circles

108
Let’s assume radius known
parameters parameters

variables variables

What is the dimension of the parameter space?


parameters parameters

variables variables

Image space Parameter space

What does a point in image space correspond to in parameter space?


parameters parameters

variables variables
parameters parameters

variables variables
parameters parameters

variables variables
parameters parameters

variables variables
What if radius is unknown?
parameters parameters

variables variables

If radius is not known: 3D Hough Space!

UseAccumulator array

Surface shape in Hough space is


complicated
Using Gradient Information
Gradient information can save lot of
computation:

Edge Location
Edge Direction

Assume radius is known:

Need to increment only one point in accumulator!


parameters parameters

variables variables
parameters parameters

variables variables
Pennie Hough detector Quarter Hough detector
Pennie Hough detector Quarter Hough detector
The Hough transform
Deals with occlusion well?

Detects multiple instances?

Robust to noise?

Good computational complexity?

Easy to set parameters?


Application of Hough
transforms

123
Detecting shape features

F. Jurie and C. Schmid, Scale-invariant shape features for


recognition of object categories, CVPR2004
Original
images

Laplacian circles Hough-like circles

Which feature detector is more consistent?


Robustness to scale and clutter

You might also like