Pedestrian Detection - Kristina Pickl

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Pedestrian Detection

Kristina Pickl
Motivation

basic question:
stopping or crossing?

• guarantee safety
• prevent accidents

→ pedestrian-detection system (PDS)


Overview
• Basics of a PDS
• Different Imageries
• Single Optical Camera Approach
• Multi Camera Approaches
• Two Camera Approach by BMW
• Focused PDS
• Conclusion
Overview
 Basics of a PDS
– Layout of a PDS
– Classifier
• Different Imageries
• Single Optical Camera Approach
• Multi Camera Approaches
• Two Camera Approach by BMW
• Focused PDS
• Conclusion
Layout of a PDS
PDS

feature
extraction classifier training

• well selection • well structured • large number of


→ ensures efficiency • real-time high-quality
• appearance or performance positive and
motion negative samples
(rare cases: both) (all same size)
Classifier
• performance depends on
– features
– quality of samples

• if badly designed or trained


– either
• low positive rate
• low false positive rate
• high detection speed
– or
• high positive rate
• low false positive rate
• but no real-time detection speed
Overview
• Basics of a PDE
 Different Imageries
– Video Camera
– Infrared Camera
– Single Camera Approach
– Camera Combination
• Single Optical Camera Approach
• Multi Camera Approaches
• Two Camera Approach by BMW
• Focused PDS
• Conclusion
Video Camera

• catches light properties of a


scene
• long-distance detection
• can also determine trees,
animals, etc. after training
• maybe already in the car (e.g.
lane departure warning)
• low cost
Infrared Camera

• catches thermal properties of


a scene
• heat dependent
• high contrast, even in sparsely
lit situations
• can be used during night-time
• can not distinct between
different objects (e.g. animals,
humans)
Single Camera Approach

Video Image Features Infrared Image Features


• component-based • thermal hotspots
gradient responses • body-model templates
• image contours with • shape-independent
mean field models multidimensional
• implicit shape models histograms
• local receptive fields • inertial and contrast base
features
• histograms of oriented
gradients
Camera Combination
hard to reach acceptable
detection rate & speed

nearly contrary information necessary, e.g. consider


→ hard to find common driving during night-time
denominator

Why?
precise depth
information not yet
possible, too less most standard stereo
information algorithms fail
Overview
• Basics of a PDS
• Different Imageries
 Single Optical Camera Approach
– Detection Procedure
– Module 1: Recognition
– Module 2: Distance & Direction
– Training
– Results
• Multi Camera Approaches
• Two Camera Approach by BMW
• Focused PDS
• Conclusion
Detection Procedure

Module 1:
Recognise
human body

Module 2:
Estimate Distance
& Direction
Module 1: Recognition
1) intercept region of interest (ROI) Input: 2 sequential frames
2) constant Z (0<Z<1), value N=0
N
zoom-factor: Z
for (N=0; N<7;N++) {
a) intercept 2 images in sliding window (from
top left to bottom right) from zoomed ROI
b) extract appearance & motion features 
(shifting & subtracting image techniques)

 i) compute key features of sliding window
Cascaded Classifier

ii) calculate 10 values of classification function


with corresponding weights Output: information of pedestrian
iii) if sum ≥ threshold → pedestrian
about 100
 i) compute key features of candidate
ii) calculate f(x)=sgn (∑ support vectors)
iii) if f(x) = 1 → pedestrian
otherwise remove
}
Module 1: Recognition
• Why cascaded classifier?
– statistical-learning classifier was very fast with low false
negative but high false positive rate
– decomposed SVM classifier had high positive and low false
positive rate but slower

1. statistical-learning classifier scans very quickly,


reduces amount
2. decomposed SVM classifier performs accurate
classification of remaining objects
Module 2: Distance & Direction
1. Distance Estimation
according to zoom scale
2. Direction Identification
i) small scale weighted template tree
– template = typical human shape
obtained from real traffic
– build with coevolutionary algorithm
(large search space, self-adaptive
subpopulation size)
– typically 4000, less than 50 chosen
ii) distance-transform (DT) algorithm
– selects 30 representative templates
and matches the pedestrian by
scaling similarity between 2 binary
images
– results in sum of DT values
(small value: similar)
Training

Statistical-Learning Classifier Decomposed SVM Classifier


motion

appearance
Results
• sample pairs
– 3 600 positive & 3 000 negative (manually generated)
– 1 000 000 negative for classifier training (automatically)
• detection performance
– depends on traffic condition
– template size: 32x16
– suitable for 60 km/h and 25 m distance
• estimation error
– pedestrian 12 m away: around 2,5 m
– Pedestrian 18 m away: around 5 m
• shelter ratio has more influence than distance

►precise depth estimation not possible!


Overview
• Basics of a PDS
• Different Imageries
• Single Optical Camera Approach
 Multi Camera Approaches
– Disparity Based Obstacle Detection
– Result
– Trifocal Framework
– Results
– Reduction to Two Cameras
– Matching
– Results
• Two Camera Approach by BMW
• Focused PDS
• Conclusion
Disparity Based Obstacle Detection
• input
– 2 cameras (IR & colour) to test
which has better results
• stereo based detection
– reliable estimation of 3D position
Dense Stereo Matching

• produces disparity estimates


• correspondence-matching
algorithm
• can be used for colour and IR
images
Disparity Image Generation
• disparity image
– histogram with density of disparity
values for each column or row
• u-disparity image (top)
– 3 horizontal lines: pedestrians
– top horizontal line: background
plane
• v-disparity image (bottom)
– front peak: pedestrians
– back peak: background plane
– downward-sloping trend: estimated
ground plane
Ground Plane Estimation

• v-disparity image
– for each column: select lowest
pixel location as candidate
ground plane point
– line fit: weighted least squares
and bisquare weighting
function
Bounding Box Generation
• ROI
– u-disparity image
scan rows for continuous spans where
disparity is above threshold
– v-disparity image
select columns where sum of disparity
above ground plane exceeds threshold
• Bounding Box
– widths of bounding box
ROI of u-disparity image
– heights of bounding boxes
ROI of v-disparity image
Candidate Filtering and Merging
overlapping bounding-boxes
(if pedestrian close to camera
disparities span range of values)
→ merge boxes if disparities close
together
Results
• very accurate detection
• low false positive rate

BUT images only contained fully visible pedestrians and


no other objects
→ not sufficient in real-world driving scenarios
• additional filtering necessary (if non-pedestrian and pedestrian
bounding box overlap)
• more robust bound features (e.g. size, disparity, aspect ratio are
not sufficient enough)

→ combination of video and IR camera to obtain better


disparity features
Trifocal Framework
• combines benefits of colour, disparity and IR image features
• 2 colour, 1 IR camera
trifocal tensor
disparity based • set of matrices relating to
obstacle detection correspondence between
3 images
• estimation: minimisation
of algebraic error of point
calculate element correspondence
histogram • normally 7 point-to-point,
gradient in practice more for error
width х height х
orientation smoothing
Results
• combinations
- colour + IR outperformed by colour trained
- colour + disparity detector

• suspected reason: gradient based features not


suitable for distinguishing in low-contrast disparity and IR
images
• adding only disparity or IR just adds more noise

• nevertheless best performance: colour + disparity + IR

→ anticipate accuracy profit: using more dissimilar features


Reduction to Two Cameras

- cross-spectral stereo approach:


1 colour, 1 IR camera
- optical-flow techniques for
motion
- foreground-matching
Matching
1) for a given column i: fix window on foreground of 1st, slide over 2nd
image at column i+d
2) maximise mutual information between 2 correspondence windows
→ choose best disparity at each pixel → entry in disparity voting
matrix
3) good match → for single person at certain distance: large number
of votes for single disparity value
4) best disparity value has corresponding confidence value
5) high confidence → mutual information maximised for large number
of correspondence windows → value more likely to be accurate

“normal” disparity image, IR foreground pixels, colour foreground pixels


Results
• suitable for low-speed
(~15 km/h)
• useful in
– parking lots
– residential and shopping
areas
– starting and stopping at
traffic signals
• detects static objects via
long-term tracking
• could be adapted to
higher speeds
Overview
• Basics of a PDS
• Different Imageries
• Single Optical Camera Approach
• Multi Camera Approaches
 Two Camera Approach by BMW
– Setup
– Multi Spectral Stereo Algorithm
– Epipolar Constraint
– Classification
– Results
• Focused PDS
• Conclusion
Setup
Multi Spectral Stereo Algorithm
• based on Active Contour Models (contour features)
• aim of energy minimum
– internal energies: depend on model shape
(continuous and smooth curve)
– external energies: depend on image content
(gradient magnitude or illuminance)
• stable state: no more change in model outline
Multi Spectral Stereo Algorithm
Remarks
– exclude image gradient magnitudes with
directions more than a threshold different from
active contour model gradient
(strong background gradients could be included in active
contour)
– gradient vector flow: extend the gradient map
away from edges to homogeneous regions
(active contours have to be guided towards boundary
concavities e.g. head-body region )
Epipolar Constraint
• restrict entire search space
• set of models cover restricted
search space
• obtained by back-projection
between max. and min.
distance
• distance of pedestrian
deduced via triangulation
(might be inaccurate, solve
minimisation problem with
singular value decomposition)
Classification
• starting ellipse selected due to thermal hotspots
in IR image
• feature set based on normalised histograms of
oriented gradient descriptors
• for classification support vector machine trained
with feature vectors
Results
• best performance if pedestrian is centred
surrounded by small margin
• real-time performance on dual core with 2.16
GHz (calculation of one model)
• calculation time
– 10% gradient image
– 55% active contour model
– 15% triangulation
– 20% classification
Overview
• Basics of a PDS
• Different Imageries
• Single Optical Camera Approach
• Multi Camera Approaches
• Two Camera Approach by BMW
 Focused PDS
– Pedestrian – But What’s Next?
– Enhanced PDS
– Example
• Conclusion
Pedestrian – But What’s Next?
so far
detection with no
correspondence to
surrounding environment

dangerous position safe position


 alarm driver or brake in case of dangerous position or if system fails
Enhanced PDS
1) assess scenario
laser scanner system
2) locate critical areas
3) search for pedestrian
only in critical areas monocular vision system
→ reduce computational time

Requirements
• quick detection
• as soon as they
appear
Example: 50 km/h, 40 m
1. classification (only to size and shape)
• possible pedestrian
• road border
• L-shaped obstacle
• moving obstacle
• generic obstacle
2. classification
• moving (little overlapping)
→ discarded
• static (much overlapping)
→ important
• changing shape (overlapping in some position provided by previous scans
regions) mobile obstacles
→ basic importance static obstacles
• new (no corresp. between old and new) changing shape obstacles
→ vision system left and right border of risk area
Overview
• Basics of a PDS
• Different Imageries
• Single Optical Camera Approach
• Multi Camera Approaches
• Two Camera Approach by BMW
• Focused PDS
Conclusion
Conclusion
• Enhanced PDS
– important for future development
– replace laser scanner system by low cost system
– include more typical urban traffic scenarios

• Future Work
– hardware software co-designed PDS
– resolution improvement of camera
– include higher speeds, e.g. motorway
– interaction with rest of the car (collision avoidance management)
– integrate driving dynamic
– integrate tracing of pedestrians
– intelligent infrastructures
Thank you for your attention.

Any questions?

You might also like