May 06 Course 15 Notes Computational Photo Full

Download as pdf or txt
Download as pdf or txt
You are on page 1of 315

Computational Photography

SIGGRAPH 2006 Course 15 Notes

Organizers

Ramesh Raskar
Senior Research Scientist
MERL - Mitsubishi Electric Research Labs
201 Broadway
Cambridge, MA 02139
Email: raskar@merl.com
http://www.merl.com/people/raskar/

Jack Tumblin
Assistant Professor
Computer Science Dept., Northwestern University
1890 Maple Avenue, Evanston IL 60201,
Email: jet@cs.northwestern.edu
http://www.cs.northwestern.edu/~jet

Course Web Page

http://www.merl.com/people/raskar/photo/

i
Contents

• Cover
o Course Description
o Prerequisites
o Course Schedule
o Lecturer Biographies
o Course Syllabus

• Course Slides

• Papers
• Code
• Bibliography

Acknowledgements
We would like to thank Amit Agrawal for his help on several aspects of this course and for
providing the pseudo-code included in these notes.

We would also like to thank the following for providing us their slides and videos which have
been extensively used throughout this course.

Stanford University: Bennett Wilburn, Vaibhav Vaish, Ren Ng

Washington University: Aseem Agrawala

MERL/Brown University: Morgan McGuire

University of Pennsylvania: Jianbo Shi

ii
Course Description
Digital photography is evolving rapidly with advances in electronic sensing, processing
and storage. The emerging field of computational photography attempts to exploit the
cheaper and faster computing to overcome the physical limitations of a camera, such as
dynamic range, resolution or depth of field, and extend the possible range of
applications. The computational techniques encompass methods from modification of
imaging parameters during capture to modern image reconstruction methods from the
captured samples.

Many ideas in computational photography are still relatively new to digital artists and
programmers although they are familiar with photography and image manipulation
techniques. A larger problem is that a multi-disciplinary field that combines ideas from
computational methods and modern digital photography involves a steep learning curve.
For example photographers are not always familiar with advanced algorithms now
emerging to capture high dynamic range images, but image processing researchers
face difficulty in understanding the capture and noise issues in digital cameras. These
topics, however, can be easily learned without extensive background. The goal of this
course is to present both aspects in a compact form.

The new capture methods include sophisticated sensors, electromechanical actuators


and on-board processing. Examples include adaptation to sensed scene depth and
illumination, taking multiple pictures by varying camera parameters or actively modifying
the flash illumination parameters. A class of modern reconstruction methods is
emerging. The methods can achieve a ‘photomontage’ by optimally fusing information
from multiple images, improve signal to noise ratio and extract scene features such as
depth edges. The course briefly reviews fundamental topics in digital imaging and then
provides a practical guide to underlying techniques beyond image processing such as
gradient domain operations, graph cuts, bilateral filters and optimizations.

The participants learn about topics in image capture and manipulation methods for
generating compelling pictures for computer graphics and for extracting scene
properties for computer vision, with several examples. We hope to provide enough
fundamentals to satisfy the technical specialist without intimidating the curious graphics
researcher interested in photography.

Prerequisites
A basic understanding of camera operation and image processing is required.
Familiarity with concepts of linear systems, convolution, and machine vision will be
useful.

Photographers, digital artists, image processing programmers and vision researchers


using or building applications for digital cameras or images will learn about camera
fundamentals and powerful computational tools, along with many real world examples.

iii
Course Schedule
A.1 Introduction (Raskar, 5 mins)
A.2 Concepts in Computational Photography (Tumblin, 15 mins)
A.3 Understanding Film-like Photography (Tumblin, 10 mins)
A.4 Image Processing Tools (Raskar, 10 mins)
Q&A (5 minutes)

B.1 Image Reconstruction Techniques (Tumblin, 20 mins)


B.2 Computational Camera : Convergence of Optics and Software (Nayar, 35 minutes)
Q&A (5 minutes)
Break (15 mins)

C.1 Computational Imaging in the Sciences (Levoy, 35 minutes)


Q&A (5 minutes)

D.1 Computational Illumination (Raskar, 15 mins)


D.2 Smart Optics, Modern Sensors and Future Cameras (Raskar, 20 mins)

E.1 Panel Discussion and Q&A (Nayar, Levoy,


Raskar, Tumblin 30
mins)

Topics not covered: film cameras, advanced optics, traditional image processing, image based
rendering (IBR) and novel view synthesis, lighting and illumination hardware technologies,
camera assisted projector display systems, geometric calibration and photometric calibration
techniques, compression, storage, photo-editing software packages, file formats and standards.

Course Web Page http://www.merl.com/people/raskar/photo/

iv
Lecturer Biographies

Marc Levoy
Marc Levoy is an Associate Professor of Computer Science and Electrical
Engineering at Stanford University. He received his PhD in Computer
Science from the University of North Carolina in 1989. In the 1970's Levoy
worked on computer animation, developing an early computer-assisted
cartoon animation system. In the 1980's Levoy worked on volume rendering,
a family of techniques for displaying sampled three-dimensional functions,
such as CT and MR data. In the 1990's he worked on technology and
algorithms for 3D scanning. This led to the Digital Michelangelo Project, in
which he and a team of researchers spent a year in Italy digitizing the statues
of Michelangelo using laser rangefinders. His current interests include light
field sensing and display, computational imaging, and digital photography.
Levoy received the NSF Presidential Young Investigator Award in 1991 and
the SIGGRAPH Computer Graphics Achievement Award in 1996 for his work
in volume rendering.

http://graphics.stanford.edu/~levoy/

Shree Nayar
Shree K. Nayar received his PhD degree in Electrical and Computer
Engineering from the Robotics Institute at Carnegie Mellon University in 1990.
He heads the Columbia Automated Vision Environment (CAVE), which is
dedicated to the development of advanced computer vision systems. His
research is focused on three areas; the creation of novel vision sensors, the
design of physics based models for vision, and the development of algorithms
for scene understanding. His work is motivated by applications in the fields of
digital imaging, computer graphics, and robotics. Professor Nayar has
received best paper awards at ICCV 1990, ICPR 1994, CVPR 1994, ICCV
1995, CVPR 2000 and CVPR 2004. He is the recipient of the David and
Lucile Packard Fellowship (1992), the National Young Investigator Award
(1993), the NTT Distinguished Scientific Achievement Award (1994), and the
Keck Foundation Award for Excellence in Teaching (1995). He has published
over 100 scientific papers and has several patents on inventions related to
vision and robotics.
http://www.cs.columbia.edu/~nayar/

v
Ramesh Raskar
Ramesh Raskar is a Senior Research Scientist at MERL. His research
interests include projector-based graphics, computational photography and
non-photorealistic rendering. He has published several articles on imaging
and photography including multi-flash photography for depth edge detection,
image fusion, gradient-domain imaging and projector-camera systems. His
papers have appeared in SIGGRAPH, EuroGraphics, IEEE Visualization,
CVPR and many other graphics and vision conferences. He was a course
organizer at Siggraph 2002 through 2005. He was the panel organizer at the
Symposium on Computational Photography and Video in Cambridge, MA in
May 2005 and taught a graduate level class on Computational Photography at
Northeastern University, Fall 2005. He is a member of the ACM and IEEE.

http://www.merl.com/people/raskar/

Jack Tumblin
Jack Tumblin is an assistant professor of computer science at Northwestern
University. His interests include novel photographic sensors and lighting
devices to assist museum curators in historical preservation, computer
graphics and visual appearance, and image-based modeling and rendering.
During his doctoral studies at Georgia Tech and post-doc at Cornell, he
investigated tone-mapping methods to depict high-contrast scenes. His MS in
Electrical Engineering (December 1990) and BSEE (1978), also from Georgia
Tech, bracketed his work as co-founder of IVEX Corp., (>45 people as of
1990) where his flight simulator design work was granted 5 US Patents. He
was co- organizer of a course on Computational Photography at Siggraph
2005. He is an Associate Editor of ACM Transactions on Graphics.

http://www.cs.northwestern.edu/~jet

vi
Copyright Information

The following copyright information appears on the first


page of each corresponding paper.

© 2006 IEEE. Reprinted, with permission, from (…).

© 2006 ACM Reprinted, with permission, from (…).

All trademarks are the property of their respective owners.


Papers and Photos are copyrighted and shall not be reproduced without
permission.

vii
Course 15: Computational Photography Course 15: Computational Photography

Course WebPage

http://www.merl.com/people/raskar/photo

Organisers
Course Evaluation
Ramesh Raskar
Mitsubishi Electric Research Labs
Jack Tumblin http://www.siggraph.org/courses_evaluation
Northwestern University

Course WebPage :
http://www.merl.com/people/raskar/photo

Welcome Speaker: Marc Levoy


Marc Levoy is an Associate Professor of Computer Science and
• Understanding Film-
Film-like Photography Electrical Engineering at Stanford University.

He received his PhD in Computer Science from the University of


– Parameters, Nonlinearities, Ray-
Ray-based concepts North Carolina in 1989. In the 1970's Levoy worked on computer
animation, developing an early computer-assisted cartoon animation
• Image Processing and Reconstruction Tools system. In the 1980's Levoy worked on volume rendering, a family
of techniques for displaying sampled three-dimensional functions,
such as CT and MR data. In the 1990's he worked on technology
– Multi-
Multi-image Fusion, Gradient domain, Graph Cuts and algorithms for 3D scanning. This led to the Digital Michelangelo
Project, in which he and a team of researchers spent a year in Italy
• Improving Camera Performance digitizing the statues of Michelangelo using laser rangefinders. His
current interests include light field sensing and display,
computational imaging, and digital photography. Levoy received the
– Better dynamic range, focus, frame rate, resolution NSF Presidential Young Investigator Award in 1991 and the
SIGGRAPH Computer Graphics Achievement Award in 1996 for his
• Future Directions work in volume rendering.

– HDR cameras, Gradient sensing, Smart optics/lighting


http://graphics.stanford.edu/~levoy/

Speaker: Shree Nayar Speaker: Ramesh Raskar


Shree K. Nayar is a Professor at Columbia University.
Ramesh Raskar is a Senior Research Scientist at MERL.
He received his PhD degree in Electrical and Computer
Engineering from the Robotics Institute at Carnegie Mellon
His research interests include projector-based graphics,
University in 1990. He heads the Columbia Automated Vision
computational photography and non-photorealistic rendering.
Environment (CAVE), which is dedicated to the development of
He has published several articles on imaging and photography
advanced computer vision systems. His research is focused on
including multi-flash photography for depth edge detection,
three areas; the creation of novel vision sensors, the design of
image fusion, gradient-domain imaging and projector-camera
physics based models for vision, and the development of
systems. His papers have appeared in SIGGRAPH,
algorithms for scene understanding. His work is motivated by
EuroGraphics, IEEE Visualization, CVPR and many other
applications in the fields of digital imaging, computer graphics,
graphics and vision conferences. He was a course organizer at
and robotics. Professor Nayar has received best paper awards
Siggraph 2002 through 2005. He was the panel organizer at
at ICCV 1990, ICPR 1994, CVPR 1994, ICCV 1995, CVPR
the Symposium on Computational Photography and Video in
2000 and CVPR 2004. He is the recipient of the David and
Cambridge, MA in May 2005 and taught a graduate level class
Lucile Packard Fellowship (1992), the National Young
on Computational Photography at Northeastern University, Fall
Investigator Award (1993), the NTT Distinguished Scientific
2005. He is a member of the ACM and IEEE.
Achievement Award (1994), and the Keck Foundation Award
for Excellence in Teaching (1995). He has published over 100
http://www.merl.com/people/raskar/raskar.html
scientific papers and has several patents on inventions related
to vision and robotics.
http://www.cs.columbia.edu/~nayar/

1
Speaker: Jack Tumblin Opportunities
– Unlocking Photography
Jack Tumblin is an Assistant Professor of Computer
Science at Northwestern University.
• How to expand camera capabilities
• Digital photography that goes beyond film-
film-like photography
His interests include novel photographic sensors to assist
museum curators in historical preservation, computer graphics – Opportunities
and visual appearance, and image-based modeling and • Computing corrects for lens, sensor and lighting limitations
rendering. During his doctoral studies at Georgia Tech and
post-doc at Cornell, he investigated tone-mapping methods to • Computing merges results from multiple images
depict high-contrast scenes. His MS in Electrical Engineering • Computing reconstructs from coded image samples
(December 1990) and BSEE (1978), also from Georgia Tech,
bracketed his work as co-founder of IVEX Corp., (>45 people • Cameras benefit from computerized light sources
as of 1990) where his flight simulator design work was granted
5 US Patents. He is an Associate Editor of ACM Transactions
– Think beyond post-
post-capture image processing
on Graphics, was a member of the SIGGRAPH Papers • Computation well before image processing and editing
Committee (2003, 2004), and in 2001 was a Guest Editor of
IEEE Computer Graphics and Applications. – Learn how to build your own camera-
camera-toys
http://www.cs.northwestern.edu/~jet

Computational  Photography: 
Computational  Photography: 
Traditional  Photography Optics, Sensors and Computations

Generalized
Detector Sensor

Lens
Generalized
Computations
Optics

Pixels Ray Reconstruction 4D Ray Bender


Upto 4D 
Ray Sampler

Image Picture

Computational Photography Computational Photography Novel Illumination


Light Sources

Novel Cameras Novel Cameras
Generalized Generalized
Sensor Sensor

Processing Generalized Processing Generalized


Optics Optics

2
Computational Photography Novel Illumination Computational Photography Novel Illumination
Light Sources Light Sources

Novel Cameras Novel Cameras
Generalized Generalized
Sensor Sensor

Processing Generalized Processing Generalized


Optics Optics

Display

Scene: 8D Ray Modulator Recreate 4D Lightfield Scene: 8D Ray Modulator

Computational Photography Novel Illumination
A Teaser: Dual Photography
Light Sources

Modulators Projector Photocell


Novel Cameras
Generalized
Generalized Optics
Sensor

Processing Generalized
Optics 4D Incident Lighting
Ray  4D Ray Bender
Reconstruction Upto 4D 
Ray Sampler
4D Light Field

Display

Scene
Recreate 4D Lightfield Scene: 8D Ray Modulator

A Teaser: Dual Photography A Teaser: Dual Photography


Projector Photocell Projector Photocell

Scene Scene

3
A Teaser: Dual Photography The 4D transport matrix:
Contribution of each projector pixel to each camera pixel
Projector Photocell
Camera projector camera

Scene scene

The 4D transport matrix: The 4D transport matrix:


Contribution of each projector pixel to each camera pixel Which projector pixel contribute to each camera pixel
projector camera projector camera

scene scene
Sen et al, Siggraph 2005 Sen et al, Siggraph 2005

Dual photography
A Brief History of Images 1558
from diffuse reflections

Camera Obscura,  Gemma Frisius, 1558

the camera’s view


Sen et al, Siggraph 2005

4
A Brief History of Images 1558 A Brief History of Images 1558
1568 1568

1837

Lens Based Camera Obscura, 1568 Still Life, Louis Jaques Mande Daguerre, 1837

A Brief History of Images 1558 A Brief History of Images 1558


1568 1568

1837 1837

Silicon Image Detector,  1970

1970 Digital Cameras 1970


1994

Dream of A New Photography

Old New
• People and Time ~Cheap Precious
• Each photo Precious Free
• Lighting Critical Automated*
• External Sensors No Yes
• ‘Stills / Video’ Disjoint Merged

• Exposure Settings Pre-select Post-Process


• Exposure Time Pre-select Post-Process
• Resolution/noise Pre-select Post-Process
• ‘HDR’ range Pre-select Post-Process

5
Course: Computational Photography

A2:
What is the Ideal
Photographic Signal?

Jack Tumblin
Northwestern University

What is Photography?

Safe answer:

A wholly new,
expressive medium
(ca. 1830s)

• Manipulated display of what we think, feel, want, …


– Capture a memory, a visual experience in tangible form
– ‘painting with light’; express the subject’s visual essence
– “Exactitude is not the truth.”
truth.” –Henri Matisse

1
What is Photography?
• A ‘bucket’ word: a neat container for messy notions
(e.g. aviation, music, comprehension)

• A record of what we see,


or would like to see,
in tangible form.
• Does photography
always capture it? no.

• So, what do we see?


Harold ‘Doc’ Edgerton 1936

‘Film-Like’ Photography
Film Camera designs still dominate:
– ‘Instantaneous’ light measurement…
– Of focal plane image behind a lens.
– Reproduce those amounts of light;
– EXACT MATCH!

Implied:
“What we see is ≅
focal-plane intensities.”
well, no…

2
Why we like Photography
PHYSICAL Exposure
PERCEIVED
Light & Control,
3D Scene Optics
Scene
light sources, tone map light sources,
BRDFs, BRDFs,
shapes, shapes,
Display

Vision
positions, Image positions,
movements, I(x,y,λ
I(x,y,λ,t) RGB(x,y,tn) movements,
… …
Eyepoint Eyepoint
position, position,
movement, movement,
projection, projection,
… Tangible Record …
Editable, storable as
Film or Pixels

What’s Beyond Film-Like Photography?


Thought Experiment:
Side-by-side digital camera & film camera.

• COMPARE:
– Digital Camera result.
– Digitized (Scanned) Film result.

? Can we See more, Do more, Feel more?


? Has photography really changed yet?

3
Goals for a New Photography
PERCEIVED
PHYSICAL or UNDERSTOOD
Light &
3D Scene Optics
light sources, 3?D Scene
BRDFs, light sources,

Computing
Sensor(s)
shapes, BRDFs,
Visual

Vision
positions, shapes,
positions,
movements, Stimulus movements,


Eyepoint
position, Eyepoint
movement, position,
projection, Tangible Record movement,
… projection,
estimates we can …
capture, edit, store, display
Meaning…

Missing: Dynamic Display, Interactive…


What other ways
better reveal
appearance to
human viewers?
(Without direct shape
measurement? )

Can you understand


this shape better?

Time for space wiggle. Gasparini, 1998.

4
Missing: More Revealing Sets of Rays
“Multiple-
Multiple-Center-
Center-of-
of-Projection Images”
Images” Rademacher,
Rademacher, P, Bishop, G., SIGGRAPH '98

Taking Images versus Taking Pictures


Image: A copy of light intensities.
(Just one kind of picture, made by copying a scaled map of
scene light intensities, as a lens might)

Visual Appearance: What we think we see.


(Consciously-
(Consciously-available estimates of our surroundings,
made from the light reaching our eyes)

Picture: A ‘container’ for visual appearance.


(something we make to hold what we see,
or what would like to see. Includes non-
non-photorealistic drawings)

5
Photographic Signal: Pixels Rays
• Core ideas are ancient, simple, seem obvious:
– Lighting: ray sources
– Optics: ray bending/folding devices
– Sensor: measure light
– Processing: assess it
– Display: reproduce it

• Ancient Greeks:
‘eye rays’ wipe the world
to feel its contents…
http://www.mlahanas.de/Greeks/Optics.htm
http://www.mlahanas.de/Greeks/Optics.htm

The Photographic Signal Path


Computing can improve every component:

Light Sources Sensors


Data Types,
Processing
Optics Optics

Rays
Display

“Scene”
Rays

6
Review: How many Rays are there?
(Gortler et al. ‘96) (Levoy et al. SIGG’96)

4-D set; infinitesimal members. Imagine:


– Convex Enclosure of a 3D scene
– Inward-facing ray camera at every surface point
– Pick the rays you need for ANY camera outside.
– 2D surface of cameras,
+ 2D ray set for each camera,
Æ 4D set of rays.

4-D Light Field / Lumigraph


Measure all the outgoing light rays.

7
4-D Illumination Field
Same Idea: Measure all the incoming light rays

4D x 4D = 8-D Reflectance Field


Ratio: Rij = (outgoing rayi) / (incoming rayj)

8
Is a 4-D Light Source Required?

[Debevec et al. 2000] [Masselus et al. 2002]


[Matusik et al. 2002]

[Debevec et al. 2002] [Masselus et al. 2003] [Malzbender et al. 2002]

Is A 4D Camera Required?
e.g. MIT Dynamic Light Field Camera 2002
Is this the whole answer?

• Multiple dynamic
Virtual Viewpoints
• Efficient Bandwidth usage:
‘send only what you see’
• Yang, et al 2002
• 64 tightly packed commodity
CMOS webcams
• 30 Hz, Scaleable, Real-time:

or is it just “more film-


film-like cameras, but now with computers!” ?

9
Or do Ray Changes Convey Appearance?
5 ray sets Æ explicit geometric occlusion boundaries

Ramesh Raskar, MERL, 2004

Or do Ray Changes Convey Appearance?


• These rays + all these rays give me…

• MANY more useful


details I can examine…

10
Mild Viewing & Lighting Changes;
Are these Enough?
Convicing visual appearance:
Is Accurate Depth really necessary?

a few good 2-
2-D images may be enough…

“Image jets, Level Sets,


and Silhouettes“
Lance Williams,
talk at Stanford, 1998.

‘The Ideal Photographic Signal’


I CLAIM IT IS:
All Rays? Some Rays? Changes in Rays

Photographic ray space is vast and boring.


>8 dimensions: 4D view, 4D light, time, λ,
Gather only ‘visually significant’ ray changes

? What rays should we measure ?


? How should we combine them ?
? How should we display them ?

11
Novel Illuminators
Future Photography: Lights

Novel Cameras Modulators

General Optics:
Generalized 4D Ray Benders

4D Ray Benders
General Optics:
Sensor

Ray Reconstructor
Generalized 4D Incident Lighting
Processing

ield
4D Ray
Sampler

tF
igh
DL
Novel Displays

d4
Generalized Display

we
Vie

Recreated 4D Light field Scene: 8D Ray Modulator

Beyond ‘Film-Like’ Photography


‘Computational Photography’;
To make ‘meaningful ray changes’ tangible,

• Sensors can do more…


• Displays can do more…
• Light Sources can do more…
• Optics can do more…
• Ray Descriptors can do more…
by applying low-cost storage,
computation, and control.

12
Course : Computational Photography

A3: Film-like Photography:


The Ray-Optics Model

Jack Tumblin
Northwestern University

‘Film-Like’ Photography

Film Camera designs still dominate:


– ‘Instantaneous’ light measurement…
– Of focal plane image behind a lens.
– Reproduce those amounts of light;
– Display ‘exactly matches’ the scene

Implied:
“What we see is ≅
focal-plane intensities.”
well, no…

1
‘Film-Like Photography’: Ray Model

Light + 3D Scene: Image:


Illumination, Planar 2D map of
shape, movement, light intensities
surface BRDF,…
Image Plane
I(x,y)

Position(x,y)
Angle(θθ,ϕ)
Angle(

‘Center of
Projection’
(P or P2 Origin)
3

Film-Like Photography
• Lighting: Ray Sources (external)

• Scene: Ray Modulator (external)

• Optics: Ray Benders Thin Lens Approx.

• Sensors: Ray Bundle Sensor Irradiance


Measurement E(x,y)

• Processing Ray Normalized E(x,y)

• Display Recreate Rays Normalized E(x,y)

2
Film-like Optics: ‘Thin Lens Law’

• Focal length f: where parallel rays converge


• Object at distance S1 forms image at S2
• Focus at infinity: Adjust for S2=f
Larger S2 for closer focus

http://www.nationmaster.com/encyclopedia/Lens-(optics)

Rays Are Doubly Differential

• Lens Systems: approximate rays with bundles


• Finite angle, not rays (lens aperture)
• Finite area, not points (circle of confusion)

http://www.nationmaster.com/encyclopedia/Lens-(optics)

3
Ray BUNDLES approximate Rays

• Rays are doubly infinitesimal!


– A ‘ray’ leaves a span of infinitesimal area 0+
– And covers a span of infinitesimal directions 0+

• Ray Bundles:
Finite, measurable power from combined rays
– A finite span of SOLID ANGLE, and
– A finite span of SURFACE AREA

Ray BUNDLES approximate Rays


• Rays are doubly infinitesimal!
A ‘ray’ Leaves 0+ area in 0+ directions

r
EXAMPLE:
• Power from 1 point
on a spherical lamp?
0/+ = 60Watts / (∞ points)
– BUT has a finite, measurable ratio: (flux/area)
(60Watts / 30 cm2 area) = 2 W / cm2

4
Ray BUNDLES approximate Rays
• Rays are doubly infinitesimal!
A ‘ray’ Leaves 0+ area in 0+ directions

EXAMPLE:
• Power from spherical lamp
in just 1 direction?
0+ = (60Watts / (∞ directions)
– BUT has finite ratio:
4π steradians)
(60Watts / 4π steradians) = 4.77 W / cm2

Ray Measurement: Radiance L


• Incoming light directions form hemisphere Ω;
Ray == one point on the hemisphere
THUS
‘Incident Rays’ measured in Radiance Units L:
Irradiance per unit solid angle
L = (watts / area) / sr
(sr = steradians;
steradians; solid angle;
= surface area on unit sphere) θi
Ω

5
Ray ‘Bundles’

• Rays have no surface area (just a point)


• Rays have no solid angle (just a point)
THUS:
• A Ray carries infinitesimal power (0+ Watts).
• Only BUNDLES of rays are measurable!

?How can estimate the


θi Ω
‘Photographic Signal’
when we can’t directly
measure it?

Lens Flaws: Depth of Focus


For the same focal length:
• Larger lens
– Gathers a wider ray bundle:
– More light: brighter image
– Shallower depth-of-focus

• Smaller lens
– Gathers a narrower ray
bundle:
– Less light: dimmer image
– Deeper depth-of-focus

6
Lens Flaws: Geometric Aberration

• Aberrations:
Real lenses don’t converge rays perfectly
• Spherical: edge rays ≠ center rays
• Coma: diagonal rays focus deeper at edge

http://www.nationmaster.com/encyclopedia/Lens-(optics)

Radial Distortion
(e.g. ‘Barrel’ and ‘pin-cushion’)
straight lines curve around the image center

7
Lens Flaws: Chromatic Aberration
• Dispersion: wavelength-dependent refractive index
– (enables prism to spread white light beam into rainbow)
• Modifies ray-bending and lens focal length: f(λ)

• color fringes near edges of image


• Corrections: add ‘doublet’ lens of flint glass, etc.
http://www.swgc.mun.ca/physics/physlets/opticalbench.html
http://www.swgc.mun.ca/physics/physlets/opticalbench.htm

Lens Flaws: Chromatic Aberration

• Lens Design Fix: Multi-element lenses


Complex, expensive, many tradeoffs!
• Computed Fix: Geometric warp for R,G,B.
Near Lens Center Near Lens Outer Edge

8
Lens Flaws: Intensity Aberrations
Image ‘Vignette’: bright at center, dark at edges.

Several compounded causes:


• Internal shadowing—angle-dependent Ray bundles
• Longer paths for off-axis Rays; Dark Glass
• Planar detector: outer pixels span greater angle
• Compensation:
– Use anti-
anti-vignetting filters,
(darkest at center)
– OR Position-
Position-dependent
pixel-
pixel-detector sensitivity.
http://homepage.ntlworld.com/j.houghton/vignette.htm
http://homepage.ntlworld.com/j.houghton/vignette.htm

Polarization

Sunlit haze is often


strongly polarized.
Polarization filter yields
much richer sky colors

9
Film-like Color Sensing

• Visible Light: narrow band of emag spectrum


• λ ≈ 400-700 nm (nm = 10-9 meter wavelength)
•(humans:<1 octave ÅÆhoney
ÅÆhoney bees: 3-3-4 ‘octaves
do honey bees sense harmonics, see color ‘chords’
chords’ ?

ÅEquiluminant Curve
defines ‘luminance’
luminance’
vs. wavelength

http://www.yorku.ca/eye/photopik.htm

Film-like Color Sensing

• Visible Light: narrow band of emag spectrum


• λ ≈ 400-700 nm (nm = 10-9 meter wavelength)
• At least 3 spectral bands required (e.g. R,G,B)

ÅRGB spectral curves


Vaytek CCD camera
with Bayer grid

www.vaytek.com/specDVC.htm

10
Color Sensing

• 3-chip vs. 1-chip: quality vs. cost

http://www.cooldictionary.com/words/Bayer
http://www.cooldi tionary.com/words/Bayer--filter.wikipedia

Practical Color Sensing:


Bayer Grid

• Estimate RGB
at ‘G’ cels from
neighboring
values
http://www.cooldictionary.com/
words/Bayer-filter.wikipedia

11
Conclusions
• Film-like photography methods limit digital
photography to film-like results or less.

• Broaden, unlock our views of photography:

• 4D, 8D, even 10D Ray Space holds the photographic


signal. Look for new solutions by creating, gathering,
processing RAYS, not focal-plane intensities.

• Choose the best, most expressive sets of rays,


THEN find the best way to measure them.

Useful links:

• Interactive Thin Lens Demo


(or search ‘physlet optical bench’)
www.swgc.mun.ca/physics/physlets/opticalbench.html
For more about color:
– Prev. SIGGRAPH courses (Stone et al.)
– Good: www.cs.rit.edu/~ncs/color/a_spectr.html
– Good: www.colourware.co.uk/cpfaq.htm
– Good: www.yorku.ca/eye/toc.htm

12
Image Tools

Image Processing • Gradient domain operations,


and – Applications in tone mapping, fusion and matting
Reconstructions Tools
• Graph cuts,
– Applications in segmentation and mosaicing

• Bilateral and Trilateral filters,


Ramesh Raskar – Applications in image enhancement
Mitsubishi Electric Research Labs
Cambridge, MA

Intensity Gradient in 1D Reconstruction from Gradients


Intensity Gradient Gradient Intensity
105 105 105 105

?
?
1 1 1 1
I(x) G(x) G(x) I(x)

Gradient at x,
G(x) = I(x+1)- I(x) For n intensity values, about n gradients
Forward Difference

Reconstruction from Gradients Intensity Gradient in 2D

Gradient Intensity Gradient at x,y as Forward Differences


105 105 Gx(x,y) = I(x+1 , y)- I(x,y)
Gy(x,y) = I(x , y+1)- I(x,y)
? G(x,y) = (Gx , Gy)

Grad X
1 1
G(x) I(x)

1D Integration
I(x) = I(x-1) + G(x)
Grad Y
Cumulative sum

1
Intensity Gradient Vectors in Images
Reconstruction from Gradients
Given G(x,y) = (Gx , Gy)
How to compute I(x,y) for the image ?
Gradient Vector
For n 2 image pixels, 2 n 2 gradients !

Grad X
2D
Integration

Grad Y

Intensity Gradient in 2D Intensity Gradient Manipulation

Recovering Original Image Recovering Manipulated Image

Grad X Grad X New Grad X


2D Gradient
Integration Processing

Grad Y Grad Y New Grad Y

Intensity Gradient Manipulation Intensity Gradient Manipulation

Recovering Manipulated Image Recovering Manipulated Image

New Grad X New Grad X


Gradient 2D Gradient 2D
Processing Integration Processing Integration

New Grad Y New Grad Y

2
Intensity Gradient Manipulation Reconstruction from Gradients

A Common Pipeline

Grad X New Grad X


Gradient 2D
Processing Integration

Grad Y New Grad Y

Euler-
Euler-Lagrange Equation Application: Compressing Dynamic Range

How could you put all this


information into one
Image ?
⎛ ∂ 2 I ∂G ⎞ ⎛ ∂ 2 I ∂G y ⎞
2⎜⎜ 2 − x ⎟⎟ + 2⎜⎜ 2 − ⎟⎟
⎝ ∂x ∂x ⎠ ⎝ ∂y ∂y ⎠

Attenuate High Gradients Basic Assumptions


Intensity Intensity Gradient
105 105 105
• The eye responds more to local intensity
differences (ratios) than global illumination

• A HDR image must have some large


magnitude gradients
1 1 1
I(x) I(x) G(x)

Maintain local detail at the cost • Fine details consist only of smaller magnitude
of global range gradients
Fattal et al Siggraph 2002

3
Gradient Compression in 1D
Gradient Domain Method

Basic Method Summary: Intensity Gradient Manipulation

• Take the log of the luminances


• Calculate the gradient at each point
• Scale the magnitudes of the gradients with a
progressive scaling function (Large
Gradient
Grad X New Grad X
magnitudes are scaled down more than small 2D
Integration

magnitudes) Processing
• Re-
Re-integrate the gradients and invert the log Grad Y New Grad Y

to get the final image

Graph and Images

Credits: Jianbo Shi

Agrawala et al, Digital Photomontage, Siggraph 2004

4
Agrawala et al, Digital Photomontage, Siggraph 2004 Agrawala et al, Digital Photomontage, Siggraph 2004

5
Source images Brush strokes Computed labeling

Composite

set ofactual
originals perceived
photomontage

Image objective

Brush strokes Computed labeling

0 for any label

0 if red
∞ otherwise

Graph Based Image Segmentation


Segmentation = Graph partition
Minimum Cost Cuts in a graph

i
Wij
j
Cut: Set of edges whose removal makes a graph disconnected
Wij Si,j : Similarity between pixel i and pixel j

Cost of a cut,

V: graph node Image pixel

E: edges connection nodes Link to neighboring pixels


A
A
Wij: Edge weight Pixel similarity

6
Problem with min cuts Normalize cuts in a graph
Ncut = balanced cut

Min. cuts favors isolated clusters


NP-Hard!

Graph Cuts for Segmentation and Mosaicing Bilateral and Trilateral Filter
Cut ~ String on a height field

Brush strokes Computed labeling

Input Bilateral Trilateral

Bilateral and Trilateral Filtering ‘Unilateral’ Filter


Traditional, linear, FIR filters
Outline Key Idea: Convolution
- Output(x) = local weighted avg. of inputs.
- Weights vary within a ‘window’ of nearby x
• Unilateral filtering Smoothes away details, BUT blurs result
• Smoothing using filtering
• Bilateral filtering
• Strength and 3 weaknesses
Note that weights
• Trilateral filtering always sum to 1.0
• Key ideas
c
• Application in tone mapping
weight(x)
• Detail preserving contrast reduction

7
‘Unilateral’ Filter Bilateral Filter
A 2-D filter window: weights vary with intensity
Forces a Tradeoff: [Tomasi&Manduchi1998]
-Broad window: better detail removal Further Analysis: [Black99] [Elad02] [Durand&Dorsey02], ...

-- OR -- Range
-Narrow window: better large structure c: distance from input (domain of input)
f(x) s: difference from input (range of input)
But we want BOTH... x
Domain

Bilateral Filter Bilateral Filter


Range
Why it works: graceful segmentation f(x)
• Filtering in one region ignores filtering in another
x
Domain
Range • Gaussian s acts as a ‘filtered region’ finder
c
f(x) c
x
Domain
s
2 Gaussian Weights:
product =
ellisoidal footprint s c

Bilateral Filter: Strengths Bilateral Filter: 3 Difficulties


Piecewise smooth result
• averages local small details, ignores outliers
• Poor Smoothing in
• preserves steps, large-scale ramps, and curves,...
• Equivalent to anisotropic diffusion and robust statistics
High Gradient Regions
c
[Black98,Elad02,Durand02]
• Smoothes and blunts
• Simple & Fast (esp. w/ [Durand02] FFT-based speedup)
cliffs, valleys & ridges
s
• Can combine disjoint
signal regions
c
Output at is
average of a
s tiny region

8
Bilateral Filter: 3 Difficulties Bilateral Filter: 3 Difficulties
c

• Poor Smoothing in s • Poor Smoothing in


c
High Gradient Regions High Gradient Regions
• Smoothes and blunts • Smoothes and blunts
cliffs, valleys & ridges cliffs, valleys & ridges s
• Can combine disjoint • Disjoint regions
signal regions can blend together

New Solution: “Trilateral” Filter BilateralÆ


BilateralÆTrilateral Filter

Three Key Ideas:


c
• Tilt the filter window s
according to bilaterally-
smoothed gradients

• Limit the filter window


Bilateral to connected regions
Input Trilateral
of similar smoothed gradient.
• Keep best features of bilateral, adds more
• Corner sharpening resembles PDE shocks • Adjust Parameters
from measurements
• User sets 1 parameter (good defaults for 7 internals) of the windowed signal

BilateralÆ
BilateralÆTrilateral Filter BilateralÆ
BilateralÆTrilateral Filter
Key Ideas: Key Ideas:
c c
• Tilt the filter window s • Tilt the filter window s
according to bilaterally- according to bilaterally-
smoothed gradients smoothed gradients

• Limit the filter window • Limit the filter window


to connected regions to connected regions
of similar smoothed gradient. of similar smoothed gradient.

• Adjust Parameters • Adjust Parameters


from measurements from measurements
of the windowed signal of the windowed signal

9
.
Application: Tone Mapping
• Filter removes details.

• Goal: Detail-Preserving Contrast Reduction


• in log domain, difference == contrast
• remove details, compress contrast, replace details

Details
In
w
(<1: detail strength)

Base Out
Detail-
Detail- 10x
log10 Removing 10X 10X 10X
Filter Æ Æ Æ
γ (<1: compression)

More Trilateral
Results .

Comparable to •,
Gradient Attenuation
[Fattal et al 2002]

Similar to LCIS
[Tumblin&Turk`99]
[Bertozzi’03]

Simple, Robust

10
Course 15: Computational Photography
Image Fusion and Reconstruction
B1: Reconstruction
• Epsilon Photography
– Vary time, view
– Vary focus, exposure polarization, illumination
– Better than any one photo

Ramesh Raskar • Achieve effects via multi-


multi-image fusion
Mitsubishi Electric Research Labs
Jack Tumblin • Understand computer vision methods
Northwestern University
• Exploit lighting
Course WebPage :
http://www.merl.com/people/raskar/photo

Time-
Time-Lapse Time-
Time-Lapse
• Duchamp
– Nude Descending a Staircase

Richard Hundley 2001

Varying Focus: Extended depth-


depth-of-
of-field
Shape Time Photography

Freeman et al 2003
Agrawala et al, Digital Photomontage, Siggraph 2004

1
Source images Computed labeling Computer Vision Techniques
• Photometric Stereo
– Varying light source positions
– Estimate surface normal from shading
– Diffuse objects: minimum 3 lights
Composite
• Depth from Defocus
– Varying focus

• Defogging
– Varying time and polarization

Varying Focus: Depth from Defocus Varying Focus: Depth from Defocus
(Nayar, Watanabe and Noguchi, 95 ) (Nayar, Watanabe and Noguchi, 95 )
image image
detectors lens detectors lens
scene scene

P P

f f
Q Q

i o i o
near far
focus aperture focus aperture

1 1 1 1 1 1
+ = + =
Previous Work: o i f Previous Work: o i f
Pentland 87,  Subbarao 88, Nayar 89. Pentland 87,  Subbarao 88, Nayar 89.

Clear Day  from  Foggy Days
Real Time Defocus Depth Camera (Movies)
Two Different Foggy Conditions (Shree Nayar,  Srinivasa Narasimhan 00)
(Nayar , Watanabe , Noguchi 95 )

Clear Day Image

Time: 3 PM Deweathering

Performance : 512 x 480 Depth map at 30 frames per sec. Time: 5:30 PM

2
Varying Polarization Varying Polarization
Yoav Y. Schechner, Nir Karpel 2005
• Schechner,
Schechner, Narasimhan,
Narasimhan, Nayar

• Instant dehazing
Best polarization state
of images using
polarization

Worst polarization state

Best polarization Recovered


state image
[Left] The raw images taken through a polarizer. [Right] White-balanced results:
The recovered image is much clearer, especially at distant objects, than the raw image

Varying Wavelength: Multispectral Fusion Varying IR Wavelength Image Fusion

Vegetation Mapping of the Forest NIR SWIR LWIR

+ =

SAR Optical Landsat Uniform fusion across image

Adaptive fusion by sub region

Non-
Non-photorealistic Camera:
Depth Edge Detection and Stylized Rendering
using
Multi-
Multi-Flash Imaging

Ramesh Raskar, Karhan Tan, Rogerio Feris,


Jingyi Yu, Matthew Turk
Mitsubishi Electric Research Labs (MERL), Cambridge, MA
U of California at Santa Barbara
U of North Carolina at Chapel Hill

3
Our Method

Canny Intensity Edge Detection

Image Fusion and Reconstruction Improving FILM-


FILM-LIKE
Camera Performance
What would make it ‘perfect’ ?
• Epsilon Photography
– Vary focus, exposure polarization, illumination
• Dynamic Range
– Vary time, view
– Better than any one photo

• Achieve effects via multi-image fusion


• Understand computer vision methods
• Exploit lighting

Film-
Film-Style Camera: Dynamic Range Limits Problem:Map Scene to Display
Domain of Human Vision:
from ~10-6 to ~10+8 cd/m2
starlight moonlight office light daylight flashbulb

10-6 10-2 1 10 100 10+4 10+8

Under-Exposure
• Highlight details: Captured
Over-Exposure
• Highlight details: Lost
?? 0 255
??
• Shadow details: Lost • Shadow details: Captured
Range of Typical Displays:
from ~1 to ~100 cd/m2

4
High dynamic range capture (HDR) Debevec’97 (see www.HDRshop.com)
www.HDRshop.com)

Pixel Value Z
j=0 1 2 3 4 5 6
j=0
• overcomes one of photography’s key limitations j=1
i=2
– negative film = 250:1 (8 stops) j=3
j=4
– paper prints = 50:1 j=5
j=6
– [Debevec97] = 250,000:1 (18 stops) logLi
– hot topic at recent SIGGRAPHs
? f(logL)
f(logL)

STEP 1:
→ --number
--number the images ‘i’,
--pick
--pick fixed spots (x
(xj,yj)
that sample scene’s
radiance values logLi well:
© 2004 Marc Levoy

Debevec’97 (see www.HDRshop.com)


www.HDRshop.com) Debevec’97 (see www.HDRshop.com)
www.HDRshop.com)
Pixel Value Z

•Use the multiple samples to


j=0 1 2 3 4 5 6
j=0 reconstruct the response curve;
j=1
i=2
j=3
j=4
Pixel Value Z

Pixel Value Z

j=5 j=0 1 2 3 4 5 6 F(logL)


F(logL) ?
j=6
logLi Zij
?
? f(logL)
f(logL)
logLi logL
STEP 2:
•Then use the inverse response curve to
--Collect
--Collect pixel values Zij
(from image i, location j) reconstruct the intensities that caused the responses
--(All
--(All of them sample the
response curve f(logL)…)
f(logL)…)

5
HDR Direct Sensing? HDR From Multiple Measurements

• An open problem! (esp. for video...)

• A direct (and expensive) solution:


– Flying Spot Radiometer: Captured  Images
brute force instrument, costly, slow, delicate
• Some Other Novel Image Sensors:
– line-
line-scan cameras (e.g. Spheron:
Spheron: multi-
multi-detector)
– logarithmic CMOS circuits (e.g. Fraunhofer Inst) (Courtesy  Shree Nayar, Tomoo Mitsunaga 99)

– Self-
Self-resetting pixels (e.g. sMaL /Cypress Semi) Ginosar et al 92,  Burt & Kolczynski 93,
Madden 93, Tsai 94,  Saito 95,  Mann & Picard 95,
– Gradient detectors (CVPR 2005 Tumblin,Raskar et al) Debevec & Malik 97, Mitsunaga & Nayar 99,
Robertson et al. 99,  Kang et al. 03 Computed Image

MANY ways to make multiple exposure measurments
MANY ways to make multiple exposure measurments
MANY ways to make multiple exposure measurements
Sequential Exposure Change:

Ginosar et al 92, Burt & Kolczynski 93, Multiple Sensor Elements in a Pixel:


Madden 93, Tsai 94, Saito 95, Mann 95,
Debevec & Malik 97, Mitsunaga & Nayar 99,
Robertson et al. 99, Kang et al. 03 time

Handy 86, Wen 89, Murakoshi 94,


Mosaicing with Spatially Varying Filter: Konishi et al. 95, Hamazaki 96, Street 98

Schechner and Nayar 01,


Aggarwal and Ahuja 01
time Assorted Pixels: R G R G R G R G
G B G B G B G B
R G R G R G R G
G B G B G B G B

Multiple Image Detectors: Generalized Bayer Grid: R


G
G
B
R
G
G
B
R G
G B
R G
G B
R G R G R G R G
Trade resolution for multiple exposure,color G B G B G B G B

Nayar and Mitsunaga oo,


Nayar and Narasimhan 02
Doi et al. 86, Saito 95, Saito 96,
Kimura 98, Ikeda 98,
Aggarwal & Ahuja 01, …

Assorted‐
Assorted‐Pixel Camera Prototype Another Approach:  Locally Adjusted Sensor Sensitivity

( Courtesy : Sony Kihara Research Lab )

Computational Pixels:

Digital Still Camera Camera with Assorted Pixels ( pixel sensivity set by its illumination)


Brajovic & Kanade 96,
Ginosar & Gnusin 97
Serafini & Sodini 00

NO GRADIENT CAMERA: RAMESH HAS IT

6
Sensor: LCD Adaptive Light Attenuator Improving FILM-
FILM-LIKE
light Camera Performance
Unprotected
8-bit Sensor
attenuator
element Tt+1 Output: • Vary Focus Point-by-Point
Controller
detector
element
It
LCD Light Attenuator
limits image intensity
reaching 8-bit sensor

Attenuator-
Protected
8-bit Sensor
Output

Single-
Single-Axis Multi-
Multi-Parameter Camera
High depth-of-field (SAMP)
Levoy et al., SIGG2005
Idea:
• adjacent views use different focus settings Cameras + Beamsplitters
• for each pixel, select sharpest view Place MANY (8) cameras
at same virtual location
[Haeberli90]

2005: Morgan McGuire (Brown),


Wojciech Matusik (MERL),
Hanspeter Pfister (MERL),
Fredo Durand (MIT),
close focus distant focus composite
John Hughes (Brown),
Shree Nayar (Columbia)

SAMP Prototype System (Layout) Multiple Simultaneous Focus Depths

‘Co-located’
Back Fore Lenses

zF zB
Fore & Back
Focal Planes
Strongly desired in microscopy, too: see
http://www.micrographia.com/articlz/artmicgr/mscspec/mscs0100.htm

7
Long-range
Focus Adjustment: Sum of Bundles
synthetic aperture photography

Improving FILM-
FILM-LIKE A tiled camera array
Camera Performance

• Field of view vs. Resolution? • 12 × 8 array of VGA cameras


• abutted: 7680 × 3840 pixels
Are we done? • overlapped 50%: half of this
• Almost EVERY digital camera has • total field of view = 29° wide
panoramic stitching.
• seamless mosaicing isn’t hard
• cameras individually metered
No; Much more is possible:
• Approx same center-of-proj.

Tiled panoramic image Tiled panoramic image


(before geometric or color calibration) (after calibration and blending)

8
1/60 1/60 Improving FILM-
FILM-LIKE
same exposure Camera Performance
in all cameras 1/60 1/60

individually 1/120 1/60


• Exposure time and Frame rate
metered
1/60 1/30

1/60
1/120

same and
overlapped 50%

1/60 1/30

High Speed Video High Speed Video

Say you want 120 frame per second (fps) video. Say you want 120 frame per second (fps) video.
• You could get one camera that runs at 120 fps • You could get one camera that runs at 120 fps
• Or… • Or… get 4 cameras running at 30 fps.

52 Camera Cluster, 1560 FPS Conclusions


Levoy et al., SIGG2005

• Multiple measurements:
– Multi-camera, multi-sensor, multi-optics, multi-lighting

• Intrinsic limits seem to require it


– lens diffraction limits, noise, available light power.

• Are we eligible for Moore’s law?


Or will lens making, mechanics limit us?

9
Computational Cameras:
Convergence of Optics and Software

Shree K. Nayar

Computer Science
Columbia University

http://www.cs.columbia.edu/CAVE/

Support:
NSF, ONR, Packard Foundation
T. C. Chang Endowed Chair

Traditional  Camera

Detector

Lens

Pixels

c  Shree Nayar, Columbia University

1
Computational  Cameras

Detector

Computations New  Optics

Pixels

Vision

c  Shree Nayar, Columbia University

Wide Angle Imaging

Multiple Cameras Catadioptric Imaging

Examples:  Rees 70,  Charles 87,  Nayar 88,  
Examples: Disney 55, McCutchen 91, Nalwa 96,  Yagi 90,   Hong 91,  Yamazawa 95,  Bogner 95,  
Swaminathan & Nayar 99, Cutler et al. 02 Nalwa 96, Nayar 97,   Chahl & Srinivasan 97

c  Shree Nayar, Columbia University

2
What’s the Mirror’s Shape ?

camera
(with Simon Baker, ICCV 98)

scene

lens

z
mirror z(r) Complete  Class  of  Mirrors

c2 ⎛ k − 2 ⎞
2
⎛ c⎞ 2⎛ k ⎞
viewpoint r ⎜ z − ⎟ − r ⎜ − 1⎟ = ⎜ ⎟ (k > 0)( k ≥
⎝ 2⎠ ⎝2 ⎠ 4 ⎝ k ⎠

2⎛ c2 ⎞ ⎛ 2k − c 2 ⎞
2
⎛ c⎞
⎜ z − ⎟ − r ⎜⎜1 + ⎟ = ⎜⎜ ⎟⎟ (k > 2)( k >
⎝ 2⎠ ⎝ 2 k ⎟⎠ ⎝ 4 ⎠
c  Shree Nayar, Columbia University

OneShot 360 by RemoteReality

(Nayar 97)

4 Megapixel (2000 x 2000)
360  degree  still  camera

c  Shree Nayar, Columbia University

3
c  Shree Nayar, Columbia University

(with Venkat Peri 96)


c  Shree Nayar, Columbia University

4
Omnidirectional Periscope Wide Area Surveillance

Vehicle Navigation
Perimeter Monitoring

Commercial
Security

Virtual Tours
c  Shree Nayar, Columbia University Near Vehicle Awareness (Courtesy : RemoteReality Inc.)

Radial  Stereoscopic  Imaging

(with Sujit Kuthirummal, SIGGRAPH  06)
c  Shree Nayar, Columbia University

5
c  Shree Nayar, Columbia University

c  Shree Nayar, Columbia University

6
Mosaicing

c  Shree Nayar, Columbia University

……Redundant Measurements

c  Shree Nayar, Columbia University

7
Generalized Mosaicing
( Schechner and Nayar, ICCV 2001 )

Field of View

camera Dynamic Range

Spectrum

spatially varying 
Depth of Field
optical  filter

Polarization

c  Shree Nayar, Columbia University

Exposure                         Spectrum                   Polarization                          Focus
focus
focal

c  Shree Nayar, Columbia University

8
High Dynamic Range Mosaicing

Attenuation

c  Shree Nayar, Columbia University

High Dynamic Range Mosaic

88 - 18,794

c  Shree Nayar, Columbia University

9
Multispectral Mosaicing

Spectral

λ
x
c  Shree Nayar, Columbia University

Multispectral Mosaic

400 500 600 λ 700 400 500 600 λ 700 400 500 600 λ 700

c  Shree Nayar, Columbia University

10
High Dynamic Range Imaging: Assorted Pixels

scene

Spatially Varying 
Exposures (SVE)
optical mask

pixels

image detector
( with Tomoo Mitsunaga, CVPR 2000)
c  Shree Nayar, Columbia University

c  Shree Nayar, Columbia University

11
Sony Cybershot Sony Cybershot with Assorted Pixels

c  Shree Nayar, Columbia University

Motion Blur

Object Motion

Camera Motion
c  Shree Nayar, Columbia University

12
Fundamental Trade‐Off in Imaging

Temporal Resolution (fps) High Resolution Camera


3

Low Resolution Camera


130

3M 75K
2048x1536 320x240

Spatial Resolution (pixels)


(with Moshe Ben‐Ezra, CVPR 2003)
c  Shree Nayar, Columbia University

Debluring Approach: Hybrid Imaging
y
Low‐Res. Camera
Motion Analysis

Same Time Period
PSF Estimation
High‐Res. Camera

Deconvolution

c  Shree Nayar, Columbia University

13
Hybrid Imaging System: Prototype

Primary Detector
(2048x1536)

Secondary Detector
(360x240)

Resolution Ratio of 1 : 36
c  Shree Nayar, Columbia University

Example: Blurred High Resolution Image

f = 633mm, Exp. Time 1 Sec (> ‐9 stops)
c  Shree Nayar, Columbia University

14
Example: PSF Estimation from Motion

Estimated PSF

90 0.06
Low Resolution Video

Y (Pixels)
10
0.001
10 130
X (Pixels)

c  Shree Nayar, Columbia University

Example: PSF Estimation from Motion

Blurred Image Deblurred image

Tripod image (Ground Truth) 
c  Shree Nayar, Columbia University

15
Super‐Resolution using Jitter Video

Conventional Video Jitter Video

Ti Ti
m m
e e

Space Space

(with Moshe Ben‐Ezra and Assaf Zomet, CVPR 2004)
c  Shree Nayar, Columbia University

Jitter Camera

Lens Detector

Micro‐Actuator

Jitter is Instantaneous and Synchronized
c  Shree Nayar, Columbia University

16
Jitter Camera

Lens Detector

Micro‐Actuator

Jitter is Instantaneous and Synchronized
c  Shree Nayar, Columbia University

Computer Controlled Computer Controlled
Y Micro‐Actuator X Micro‐Actuator

Lens

c  Shree Nayar, Columbia University
Board Camera

17
Super‐Resolution using Jitter Video

1 (out of 4) Jitter Camera Image Super‐Resolution

c  Shree Nayar, Columbia University

18
Imaging Through Micro‐Mirrors
(with Vlad Branzoi and Terry Boult, 2004)
detector
scene xi

black
viewpoint
surface

ni
oi nb
mirror array

Geometry: Ray Orientation Photometry: Ray Attenuation

ni
t (    )
oi (   )  =  G (    )
xi ni ai (   )  =
xi
ni + t (    )
t (    ) nb
c  Shree Nayar, Columbia University

Digital Micromirror Device (DMD)
(by Texas Instruments)

DMD Array: Micromirror Architecture:

o o
‐10 10

14 um

DMD for Imaging: 
(Malbet et al. 95, Kearney et al. 98, Castracane et al. 99, Christensen et al. 02)
c  Shree Nayar, Columbia University

19
Programmable  Imaging  System

Imaging Lens DMD Electronics

Camera Electronics Tilted CCD Lens Focused on DMD


c  Shree Nayar, Columbia University

Modulation: Examples
Scene DMD Image Camera Image

* =

* =

* =

c  Shree Nayar, Columbia University

20
Adaptive Dynamic Range Imaging (ADR)
Normal (Constant Exposure) Video

Pixel‐wise Dynamic
Range Control
(Nayar & Branzoi 03)
(Christensen et al. 02)

ADR  Video DMD Control Video

c  Shree Nayar, Columbia University

Camera with a Lens

Scene

Aperture

Lens

Image Detector
c  Shree Nayar, Columbia University

21
Lensless Camera with Volumetric Aperture

Scene

Volumetric
Aperture

Image Detector
(with Assaf Zomet, 2005)
c  Shree Nayar, Columbia University

Single Aperture Layer

u = tan(α )
S (u, v )
Scene

Single Layer
Aperture
α
Image Detector
f

Pixel Brightness: I (x, y ) = ∫∫ S (u, v)T ( x − fu, y − fv)dudv


c  Shree Nayar, Columbia University Scene Transmittance Function

22
Initial  Implementation: LCD Attenuator

Camera without Lens LCD Aperture

LCD
Controller

c  Shree Nayar, Columbia University

Panning without Moving Parts

LCD Attenuator

Image Detector

Captured Video

c  Shree Nayar, Columbia University

23
Multiple Aperture Layers

u = tan(α )
S (u, v )
Scene

Multi‐Layered
Aperture
α
j=1
j=2
.
. Image Detector

I ( x, y ) = ∫ ∫ S (u , v )∏ T j (x − f j u, y − f j v )dudv
N
Pixel Brightness:
j =1

c  Shree Nayar, Columbia University Scene Transmittance Functions

Conventional View Desired View

c  Shree Nayar, Columbia University

24
Split Field of View using Multiple Layers

Fov 2
Fov 1
Fov 3

Attenuating Layers

Pinholes

c  Shree Nayar, Columbia University Image Detector

Split Field of View

Lens
Camera

Lensless
Camera

c  Shree Nayar, Columbia University

25
Computational Cameras

Detector

Computations New  Optics

Pixels

Vision

c  Shree Nayar, Columbia University

Programmable Imaging

Detector

Computations New  Optics

Pixels

Vision Programmable Controller

c  Shree Nayar, Columbia University

26
Light field
photography and videography

Marc Levoy

Computer Science Department


Stanford University

34:15 total + 30% = ~45 minutes

Time = 1
List of projects

• high performance imaging


using large camera arrays
• light field photography
using a handheld plenoptic camera
• dual photography

© 2005 Marc Levoy

Time = 2
High performance imaging
using large camera arrays
Bennett Wilburn, Neel Joshi, Vaibhav Vaish, Eino-Ville Talvala, Emilio Antunez,
Adam Barth, Andrew Adams, Mark Horowitz, Marc Levoy

(Proc. SIGGRAPH 2005)

Time = 3
Stanford multi-camera array

• 640 × 480 pixels ×


30 fps × 128 cameras

• synchronized timing
• continuous streaming
• flexible arrangement

© 2005 Marc Levoy

Time = 4
Ways to use large camera arrays

• widely spaced light field capture


• tightly packed high-performance imaging
• intermediate spacing synthetic aperture photography

© 2005 Marc Levoy

Time = 5
Intermediate camera spacing:
synthetic aperture photography

© 2005 Marc Levoy

Time = 6
Example using 45 cameras
[Vaish CVPR 2004]

© 2005 Marc Levoy

•Reference:
–Vaish, V., Wilburn, B., Joshi, N., Levoy, M., Using
Plane + Parallax for Calibrating Dense Camera
Arrays,
Proc. CVPR 2004.

Time = 7
•At left is a single view, at right is a sum of all views,
appropriately shifted.
•For the movie, see 2nd bullet of “Slides and videos” on
http://graphics.stanford.edu/projects/array/

Time = 8
Video

© 2005 Marc Levoy

•Video available at
http://graphics.stanford.edu/papers/CameraArray/

Time = 9
Tiled camera array
Can we match the image quality of a cinema camera?

• world’s largest video camera


• no parallax for distant objects
• poor lenses limit image quality
• seamless mosaicing isn’t hard

•poor lenses limit image quality


–we set out to answer the question, “Can we match
the image quality of an SLR?”

Time = 10
Tiled panoramic image
(before geometric or color calibration)

Time = 11
Tiled panoramic image
(after calibration and blending)

Time = 12
Tiled camera array
Can we match the image quality of a cinema camera?

• world’s largest video camera


• no parallax for distant objects
• poor lenses limit image quality
• seamless mosaicing isn’t hard
• per-camera exposure metering
• HDR within and between tiles

Time = 13
same exposure
in all cameras

individually
metered

checkerboard
of exposures

Time = 14
High-performance photography
as multi-dimensional sampling

• spatial resolution
• field of view
• frame rate
• dynamic range
• bits of precision
• depth of field
• focus setting
• color sensitivity

© 2005 Marc Levoy

Time = 15
Spacetime aperture shaping

• shorten exposure time to


freeze motion → dark
• stretch contrast to restore
level → noisy
• increase (synthetic) aperture
to capture more light →
decreases depth of field

© 2005 Marc Levoy

Time = 16
• center of aperture: few cameras, long exposure →
high depth of field, low noise,
but action is blurred
• periphery of aperture: many cameras, short exposure →
freezes action, low noise,
but low depth of field

Time = 17
Time = 18
Time = 19
Light field photography using a
handheld plenoptic camera
Ren Ng, Marc Levoy, Mathieu Brédif,
Gene Duval, Mark Horowitz and Pat Hanrahan

(Proc. SIGGRAPH 2005


and TR 2005-02)

© 2005 Marc Levoy

•Light field capture not using an array of cameras, but


–using a single, handheld camera
•What we’ll do with these light fields is not seeing through
crowds, but
–refocusing a picture after we take it, and
–moving the observer (slightly) after we take the
picture

Time = 20
Conventional versus light field camera

© 2006 Marc Levoy

Time = 21
Conventional versus light field camera

uv-plane st-plane

© 2006 Marc Levoy

Time = 22
Conventional versus light field camera

st-plane uv-plane

© 2006 Marc Levoy

Time = 23
Prototype camera

Contax medium format camera Kodak 16-megapixel sensor

Adaptive Optics microlens array 125μ square-sided microlenses

4000 × 4000 pixels ÷ 292 × 292 lenses = 14 × 14 pixels per lens

Time = 24
Mechanical design

• microlenses float 500μ above sensor


• focused using 3 precision screws
© 2005 Marc Levoy

Time = 25
Time = 26
Prior work

• integral photography
– microlens array + film
– application is autostereoscopic effect

• [Adelson 1992]
– proposed this camera
– built an optical bench prototype using relay lenses
– application was stereo vision, not photography

© 2006 Marc Levoy

•Reference:
–Adelson, E.H., Wang, J.Y.A., Single Lens Stereo
with a Plenoptic Camera ,
–IEEE Transactions on Pattern Analysis and
Machine Intelligence (PAMI),
Vol. 14, No. 2, 1992, pp. 99-106.

Time = 27
Digitally stopping-down

• stopping down = summing only the


central portion of each microlens
© 2006 Marc Levoy

Time = 28
Digital refocusing

• refocusing = summing windows


extracted from several microlenses
© 2006 Marc Levoy

Time = 29
A digital refocusing theorem

• an f / N light field camera, with P × P pixels


under each microlens, can produce views as
sharp as an f / (N × P) conventional camera

– or –

• it can produce views with a shallow depth of


field ( f / N ) focused anywhere within the
depth of field of an f / (N × P) camera

© 2006 Marc Levoy

Time = 30
Example of digital refocusing

© 2006 Marc Levoy

•For full sequence, see


http://graphics.stanford.edu/papers/lfcamera/

Time = 31
Refocusing portraits

© 2006 Marc Levoy

•For full sequence, see


http://graphics.stanford.edu/papers/lfcamera/

Time = 32
Action photography

© 2006 Marc Levoy

•For full sequence, see


http://graphics.stanford.edu/papers/lfcamera/

Time = 33
Extending the depth of field

conventional photograph, conventional photograph, light field, main lens at f / 4,


main lens at f / 4 main lens at f / 22 after all-focus algorithm
[Agarwala 2004]

•main lens at f/22


–captured with light field camera and f/4 lens,
–computed by extracting only the middle pixel of
that image
–would be the same image if no microlenses and
larger pixels
•Reference:
–Agarwala, A., Dontcheva, M., Agrawala, M.,
Drucker, S., Colburn, A., Curless, B., Salesin, D.,
Cohen, M., Interactive digital photomontage,
Proc. SIGGRAPH 2004.

Time = 34
Macrophotography

© 2005 Marc Levoy

Time = 35
Digitally moving the observer

• moving the observer = moving the


window we extract from the microlenses
© 2006 Marc Levoy

Time = 36
Example of moving the observer

© 2006 Marc Levoy

Time = 37
Moving backward and forward

© 2006 Marc Levoy

Time = 38
Implications

• cuts the unwanted link between exposure


(due to the aperture) and depth of field
• trades off (excess) spatial resolution for ability to
refocus and adjust the perspective
• sensor pixels should be made even smaller,
subject to the diffraction limit
36mm × 24mm ÷ 2μ pixels = 216 megapixels
18K × 12K pixels
1800 × 1200 pixels × 10 × 10 rays per pixel

© 2006 Marc Levoy

Time = 39
Dual Photography

Pradeep Sen, Billy Chen, Gaurav Garg, Steve Marschner,


Mark Horowitz, Marc Levoy, Hendrik Lensch

(Proc. SIGGRAPH 2005)

Time = 40
Helmholtz reciprocity
light

camera

scene

Time = 41
Helmholtz reciprocity
camera

light

scene

Time = 42
Measuring transport along a set of paths
projector photocell

scene

Time = 43
Reversing the paths
camera
point light

scene

•The transport will be the same


–up to a global scaling factor
–because we replaced a projector by a different kind
of light

Time = 44
Forming a dual photograph
projector photocell

scene

Time = 45
Forming a dual photograph
“dual” camera
“dual” light

image of
scene

scene

Time = 46
Physical demonstration

• light replaced with projector


• camera replaced with photocell
• projector scanned across the scene

conventional photograph, dual photograph,


with light coming from right as seen from projector’s position
and as illuminated from photocell’s position

Time = 47
Related imaging methods

• time-of-flight scanner
– if they return reflectance as well as range
– but their light source and sensor are typically coaxial

• scanning electron microscope

Velcro® at 35x magnification,


Museum of Science, Boston
© 2006 Marc Levoy

Time = 48
The 4D transport matrix
projector camera

scene

Time = 49
The 4D transport matrix
projector camera

P C

pq x 1 mn x 1

mn x pq

scene

Time = 50
The 4D transport matrix

mn x pq

C = T P

mn x 1 pq x 1

Time = 51
The 4D transport matrix

mn x pq

1
0
C = T 0
0
0

mn x 1 pq x 1

Time = 52
The 4D transport matrix

mn x pq

0
1
C = T 0
0
0

mn x 1 pq x 1

Time = 53
The 4D transport matrix

mn x pq

0
0
C = T 1
0
0

mn x 1 pq x 1

Time = 54
The 4D transport matrix

mn x pq

C = T P

mn x 1 pq x 1

Time = 55
The 4D transport matrix
mn x pq

C = T P

mn x 1 pq x 1

applying Helmholtz reciprocity...


pq x mn

C’ = TT P’

pq x 1 mn x 1

•This lets us relight the scene


–as viewed from the projector’s position, and
–not just as illuminated by a uniform point light, but
–as illuminated by a point source with arbitrary
directional control,
–i.e. as illuminated by a programmable video
projector

Time = 56
Example

conventional photograph dual photograph


with light coming from right as seen from projector’s position
© 2006 Marc Levoy

Video available at
http://graphics.stanford.edu/papers/dual_photography/

Time = 57
Properties of the transport matrix

• little interreflection → sparse matrix


• many interreflections → dense matrix
• convex object → diagonal matrix
• concave object → full matrix

Can we create a dual photograph entirely from diffuse reflections?

© 2006 Marc Levoy

Time = 58
Dual photography
from diffuse reflections

the camera’s view


© 2006 Marc Levoy

Time = 59
The relighting problem

Paul Debevec’s
Light Stage 3

• subject captured under multiple lights


• one light at a time, so subject must hold still
• point lights are used, so can’t relight with cast shadows
© 2006 Marc Levoy

Time = 60
The 6D transport matrix

Time = 61
The 6D transport matrix

Time = 62
The advantage of dual photography

• capture of a scene as illuminated by


different lights cannot be parallelized

• capture of a scene as viewed by different


cameras can be parallelized

© 2006 Marc Levoy

Time = 63
Measuring the 6D transport matrix
mirror array
projector

scene

Time = 64
Relighting with complex illumination
camera array
projector

pq x mn x uv

C’ = TT P’

scene pq x 1 mn x uv x 1

• step 1: measure 6D transport matrix T


• step 2: capture a 4D light field
• step 3: relight scene using captured light field
© 2006 Marc Levoy

Time = 65
Running time

• the different rays within a projector can in fact


be parallelized to some extent

• this parallelism can be discovered using a


coarse-to-fine adaptive scan

• can measure a 6D transport matrix in 5 minutes

© 2006 Marc Levoy

•5 minutes
–using a video-rate camera
–and (effectively) measuring 1M x 1M transport
entries
–for scenes having average amounts of diffuse
interreflection
–everything depends on the density of the T matrix

Time = 66
Can we measure an 8D transport matrix?
projector array camera array

scene

•8D transport matrix


–the full scattering function
–if this were a surface, we’d call it the BSSRDF
–what should we call this? the bidirectional light
field transport distribution function (BLFTDF)?

Time = 67
C = T P

mn x 1 mn x pq pq x 1

http://graphics.stanford.edu © 2005 Marc Levoy

Time = 68
Computational Imaging
in the Sciences (and Medicine)

Marc Levoy
Due to copyright restrictions, some images have been removed from this version of the slides.
To see presentation with these images intact, go to:
http://graphics.stanford.edu/courses/cs448a-06-winter/
and look under the heading “Lectures for SIGGRAPH 2006 course on Computational Photography”.

Computer Science Department


Stanford University

•Based on lectures given in:


–Stanford CS 448A (Computational
Photography), Winter quarter 2006

34:15 total + 30% = ~45 minutes

Time = 1
Some examples

• medical imaging
– rebinning inspiration for light field rendering
9 – transmission tomography
– reflection tomography (for ultrasound)
• geophysics
9 – borehole tomography
– seismic reflection surveying
• applied physics
9 – diffuse optical tomography
9 – diffraction tomography 9 in this lecture
time-of-flight or wave-based
– inverse scattering ©2006 Marc Levoy

Time = 2
• biology
9 – confocal microscopy applicable at macro scale too
9 – deconvolution microscopy related to tomography

• astronomy
9 – coded-aperture imaging
– interferometric imaging
• airborne sensing
– multi-perspective panoramas
9 – synthetic aperture radar

©2006 Marc Levoy

Time = 3
• optics
9 – holography
– wavefront coding

©2006 Marc Levoy

Time = 4
Confocal scanning microscopy

light source
pinhole

© 2004 Marc Levoy

•typical reference:
–Corle, T.R.. Kino, G.S. Confocal Scanning
Optical Microscopy and Related Imaging
Systems,
Academic Press, 1996.
•if you introduce a pinhole
–only one point on the focal plane will be
illuminated

Time = 5
Confocal scanning microscopy

r
light source
pinhole

pinhole
photocell © 2004 Marc Levoy

•...and a matching optical system,


–hence the word confocal
•this green dot
–will be both strongly illuminated and sharply
imaged
•while this red dot
–will have less light falling on it by the square of
distance r,
–because the light is spread over a disk
–and it will also be more weakly imaged by the
square of distance r,
–because its image is blurred out over an disk on the
pinhole mask, and only a little bit is permitted through
•so the extent to which the red dot contributes to the final
image
–falls off as the fourth power of r, the distance from
the focal plane

Time = 6
Confocal scanning microscopy

light source

pinhole

pinhole
photocell © 2004 Marc Levoy

•of course, you’ve only imaged one point


–so you need to move the pinholes
–and scan across the focal plane

Time = 7
Confocal scanning microscopy

light source

pinhole

pinhole
photocell © 2004 Marc Levoy

Time = 8
Image removed due to
copyright restrictions

[UMIC SUNY/Stonybrook]

Image removed due to


copyright restrictions

•the object in the lower-right image is actually


spherical,
–but portions of it that are off the focal plane
are both blurry and dark,
–effectively disappearing

Time = 9
Synthetic confocal scanning
[Levoy 2004]

light source

→ 5 beams
→ 0 or 1 beams

©2006 Marc Levoy

•our goal
–is to approximate this effect at the large
scale
•we can understand the photometry of this setup
–using a simplified counting argument
•Reference:
–Levoy, M., Chen, B., Vaish, V., Horowitz,
M., McDowall, I., Bolas, M., Synthetic
aperture confocal imaging,
Proc. SIGGRAPH 2004.

Time = 10
Synthetic confocal scanning

light source

→ 5 beams
→ 0 or 1 beams

©2006 Marc Levoy

•5:0 or 5:1
–if we had 5 cameras as well as 5 projectors,
then the ratio would be 25:0 or 25:1

Time = 11
Synthetic confocal scanning

→ 5 beams
→ 0 or 1 beams

dof
• works with any number of projectors ≥ 2
• discrimination degrades if point to left of
• no discrimination for points to left of
• slow!
• poor light efficiency ©2006 Marc Levoy

•depth of field
–a microscoper would call it the axial
resolution
–to make the depth of field shallower,
spread out the projectors, i.e. a larger
synthetic aperture

Time = 12
Synthetic coded-aperture
confocal imaging

• different from coded aperture imaging in astronomy


• [Wilson, Confocal Microscopy by Aperture Correlation, 1996]

©2006 Marc Levoy

•Reference:
–Wilson, T., Juskaitis, R., Neil, M.A.A.,
Kozubek, M., Confocal microscopy by
aperture correlation,
Optics Letters, Vol. 21, No. 23, December 1,
1996.

Time = 13
Synthetic coded-aperture
confocal imaging

©2006 Marc Levoy

Time = 14
Synthetic coded-aperture
confocal imaging

©2006 Marc Levoy

Time = 15
Synthetic coded-aperture
confocal imaging

©2006 Marc Levoy

Time = 16
Synthetic coded-aperture
confocal imaging
100 trials
→ 2 beams × 50/100 trials = 1
→ ~1 beam × 50/100 trials = 0.5

©2006 Marc Levoy

Time = 17
Synthetic coded-aperture
confocal imaging
100 trials
→ 2 beams × 50/100 trials = 1
→ ~1 beam × 50/100 trials = 0.5

floodlit
→ 2 beams
→ 2 beams

trials – ¼ × floodlit
→ 1 – ¼ ( 2 ) = 0.5
→ 0.5 – ¼ ( 2 ) = 0

©2006 Marc Levoy

Time = 18
Synthetic coded-aperture
confocal imaging
100 trials
→ 2 beams × 50/100 trials = 1
→ ~1 beam × 50/100 trials = 0.5

floodlit
→ 2 beams
→ 2 beams

trials – ¼ × floodlit
→ 1 – ¼ ( 2 ) = 0.5
• works with relatively few trials (~16) → 0.5 – ¼ ( 2 ) = 0
• 50% light efficiency
• works with any number of projectors ≥ 2
• discrimination degrades if point vignetted for some projectors
• no discrimination for points to left of
• needs patterns in which illumination of tiles are uncorrelated
©2006 Marc Levoy

•note all the tildas in the formulas


–this algorithm is statistical in nature
–for example, if we flip a coin to decide whether to
illumninate a particular tile on a particular trial
–the binomial theorem tells us how much variability we’ll
get over a given number of trials
•the effect of this variability
–the image of our focal plane will be slightly non-uniform,
and
–objects off the focal plane won’t be entirely dark after the
confocal subtraction
•but for visual purposes
–our technique works well with a modest number of trials,
like 16
–far fewer than would be required to scan out the focal
plane, as in the usual confocal scanning algorithm
–we need patterns in which the illumination of different
tiles are spatially uncorrelated
•patterns
–Hadamard patterns don’t work well because
»they need to be square
»they are too structured, creating aliasing on foreground
objects

Time = 19
Example pattern

©2006 Marc Levoy

Time = 20
What are good patterns?
pattern one trial 16 trials

Time = 21
Patterns with less aliasing

multi-phase
sinusoids?
[Neil 1997]

©2006 Marc Levoy

•reference:
–Neil, M.A.A., Juskaitis, R., Wilson, T.,
Method of obtaining optical sectioning by
using structured light in a conventional
microscope,
Optics Letters, Vol. 22, No. 24, December
15, 1997.

Time = 22
Implementation
using an array of mirrors

©2006 Marc Levoy

•video projector is orange dot at right, off-axis


screen (not used) is red line, mirrors are yellow
lines, scene is green dot, focal planes are
superimposed yellow lines at green dot
•virtual projectors are orange dots at left
•this is an interactive program that let’s you adjust
all the parameters
•main tradeoff is between angular spread of rays
arriving at scene versus depth of field of patterns
arriving there

Time = 23
Implementation using an
array of mirrors

©2006 Marc Levoy

•URL of SIGGRAPH 2004 paper and movie is:


–http://graphics.stanford.edu/papers/confoc
al/

Time = 24
Confocal imaging in scattering media

• small tank
– too short for attenuation
– lit by internal reflections

© 2006 Marc Levoy

•I observed a confocal effect


–but it was modest
–theory said the effect should be stronger

Time = 25
Experiments in a large water tank

50-foot flume at Wood’s Hole Oceanographic Institution (WHOI)


© 2004 Marc Levoy

Time = 26
Experiments in a large water tank

• 4-foot viewing distance to target


• surfaces blackened to kill reflections
• titanium dioxide in filtered water
• transmissometer to measure turbidity © 2004 Marc Levoy

•titanium dioxide
–the stuff in white paint

Time = 27
Experiments in a large water tank

• stray light limits performance


• one projector suffices if no occluders
© 2004 Marc Levoy

Time = 28
Seeing through turbid water

floodlit scanned tile


© 2004 Marc Levoy

•this is very turbid water


–the “attenuation length” (a technical term
that roughly translates to “how far you can
see clearly”) is about 8 inches
–and I’m trying to see through 4 feet
•if you contrast enhance these images,
–you can see the improvement in signal-to-
noise ratio
•Reference:
–M. Levoy, Improving underwater vision
using confocal imaging. In preparation.

Time = 29
Other patterns

sparse grid
staggered grid

swept stripe
© 2004 Marc Levoy

•to speed things up,


–one can use patterns that illuminate several
tiles at once
–similar strategies have been used in
confocal microscopy
•the problem with this one
–the illumination beams (coming in from the
right side) intersect the lines of sight to
other tiles, degrading contrast
•here’s a pattern in which they don’t
–a staggered grid
•here’s another pattern in which they don’t
–a simple swept stripe
–with the light coming in from the right side

Time = 30
Other patterns

floodlit swept stripe scanned tile© 2004 Marc Levoy


•here’s how a swept stripe stacks up against the
other patterns
–somewhere in the middle, in terms of
quality
•a swept stripe has a number of advantages,
though...

Time = 31
Stripe-based illumination

• if vehicle is moving, no sweeping is needed!


• can triangulate from leading (or trailing) edge
of stripe, getting range (depth) for free

Image removed due to


copyright restrictions

© 2004 Marc Levoy

•no sweeping is needed


–the forward motion of the vehicle will sweep out
the stripe!
•in addition, since the stripe is coming from the side...
–essentially constitutes a structured-light
rangefinder
•this is not a new idea
–Jules Jaffe proposed it 14 years ago
»Jaffe, J.S., Computer modeling and the design of
optimal underwater imaging systems,
IEEE J. Oceanic Eng. 15(2), 101-111 (1990).
–but he had no technology to implement it
–compact video projectors provide that technology
•so we can easily envision
–video projectors being mounted on future
underwater vehicles
–this is the Hercules remotely operated vehicle
–exploring the wreck of the Titanic two months ago
in the North Atlantic

Time = 32
Application to
underwater exploration

Image removed due to


copyright restrictions

[Ballard/IFE 2004]
Image removed due to
copyright restrictions

© 2004 Marc Levoy

•so I think we can expect


–video projectors being mounted on future
underwater vehicles
–this is the Hercules remotely operated vehicle
–exploring the wreck of the Titanic two months ago
in the North Atlantic
–Pictures from Robert Ballard, Institute for
Exploration, 2004.
•the question is
–can you produce an overhead view like this, of the
Titanic
–in a single shot taken from far away using shaped
illumination
–rather than by mowing the lawn with the
underwater vehicle
–which is difficult, dangerous, time consuming, and
produces a mosaic with parallax errors

Time = 33
Shaped illumination in a
computer vision algorithm

transpose of the light field

• low variance within one block = stereo constraint


• sharp differences between adjacent blocks = focus constraint
• both algorithms are confused by occluding objects © 2006 Marc Levoy

Time = 34
Shaped illumination in a
computer vision algorithm

transpose of the light field

• confocal estimate of projector mattes → re-shape projector beams


• re-capture light field → run vision algorithm on new light field
• re-estimate projector mattes from model and iterate © 2006 Marc Levoy

Time = 35
Confocal imaging versus
triangulation rangefinding
• triangulation • confocal
– line sweep of W pixels or log(W) – point scan over W2 pixels or time
time sequence of stripes, W ≈ 1024 sequence of T trials, T ≈ 32-64
– projector and camera lines of sight – works if some fraction of aperture
must be unoccluded, so requires S is unoccluded, but gets noisier, max
scans, 10 ≤ S ≤ 100 aperture ≈ 90°, so 6-12 sweeps?
– one projector and camera – multiple projectors and cameras
– S log(W) ≈ 100-1000 – 6 T = 200-800 no moving parts

30º

90º
© 2006 Marc Levoy

Time = 36
The Fourier projection-slice theorem
(a.k.a. the central section theorem) [Bracewell 1956]

P‰(t
) G‰(ω)
Image removed due to
copyright restrictions

(from Kak)

• P‰(t) is the integral of g(x,y) in the direction ‰


• G(u,v) is the 2D Fourier transform of g(x,y)
• G‰(ω) is a 1D slice of this transform taken at ‰
• )-1 { G‰(ω) } = P‰(t) !
©2006 Marc Levoy

•References:
–Bracewell, R. N. Strip Integration in Radio
Astronomy,
Australian Journal of Physics, Vol. 9, 1956,
No. 2, pp. 198-217.
–Kak, A, Slaney, M., Principles of
Computerized Tomographic Imaging,
IEEE Press, 1988.

Time = 37
Reconstruction of g(x,y)
from its projections

P‰(t)
G‰(ω)
Image removed due to
copyright restrictions
G θ (ω ) = G(u, v) |θ
P‰(t, s)
+∞
Pθ (t) = ∫ Pθ (t, s) ds (from Kak)
−∞

• add slices G‰(ω) into u,v at all angles ‰ and inverse


transform to yield g(x,y), or
• add 2D backprojections P‰(t, s) into x,y at all angles
‰ ©2006 Marc Levoy

Time = 38
The need for filtering before
(or after) backprojection
v
1/ω |ω|

u
ω ω
hot spot correction

• sum of slices would create 1/ω hot spot at origin


• correct by multiplying each slice by |ω|, or
• convolve P‰(t) by )-1 { |ω| } before backprojecting
• this is called filtered backprojection
©2006 Marc Levoy

Time = 39
)-1 { |ω| } = Hilbert transform of (∂/ ∂t) P‰(t)
Image removed due to
= −1 / ( π t ) * (∂/ ∂t) P‰(t) copyright restrictions

= )-1{ lim
ε→0
ω e −ε ω }
Image removed due to
copyright restrictions
v
(from Bracewell)
1/ω |ω|

u Image removed due to


copyright restrictions
ω ω
hot spot correction

• sum of slices would create 1/ω hot spot at origin ~2nd derivative
• correct by multiplying each slice by |ω|, or
• convolve P‰(t) by )-1 { |ω| } before backprojecting
• this is called filtered backprojection

•Reference:
–Bracewell, R.N., The Fourier Transform
and its Applications,
2nd ed., McGraw-Hill, 1985.

Time = 40
Summing filtered
backprojections

Image removed due to


copyright restrictions

(from Kak)

©2006 Marc Levoy

Time = 41
Example of reconstruction
by filtered backprojection

Image removed due to Image removed due to


copyright restrictions copyright restrictions

X-ray sinugram

Image removed due to Image removed due to


copyright restrictions copyright restrictions

(from Herman)
filtered sinugram reconstruction
©2006 Marc Levoy

•Reference:
–Herman, G.T., Image Reconstruction from
Projections,
Academic Press, 1980.

Time = 42
More examples

CT scan
of head

volume the effect


renderings of occlusions
©2006 Marc Levoy

Time = 43
Shape from light fields
using filtered backprojection

backprojection occupancy

sinugram scene reconstruction


©2006 Marc Levoy

•Reference:
–M. Levoy, unpublished.
•but also see:
–Brady, D.J., Stack, R., Feller, S., Cull, E.,
Fernandez, L., Kammeyer, D., and Brady,
R., Information Flow in Streaming 3D
Video,
Proc. SPIE, Vol. CR76-13, 2000.

Time = 44
Relation to Radon Transform

P( r, θ )

‰ Image removed due to


copyright restrictions r ‰

• Radon transform
+∞
P( r, θ ) = ∫ g ( r cos θ − s sin θ, r sin θ + s cos θ ) ds
−∞

• Inverse Radon transform


2π +∞
1 1
g ( x, y ) = −
2π 2 ∫
0
dθ ∫
−∞
q
P1 ( x cos θ + y sin θ + q , θ ) dq

where P1 where is the partial derivative of P with respect to t


©2006 Marc Levoy

Time = 45
= [ B H 1 D1 P ] ( r , θ ) where
D1 is partial differenti ation wrt the 1st variab le,
H 1 is the Hilbert tr ansform wrt the 1st variab le, and
B is backprojec tion

• Radon transform
+∞
P( r, θ ) = ∫ g ( r cos θ − s sin θ, r sin θ + s cos θ ) ds
−∞

• Inverse Radon transform


2π +∞
1 1
g ( x, y ) = −
2π 2 ∫
0
dθ ∫
−∞
q
P1 ( x cos θ + y sin θ + q , θ ) dq

where P1 where is the partial derivative of P with respect to t

Time = 46
Higher dimensions

• Fourier projection-slice theorem in ·n


– Gξ(ω), where ξ is a unit vector in ·n, ω is the basis for a hyperplane
in ·n-1, and G contains integrals over lines
– in 2D: a slice (of G) is a line through the origin at angle ‰,
each point on )-1 of that slice is a line integral (of g) perpendicular to ‰
– in 3D: each slice is a plane through the origin at angles (‰,φ) ,
each point on )-1 of that slice is a line integral perpendicular to the plane
(from Deans)

• Radon transform in ·n
– P(r,ξ), where ξ is a unit vector in ·n, r is a scalar,
and P contains integrals over (n-1)-D hyperplanes Image removed due to
copyright restrictions
– in 2D: each point (in P) is the integral along the line (in g)
perpendicular to a ray connecting that point and the origin
– in 3D: each point is the integral across a plane
normal to a ray connecting that point and the origin ©2006 Marc Levoy

•Reference:
–Deans, S., The Radon Transform and Some
of its Applications,
Krieger, 1983.

Time = 47
Frequency domain volume rendering
[Totsuka and Levoy, SIGGRAPH 1993]

volume rendering

X-ray with with with


depth cueing directional depth cueing
shading and shading

©2006 Marc Levoy

•Reference:
–Totsuka, T. and Levoy, M., Frequency Domain
Volume Rendering,
Proc. SIGGRAPH 1993.
•Example depth cueing
–sinusoidal falloff = multiplication of volume by large
sinusoid
–= convolution of spectrum by F(sinusoid) =
convolution by spike = shifting the 3D spectrum
before extracting slice!
•Example directional shading
–we can’t compute |N.L| or max(N.L,0)
–Lambertian under hemispherical shading = ½ + ½
N.L, which smoothly maps N.L to 0..1
–N.L = first derivative of volume in direction x of
pole = directional first moment (x f(x)) of spectrum,
which again can be computed while extracting a slice

Time = 48
Other issues in tomography

• resample fan beams to parallel beams


• extendable (with difficulty) to cone beams in 3D
• modern scanners use helical capture paths
• scattering degrades reconstruction

©2006 Marc Levoy

Time = 49
Limited-angle projections

Image removed due to


copyright restrictions

(from Olson)

©2006 Marc Levoy

•Reference:
–Olson, T., Jaffe, J.S., An explanation of the
effects of squashing in limited angle
tomography,
IEEE Trans. Medical Imaging, Vol. 9, No.
3., September 1990.

Time = 50
Reconstruction using Algebraic
Reconstruction Technique (ART)

N
pi = ∑ wij f j , i = 1, 2, … , M
j =1

Image removed due to M projection rays


copyright restrictions
N image cells along a ray
pi = projection along ray i
fj = value of image cell j (n2 cells)
wij = contribution by cell j to ray i
(from Kak) (a.k.a. resampling filter)

• applicable when projection angles are limited


or non-uniformly distributed around the object
• can be under- or over-constrained, depending on N and M
©2006 Marc Levoy

Time = 51
Image removed due to
copyright restrictions

( k −1) f ( k −1) • ( wi − pi )
f (k )
= f − wi
wi • wi
Image removed due to
copyright restrictions

f ( k ) = k th estimate of all cells


wi = weights ( wi1 , wi 2 , … , wiN ) along ray i

Procedure
• make an initial guess, e.g. assign zeros to all cells
• project onto p1 by increasing cells along ray 1 until Σ = p1
• project onto p2 by modifying cells along ray 2 until Σ = p2, etc.
• to reduce noise, reduce by α Δf (k ) for α < 1
•Formula is derived in Kak, chapter 7, p. 278

Time = 52
• linear system, but big, sparse, and noisy
• ART is solution by method of projections [Kaczmarz 1937]
• to increase angle between successive hyperplanes, jump by 90°
• SART modifies all cells using f (k-1), then increments k
• overdetermined if M > N, underdetermined if missing rays
• optional additional constraints:
• f > 0 everywhere (positivity)
• f = 0 outside a certain area

Procedure
• make an initial guess, e.g. assign zeros to all cells
• project onto p1 by increasing cells along ray 1 until Σ = p1
• project onto p2 by modifying cells along ray 2 until Σ = p2, etc.
• to reduce noise, reduce by α Δf (k ) for α < 1
•SIRT = Simultaneous Itertative Reconstruction
Technique
•SART = Simultaneous ART

Time = 53
• linear system, but big, sparse, and noisy
• ART is solution by method of projections [Kaczmarz 1937]
• to increase angle between successive hyperplanes, jump by 90°
• SART modifies all cells using f (k-1), then increments k
• overdetermined if M > N, underdetermined if missing rays
• optional additional constraints:
• f > 0 everywhere (positivity)
• f = 0 outside a certain area

Image removed due to


copyright restrictions

(Olson)

•Reference:
–Olson, T., A stabilized inversion for limited
angle tomography. Manuscript.
–35 degrees missing

Time = 54
Image removed due to
copyright restrictions

Image removed due to


copyright restrictions

(Olson)

•Nonlinear constraints
–f = 0 outside of circle (oval?)

Time = 55
Shape from light fields
using iterative relaxation

©2006 Marc Levoy

•Reference:
–M. Levoy, unpublished
•but also see:
–DeBonet, J., Viola, P., Poxels:
responsibility weighted 3D volume
reconstruction,
Proc. ICCV 1999.

Time = 56
Borehole tomography

Image removed due to


copyright restrictions

(from Reynolds)

• receivers measure end-to-end travel time


• reconstruct to find velocities in intervening cells
• must use limited-angle reconstruction method (like ART)

©2006 Marc Levoy

•Reference:
–Reynolds, J.M., An Introduction to Applied
and Environmental Geophysics,
Wiley, 1997.

Time = 57
Applications

Image removed due to


copyright restrictions

mapping a seismosaurus in sandstone mapping ancient Rome using


using microphones in 4 boreholes and explosions in the subways and
explosions along radial lines microphones along the streets?

©2006 Marc Levoy

•Left picture is from Reynolds, right picture is from


Stanford’s Forma Urbis Romae project

Time = 58
From microscope light fields
to volumes
• 4D light field → digital refocusing →
3D focal stack → deconvolution microscopy →
3D volume data

Image removed due to Image removed due to


copyright restrictions copyright restrictions

• 4D light field → tomographic reconstruction →


3D volume data

Image removed due to


copyright restrictions

© 2006 Marc Levoy

•Reference:
–http://www.api.com/lifescience/DeltaVision
RT.html

Time = 59
3D deconvolution

[McNally 1999]

Image removed due to


copyright restrictions

focus stack of a point in 3-space is the 3D PSF of that imaging system

• object * PSF → focus stack


Image removed due to
• ) {object} × ) {PSF} → ) {focus stack} copyright restrictions

• ) {focus stack} ž ) {PSF} → ) {object}


• spectrum contains zeros, due to missing rays
• imaging noise is amplified by division by ~zeros
• reduce by regularization (smoothing) or completion of spectrum
• improve convergence using constraints, e.g. object > 0 © 2006 Marc Levoy

•Reference:
–McNally, J.G., Karpova, T., Cooper, J.,
Conchello, J.A., Three-Dimensional Imaging
by Deconvolution Microscopy,
Methods, Vol. 19, 1999.

Time = 60
Example: 15μ hollow fluorescent bead

conventional microscope

* =

light field microscope focal stack volumetric model

* =

© 2006 Marc Levoy

•Images from:
– Levoy, M., Ng, R., Adams, A., Footer, M.,
Horowitz, M., Light field microscopy,
ACM Transactions on Graphics (Proc.
SIGGRAPH), 25(3), 2006.

Time = 61
Silkworm mouth
(collection of B.M. Levoy)

slice of focal stack slice of volume volume rendering

© 2006 Marc Levoy

Time = 62
Legs of unknown insect
(collection of B.M. Levoy)

© 2006 Marc Levoy

Time = 63
Tomography and 3D deconvolution:
how different are they?
Fourier domain
• deconvolution
) ) -1
– 4D LF → refocusing → 3D spectrum → ž ){PSF} →
volume
• tomography
) ) -1

– 4D LF → 2D slices in 3D spectrum → ž |ω| → volume


spatial domain

• deconvolution
– 4D LF → refocusing → 3D stack → inverse filter → volume
• tomography
© 2006 Marc Levoy
4D LF b k j ti b k j ti filt l
•Full proof appears in [Levoy 2006] (previously
cited)

Time = 64
For finite apertures,
they are still the same

• deconvolution
– nonblind iterative deconvolution with
positivity constraint on 3D reconstruction

• limited-angle tomography
– Simultaneous Algebraic Reconstruction
Technique (SART) with same constraint

© 2006 Marc Levoy

Time = 65
Their artifacts are also the same

• tomography from limited-angle projections

Image removed due to


copyright restrictions

Image removed due to


copyright restrictions

[Delaney 1998]

• deconvolution from finite-aperture images

* =Image removed due to


copyright restrictions
Image removed due to
copyright restrictions

[McNally 1999]
© 2006 Marc Levoy

•References:
–Delaney, A.H., Bresler, Y., Globally
Convergent Edge-Preserving Regularized
Reconstruction: An Application to Limited-
Angle Tomography,
IEEE Transactions on Image Processing,
Vol. 7, No. 2, February 1998.
–others are previously cited

Time = 66
Diffraction tomography

Image removed due to


Image removed due to copyright restrictions
copyright restrictions

limit as λ → 0 (relative to
object size) is Fourier
(from Kak) projection-slice theorem

• Wolf (1969) showed that a broadband hologram gives the 3D


structure of a semi-transparent object
• Fourier Diffraction Theorem says ) {scattered field} = arc in
) {object} as shown above, can use to reconstruct object
• assumes weakly refractive media and coherent plane illumination,
must record amplitude and phase of forward scattered field ©2006 Marc Levoy
•Wolf
–applicable only to semi-transparent objects
–Wolf E 1969, Three-dimensional structure
determination of semi-transparent objects
from holographic data, Opt. Commun. 1
153–6

Time = 67
[Devaney 2005]

Image removed due to


copyright restrictions

Image removed due to


Image removed due to copyright restrictions
copyright restrictions

limit as λ → 0 (relative to
object size) is Fourier
(from Kak) projection-slice theorem

• Wolf (1969) showed that a broadband hologram gives the 3D


structure of a semi-transparent object
• Fourier Diffraction Theorem says ) {scattered field} = arc in
) {object} as shown above, can use to reconstruct object
• assumes weakly refractive media and coherent plane illumination,
must record amplitude and phase of forward scattered field
•measuring phase
–typically requires a reference beam and
interference between it and the main beam,
i.e. a holographic procedure
•Reference:
–Devaney, A., Inverse scattering and optical
diffraction tomography,
Powerpoint presentation.

Time = 68
Inversion by
filtered backpropagation

backprojection
backpropagation
Image removed due to
copyright restrictions

(Jebali)

• depth-variant filter, so more expensive than tomographic


backprojection, also more expensive than Fourier method
• applications in medical imaging, geophysics, optics
©2006 Marc Levoy

•Reference:
–Jebali, A., Numerical Reconstruction of
semi-transparent objects in Optical
Diffraction Tomography,
Diploma Project, Ecole Polytechnique,
Lausanne, 2002.

Time = 69
Diffuse optical tomography

Image removed due to


copyright restrictions

(Arridge)

• assumes light propagation by multiple scattering


• model as diffusion process (similar to Jensen01)

©2006 Marc Levoy

•References:
–Arridge, S.R., Methods for the Inverse
Problem in Optical Tomography,
Proc. Waves and Imaging Through Complex
Media, Kluwer, 307-329, 2001.
–Schweiger, M., Gibson, A., Arridge, S.R.,
“Computational Aspects of Diffuse Optical
Tomography,”
IEEE Computing, Vol. 5, No. 6, Nov./Dec.,
2003. (for image)
–Jensen, H.W., Marschner, S., Levoy, M.,
Hanrahan, P., A Practical Model for
Subsurface Light Transport,
Proc. SIGGRAPH 2001.

Time = 70
The optical diffusion equation

DΔ2φ ( x) = σ aφ ( x) − Q0 ( x) + 3DΔ • Q1 ( x) (from Jensen)

• D = diffusion constant = 1/3σ’t


where σ’t is a reduced extinction coefficient
• φ(x) = scalar irradiance at point x
• Qn(x) = nth-order volume source distribution, i.e.
Q0 ( x) = ∫ Q( x, ω )dω Q1 ( x) = ∫ Q( x, ω )ωdω
4π 4π
• in DOT, σa source and σt are unknown
©2006 Marc Levoy

Time = 71
Diffuse optical tomography

Image removed due to


Image removed due to
copyright restrictions
copyright restrictions

female breast with


sources (red) and
detectors (blue)

• assumes light propagation by multiple scattering


• model as diffusion process (similar to Jensen01)
• inversion is non-linear and ill-posed
• solve use optimization with regularization (smoothing)
©2006 Marc Levoy

•acquisition
–81 source positions, 81 detector positions
–for each source position, measure light at
all detector positions
–use time-of-flight measurement to estimate
initial guess for absorption, to reduce cross-
talk between absoprtion and scattering

Time = 72
Coded aperture imaging

Image removed due to Image removed due to


copyright restrictions copyright restrictions

(from Zand)
(source assumed infinitely distant)

• optics cannot bend X-rays, so they cannot be focused


• pinhole imaging needs no optics, but collects too little light
• use multiple pinholes and a single sensor
• produces superimposed shifted copies of source
©2006 Marc Levoy

•Reference:
–Zand, J., Coded aperture imaging in high
energy astronomy,
http://lheawww.gsfc.nasa.gov/docs/cai/coded
_intr.html

Time = 73
Reconstruction by
matrix inversion
⎡ d1 ⎤ ⎡ C1 C2 C3 C4 ⎤ ⎡ s1 ⎤ d=Cs
⎢d ⎥ ⎢C C C C ⎥ ⎢ s ⎥
⎢ 2⎥ = ⎢ 4 1 2 3⎥ ⎢ 2⎥ s = C-1 d
⎢ d 3 ⎥ ⎢C3 C4 C1 C2 ⎥ ⎢ s3 ⎥
⎢ ⎥ ⎢ ⎥⎢ ⎥ • ill-conditioned unless
⎣d 4 ⎦ ⎣C2 C3 C4 C1 ⎦ ⎣ s4 ⎦ auto-correlation of
detector mask source mask is a delta function
(0/1)

Image removed due to


copyright restrictions
Image removed due to
copyright restrictions
(from Zand)
collimators restrict source directions to
source larger than detector, those from which projection of mask
system underconstrained falls completely within the detector
©2006 Marc Levoy

Time = 74
Reconstruction
by backprojection

Image removed due to


copyright restrictions

(from Zand)

• backproject each detected pixel through each hole in mask


• superimposition of projections reconstructs source
• essentially a cross correlation of detected image with mask
• also works for non-infinite sources; use voxel grid
• assumes non-occluding source ©2006 Marc Levoy

•Another example:
–Carlisle, P., Coding aperture imaging
("Mmm, gumballs..."),
http://paulcarlisle.net/old/codedaperture.htm
l
–cross correlation is just convolution (of
detected image by mask) without first
reversing detected image in x and y
–conversion of blacks to -1’s in “decoding
matrix” just serves to avoid normalization of
resulting reconstruction
–performing this on an image of gumballs,
rather than a 3D gumball scene, is equivalent
to assuming the gumballs cover the sky at
infinity, i.e. they are an angular function
•assumes non-occluding source
–otherwise it’s the voxel coloring problem

Time = 75
Interesting techniques
I didn’t have time to cover

• reflection tomography
• synthetic aperture radar & sonar
• holography
• wavefront coding

©2006 Marc Levoy

Time = 76
Computational Illumination

Ramesh Raskar
Computational
Mitsubishi Electric Research Labs
Illumination

Course WebPage :
http://www.merl.com/people/raskar/photo/course/
Ramesh Raskar, Computational Illumination Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

Computational Photography Novel Illumination Computational Illumination:


Light Sources
Programmable 4D Illumination Field + Time + Wavelength

Novel Cameras • Presence or Absence


Generalized – Flash/No-
Flash/No-flash
Sensor • Light position
Programmable   – Multi-
Multi-flash for depth edges
Programmable  
Processing Generalized 4D Illumination field + 
4D Illumination field +  – Programmable dome (image re-re-lighting and matting)
Optics time + wavelength
time + wavelength • Light color/wavelength

• Spatial Modulation
– Synthetic Aperture Illumination
• Temporal Modulation
Display
– TV remote, Motion Tracking, Sony ID-
ID-cam, RFIG
• Exploiting (uncontrolled) natural lighting condition
– Day/Night Fusion
Recreate 4D Lightfield Scene: 8D Ray Modulator
Ramesh Raskar, Computational Illumination Ramesh Raskar, Computational Illumination

Flash and Ambient Images


Computational Illumination [ Agrawal, Raskar, Nayar, Li Siggraph05 ]

• Presence or Absence Ambient Flash Result Reflection Layer


– Flash/No-
Flash/No-flash
• Light position
– Multi-
Multi-flash for depth edges
– Programmable dome (image re-re-lighting and matting)
• Light color/wavelength

• Spatial Modulation
– Synthetic Aperture Illumination
• Temporal Modulation
– TV remote, Motion Tracking, Sony ID-
ID-cam, RFIG
• General lighting condition
– Day/Night
Ramesh Raskar, Computational Illumination

1
Capturing HDR Image
Varying Exposure time Varying Flash brightness Varying both
Flash-
Flash-Exposure
Sampling
Flash Brightness

Flash-
Flash-Exposure
HDR:
Varying both

Exposure Time Ramesh Raskar, Computational Illumination Ramesh Raskar, Computational Illumination

Denoising Challenging Images

Available light: Flash:


+ nice lighting + details
+ color
- noise/blurriness
- flat/artificial
- color

No-flash
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005
Flash
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

Elmar Eisemann and Frédo Durand , Flash Photography Enhancement via Intrinsic
Relighting

Georg Petschnigg, Maneesh Agrawala, Hugues Hoppe, Richard Szeliski, Michael Cohen,
Computational Illumination
Kentaro Toyama. Digital Photography with Flash and No-Flash Image Pairs

• Presence or Absence
Use no-flash image relight flash image – Flash/No-
Flash/No-flash
• Light position
– Multi-
Multi-flash for depth edges
– Programmable dome (image re-re-lighting and matting)
• Light color/wavelength

No-flash • Spatial Modulation


– Synthetic Aperture Illumination
• Temporal Modulation
– TV remote, Motion Tracking, Sony ID-
ID-cam, RFIG
• General lighting condition
– Day/Night
Flash Result
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, Computational Illumination

2
Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Synthetic Lighting Non-


Non-photorealistic Camera:
Paul Haeberli,
Haeberli, Jan 1992
Depth Edge Detection and Stylized Rendering using
Multi-
Multi-Flash Imaging

Ramesh Raskar, Karhan Tan, Rogerio Feris,


Jingyi Yu, Matthew Turk
Mitsubishi Electric Research Labs (MERL), Cambridge, MA
U of California at Santa Barbara
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 U of North Carolina at Chapel Hill

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Depth Edge Camera

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

3
Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

4
Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Depth Discontinuities

Internal and external


Shape boundaries, Occluding contour, Silhouettes

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Sigma = 9 Sigma = 5

Depth
Edges
Canny Intensity Edge Detection

Sigma = 1

Our method captures shape edges

5
Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Photo

Our Method
Canny Our Method

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Photo Result

Our Method Canny Intensity


Edge Detection

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Our Method

Canny Intensity Edge Detection

6
Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Imaging Geometry

Shadow lies along epipolar ray

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Imaging Geometry Imaging Geometry

m m

Shadow lies along epipolar ray, Shadow lies along epipolar ray,
Epipole and Shadow are on opposite sides of the edge Shadow and epipole are on opposite sides of the edge

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Input Normalized U{depth edges}


Depth Edge Camera
Left Flash Left / Max

Right Flash Right / Max

Light epipolar rays are horizontal or vertical

7
Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Input Normalized U{depth edges} Input Normalized Plot U{depth edges}

Left Flash Left / Max Left Flash Left / Max

Right Flash Right / Max Right Flash Right / Max

Negative transition along epipolar ray is depth edge

Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk Mitsubishi Electric Research Labs MultiFlash NPR Camera Raskar, Tan, Feris, Yu, Turk

Input Normalized Plot U{depth edges} % Max composite


maximg = max( left, right, top, bottom);

Left Flash Left / Max % Normalize by computing ratio images No magic


r1 = left./ maximg;
r3 = right ./ maximg;
r2 = top ./ maximg;
r4 = bottom ./ maximg;
parameters !
% Compute confidence map
v = fspecial( 'sobel' ); h = v';
d1 = imfilter( r1, v ); d3 = imfilter( r3, v ); % vertical sobel
d2 = imfilter( r2, h ); d4 = imfilter( r4, h ); % horizontal sobel

%Keep only negative transitions


silhouette1 = d1 .* (d1>0);
silhouette2 = abs( d2 .* (d2<0) );
silhouette3 = abs( d3 .* (d3<0) );
Right Flash Right / Max silhouette4 = d4 .* (d4>0);

%Pick max confidence in each


confidence = max(silhouette1, silhouette2, silhouette3, silhouette4);
Negative transition along epipolar ray is depth edge imwrite( confidence, 'confidence.bmp');

Debevec et al. 2002: ‘Light Stage 3’ Image-


Image-Based Actual Re-
Re-lighting
Debevec et al., SIGG2001
Light the actress in Los Angeles

Film the background in Milan,


Measure incoming light,

Matched LA and Milan lighting.

Matte the background

8
Photomontage

courtesy of A Agrawala

courtesy of P. Debevec

Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

From Jack Tumblin


“Light Waving”
Tech Sketch (Winnemoller, Mohan, Tumblin, Gooch)

Computational Illumination Synthetic Aperture Illumination: Comparison with


Long-range synthetic aperture photography
• Presence or Absence
– Flash/No-
Flash/No-flash
• Light position
– Multi-
Multi-flash for depth edges
– Programmable dome (image re-re-lighting and matting)
• Light color/wavelength

• Spatial Modulation
– Synthetic Aperture Illumination
• Temporal Modulation
– TV remote, Motion Tracking, Sony ID- • width of aperture 6’
ID-cam, RFIG
• General lighting condition • number of cameras 45
– Day/Night • spacing between cameras 5”
• camera’s field of view 4.5°
Ramesh Raskar, Computational Illumination © 2004 Marc Levoy

9
Synthetic aperture photography
The scene
using an array of mirrors

• distance to occluder 110’ • 11-megapixel camera (4064 x 2047 pixels)


• distance to targets 125’ • 18 x 12 inch effective aperture, 9 feet to scene
• field of view at target 10’ • 22 mirrors, tilted inwards → 22 views, each 750 x 500 pixels
© 2004 Marc Levoy © 2004 Marc Levoy

What does synthetic aperture


Synthetic aperture illumination
illumination look like?

• technologies
– array of projectors
– array of microprojectors
– single projector + array of mirrors

© 2004 Marc Levoy © 2004 Marc Levoy

What are good patterns?


pattern one trial 16 trials

10
Underwater confocal imaging
6-D Methods and beyond...
with and without SAP
Relighting with 4D Incident Light Fields Vincent Masselus, Pieter Peers,
Philip Dutre and Yves D. Willems SIGG2003

© 2004 Marc Levoy

Computational Illumination Demodulating Cameras


• Presence or Absence
– Flash/No-
Flash/No-flash
• Light position • Simultaneously decode signals from blinking
– Multi-
Multi-flash for depth edges LEDs and get an image
– Programmable dome (image re-re-lighting and matting) – Sony ID Cam
• Light color/wavelength – Phoci

• Spatial Modulation
– Synthetic Aperture Illumination • Motion Capture Cameras
• Temporal Modulation
– TV remote, Motion Tracking, Sony ID-
– Visualeyez™ VZ4000 Tracking System
ID-cam, RFIG
• General lighting condition – PhaseSpace motion digitizer
– Day/Night
Ramesh Raskar, Computational Illumination

Demodulating Cameras

• Decode signals from blinking LEDs + image


– Sony ID Cam
– Phoci

• Motion Capture Cameras

11
Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,
Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

R F I G Lamps :
Radio Frequency Identification Tags (RFID)
Interacting with a Self-
Self-describing World via
Photosensing Wireless Tags and Projectors

No batteries,
Antenna
Ramesh Raskar, Paul Beardsley, Jeroen van Baar, Yao Wang, Small size,
Paul Dietz, Johnny Lee, Darren Leigh, Thomas Willwacher

Cost few cents microchip


Mitsubishi Electric Research Labs (MERL), Cambridge, MA

Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,


Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Warehousing
Warehousing Routing
Routing Conventional Passive RFID

Memory Micro
Controller

Livestock
Livestock
tracking
tracking

Memory Micro Computer


Library
Library Controller

Baggage
Baggage READER
handling
handling

Currency
Currency

Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,


Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Tagged Books in a Library Where are boxes with


Products close to Expiry Date ?

9 Id
Easy to get list of books in RF range

No Precise Location Data


Difficult to find if the books in sorted
order ?
Which book is upside down ?

12
Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,
Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Conventional RF tag

Memory Micro Computer Photosensor


Controller Light
Projector
RF Data Micro Computer
Memory
READER Controller
Conventional RFID RF Data
READER

Photosensor
Photosensor ? Projector ?
Light
Memory Micro Computer
Controller
RF Data
READER
Compatible with Directional transfer,
RFID size and power AR with Image overlay
needs
Photo-sensing RF tag

Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,


Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

b.
b. Projector
Projector beams
beams aa time-varying
time-varying
a.
a. Photosensing
Photosensing RFID
RFID tags
tags pattern
pattern unique
unique for
for each
each (x,y)
(x,y) pixel
pixel
are
are queried
queried via
via RF
RF which
which is
is decoded
decoded byby tags
tags

c.
c. Tags
Tags respond
respond via
via RF,
RF, with
with date
date
d.
d. Multiple
Multiple users
users can
can and
and precise
precise (x,y)
(x,y) pixel
pixel location.
location.
simultaneously
simultaneously work
work from
from aa distance
distance Projector
Projector beams
beams ‘O’‘O’ or
or ‘X’
‘X’ at
at that
that
without
without RF
RF collision
collision location
location for
for visual
visual feedback
feedback

Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,


Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Prototype Tag
RFID
RF tag +
(Radio Frequency Identification) photosensor

RFIG

(Radio Frequency Id and Geometry)

13
Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,
Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Projected Sequential Frames Projected Sequential Frames

Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern
MSB
MSB MSB-1
MSB-1 LSB
LSB MSB
MSB MSB-1
MSB-1 LSB
LSB

•Handheld Projector beams binary coded stripes •Handheld Projector beams binary coded stripes
•Tags decode temporal code •Tags decode temporal code

Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,


Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Projected Sequential Frames Projected Sequential Frames

Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern
MSB
MSB MSB-1
MSB-1 LSB
LSB MSB
MSB MSB-1
MSB-1 LSB
LSB

•Handheld Projector beams binary coded stripes •Handheld Projector beams binary coded stripes
•Tags decode temporal code •Tags decode temporal code

Raskar, Beardsley, vanBaar, Wang, Raskar, Beardsley, vanBaar, Wang,


Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Projected Sequential Frames

Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern Pattern
Pattern
MSB
MSB MSB-1
MSB-1 LSB
LSB MSB
MSB MSB-1
MSB-1 LSB
LSB

00 11 11 00 00 X=12
X=12

•Handheld Projector beams binary coded stripes


For each tag
•Tags decode temporal code
a. From light sequence, decode x and y coordinate
b. Transmit back to RF reader (Id, x, y)

14
Raskar, Beardsley, vanBaar, Wang,
Mitsubishi Electric Research Labs R F I G Lamps Dietz, Lee, Leigh, Willwacher

Visual feedback of 2D position


Computational Illumination
• Presence or Absence
– Flash/No-
Flash/No-flash
a. Receive via RF {(x11,y11), (x22,y22), …} pixels • Light position
– Multi-
Multi-flash for depth edges
b. Illuminate those positions – Programmable dome (image re-re-lighting and matting)
• Light color/wavelength

• Spatial Modulation
– Synthetic Aperture Illumination
• Temporal Modulation
– TV remote, Motion Tracking, Sony ID-
ID-cam, RFIG
• Natural lighting condition
– Day/Night Fusion
Ramesh Raskar, Computational Illumination

A Night Time Scene: Enhanced Context :


Objects are Difficult to Understand due to Lack of Context All features from night scene are preserved, but background in clear

Dark Bldgs ‘Well-lit’ Bldgs

Reflections on Reflections in
bldgs bldgs windows

Unknown Tree, Street


shapes shapes
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

Night Image Background is captured from day-time


scene using the same fixed camera

Mask is automatically computed from


scene contrast

Result: Enhanced Image

Day Image Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

15
Pixel Blending

But, Simple Pixel Blending Creates


Ugly Artifacts

Our Method:
Integration of
blended Gradients
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

Nighttime image Gradient field Reconstruction from Gradient Field


x Y
I1 G1 G1
• Problem: minimize error |∇ I’ – G|
• Estimate I’ so that
Mixed gradient field
G
x
G
Y G = ∇ I’
Importance
image W
• Poisson equation
GX
I2
∇ 2 I’ = div G
x Y
I’
Final result

G2 G2
• Full multigrid GY
solver
Daytime image

Gradient field

Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

Video Enhancement using Fusion Details

– Video from fixed cameras – Combine day and night time images
• Night videos have low contrast, areas with no detail
• Improve low quality InfraRed video using high-quality visible video
• Same camera during day can capture static information
• Fill in dark areas, enhance change in intensity
• Dark areas of night video are replaced to provide context
• Output style: better context • Moving object (from night) + Static scene (from day)
– Current Demo
• Fusion of Night video and Daytime image
Modified
Modified Surveillance
Surveillance Camera
Camera
Night time
Day time
Video
Photograph
(or Photo)

Combine pixels depending on context,


image and temporal gradient

Enhanced
Night Video
(or Photo)
with context
Easy-to-understand
Original Video Frame
Non-photorealistic (NPR)
Image or Video
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

16
Video Enhancement Overview of Process
Original night time traffic camera Day time image: By averaging
320x240 video 5 seconds of day video

Input

Output
Enhanced video
Mask frame (for frame above): Note: exit ramp, lane dividers,
Encodes pixel with intensity change buildings not visible in original night
video, but clearly seen here.
Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005 Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

Algorithm
Frame N Gradient field

Mixed gradient field


TimeAveraged
importance mask

Processed binary mask

Final result
Gradient field

Frame N-1 Daytime image Ramesh Raskar, CompPhoto Class Northeastern, Fall 2005

17
D.2: Smart Optics, Modern
Sensors and Future Cameras

Ramesh Raskar
Mitsubishi Electric Research Labs
Jack Tumblin
Northwestern University

Course WebPage :
http://www.merl.com/people/raskar/photo

Computational Photography Novel Illumination
Light Sources

Novel Cameras
Generalized
Sensor

Processing Generalized
Optics

Display

Recreate 4D Lightfield Scene: 8D Ray Modulator

1
Future Directions

• Smart Lighting
– Light stages, Domes, Light waving, Towards 8D

• Computational Imaging outside Photography


– Tomography, Coded Aperture Imaging

• Smart Optics
– Handheld Light field camera, Programmable imaging/aperture

• Smart Sensors
– HDR Cameras, Gradient Sensing, Line-scan Cameras, Demodulators

• Speculations

Wavefront Coding:
10X Depth of Field

• In-focus:
small ‘Circle of Confusion’:

• Out-of-focus: LARGE
“circle of confusion’

• Coma-like distortion:
Make Circle MOVE
as focus changes:

http://www.cdm-optics.com/site/extended_dof.php

2
Wavefront Coding:
10X Depth of Field

• In-focus:
small ‘Circle of Confusion’:

• Out-of-focus: LARGE
“circle of confusion’

• Coma-like distortion
allows us to
De-convolve,
sharpen out-of-focus
items

http://www.cdm-optics.com/site/extended_dof.php

Light field photography using a


handheld plenoptic camera
Ren Ng, Marc Levoy, Mathieu Brédif,
Gene Duval, Mark Horowitz and Pat Hanrahan

3
Conventional versus light field
camera

Conventional versus light field camera

uv-plane st-plane

4
Conventional versus light field
camera

st-plane uv-plane

Prototype camera

Contax medium format camera Kodak 16-megapixel sensor

Adaptive Optics microlens array 125μ square-sided microlenses

4000 × 4000 pixels ÷ 292 × 292 lenses = 14 × 14 pixels


l

5
Digital refocusing

refocusing = summing windows


extracted from several microlenses

6
Example of digital refocusing

Extending the depth of field

conventional photograph, conventional photograph, light field, main lens at f / 4,


main lens at f / 4 main lens at f / 22 after all-focus algorithm
[Agarwala 2004]

7
Programmable Imaging

Detector

Computations New  Optics

Pixels

Vision Programmable Controller

Imaging Through Micro‐Mirrors
(Nayar,  Branzoi and Boult, 2004)
detector
scene xi

black
viewpoint
surface

ni
oi nb
mirror array

Geometry: Ray Orientation Photometry: Ray Attenuation

ni
t (    )
oi (   )  =  G (    )
xi ni ai (   )  =
xi
ni + t (    )
t (    ) nb

8
Digital Micromirror Device (DMD)
(by Texas Instruments)

DMD Array: Micromirror Architecture:

o o
‐10 10

14 um

DMD for Imaging: 
(Malbet et al. 95, Kearney et al. 98, Castracane et al. 99, Christensen et al. 02)

Programmable  Imaging  System

Imaging Lens DMD Electronics

Camera Electronics Tilted CCD Lens Focused on DMD

9
Modulation: Examples
Scene DMD Image Camera Image

* =

* =

* =

Optical Intra‐Pixel Feature Detection

f *g = f *g+ − f *g−

Laplacian:
CCD Pixels DMD Pixels 1 4      1 1 4      1 0      0      0

a    b    c 4    ‐20    4 = 4      0      4 ‐ 0     20     0


1      4      1 1      4      1 0      0      0
d    e    f
g    h    i Laplacian Image:
1 4      1 1 4      1 0      0      0 0      0      0
4      0      4 4      0      4 0     20     0 0     20     0
1      4      1

1 4      1
1      4      1

1 4      1
‐ 0      0      0

0      0      0
0      0      0

0      0      0
4      0      4 4      0      4 0     20     0 0     20     0
1      4      1 1      4      1 0      0      0 0      0      0

10
Optical Edge Detection

Scene Video Edge Video

Generalized Optics and Sensors

• Smart Optics
– Handheld Light field camera,
– Programmable imaging/aperture

• Smart Sensors
– HDR Cameras,
– Gradient Sensing,
– Line-scan Cameras,
– Demodulators

11
Future Directions

• Smart Lighting
– Light stages, Domes, Light waving, Towards 8D

• Computational Imaging outside Photography


– Tomography, Coded Aperture Imaging

• Smart Optics
– Handheld Light field camera, Programmable imaging/aperture

• Smart Sensors
– HDR Cameras, Gradient Sensing, Line-scan Cameras, Demodulators

• Speculations

Foveon: Thick Sensor

12
Pixim

High Dynamic Range

http://www.cybergrain.com/tech/hdr/

Fuji's SuperCCD S3 Pro camera has a chip with high and low sensitivity sensors
per pixel location to increase dynamic range

13
Gradient Camera

Sensing Pixel Intensity Difference with


Locally Adaptive Gain

Ramesh Raskar, MERL


Work with Jack Tumblin, Northwestern U,
Amit Agrawal, U of Maryland

Natural Scene Properties


Intensity Gradient
105 105

1 1
x x

Intensity Histogram Gradient Histogram

1 105 -105 105

14
Original Image Intensity Camera Image
Intensity values ranging from 0 to 1800 8 bit camera for 1:1000 range
Intensity ramp plus low contrast logo Problem: . saturation at high intensity regions

Log Camera Image Locally Adaptive Gain


Pixel divided by the average of local neighborhood.
8 bit log for 1:106 range
Thus the low frequency contents are lost and only
Problem: Visible quantization effects at high intensities
detail remains.

Gradient Camera Image


In proposed method, we sense intensity
differences. We use a 8 bit A/D with
resolution of log(1.02) to capture 2%
contrast change between adjacent pixels.
Notice that the details at both high and low
intensities are captured.

Gradient Camera

• Two main features


1. Sense difference between neighboring pixel intensity
At each pixel, measure (∇x , ∇y ) , ∇x = Ix+1,y - Ix,y , ∇y = Ix,y+1 - Ix,y
2. With locally adaptive gain

• Gradient camera is very similar to locally adaptive gain camera


• Locally Adaptive Gain Camera
– Gain is different for each pixel
– Problem: Loses low frequency detail and preserves only high frequency features (edges)
• Gradient Camera
– The gain is same for four adjacent pixels
– Difference between two pixels is measured with same gain on both pixels
– Reconstruct original image in software from pixel differences by solving a linear system
(solving Poisson Equation)

15
Camera Pipeline
On-board Hardware Software

Difference Local gain 2D Integration to


between adaptive to reconstruct the
pixels difference image

Detail Preserving

Intensity Camera Log Intensity Camera Gradient Camera

Intensity cameras capture detail but lose range


Log cameras capture range but lose detail

16
Quantization
Intensity Histogram

1 105
Gradient Histogram
Original Image Uniform
quantization 3 bits

-105 105

GradCam requires
fewer bits
In the reconstructed
image, error is pushed
to high gradient pixel
Log Uniform Log Uniform gradients positions which is
quantization 3 bits quantization 3 bits visually imperceptible

High Dynamic Range Images

Scene Intensity camera Gradient camera


saturation map saturation map

Intensity camera fail to capture range


Gradients saturate at very few isolated pixels

17
Line Scan Camera: PhotoFinish 2000 Hz

18
3D Cameras

• Time of flight
– ZCam (Shuttered Light Pulse)
• Phase Decoding of modulated illumination
– Canesta (Phase comparison)
– Phase difference = depth
– Magnitude = reflectance

• Structured Light
– Binary coded light and triangulation

ZCam (3Dvsystems), Shuttered Light Pulse

Resolution :
1cm for 2-
2-7 meters

19
Graphics can inserted behind and between characters

Canesta:
Canesta: Modulated Emitter

Phase ~ distance
Amplitude ~ reflectance

20
Demodulating Cameras

• Simultaneously decode signals from blinking


LEDs and get an image
– Sony ID Cam
– Phoci

• Motion Capture Cameras


– Visualeyez™ VZ4000 Tracking System
– PhaseSpace motion digitizer

21
Demodulating Cameras

• Decode signals from blinking LEDs + image


– Sony ID Cam
– Phoci

• Motion Capture Cameras

Fluttered Shutter Camera


Raskar, Agrawal, Tumblin Siggraph2006

22
Figure 2 results

Input Image

Rectified Image to make motion lines parallel to scan lines.

23
Approximate cutout of the blurred image containing the
taxi (vignetting on left edge). Exact alignment of cutout
with taxi extent is not required.

Image Deblurred by solving a linear system. No post-processing

Coded Exposure Photography


Raskar, Agrawal, Tumblin Siggraph2006

Short Exposure Traditional MURA Coded

24
Coded Exposure Photography

Discrete Fourier Transform of Convolving Filter

Converting Deblurring into a Well-posed Problem

25
Novel Sensors

• Gradient sensing
• HDR Camera, Log sensing
• Line-scan Camera
• Demodulating
• Motion Capture
• Fluttered Shutter
• 3D

Fantasy Configurations
• ‘Cloth-cam’: ‘Wallpaper-cam’
elements 4D light emission and 4D capture in
the surface of a cloth…

• Floating Cam: ad-hoc wireless networks form


camera arrays in environment…

• Other ray sets:


Multilinear cameras, canonical ‘basis’ cameras
(linear combination of 8 types)
McMillan’04, ‘05

26
Dream of A New Photography

Old New
• People and Time ~Cheap Precious
• Each photo Precious Free
• Lighting Critical Automated?
• External Sensors No Yes
• ‘Stills / Video’ Disjoint Merged
• Exposure Settings Pre-select Post-Process
• Exposure Time Pre-select Post-Process
• Resolution/noise Pre-select Post-Process
• ‘HDR’ range Pre-select Post-Process

Computational Photography Novel Illumination
Light Sources

Modulators
Novel Cameras
Generalized
Generalized Optics
Sensor

Processing Generalized
Optics 4D Incident Lighting
Ray  4D Ray Bender
Reconstruction Upto 4D 
Ray Sampler
4D Light Field

Display

Recreate 4D Lightfield Scene: 8D Ray Modulator

27
Film‐like  Computational   Photography
Photography 
with bits Computational   Camera Smart Light

Digital Computational Computational Computational Computational


Photography Processing Imaging/Optics Sensor Illumination

Image processing Processing of a set Capture of optically Detectors that Adapting and


applied to captured of captured images  coded images and  combine sensing  Controlling
images to produce to create “new” computational  and processing to Illumination to
“better” images. images. decoding to produce  create “smart”  Create ‘revealing’
“new?” images. pixels. image 

Examples: Examples: Examples: Examples: Examples:


Interpolation, Filtering,  Mosaicing, Matting, Coded Aperture, Artificial Retina, Flash/no flash, 
Enhancement, Dynamic  Super‐Resolution, Optical Tomography,
Retinex Sensors, Lighting domes,
Range Compression, Multi‐Exposure HDR, Diaphanography,
Color Management, Light Field from  SA Microscopy, Adaptive Dynamic Multi‐flash 
Morphing, Hole Filling,  Mutiple View,  Integral Imaging, Range Sensors,   for depth edges,
Artistic Image Effects,  Structure from Motion,  Assorted Pixels, Edge Detect Chips, Dual Photos,
Image Compression, Shape from X. Catadioptric Imaging, Focus of Expansion Polynomial texture
Watermarking. Holographic Imaging. Chips, Motion  Maps, 4D light
Sensors. source

Acknowledgements
• MERL, Northwestern Graphics Group
• Amit Agrawal
• Shree Nayar
• Marc Levoy
• Jinbo Shi
• Ankit Mohan, Holger Winnemoller

• Image Credits
– Ren Ng, Vaibhav Vaish,
Vaish, William Bennet
– Fredo Durand, Aseem Agrawala
– Morgan McGuire, Paul Debevec
– And more

28
Non-photorealistic Camera:
Depth Edge Detection and Stylized Rendering using Multi-Flash Imaging
Ramesh Raskar∗ Kar-Han Tan Rogerio Feris Jingyi Yu† Matthew Turk
Mitsubishi Electric Research Labs (MERL) UC Santa Barbara MIT UC Santa Barbara

Figure 1: (a) A photo of a car engine (b) Stylized rendering highlighting boundaries between geometric shapes. Notice the four spark plugs
and the dip-stick which are now clearly visible (c) Photo of a flower plant (d) Texture de-emphasized rendering.

Abstract duce visual clutter such as shadows and texture details [Gooch and
Gooch 2001]. The result is useful for imaging low contrast and
We present a non-photorealistic rendering approach to capture and geometrically complex scenes such as mechanical parts (Figure 1),
convey shape features of real-world scenes. We use a camera with plants or the internals of a patient (in endoscopy).
multiple flashes that are strategically positioned to cast shadows When a rich 3D model of the scene is available, rendering sub-
along depth discontinuities in the scene. The projective-geometric sets of view-dependent contours is a relatively well-understood task
relationship of the camera-flash setup is then exploited to detect in NPR [Saito and Takahashi 1990]. Extending this approach to real
depth discontinuities and distinguish them from intensity edges due scenes by first creating 3D scene models, however, remains diffi-
to material discontinuities. cult. In this paper, we show that it is possible to bypass geometry
We introduce depiction methods that utilize the detected edge acquisition, and directly create stylized renderings from images. In
features to generate stylized static and animated images. We can the place of expensive, elaborate equipment for geometry acquisi-
highlight the detected features, suppress unnecessary details or tion, we propose using a camera with a simple extension: multiple
combine features from multiple images. The resulting images more strategically positioned flashes. Instead of having to estimate the
clearly convey the 3D structure of the imaged scenes. full 3D coordinates of points in the scene, and then look for depth
We take a very different approach to capturing geometric fea- discontinuities, our technique reduces the general 3D problem of
tures of a scene than traditional approaches that require reconstruct- depth edge recovery to one of intensity step edge detection.
ing a 3D model. This results in a method that is both surprisingly Exploiting the imaging geometry for rendering results in a sim-
simple and computationally efficient. The entire hardware/software ple and inexpensive solution for creating stylized images from real
setup can conceivably be packaged into a self-contained device no scenes. We believe that our camera will be a useful tool for pro-
larger than existing digital cameras. fessional artists and photographers, and we expect that it will also
Keywords: non-photorealistic rendering, image enhancement, enable the average user to easily create stylized imagery.
depth edges
1.1 Overview
1 Introduction Our approach is based on taking successive photos of a scene, each
with a different light source close to and around the camera’s center
Our goal is to create stylized images that facilitate viewer com- of projection. We use the location of the shadows abutting depth
prehension of the shape contours of the objects depicted. Non- discontinuities as a robust cue to create a depth edge map in both
photorealistic rendering (NPR) techniques aim to outline the shapes static and dynamic scenes.
of objects, highlight the moving parts to illustrate action, and re- Contributions Our main contribution is a set of techniques for
detecting and rendering shape contours of scenes with low-contrast
∗ e-mail: [raskar,tan]@merl.com,[rferis,turk]@cs.ucsb.edu or high geometric complexity. Our technical contributions include
† email: jingyi@graphics.csail.mit.edu the following.
• A robust edge classification scheme to distinguish depth edges
from texture edges
• A collection of rendering and reconstruction techniques for
creating images highlighting shape boundaries from 2D data
without creating 3D representations, using qualitative depths
• An image re-synthesis scheme that allows abstraction of tex-
tured regions while preserving geometric features
• A technique to detect depth edges in dynamic scenes
rather than to detect depth edges. Depth discontinuities present dif-
ficulties for traditional stereo: it fails due to half-occlusions, i.e.,
occlusion of scene points in only one of the two views, which con-
fuse the matching process [Geiger et al. 1992]. Few techniques
try to model the discontinuities and occlusions directly [Birchfield
1999; Kang et al. 2001; Scharstein and Szeliski 2002]. Active il-
lumination methods, which generally give better results, have been
used for depth extraction, shape from shading, shape-time stereo
and photometric stereo but are unfortunately unstable around depth
discontinuities [Sato et al. 2001]. An interesting technique has
been presented to perform logical operations on detected inten-
sity edges, captured under widely varying illumination, to preserve
shape boundaries [Shirai and Tsuji 1972] but it is limited to uniform
Figure 2: Traditional image enhancement by improving (Left) albedo scenes. Using photometric stereo, it is possible to analyze
brightness and (Right) contrast. Low contrast depth edges remain the intensity statistics to detect high curvature regions at occluding
difficult to perceive. contours or folds [Huggins et al. 2001]. But the techniques assume
that the surface is locally smooth which fails for a flat foreground
We introduce the concept of a self-contained stylized imaging de- object like a leaf or piece of paper, or view-independent edges such
vice, a ‘non-photorealistic camera’, which can directly generate im- as corner of a cube. They detect regions near occluding contours
ages highlighting contours of geometric shapes in a scene. It con- but not the contours themselves.
tains a traditional camera and embedded flashes, and can be readily
Techniques for shape from shadow (or darkness) build a contin-
and inexpensively built. We attempt to address two important is-
uous representation (shadowgram) from a moving light source from
sues in NPR [Gooch and Gooch 2001] [Strothotte and Schlechtweg
which continuous depth estimates are possible [Raviv et al. 1989;
2002], detecting shape contours that should be enhanced and iden-
Langer et al. 1995; Daum and Dudek 1998]. However, it involves
tifying features that should be suppressed. We propose a new ap-
a difficult problem of estimating continuous heights and requires
proach to take image-stylization beyond the processing of a photo-
accurate detection of start and end of shadows. Good reviews of
graph, to actively changing how the photographs are taken.
shadow-based shape analysis methods are available in [Yang 1996]
The output images or videos can be rendered in many ways,
[Kriegman and Belhumeur 2001] [Savarese et al. 2001].
e.g., technical illustration, line art or cartoon-like style. We high-
A common limitation of existing active illuminations methods is
light depth discontinuities, suppress material and illumination tran-
that the light sources need to surround the object, in order to create
sitions, and create renderings with large, smoothly colored regions
significant shading and shadow variation from (estimated or known
outlined with salient contours [Durand 2002]. We describe several
3D) light positions. This necessitates a fixed lighting rig, which
applications: imaging complex mechanical parts, improving im-
limits the application of these techniques to industrial settings, and
ages for endoscopes, anatomical drawings and highlighting changes
they are impossible to build into a self-contained camera.
in a scene. Our approach shares the disadvantages of NPR: rele-
vant details may be lost as an image is simplified, so tunable ab- We believe our proposed method for extracting depth edges is
straction is needed (Section 3.3), and the usefulness of the output is complementary with many existing methods for computing depth
often difficult to quantify. and 3D surface shape, as depth edges often violate smoothness as-
sumptions inherent in many techniques. If the locations of depth
discontinuities can be reliably detected and supplied as input, we
1.2 Related Work believe that the performance of many 3D surface reconstruction al-
gorithms can be significantly enhanced.
NPR from images, rather than 3D geometric models has recently To find depth edges, we avoid the dependence on solving a corre-
received a great deal of attention. The majority of the available spondence problem or analyzing pixel intensity statistics with mov-
techniques for image stylization involve processing a single image ing lights, and we do not attempt to estimate any continuous value.
as the input applying morphological operations, image segmenta- In our search, we have not seen a photometric or other type of stereo
tion, edge detection and color assignment. Some of them aim for method successfully applied to complex scenes where the normals
stylized depiction [DeCarlo and Santella 2002] [Hertzmann 1998] change rapidly– such as a potted plant, or a scene with high depth
while others enhance legibility. Interactive techniques for stylized complexity or low intensity changes, such as a car engine or bone.
rendering such as rotoscoping have been used as well [Waking Life
2001; Avenue Amy 2002], but we aim to automate tasks where
meticulous manual operation was previously required. Our work 1.3 Outline
belongs to an emerging class of techniques to create an enhanced
image from multiple images, where the images are captured from Our method for creating a stylized image of a static scene consists
the same viewpoint but under different conditions, such as under of the following.
different illumination, focus or exposure [Cohen et al. 2003; Akers  Capture a series of images of the scene under shifted light
et al. 2003; Raskar et al. 2004]. positions
Aerial imagery techniques find shadow evidence by threshold-  Process these images to automatically detect depth edges
ing a single intensity image, assuming flat ground and uniform  Identify the subset of intensity edges that are illumination and
albedo to infer building heights [Huertas and Nevatia 1988; Irvin texture edges
and McKeown 1989; Lin and Nevatia 1998]. Some techniques im-  Compute qualitative depth relationships
prove shadow capture with novel shadow extraction techniques to  Enhance or simplify detected features for rendering
compute new shadow mattes [Chuang et al. 2003] or remove them  Insert processed scene appearance for stylization
to improve scene segmentation [Toyama et al. 1999]. Some other
techniques remove shadows without explicitly detecting them, such We use the term depth edges to refer to the C0 discontinuities
as using intrinsic images [Weiss 2001]. in a depth map. Depth edges correspond to internal or external oc-
Stereo techniques including passive and active illumination are cluding contours (or silhouettes) or boundaries of physical objects.
generally designed to compute depth values or surface orientation The depth edges recovered are signed: in the local neighborhood,
Shadow e3
Object
Shadowed
by 1 but
Ik not by 2 and 3

Pk e1 e2
ek Camera
Image

Figure 3: Imaging geometry. Shadows of the gray object are cast


along the epipolar ray. We ensure that depth edges of all orienta-
tions create shadow in at least one image while the same shadowed Figure 4: Detecting depth edges. (a) Photo (b) Ratio image (c) Plot
points are lit in some other image. along an epipolar ray, the arrows indicate negative transitions (d)
Detected edges

the side with lower depth value, foreground, is considered positive


while the opposite side is background and negative. Texture edges  Create a ratio image, Rk , where Rk (x) = Ik (x)/Imax (x)
are reflectance changes or material discontinuities. Texture edges • For each image Rk
typically delineate textured regions.
 Traverse each epipolar ray from epipole ek
In Section 2, we describe our approach to capturing important
features using a multi-flash setup. In Section 3, we discuss meth-  Find pixels y with step edges with negative transition
ods to use the information to render the images in novel styles. In  Mark the pixel y as a depth edge
Section 4, we address the problem of extending the technique to With a number of light sources (minimum 2, but typically 4 to 8
dynamic scenes. We describe our results in Section 5 and conclude are used) placed strategically around the camera, depth edges of
with discussion of limitations and future directions. all orientation with sufficient depth differences can be detected. In
each image, as long as the epipolar ray at a depth edge pixel is
not parallel to the image-space orientation of the depth edge, a step
2 Capturing Edge Features edge with negative transition (from lit part to shadowed part) will
be detected. If the depth edge is oriented along the epipolar ray, the
The image capturing process consists of taking successive pictures step edge cannot be detected.
of a scene with a point light source close to the camera’s center Let us look at the algorithm in detail. Note that, the image Ik has
of projection (COP). Due to a small baseline distance between the ambient component removed, i.e., Ik = Ik+ −I0 , where I0 is an image
camera COP and the light source, a narrow sliver of shadow appears taken with only ambient light and none of the n light sources on.
abutting each edge in the image with depth discontinuities; its width The base image is the maximum composite image, Imax, , which is
depends on the distance from the object edge to the background sur- an approximation of the image with light source at the camera COP,
face. By combining information about abutting cast shadow from and in general has no shadows from any of the n light sources. The
two or more images with distinct light source positions, we can find approximation is close if the n light sources are evenly distributed
the depth edges. around the camera COP, have the same magnitude and the baseline
is sufficiently smaller than the depth of the scene being imaged.
Consider the image of a 3D point X, given in camera coordinate
2.1 Depth Edges
system, imaged at pixel x. The intensity, Ik (x), if X is lit by the light
The method for detecting depth edges is the foundation for our ap- source at Pk ,, under lambertian assumption, is given by
proach. The idea is very simple, in retrospect. It allows us to clas-
 
sify other edges by a process of elimination. Ik (x) = µk ρ (x) L̂k (x) · N (x)
Our method is based on two observations regarding epipolar
shadow geometry, as shown in Figure 3. The image of the point Otherwise, Ik (x) is zero. The scalar µ k is the magnitude of the
light source at Pk is at pixel ek in the camera image, and is called light intensity and ρ (x) is the reflectance at X. L̂k (x) is the normal-
the light epipole. The images of the pencil rays originating at Pk are ized light vector Lk (x) = Pk − X, and N(x) is the surface normal, all
the epipolar rays originating at ek . (When Pk is behind the camera in the camera coordinate system.
center, away from the image plane, the epipolar rays wrap around
Thus, when X is seen by Pk , the ratio is as follows.
at infinity.) First, note that, a shadow of a depth edge pixel is con-
strained to lie along the epipolar ray passing through that pixel.  
Second, the shadow is observed if and only if the background pixel Ik (x) µk L̂x (x) · N (x)
Rk (x) = =   
is on the side of the depth edge opposite the epipole along the epipo- Imax (x) maxi µi L̂i (x) · N (x)
lar ray. Hence, in general, if two light epipoles lie on opposite sides
of an edge, a cast shadow will be observed at the depth edge in one It is clear that, for diffuse objects with nonzero albedo ρ (x),
image but not the other. Rk (x) is independent of the albedo ρ (x) and only a function of the
We detect shadows in an image by taking a ratio of the image local geometry. Further, if the light source-camera baseline|Pk | is
with the maximum composite of all the images. The ratio image ac- small compared to the distance  to the point, i.e., |X|  |Pk |, then
centuates shadows, which abut the depth edges, and de-emphasizes this ratio is approximately µk maxi (µi ), which is a constant for a
texture edges. During epipolar traversal in the ratio image, the entry set of omni-directional light sources in the imaging setup.
point of a shadowed region indicates a depth edge. The basic algo- The ratio values in (Rk = Ik /Imax ) are close to 1.0 in areas lit
rithm is as follows: Given n light sources positioned at P1 , P2 ...Pn , by light source k and close to zero in shadowed regions. (In gen-
• Capture ambient image I0 eral, the values are not zero due to interreflections). The intensity
• Capture n pictures Ik+ , k = 1..n with a light source at Pk profile along the epipolar ray in the ratio image shows a sharp nega-
• Compute Ik = Ik+ − I0 tive transition at the depth edge as we traverse from non-shadowed
• For all pixels x, Imax (x) = maxk (Ik (x)), k = 1..n foreground to shadowed background, and a sharp positive transi-
• For each image k, tion as we traverse from shadowed to non-shadowed region on the
Shadow Shadow
z2
Narrow
Object
T
z1 Detached
Shadow Shadow
d
Sliver
f Flash Flash
Camera B Camera B

Figure 6: (a) Relationship between baseline and width of shadow


Figure 5: A stylized imaging camera to capture images under four (b) Condition where shadow detaches
different flash conditions and our prototype.
depth edge cannot be detected in the ratio image Rk or a false pos-
background (Figure 4). This reduces the depth edge detection prob- itive when other conditions create spurious transitions in Rk . The
lem to an intensity step edge detection problem. A 1D edge detec- depth edges can be missed due to detached shadows, lack of back-
tor along the epipolar ray detects both positive and negative tran- ground, low albedo of background, holes and valleys, or if depth
sitions, and we mark the negative transitions as depth edges. As edges lie in shadowed region. The low albedo of background makes
mentioned earlier, since we are detecting a transition and not a con- it difficult to detect increase in radiance due to a flash, but this prob-
tinuous value, noise and interreflections only affect the accuracy of lem can be reduced with a higher intensity flash. The problems due
the position but not the detection of presence of the depth edge. to holes/valleys or shadowed depth edges, where the visible back-
In summary, there are essentially three steps: (a) create a ratio ground is shadowed for a majority of the flashes, are rare and fur-
image where the values in shadowed regions are close to zero; (b) ther reduced when the flash baseline is small. Below, we only dis-
carry out intensity edge detection on each ratio image along epipo- cuss the problem due to detached shadows and lack of background.
lar rays marking negative step edges as depth edges (c) combine the Some pixels may be mislabeled as depth edge pixels due to specu-
edge maps from all n images to obtain the final depth edge map. larities or near silhouettes of curved surfaces. We discuss both these
issues. We have studied these problems in detail and the solutions
will be provided in a technical report. Here we describe the main
Self-contained Prototype An ideal setup should satisfy the ideas.
constraint that each depth pixel be imaged in both conditions, the
negative side of the edge is shadowed at least in one image and not
shadowed in at least one other image. We propose using the follow- Curved surfaces The silhouettes on curved surfaces vary
ing configuration of light sources: four flashes at left, right, top and smoothly with change in viewpoint and the ratio Rk (x) is very low
bottom positions (Figure 5). near depth edges when the 3D contours corresponding to silhou-
This setup makes the epipolar ray traversal efficient. If the light ettes with respect to neighboring flash positions
 are sufficiently
 dif-
source is in the plane parallel to the image plane that contains the ferent. This is because the dot product L̂k (x) · N (x) ≈ 0 and the
center of projection, the light epipole is at infinity and the corre- dot product for  light
 sources on  the ‘opposite’ side will be larger
sponding epipolar rays are parallel in the image plane. In addition, L̂i (x) · N (x) > L̂k (x) · N (x) . Thus Rk (x) decreases rapidly even
we place the epipoles such that the epipolar rays are aligned with though the pixel is not in a shadowed region. However, as seen in
the camera pixel grid. For the left-right pair, the ray traversal is examples shown here, this is not a major issue and simply results in
along horizontal scan lines and for the top-bottom pair, the traver- a lower slope at the negative transition in Rk . Unlike the problems
sal is along vertical direction. below, it does not lead to a reversal of intensity gradient along the
epipolar ray.

2.2 Material Edges Tradeoff in choosing the baseline A larger baseline distance
In addition to depth edges, we also need to consider illumination between the camera and the flash is better to cast a wider detectable
and material edges in the image. Illumination edges are boundaries shadow in the image, but a smaller baseline is needed to avoid sep-
between lit and shadowed regions due to ambient light source(s), aration of shadow from the associated depth edge.
rather than the flashes attached to our camera. Since the individual The width  of the abutting shadow in the image is d =
images Ik , are free of ambient illumination, they are free of ambient f B (z2 − z1 ) (z1 z2 ), where f is the focal length, B is baseline in
illumination edges. In general, since material edges are indepen-
dent of illumination direction, they can be easily classified by a
process of elimination. Material edges are intensity edges of Imax
minus the depth edges.
This edge classification scheme works well and involves a mini-
mal number of parameters for tuning. The only parameters we need
are those for intensity edge detection of ratio images and Imax im-
age, to detect depth and material edges, respectively.

2.3 Issues
The technique we presented to detect depth edges is surprisingly Figure 7: (Left) Minimum composite of image with flash FS and
robust and reliable. We discuss the few conditions in which the ba- FL . (Right) Plot of intensity along a scanline due to FS , FL and
sic algorithm fails: a false negative when a negative transition at a min(IS , IL ).
Figure 8: Specularities and lack of background. First column: Imax and corresponding result showing artifacts. Second column: For the
yellow line marked on dumbbell (x=101:135); Top plot, Ile f t (red) with Imax (light blue). Bottom plot, ratio Rle f t . Note the spurious negative
transition in Rle f t , at the arrow, which gets falsely identified as a depth edge. Third column: Top plot, gradient of Ile f t (red), Iright (green),
Itop (blue) and Median of these gradients (black). Bottom plot, reconstructed intrinsic image (black) compared with Imax (light blue). Fourth
column: Top, intrinsic image. Bottom, resulting depth edge map. Fifth column: Top, Scene without a background to cast shadow. Bottom,
Edges of I0 /Imax , in white plus detected depth edges in red.

mm, and z1 , z2 are depths, in mm, to the shadowing and shadowed ages as seen in Figure 8. Although methods exist to detect specu-
edge. (See Figure 6) larities in a single image [Tan et al. 2003], detecting them reliably
Shadow detachment occurs
 when the width, T , of the object is in textured regions is difficult.
smaller than (z2 − z1 ) B z2 . So a smaller baseline, B, will allow Our method is based on the observation that specular spots shift
narrower objects (smaller T) without shadow separation. Fortu- according to the shifting of light sources that created them. We need
nately, with rapid miniaturization and sophistication of digital cam- to consider three cases of how specular spots in different light posi-
eras, we can choose small baseline while increasing the pixel reso- tions appear in each image: (i) shiny spots remain distinct (e.g., on
lution (proportional to f ), so that the product fB remains constant, highly specular surface with a medium curvature) (ii) some spots
allowing depth detection of narrow objects. overlap and (iii) spots overlap completely (e.g., on a somewhat
When camera resolutions are limited, we can exploit a hierar- specular, fronto-parallel planar surface). Case (iii) does not cause
chical baseline method to overcome this tradeoff. We can detect spurious gradients in ratio images.
small depth discontinuities (with larger baselines) without creat- We note that although specularities overlap in the input images,
ing shadow separation at narrow objects (using narrow baselines). the boundaries (intensity edges) around specularities in general do
In practice, we found two different baselines were sufficient. We, not overlap. The main idea is to exploit the gradient variation in the
however, now have to deal with spurious edges due to shadow sep- n images at a given pixel (x,y). If (x,y) is in specular region, in cases
aration in the image with larger baseline flash FL . The image with (i) and (ii), the gradient due to specularity boundary will be high in
smaller baseline flash, FS , may miss small depth discontinuities. only one or a minority of the n images under different lighting. The
How can we combine the information in those two images? There median of the n gradients at that pixel will remove this outlier(s).
are essentially four cases we need to consider at depth edges (Fig- Our method is motivated by the intrinsic image approach by [Weiss
ure 7) (a) FS creates a undetectable narrow shadow, FL creates a de- 2001], where the author removes shadows in outdoor scenes by not-
tectable shadow (b) FS creates a detectable small width shadow and ing that shadow boundaries are not static. We reconstruct the image
FL creates a larger width shadow. (c) FS creates detectable shadow by using median of gradients of input images as follows.
but FL creates a detached shadow that overlaps with FS shadow and • Compute intensity gradient, Gk (x, y) = ∇Ik (x, y)
(iv) same as (d) but the shadows of FS and FL do not overlap.
Our strategy is based on simply taking the minimum composite • Find median of gradients, G(x,y) = mediank (Gk (x, y))
of the two images. In the first three cases, this conveniently in- • Reconstruct image I  which minimizes |∇I  − G|
creases the effective width of the abutting shadow without creating Image reconstruction from gradients fields, an approximate invert-
any artifacts, and hence can be treated using the basic algorithm ibility problem, is still a very active research area. In R2 , a modified
without modifications. For the fourth case, a non-shadow region gradient vector field G may not be integrable. We use one of the di-
separates the two shadows in the min composite, so that the shadow rect methods recently proposed [Elder 1999] [Fattal et al. 2002].
in FL appears spurious. The least square estimate of the original intensity function, I  , so
Our solution is as follows. We compute the depth edges using FS that G ≈ ∇I’, can be obtained by solving the Poisson differential
and FL (Figure 7). We then traverse the epipolar ray. If the depth equation ∇2 I’ = div G, involving a Laplace and a divergence opera-
edge appears in FS (at D1 ) but not in FL we traverse the epipolar ray tor. We use the standard full multigrid method [Press et al. 1992] to
in FL until the next detected depth edge. If this depth edge in FL , solve the Laplace equation. We pad the images to square images of
there is no corresponding depth edge in FS , we mark this edge as a size the nearest power of two before applying the integration, and
spurious edge. then crop the result image back to the original size [Raskar et al.
The solution using min-composite, however, will fail to detect 2004]. We use a similar gradient domain technique to simplify sev-
minute depth discontinuities where even FL does not create a de- eral rendering tasks as described later.
tectable shadow. It will also fail for very thin objects where even FS The resultant intrinsic image intensity, I  (x, y) is used as the de-
creates a detached shadow. nominator for computing the ratio image, instead of the  max com-
posite, Imax (x, y). In specular regions, the ratio Ik (x, y) I  (x, y) now
Specularities Specular highlights that appear at a pixel in one is larger than 1.0. This is clamped to 1.0 so that the negative transi-
image but not others can create spurious transitions in the ratio im- tions in the ratio image do not lie in specular parts.
Figure 9: (a) A edge rendering with over-under style. (b) Rendering edges with width influenced by orientation. (c) and (d) Normal
Interpolation for toon rendering exploiting over-under mattes.

Lack of Background Thus far we assumed that depth edges edge orientation. Since the edge orientation in 3D is approximately
casting shadows on a background are within a finite distance. What the same as the orientation of its projection in image plane, the
if the background is significantly far away or not present? This turns thickness is simply proportional to the dot product of the image
out to be a simple situation to solve because in these cases only the space normal with a desired light direction (Figure 9(b)).
outermost depth edge, the edge shared by foreground and distant Color variation We can indicate color of original object by ren-
background, is missed in our method. This can be easily detected dering the edges in color. From signed edges, we pick up a fore-
with a foreground-background estimation technique. In Imax image ground color along the normal at a fixed pixel distance, without
the foreground pixels are lit by at least one of the flashes but in the crossing another depth or intensity edge. The foreground colored
ambient image, I0 , neither the foreground nor the background is lit edges can also be superimposed onto a segmented source image as
by any flash. Hence, the ratio of I0 /Imax , is near 1 in background seen in Figure 10(c).
and close to zero in interior of the foreground. Figure 8 shows
intensity edges of this ratio image combined with internal depth
edges. 3.2 Color Assignment
Since there is no 3D model of the scene, rendering non-edge pixels
requires different ways of processing captured 2D images.
3 Image Synthesis Normal interpolation For smooth objects, the depth edge cor-
responds to the occluding contour where the surface normal is per-
Contour-based comprehensible depiction is well explored for 3D
pendicular to the viewing direction. Hence the normals at depth
input models [DeCarlo et al. 2003] but not for photographs. In the
edges lie in the plane of the image and we can predict normals at
absence of a full 3D representation of the scene, we exploit the
other pixels. We solve this sparse interpolation problem by solving
following 2D cues to develop novel rendering algorithms.
a 2D Poisson differential equation. Our method is inspired by the
(a) The sign of the depth edge, Lumo [Johnston 2002] where the over-under mattes are manually
(b) Relative depth difference based on shadow width, created. In our case, signed depth edges allow normal interpolation
(c) Color near the signed edges, and while maintaining normal discontinuity at depth edges.
(d) Normal of a smooth surface at the occluding contour Image attenuation We accentuate the contrast at shape bound-
aries using an image attenuation maps (Figure 10(a)) as follows.
We aim to automate tasks for stylized rendering where meticulous Depth edges are in white on a black background. We convolve with
manual operation was originally required, such as image editing or a filter that is the gradient of an edge enhancement filter. Our filter
rotoscoping [Waking Life 2001] . is a Guassian minus an impulse function. When we perform a 2D
integration on the convolved image, we get a sharp transition at the
3.1 Rendering Edges depth edge.
Depicting Change Some static illustrations demonstrate action
We create a vectorized polyline representation of the depth edges e.g., changing oil in a car, by making moving parts in the fore-
by linking the depth edge pixels into a contour. The polyline is ground brighter. Foreground detection via intensity-based schemes,
smoothed and allows us to stylize the width and color of the con- however, is difficult when the colors are similar and texture is lack-
tour maintaining spatial coherency. While traversing the marked ing, e.g., detecting hand gesture in front of other skin colored parts
depth edge pixels to create a contour, at T-junctions, unlike tradi- (Figure 11). We take two separate sets of multi-flash shots, with-
tional methods that choose the next edge pixel based on orientation out and with the hand in front of the face to capture the reference
similarity, we use the information from the shadows to resolve the and changed scene. We note that any change in a scene is bounded
connected component. Two edge pixel are connected only if they by new depth edges introduced. Without explicitly detecting fore-
are connected in the intensity edges of all the n ratio images. ground, we highlight interiors of regions that contribute to new
Signed edges At the negative transition along the epipolar ray depth edges.
in the ratio image, Rk, the side of edge with higher intensity is the We create a gradient field where pixels marked as depth edges
foreground and lower intensity (corresponding to shadowed region) in changed scene but not in reference, are assigned a unit magni-
is background. This qualitative depth relationship can be used to tude gradient. The orientation matches the image space normal to
clearly indicate foreground-background separation at each edge. the depth edge. The gradient at other pixels is zero. The recon-
We emulate the over-under style used by artists in mattes. The structed image from 2D integration is a pseudo-depth map – least
foreground side is white while the background side is black. Both squared error solution via solving Poisson equation. We threshold
are rendered by displacing depth contour along the normal (Figure this map at 1.0 to get the foreground mask which is brightened.
9(a)). Note, the shadow width along the epipolar ray is proportional to
Light direction We use a commonly known method to convey the ratio of depth values on two sides of the edge. Hence instead
light direction by modifying the width of edges depending on the of a unit magnitude gradient, we could assign a value proportional
Figure 10: Color assignment. (a) Attenuation Map (b) Attenuated Image (c) Colored edges on de-emphasized texture

a mask image, γ , to attenuate the gradients away from depth edges.


The mask image is computed as follows.
γ (x, y) = a if (x, y) is a texture edge pixel
= a · d(x, y) if (x, y) is a featureless pixel
= 1.0 if (x, y) is a depth edge pixel
The factor d(x, y) is the ratio of the distance field of texture pixels
by the distance field of depth edge pixels. The distance field value
at a pixel is the Euclidean distance to the nearest (texture or depth)
edge pixel. As shown in Figure 12, the parameter a controls the
degree of abstraction, and textures are suppressed for a = 0. The
procedure is as follows.
• Create a mask image γ (x,y)
• Compute intensity gradient ∇I(x, y)
• Modify masked gradients G(x,y) = ∇I(x, y)γ (x,y)
• Reconstruct image I’ to minimize |∇I  − G|
Figure 11: Change Detection. (Left column) Reference image, • Normalize I  (x, y) colors to closely match I(x, y)
changed image, and pseudo depth map of new depth edges (Right)
Modified depth edge confidence map. The image reconstruction follows the solution of a Poisson equation
via a multi-grid approach as in the specularity attenuation technique
in Section 2.
to the logarithm of the shadow width along the epipolar ray to get
a higher quality pseudo-depth map. Unfortunately, we found that
the positive transition along the ray is not strong due to the use of a
non-point light source and interreflections. In principle, estimated
shadow widths could be used for say, tunable abstraction to elimi-
nate edges with small depth difference.

Figure 12: Tunable abstraction for texture de-emphasis. Depth edge


3.3 Abstraction followed by abstraction with a = 1, a = 0.5 and a = 0.
One way to reduce visual clutter in an image and emphasize object
shape is to simplify details not associated with the shape bound-
aries (depth edges) of the scene, such as textures and illumination
variations [Gooch and Gooch 2001]. Our goal is to create large 4 Dynamic Scenes
flat colored regions separated by strokes denoting important shape
boundaries. Traditional NPR approaches based on image segmen- Our method for capturing geometric features thus far requires tak-
tation achieve this by assigning a fixed color to each segment [De- ing multiple pictures of the same static scene. We examine the
Carlo and Santella 2002]. However, image segmentation may miss lack of simultaneity of capture for scenes with moving objects
a depth edge leading to merger of foreground and background near or a moving camera. Again, a large body of work exists for esti-
this edge into a single colored object. Although image segmenta- mating motion in image sequences, and a sensible approach is to
tion can be guided by the computed depth edges, the segmentation use the results from the static algorithm and apply motion compen-
scheme places hard constraint on closed contours and does not sup- sation techniques to correct the artifacts introduced. Finding op-
port smalls gaps in contours. We propose a method that is concep- tical flow and motion boundaries, however, is a challenging prob-
tually simple and easy to implement. lem especially in textureless regions [Papademetris and Belhumeur
Our method reconstructs image from gradients without those at 1996; Birchfield 1999]. Fortunately, by exploiting properties of our
texture pixels. No decision need to be made about what intensity unique imaging setup, in most cases, movement of depth edges in
values to use to fill in holes, and no feathering and blurring need be dynamic scenes can still be detected by observing the correspond-
done, as is required with conventional pixel-based systems. We use ing movement in shadowed regions. As in the static case, we bypass
the hard problem of finding the rich per-pixel motion representation min(Im−1 , Im+1 ) or max(Im−1 , Im+1 ) will both have a flat re-
and focus directly on finding the discontinuities i.e., depth edges in gion around the depth edge in frame m. Similarly, images
motion. The setup is similar to the static case with n flashes around min(Im−1 , Im , Im+1 ) and max(Im−1 , Im , Im+1 ) both are bound to
the camera, but triggered in a rapid cyclic sequence, one flash per have a flat region around texture edge in frame m. Since the cast
frame. We find depth edges in a given frame and connect edges shadow region at the depth edge in frame m is darker than the fore-
found in adjacent frames into a complete depth edge map. ground and background objects in the scene, the shadow is pre-
served in min(Im−1 , Im , Im+1 ) but not in max(Im−1 , Im , Im+1 ). This
leads to the following algorithm:

• Compute shadow preserving It = min(Im−1 , Im , Im+1 )


• Compute shadow free Id = min(Im−1 , Im+1 )
• Compute ratio image, Rm , where Rm = It /Id
• Traverse along epipolar ray from em and mark negative tran-
sition

This ratio image is free of unwanted transitions and the same epipo-
lar ray traversal method can be applied to localize the depth edges.
Figure 13 shows the algorithm in action. We tested the algorithm
with synthetic sequences to investigate the set of conditions under
which the algorithm is able to correctly localize the depth edges
and also experimented with this algorithm in real dynamic scenes.
An example frame from a dynamic sequence is shown in Figure 14.
A full stylized example with human subjects can be seen in the ac-
companying video. While we are very encouraged by the simplicity
of the algorithm as well as the results we were able to achieve with
it, the simplifying assumptions made about the monotonicity and
magnitude of motion are still fairly restrictive. For thin objects or
objects with high frequency texture, large motions between succes-
Figure 13: Depth edge detection for dynamic scenes. (Top) Three sive frames creates spurious edges. We plan to continue our in-
frames from multi-flash sequence of a toy example showing a red vestigation in this area and designing algorithms that require fewer
square with a green triangle texture moving from left to right. We assumptions and work under a wider range of conditions.
are interested in detecting the depth edge in frame m. A single
scan line shown in blue is used for the plots. (Middle) The three
scan lines plots. The position of the correct depth edge position
is indicated with a vertical blue line. (Bottom) Plot of minimum
composite and ratio images computed using the static and dynamic
algorithms. The motion induced unwanted edges in the static ratio
image but not in the dynamic ratio image. The correct depth edge
can then be detected from the ratio image using the same traversal
procedure as before.

Figure 14: (Left) A frame from a video sequence, shadows due to


left flash. (Right) Detected depth edges merged from neighboring
4.1 Depth Edges in Motion
frames.
To simplify the discussion, consider using just the left and right
flashes to find vertical depth edges. Images from three frames,
Im−1 , Im and Im+1 , from a toy example are shown in Figure 13. In
the sequence, a red square with a green triangle texture is shown
4.2 Edges and Colors
moving from left to right, and the three frames are captured under The depth edges in a given frame, m, are incomplete since they
left, right, and left flashes, as can be easily inferred from the cast span only limited orientations. In a dynamic scene a union of depth
shadows. edges from all n successive frames may not line up creating discon-
In presence of scene motion, it is difficult to reliably find shadow tinuous contours. We match signed depth edges corresponding to
regions since the base image to compare with, e.g., the max com- the same flash i.e., m and m + n and interpolate the displacement
posite, Imax , exhibits misaligned features. A high speed camera can for intermediate frames. To assign colors, we take the maximum of
reduce the amount of motion between frames but the lack of simul- three successive frames. Our video results can also be considered as
taneity cannot be assumed. tools for digital artists who traditionally use rotoscoping for finding
We make two simplifying assumptions (a) motion in image space shape boundaries in each frame.
is monotonic during the image capture from the start of frame m-1
to the end of frame m+1 and (b) the motion is also small enough
that the depth and texture edges in the frames do not cross, i.e., the 5 Implementation
motion is restricted to the spacing between adjacent edges on the
scan line. Our basic prototype makes use of a 4 MegaPixel Canon Power-
Due to the left-right switch in illumination, a shadow near shot G3 digital camera. The dynamic response in the images is lin-
a depth edge disappears in alternate frame images, Im−1 and earized. The four booster (slaved Quantarray MS-1) 4ms duration
Im+1 , while a moving texture edge appears in all three frames. flashes are triggered by optically coupled LEDs turned on sequen-
Monotonicity of motion without crossing over edges means tially by a PIC microcontroller, which in turn is interrupted by the
hot-shoe of the camera. Our video camera is a PointGrey Dragon-
Fly camera at 1024x768 pixel resolution, 15 fps which drives the
attached 5W LumiLeds LED flashes in sequence. We used a Lu-
mina Wolf endoscope with 480x480 resolution camera.
It takes 2 seconds to capture each image. Our basic algorithm
to detect depth edges executes in 5 seconds in C++ on a Pentium4
3GHz PC. The rendering step for 2D Poisson takes about 3 minutes.

6 Results
We show a variety of examples of real scenes, from millimeter scale
objects to room sized environments.

Figure 17: (Left) Intensity edge detection (Canny) for engine of


Figure 1(a). (Right Top) Depth map from 3Q scanner, notice the
jagged depth edges on the neck. (Right Bottom) Depth edge confi-
dence map using our technique.

values do not necessarily imply object boundaries, and vice versa


Figure 15: Room sized scene: Right flash image and depth edge [Forsyth and Ponce 2002]. The Canny edge detection or segmenta-
map. tion based NPR approaches unfortunately also fail in low-contrast
areas e.g., in the plant, bone or engine (Figure 17.left) example.
Objects and room sized scenes We examine imaging a me- The 3D scanner output is extremely high quality in the interior of
chanical (car engine, Figure 1(b)), organic (plant, Figure 1(d)) and objects as well as near the depth edges. But due to partial occlu-
anatomical (bone, Figure 9) object. For organic objects, such as sions, the depth edges are noisy (Figure 17).
flower plant, the geometric shape is complex with specular high-
lights, probably challenging for many shape-from-x algorithms.
Note the individual stems and leafs that are clear in the new syn- 7 Discussion
thesis. The white bone with complex geometry, is enhanced with
Feature capture For comprehensible imagery, other shape
different shape contour styles. In all these scenes, intensity edge
cues such as high curvature regions (ridges, valleys and creases)
detection and color segmentation produce poor results because the
and self-shadowing boundaries from external point light sources
objects are almost uniformly colored. The method can be easily
are also useful, and are not captured in our system. Our method is
used with room-sized scenes (Figure 15).
highly dependent on being able to detect the scene radiance con-
tributed by the flash, so bright outdoors or distant scenes are a
problem. Given the dependence on shadows of opaque objects,
our method cannot handle transparent, translucent, luminous, and
mirror like objects.
Many hardware improvements are possible. Note that the
depth edge extraction scheme could be used for spectrums other
than visible light that create ‘shadows’, e.g., in infrared, sonar, X-
rays and radars imaging. Specifically, we envision the video-rate
camera to be fitted with infrared light sources invisible to humans
Figure 16: (Left) Enhanced endoscope, with only left lights turned so the resulting flashes are not distracting. In fact, one can use a fre-
on; input image and depth edge superimposed image. (Right) quency division multiplexing scheme to create a single shot multi-
Skeleton and depth edge superimposed image. flash photography. The flashes simultaneously emit four different
colors (wavelength) and the Bayer mosaic like pattern of filters on
Milli-scale Scene Medical visualization can also benefit from the camera imager decodes the four separate wavelengths.
multi-flash imaging. We manipulated the two light sources avail-
able near the tip of an endoscopic camera. The baseline is 1mm for Applications of depth edges Detecting depth discontinuity
5mm wide endoscope (Figure 16.left). From our discussions with is fundamental to image understanding and can be used in many
medical doctors and researchers who with such images, extension applications [Birchfield 1999]. Although current methods rely pri-
to video appears to be a promising aid in examination [Tan et al. marily on outermost silhouettes of objects, we believe a complete
2004]. A similar technique can also be used in boroscopes that are depth edge map can benefit problems in visual hull, segmentation,
used to check for gaps and cracks inside inaccessible mechanical layer resolving and aspect graphs. Aerial imaging techniques [Lin
parts - engines or pipes. and Nevatia 1998] can improve building detection by looking at
Comparison with other strategies We compared our edge ren- multiple time-lapsed images of cast shadows from known sun di-
dering technique for comprehension with intensity edge detection rections before and after local noon. In addition, effects such as
using Canny operator, and segmentation. We also compared with depth of field effect during post-processing, synthetic aperture us-
active illumination stereo 3D scanning methods, using a state of ing camera array and screen matting for virtual sets (with arbitrary
the art 3Q scanner. Edges captured via intensity edge detection background) require high quality signed depth edges.
are sometimes superimposed on scenes to improve comprehension. Edge-based or area-based stereo correspondence can be im-
While this works in high contrast imagery, sharp changes in image proved by matching signed depth edges, constraining dynamic pro-
gramming to segments within depth edges and modifying correla- F ORSYTH , AND P ONCE. 2002. Computer Vision, A Modern Approach.
tion filters to deal with partial occlusions [Scharstein and Szeliski G EIGER , D., L ADENDORF, B., AND Y UILLE , A. L. 1992. Occlusions and Binocular
2002]. Edge classification can provide confidence map to assist Stereo. In European Conference on Computer Vision, 425–433.
color and texture segmentation in low-contrast images. Shape con- G OOCH , B., AND G OOCH , A. 2001. Non-Photorealistic Rendering. A K Peters, Ltd.,
tours can also improve object or gesture recognition [Feris et al. Natick.
2004]. H ERTZMANN , A. 1998. Painterly Rendering with Curved Brush Strokes of Multiple
Sizes. In ACM SIGGRAPH, 453–460.

8 Conclusion H UERTAS , A., AND N EVATIA , R. 1988. Detecting buildings in aerial images. Com-
puter Vision, Graphics and Image Processing 41, 2, 131–152.
We have presented a simple yet effective method to convey shape H UGGINS , P., C HEN , H., B ELHUMEUR , P., AND Z UCKER , S. 2001. Finding Folds:
boundaries by rendering new images and videos of real world On the Appearance and Identification of Occlusion . In IEEE CVPR, vol. 2, 718–
725.
scenes. We exploit the epipolar relationship between light sources
and cast shadows to extract geometric features from multiple im- I RVIN , R., AND M C K EOWN , D. 1989. Methods for exploiting the relationship be-
ages of a scene. By making use of image space discontinuity rather tween buildings and their shadows in aerial imagery. IEEE Transactions on Sys-
tems, Man and Cybernetics 19, 6, 1564–1575.
than relying on 3D scene reconstruction, our method can robustly
capture the underlying primitives for rendering in different styles. J OHNSTON , S. F. 2002. Lumo: Illumination for cel animation. In Proceedings of
We have presented basic prototypes, related feature capturing NPAR, ACM Press, 45–52.
and rendering algorithms, and demonstrated applications in tech- K ANG , S. B., S ZELISKI , R., AND C HAI , J. 2001. Handling occlusions in dense
nical illustration and video processing. Finally, since a depth edge multi-view stereo. In IEEE CVPR, vol. 1, 102–110.
is such a basic primitive, we have suggested ways in which this K RIEGMAN , D., AND B ELHUMEUR , P. 2001. What Shadows Reveal About Object
information can be used in applications beyond NPR. Structure . Journal of the Optical Society of America, 1804–1813.
Minor modification to camera hardware enables this method to L ANGER , M., D UDEK , G., AND Z UCKER , S. 1995. Space Occupancy using Multiple
be implemented in a self-contained device no larger than existing Shadow Images. International Conference on Intelligent Robots and Systems, 390–
digital cameras. We have proposed one possible approach to lever- 396.
aging the increasing sophistication of digital cameras to easily pro- L IN , C., AND N EVATIA , R. 1998. Building detection and description from a single
duce useful and interesting stylized images. intensity image. Computer Vision and Image Understanding: CVIU 72, 2, 101–
121.

Acknowledgements We thank the anonymous reviewers for PAPADEMETRIS , X., AND B ELHUMEUR , P. N. 1996. Estimation of motion boundary
location and optical flow using dynamic programming. In Proc. Int. Conf. on Image
useful comments and guidance. We thank Adrian Ilie, Hongcheng
Processing.
Wang, Rebecca Xiong, Paul Beardsley, Darren Leigh, Paul Dietz,
Bill Yerazunis and Joe Marks for stimulating discussions, James P RESS , W. H., T EUKOLSKY, S., V ETTERLING , W. T., AND F LANNERY, B. P. 1992.
Numerical Recipes in C: The Art of Scientific Computing . Pearson Education.
Kobler (MEEI), Takashi Kan and Keiichi Shiotani for providing
motivating applications, Narendra Ahuja and Beckman Institute R ASKAR , R., I LIE , A., AND Y U , J. 2004. Image Fusion for Context Enhancement
Computer Vision and Robotics Lab for suggestions and support, and Video Surrealism. In Proceedings of NPAR.
and many members of MERL for help in reviewing the paper. R AVIV, D., PAO , Y., AND L OPARO , K. A. 1989. Reconstruction of Three-dimensional
Surfaces from Two-dimensional Binary Images. In IEEE Transactions on Robotics
and Automation, vol. 5(5), 701–710.
References S AITO , T., AND TAKAHASHI , T. 1990. Comprehensible Rendering of 3-D Shapes.
A KERS , D., L OSASSO , F., K LINGNER , J., AGRAWALA , M., R ICK , J., AND H AN - In ACM SIGGRAPH, 197–206.
RAHAN , P. 2003. Conveying Shape and Features with Image-Based Relighting. In S ATO , I., S ATO , Y., AND I KEUCHI , K. 2001. Stability issues in recovering illumina-
IEEE Visualization. tion distribution from brightness in shadows. IEEE Conf. on CVPR, 400–407.
AVENUE A MY, 2002. Curious Pictures. S AVARESE , S., RUSHMEIER , H., B ERNARDINI , F., AND P ERONA , P. 2001. Shadow
Carving. In ICCV.
B IRCHFIELD , S. 1999. Depth and Motion Discontinuities. PhD thesis, Stanford
University. S CHARSTEIN , D., AND S ZELISKI , R. 2002. A taxonomy and evaluation of dense
two-frame stereo correspondence algorithms. In International Journal of Computer
C HUANG , Y.-Y., G OLDMAN , D. B., C URLESS , B., S ALESIN , D. H., AND S ZELISKI ,
Vision, vol. 47(1), 7–42.
R. 2003. Shadow matting and compositing. ACM Trans. Graph. 22, 3, 494–500.
S HIRAI , Y., AND T SUJI , S. 1972. Extraction of the Line Drawing of 3-Dimensional
C OHEN , M. F., C OLBURN , A., AND D RUCKER , S. 2003. Image stacks. Tech. Rep.
Objects by Sequential Illumination from Several Directions. Pattern Recognition
MSR-TR-2003-40, Microsoft Research.
4, 4, 345–351.
DAUM , M., AND D UDEK , G. 1998. On 3-D Surface Reconstruction using Shape from
S TROTHOTTE , T., AND S CHLECHTWEG , S. 2002. NonPhotorealistic Computer
Shadows. In CVPR, 461–468.
Graphics: Modeling, Rendering and Animation. Morgan Kaufmann, San Fran-
D E C ARLO , D., AND S ANTELLA , A. 2002. Stylization and Abstraction of Pho- cisco.
tographs. In Proc. Siggraph 02, ACM Press.
TAN , P., L IN , S., Q UAN , L., AND S HUM , H.-Y. 2003. Highlight Removal by
D E C ARLO , D., F INKELSTEIN , A., RUSINKIEWICZ , S., AND S ANTELLA , A. 2003. Illumination-Constrained Inpainting . In Ninth IEEE International Conference on
Suggestive contours for conveying shape. ACM Trans. Graph. 22, 3, 848–855. Computer Vision.
D URAND , F. 2002. An Invitation to Discuss Computer Depiction. In Proceedings of TAN , K., KOBLER , J., D IETZ , P., F ERIS , R., AND R ASKAR , R. 2004. Shape-
NPAR 2002. Enhanced Surgical Visualizations and Medical Illustrations with Multi-Flash Imag-
E LDER , J. 1999. Are Edges Incomplete? . International Journal of Computer Vision ing. In MERL TR/38.
34, 2/3, 97–122. T OYAMA , K., K RUMM , J., B RUMITT, B., AND M EYERS , B. 1999. Wallflower:
FATTAL , R., L ISCHINSKI , D., AND W ERMAN , M. 2002. Gradient Domain High Principles and Practice of Background Maintenance. In ICCV, 255–261.
Dynamic Range Compression. In Proceedings of SIGGRAPH 2002, ACM SIG- WAKING L IFE, 2001. Waking Life, the movie.
GRAPH, 249–256. W EISS , Y. 2001. Deriving intrinsic images from image sequences. In Proceedings of
F ERIS , R., T URK , M., R ASKAR , R., TAN , K., AND O HASHI , G. 2004. Exploit- ICCV, vol. 2, 68–75.
ing Depth Discontinuities for Vision-based Fingerspelling Recognition. In IEEE YANG , D. K.-M. 1996. Shape from Darkness Under Error. PhD thesis, Columbia
Workshop on Real-time Vision for Human-Computer Interaction (in conjunction University.
with CVPR’04).
Eurographics Symposium on Rendering (2005)
Kavita Bala, Philip Dutré (Editors)

Table-top Computed Lighting


for Practical Digital Photography

Ankit Mohan1† , Jack Tumblin1 , Bobby Bodenheimer2 , Cindy Grimm3 , Reynold Bailey3

1 Northwestern University, 2 Vanderbilt University, 3 Washington University in St. Louis

Abstract

We apply simplified image-based lighting methods to reduce the equipment, cost, time, and specialized skills
required for high-quality photographic lighting of desktop-sized static objects such as museum artifacts. We place
the object and a computer-steered moving-head spotlight inside a simple foam-core enclosure, and use a camera
to quickly record low-resolution photos as the light scans the box interior. Optimization guided by interactive user
sketching selects a small set of frames whose weighted sum best matches the target image. The system then repeats
the lighting used in each of these frames, and constructs a high resolution result from re-photographed basis
images. Unlike previous image-based relighting efforts, our method requires only one light source, yet can achieve
high resolution light positioning to avoid multiple sharp shadows. A reduced version uses only a hand-held light,
and may be suitable for battery-powered, field photography equipment that fits in a backpack.
Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional
Graphics and Realism I.4.1 [Image Processing and Computer Vision]: Digitization and Image Capture I.3.3 [Com-
puter Graphics]: Picture/Image Generation

1. Introduction We are especially interested in camera-assisted lighting for


human-scale, desktop-sized static objects. We want lighting
Modern digital cameras have made picture-taking much
that accurately reveals the shape, texture, materials, and most
easier and more interactive. However, lighting a scene for
visually meaningful features of the photographed item. In
good photography is still difficult, and practical methods
particular, we seek a method to help museum curators as they
to achieve good lighting have scarcely changed at all.
gather digital photographic archives of their vast collections
We show that sketch-guided optimization and simplified
of items.
forms of image-based lighting can substantially reduce the
cost, equipment, skill, and patience required for small-scale
Pioneering work in image-based lighting [DHT∗ 00,HCD01,
studio-quality lighting.
DWT∗ 02, MPDW03] offers promising approaches that can
Good studio lighting is difficult because it is a 4D inverse help with the photographic lighting problem. Unfortunately,
problem that photographers must solve by making succes- most require too many precise measurements and adjust-
sive approximations guided by years of experience. For non- ments for day-to-day use outside the laboratory. Precision
experts, good studio lighting can be surprisingly frustrating. is required to address more ambitious goals such as recov-
Most people can specify the lighting they want in screen ering shape, BRDF, and appearance under arbitrary view-
space (e.g., “get rid of this obscuring highlight, make some ing and lighting conditions. For the much smaller, yet more
shadows to reveal rough texture here, but fill in the shadows widespread problem of photographic lighting, we need a
there”), but determining what kind of lights to use, where to method that requires less time, expense, and complexity, yet
place them, and how to orient them is never easy. allows users who are not lighting experts to quickly find the
lighting they want.

† ankit@cs.northwestern.edu This paper offers three contributions. We extend existing


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

Figure 1: Light placement for obtaining high quality photographs can be extremely tedious and time consuming (left). Our
system use a simple setup with a steerable spotlight and an uncalibrated enclosure (center) to obtain results comparable to
professional lighting even when used by novice users (right).

image-based lighting ideas to reduce the required equip- the human face [DHT∗ 00], museum artifacts measured by a
ment to a single light source and single camera; we replace rotating-arm light stage [HCD01], an ingenious but exten-
trial-and-error light repositioning with optimization and on- sive system by Debevec et al. [DWT∗ 02] for real-time video
screen painting; and we reduce the need for high dynamic playback and measurement of light fields, a dome of elec-
range photography, thus reducing the capture time. The re- tronic flashes for real time image relighting [MGW01], a
sult is a novel and inexpensive system that a novice can use free form light stage to enable portable gathering of light-
to intuitively describe and obtain the desired lighting for a field data with some calibration [MDA02], and full 4D inci-
photograph. dent light measurements by Masselus et al. [MPDW03]. In
all of these cases, data-gathering required either customized
equipment or collection times much longer than would be
2. Related Work practical for photographic lighting.
Lighting has long been recognized as a hard problem in com- Three recent systems also offered novel sketch guided re-
puter graphics and many papers have explored optimization lighting from basis images. Akers et al. [ALK∗ 03] used a
for light placement and other parameters [SDS∗ 93, KPC93, robotic light-positioning gantry to gather precisely lit im-
PF95, CSF99, SL01]. Some of these systems used painting ages, and like us, provided a painting interface to guide re-
interfaces to specify desired lighting in a 3D scene [SDS∗ 93, lighting. But unlike us they used spatially varrying weights
PF95, PRJ97], and we use a similar approach to make light- that could produce physically impossible lighting. Digital
ing for photography more intuitive. The system by Shacked Photomontage [ADA∗ 04] used sketch guided graph-cut seg-
et al. [SL01] was even able to provide fully automatic mentation coupled with gradient domain fusion to seam-
lighting by applying image quality metrics. Marschner et lessly merge several photographs. They demonstrated merg-
al. [MG97] used inverse rendering techniques to estimate ing differenlty lit photographs to create novel illumination
and alter the directional distribution of incident light in a conditions. Though their interaction scheme worked well
photograph. However, all these systems require 3D informa- for a small number of images (∼10), it may be impracti-
tion unavailable in our photographic application. cal for the hundreds of images required for complete control
Several commercial photographic products have also used over lighting directions. Also, their system does nothing to
lighting enclosures similar to ours, but they achieve very help the user with light placement, and may produce phys-
soft lighting with limited user controls. Moreover, they do ically unrealizable results. Anrys and Dutre [AD04] used a
not help users solve light placement problems. These sys- Debevec-style light stage with around 40 fixed, low pow-
tems include diffusive tents [Pho], photo-boxes [MK ] and ered light sources and a painting interface to guide light-
translucent back-lit platforms with an array of individually ing. Their optimization only found light intensities, and light
dimmed light sources [Ast]. placement was still left up to the user. Also, their point
light sources could cause multiple shadows and highlights
Image-based methods have also been used to permit arbi- which might be undesirable for archival purposes. The data
trary relighting of well-measured objects. Most methods, in- capture time was high since they captured high-dynamic-
cluding ours, perform relighting using a weighted sum of range (HDR) photos for every light location.
differently lit basis images, done first by [NSD94]. How-
ever, prior efforts used more elaborate and expensive equip- Unlike previous attempts, our system does not require users
ment because their goals were different from ours. These in- to decide on correct or complete light source placement. This
clude measurement of a 4D slice of the reflectance field of is possible because our capture process is significantly dif-


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

object. Distance to the light affects foreshortening of shadow


shapes, but these effects are subtle and rarely noticed. Third,
they adjust lights to control shadow softness versus sharp-
ness. Light sources (or more accurately, the shadows they
θp φa form) become ‘softer’ by increasing the angular extent as
θa measured from the lit object. Fourth, they seek out light-
ing arrangements that produce a simple set of shadows and
highlights that best reveal the object’s shape, position, and
surface qualities. They avoid complex overlapping shadows,
φp lack of shadows due to overly-soft light, and contrast ex-
tremes due to large specular highlights or very dark shad-
Figure 2: All possible lighting angles parameterized by ows. Simpler shadows usually mean fewer lights, and thus
light position (θ p , φ p ) and direction (θa , φa ). Point light fewer basis images.
sources (on the left side of the hemisphere) result in mul- Accordingly, we use commercially available light sources
tiple hard shadows, while overlapping area (on the right) instead of custom or special-purpose devices. We place light
light sources can be used to simulate a larger light source. sources at a moderate distance (typically around 1 meter)
from the object. We use small-to-moderate area ‘soft’ light
sources instead of the much sharper point-like sources often
used in earlier approaches. Overlapped soft shadows blend
ferent, and better suited for the task of photography. We re-
far less noticeably than sharp shadows from the same light
quire less than five minutes to complete the initial image
positions (as shown in Figure 2), thus requiring fewer im-
capture and a few more minutes to get the final result. The
ages to avoid multiple shadow artifacts. Also, overlapping
equipment required is minimal and portable, and our hand-
area light sources can be combined to produce a larger area
held version can be carried in a backpack. Also, HDR cap-
light source.
ture is reduced to a minimum in our system.
Note that we do not need to know the light positions or their
absolute intensities for our images; we select weights wi and
3. Simplifications: HDR and 2D lighting images Ii by their ability to match the lighting target images
Our goal is to do what a good photographer does, but with a user sketches for us. Instead of calibration, we only need
computational help. We want to light a scene for a par- consistency in the aiming direction of a single, commer-
ticular photograph, not build a calibrated 4D data set to cially available steerable light, and consistency in the light
reconstruct every possible form of illumination. Photogra- response curve of a commercially available digital camera.
phers make consistent choices about which types of lights We also avoid the use of HDR photographs where possible,
to use, how to adjust them, and where to place them. We as these typically require multiple calibrated exposures and
will show how our streamlined image-based method follows computation to merge them [DM97]. Instead, we rely on the
these same choices. camera’s automatic exposure adjustments to capture what
Like most previous image-based lighting methods, we apply we call light-aiming images suitable for interactive lighting
the observations formalized by Nimeroff [NSD94] that lights design. We photograph high resolution basis images after-
and materials interact linearly. If a fixed camera makes an wards, for construction of the output image, and only resort
image Ii from a fixed scene lit only by a light Li , then the to HDR capture methods for a basis image with large over-
same scene lit by many lights scaled by weights wi will make exposed regions. Under-exposed regions can be ignored, as
an image Iout = ∑i wi Ii . Adjusting weights lets us “relight” their contributions are already invisible, and are further re-
the image, as if the weights modulate the lights rather than duced as their weights are less than one (wi ≤ 1).
the images. As we collect more images Ii , we can simulate Formally, arbitrary external illumination is four-dimensional
more lighting possibilities. for a desktop scene: L(θ p , φ p , θa , φa ) = L(Θ). Suppose that
the photographed object receives all its light from a hemi-
How many images do we really need to gather? We only
sphere of tiny, invisible, inward-pointing video projectors,
need enough images to span the kind of lighting a skilled
each at a distance r from the object. Each projector’s po-
photographer might explore to get good results in a photo
sition in desktop polar coordinates is (θ p , φ p ). Each projec-
studio. Several common practices in studio lighting can help
tor’s centermost pixel P(θa = 0, φa = 0) forms a ray that illu-
us.
minates the center point of our desktop, and in the projector’s
First, professional photographers choose lamps with broad, polar coordinates the other pixels are P(θa , φa ), as shown
nearly uniform beams of light, often with a reflector and lens in Figure 2. All projectors’ light output is the 4-D incident
to help direct more light forward. Second, they adjust light light field, and describes all possible lighting. To simulate all
placement angles carefully, but not their distances from the possible lighting, we would need a new image Ii to capture


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

light from each pixel of each video projector! Instead, we use


only broad beams of light (P(θa , φa ) ∼ = cos(θa )cos(φa )), reg-
ular sampling of light placement angles (θ p , φ p ), and specify
‘softer’ to ‘sharper’ shadows by varying the angular extent
(θ p , φ p ) as measured from the lit object. This angular extent
should not be confused with the lamp’s beam width (θa , φa );
in our ‘hemisphere of video projectors’ analogy, beam width
sets the image from a projector, but angular extent sets the
number of adjacent projectors that emit this same image.
In summary, rather than recreate arbitrary 4D incident light
fields, we use weighted sums of basis images that represent
the type of lighting used by professional photographers. This
method is much more practical and efficient, with little, if Figure 3: The disco-light setup. The object and disco light
any, loss of useful generality. are both enclosed in a white foam box, with the camera look-
ing in through a window in the enclosure wall farthest from
the light.
4. Method
We construct a high quality user-guided picture in three
steps. First the system automatically captures low-resolution
light-aiming photos for densely sampled lighting angles Computer control by the DMX512 protocol is easy to pro-
around the photographed object. These quick photos are gram with the SoundLight USB DMX controller. Our foam-
used only to guide the lighting design, not to form the final core enclosure resembles a hemi-cube around a pair of ta-
output. Second, the user iteratively paints the desired light- bles. We place the gimbal light on a small table that lowers
ing by simple lighten-darken operations to generate a target its rotation center to the plane of an adjacent taller table hold-
image. The system finds weights wi for each light-aiming ing the photographed object, as shown in Figure 3. Using ad-
photo such that their weighted sum matches the target image jacent but separate tables reduces vibration, permits gimbal
in the least-squares sense. Finally the system takes a few se- angles to approximate hemisphere angles, and separates the
lected high resolution basis images by relighting the scene object from the swiveling lamp. We place the camera behind
from light source positions that have weights wi greater than a small opening cut in the enclosure wall on the end farthest
a threshold. A weighted sum of these high resolution images from the light source.
gives the final result. If the result is not satisfactory, the user The system gathers aiming images rapidly and automati-
can sketch on the current result for use as the next iteration’s cally. Through the DMX512 controller we direct the gim-
target image. bal light to scan the upper hemisphere of light aiming direc-
tions in equal-angle increments as we record low-resolution
4.1. Enclosed Light Source & Aiming Images aiming images, either by collecting viewfinder video (320 ×
240@10Hz) or by individual computer-triggered pho-
Freed from photometric and angular calibration require- tographs using auto-exposure. We are able to record hun-
ments as discussed in Section 3, we are able to build a much dreds of individual aiming images per minute, and can com-
simpler and cost-effective controlled light source. We place plete all the data gathering in less than five minutes using a
the object and a gimbal-mounted moving-head spotlight in- Pentium 2GHz computer, and a Canon Powershot G3 cam-
side an enclosure of almost any convenient size, shape and era.
material. The powerful computer-aimed light pivots to any
desired pan and tilt angle with good repeatability (≤ ±0.5◦ ) To the best of our knowledge, no other image-based light-
to light any desired spot inside our enclosure. The enclosure ing work exploits these movable and controllable lights. En-
acts as a reflector, and effectively provides a controllable 2D closed pivoting lights retain many advantages of the more
area light source around the object. The size and shape of sophisticated lighting systems, avoid multiple sharp shad-
the enclosure is almost irrelevant as long as the light is close ows, can offer variable ‘softness’ by spot size adjustment,
enough to the object to keep parallax low, and the light is and are much simpler and cheaper to construct. Of course,
powerful enough for the camera to get a reasonable expo- they do not easily provide accurate lighting direction cali-
sure. bration or point-light illumination, but these features are not
needed for our goals.
We built a 1 × 1 × 1.5m3 sized box of white 1/2” foam-core
board as our enclosure, and chose an inexpensive moving- After recording, we linearize each captured frame (RGB) by
head spotlight. The 150-watt American DJ Auto Spot 150 applying the camera’s inverse response curve, recovered by
disco-light, shown in Figure 1 can tilt 270◦ , pan 540◦ , and the method of Debevec et al. [DM97], and converted to lu-
includes 9 color filters, gobos and several other fun features. minance values. Linear response ensures weighted sums of


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

whole images are accurate representations of physically real-


izeable lighting. We then down-sample the linearized aiming
image dataset to 64 × 64 for use as the aiming basis set for
the following optimization step.

4.2. Sketch-Guided Lighting Optimization

After gathering aiming images, users can interactively spec-


ify and refine lighting by sketching the desired intensity on
a target intensity image. This grey-scale image (examples in
Figure 5) approximates the final output image the user would
like to see. For editing the target image, the user starts off ei-
ther with a simple grey wash (such as uniform grey, or light
grey fading to dark grey across the image, etc.), or the pre-
vious iteration’s result. The user then carries out a series of Figure 4: Light source with attached foam-core diffuse re-
lighten and darken operations in the different regions of the flector used for hand-held data gathering.
image to approximate the desired results. The process is ex-
tremely simple and intuitive, and takes a few of minutes at
most.
4.3. Output Assembly
Given a target image, the optimization finds weights wi for
each aiming image that produces the best match to the target The user now has the desired visually pleasing, but low-
image. We take a constrained least-squares approach, solv- resolution, image that is a weighted sum of a small sub-
ing for weights wi for each of the small, luminance-only set of the linearized aiming images. For high-quality re-
aiming basis images. Let N be the number of images in the sults, we wish to replace each of these aiming images with
aiming image set, each of size m × n. We formulate the opti- an image taken at the maximum resolution available from
mization problem as follows: the camera. We re-take just those photos that correspond
to the aiming images with significant weights wi , again us-
min |Aw − t|2 ing auto-exposure on the camera, and record a set of high-
w
resolution photos called basis images. Recall that we can ex-
actly replicate the lighting using the gimballed spotlight; the
subject to 0 ≤ wi ≤ 1 ∀ i ∈ (1 . . . N) only things that change are the camera settings.
where w is the N-dimensional vector of weights, A is an We capture HDR photographs for images that contain large
(m × n) × N matrix of basis images (that is, each basis im- over-exposed regions as a result of the camera’s autoexpo-
age is treated as a vector), t is the (m × n) vector repre- sure. As discussed in Section 3, under-exposed regions do
senting the target image painted by the user, and |.| repre- not require HDR photos. We then linearize each basis image
sents the L2 norm of the vector. We solve this bound con- to remove effects of the camera response curve. As before,
strained quadratic optimization problem using an active set we construct a linear output image as a weighted sum of ba-
method [NW99]. The optimization is quite fast and takes sis images, using the weights determined by the optimization
around 1-2 minutes on a 2GHz Pentium 4 desktop machine. to match the target image. Finally, we re-apply the camera’s
response function to the linear output image to get the de-
The result is a least-squares optimal match to the supplied sired high resolution result.
target image. As the objective function is quadratic, weights
for images with weak contributions are rapidly driven to
zero. In our experience, the number of significant nonzero 5. Portable, Hand-held Method
weights is consistently small (5 − 15). This greatly reduces
the number of images needed for the final lighting solution. Even a foam-core box and a moving-head spotlight are im-
practical to carry around everywhere. However, the ‘Free-
After finding the wi weights, we apply them to the lin- form light-stage’ [MDA02] showed that it is possible to
earized color aiming images, then re-apply the camera re- gather calibrated image sets suitable for 2D relighting with
sponse function to display a preview of the output image. nothing more than four small light-probe-like spheres, a
The user then has the option of replacing the target with a digital camera on a tripod, a hand-held point-light source,
grayscale version of this result and can repeat the sketching possibly battery-powered, and approximately 30 minutes of
and optimization cycle until satisfied with the color preview time to take several hundred digital photographs. Pang et
of the output image. al. [PWH04] also used a similar approach by mounting a


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

camera on the light source and used camera calibration tech- target, the user guides the optimization, iteratively improv-
niques to estimate lighting directions with reasonable accu- ing the target until she gets the desired output. Figure 5(b)
racy. While these methods try to meet the ambitious goal of shows how simple approximate sketching on the target im-
incident light field capture, they would tax anyone’s patience age can give an interesting sidelighting effect. Figure 5(c)
to record more than just a few items. We present a faster and shows how the highlight can bring out the underlying tex-
simpler variant that serves our purposes better. ture in a surface.
In the method of Section 4, we required repeatable light Figure 5(d) shows lighting for a highly specular object. Good
source positioning. However, if we record all of our ‘aim- lighting for such smooth, highly reflective objects is always
ing images’ at the final output resolution, and if we either difficult, as the light source itself is visible in the reflec-
ignore over-exposed specular highlights or record high dy- tion. Our system produces results similar to the target image
namic range images when needed, then repeatability is not without large, objectionable saturated regions. In future sys-
needed. This allows us to use a hand-held light source in- tems we may hide the enclosure seams by constructing wide
stead. As shown in Figure 4, we use a small 250W hand-held smooth rounded corners resembling a photographer’s ‘cyc’.
light intended for television news cameras, attached to a dif-
Figure 5(f) shows results from the handheld method of Sec-
fuse reflector (foam core again), and limit the beam width
tion 5. The data gathering time was under 3 minutes, and
with barn-doors to form a well-defined area light source.
the results are comparable to the moving-head light method.
To gather all photos, we hold the light outstretched and While the handheld method is not practical for photograph-
“dance” (see video). We sample the hemisphere of lighting ing a large collection of objects, it can be an invaluable tool
directions by a polar-coordinate scan in φ -major order as the for well-lit photography in the field.
camera takes sequential photographs. A Nikon D70 camera,
takes a steady stream of photos at about 3 frames per second
7. Discussion and Future Work
using autoexposure for each frame. The user stands facing
the object, and holds the light at arms’ length while moving The ability to have large area light sources is crucial for pho-
the lamp in an arc that passes directly over the object. The tographing highly specular objects. Light source size also
user moves the lamp from one side of the table to the other, affects the sharpness of shadows and highlights. Our system
scanning by π radians in θ axis with constant φ , and the nat- has a unique advantage in that larger area light sources can
ural alignment of their shoulders helps aim the light’s cen- be simulated by combining pictures illuminated with over-
terline directly at the object. After each pass over the object lapping light sources. We could extend our optimization to
with the light, the user steps sideways to change the φ angle penalize each distinct light source cluster, thus preventing
for the next scan, and makes enough of these passes to cover disjoint highlights. The softness of the light can also be con-
0 ≤ φ < π radians. In practice the user can be more careless trolled by varying the beam width between a point-source
with the light, as long as the hemisphere of light directions and a large area source as it quickly sweeps over the hemi-
is well-sampled and the images are not over-exposed. After sphere of lighting directions. More advanced moving-head
the image capture dance is complete, we downsample all im- spotlights usually provide controllable spot sizes suitable for
ages to construct aiming photos, and proceed with the sketch this purpose.
guided lighting design as before.
Even though our system is aimed primarily at non-
We find this process is quite simple and pleasing, and in professional photographers, a few simple additions can make
under three minutes we can gather around 150 high-quality it a flexible tool for a creative expert to experiment with dif-
aiming/basis photos. An experienced user might not need to ferent lighting designs more easily. For example, the user
scan the whole hemisphere, but can quickly illuminate just might specify a simple weighting mask to set the impor-
from the useful and interesting lighting directions. tance of different image regions and influence the optimiza-
tion process. While weighting masks would make the sys-
tem more flexible, they would complicate the target sketch-
6. Results
ing process. We do not know yet if the results would warrant
Images in Figure 5 show results from our sketch guided the increase in complexity. Also, tools to directly tweak the
lighting system. Both the moving-head light and the hand- light position and size on a virtual hemisphere around the
held methods are equally successful at creating arbitrary object might also aid expert users.
cleanly-lit images of desktop-sized objects. The data sets
There are several possible ways of dealing with the ambient
gathered by either method is sufficiently dense to allow easy
light in the reflective enclosure. Underexposing all images
lighting design. Additionally, our system yields reasonable
using exposure compensation on the camera, using a larger
results even when presented with unrealistic targets or highly
enclosure or one made of materials with special reflective
reflective objects.
properties would greatly minimize the ambient component.
Figure 5(a), demonstrates a user interaction sequence with Finally, it might also be possible to explicitly subtract the
the system. Starting from a uniform grayscale image as the ambient term from the basis images.


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

This paper takes the problem of good lighting for desktop [HCD01] H AWKINS T., C OHEN J., D EBEVEC P.: A pho-
photography and finds a simple and practical solution us- tometric approach to digitizing cultural artifacts. In Pro-
ing image-based relighting techniques. More sophisticated ceedings of conference on Virtual reality, archeology, and
image-based measurements might also be achievable while cultural heritage (2001), pp. 333–342. 1, 2
maintaining the simplicity and elegance of the system. For [KPC93] K AWAI J. K., PAINTER J. S., C OHEN M. F.:
example, we could estimate the incoming light direction by Radioptimization: goal based rendering. In SIGGRAPH
calibrating the ad-hoc enclosure setup with a light-probe, or (1993), pp. 147–154. 2
by using dimensionality reduction [WMTG05] for the hand-
held case. Combined with surface normals, such calibration [MDA02] M ASSELUS V., D UTRÉ P., A NRYS F.: The
might suffice for image-based estimates of BRDF. free-form light stage. In Proceedings of the 13th Euro-
graphics workshop on Rendering (2002), pp. 247–256. 2,
5
8. Acknowledgements
[MG97] M ARSCHNER S. R., G REENBERG D. P.: Inverse
We would like to thank the computer graphics group at lighting for photography. In Proceedings of IS&T/SID
Northwestern University for their support and feedback and Fifth Color Imaging Conference (1997), pp. 262–265. 2
the anonymous reviewers for their helpful comments and
[MGW01] M ALZBENDER T., G ELB D., W OLTERS H.:
suggestions. Thanks to Jingjing Meng and Xing Hu for help
Polynomial texture maps. In SIGGRAPH (2001), ACM
in early stages of the project, Kelli Johnson and Nathan Mat-
Press, pp. 519–528. 2
suda for help with the construction of the enclosure, Vincent
Masselus and Amy Gooch for help with Figure 2, and Hol- [MK ] MK D IGITAL: Photo light box. http://www.
ger Winnemöller for help with the video. This research was mkdigitaldirect.com. 2
funded in part by NSF grants 0238062 and 0237621. [MPDW03] M ASSELUS V., P EERS P., D UTRÉ P.,
W ILLEMS Y. D.: Relighting with 4d incident light fields.
References In SIGGRAPH (2003), vol. 22, pp. 613–620. 1, 2

[AD04] A NRYS F., D UTRÉ P.: Image based lighting de- [NSD94] N IMEROFF J. S., S IMONCELLI E., D ORSEY J.:
sign. In The 4th IASTED International Conference on Vi- Efficient re-rendering of naturally illuminated environ-
sualization, Imaging, and Image Processing (2004). 2 ments. In Proceedings of the Fifth Eurographics Work-
shop on Rendering (1994), pp. 359–373. 2, 3
[ADA∗ 04] AGARWALA A., D ONTCHEVA M.,
AGRAWALA M., D RUCKER S., C OLBURN A., C URLESS [NW99] N OCEDAL J., W RIGHT S. J.: Numerical Opti-
B., S ALESIN D., C OHEN M.: Interactive digital pho- mization. Springer-Verlag New York, 1999. 5
tomontage. In SIGGRAPH (2004), vol. 23, pp. 294–302. [PF95] P OULIN P., F OURNIER A.: Painting surface char-
2 acteristics. In Proceedings of Eurographics Workshop on
[ALK∗ 03] A KERS D., L OSASSO F., K LINGNER J., Rendering (1995), pp. 160–169. 2
AGRAWALA M., R ICK J., H ANRAHAN P.: Conveying [Pho] P HOTEK: Digital lighthouse. http://www.
shape and features with image-based relighting. In IEEE photekusa.com. 2
Visualization (2003). 2
[PRJ97] P OULIN P., R ATIB K., JACQUES M.: Sketching
[Ast] A STRON S YSTEMS: Orbiculight lighting system. shadows and highlights to position lights. In Proceed-
http://www.astronsys.com. 2 ings of Conference on Computer Graphics International
[CSF99] C OSTA A. C., S OUSA A. A., F ERREIRA F. N.: (1997), pp. 56–63. 2
Lighting design: a goal based approach using optimisa- [PWH04] PANG W.-M., W ONG T.-T., H ENG P.-A.: Esti-
tion. In Proceedings of Eurographics Workshop on Ren- mating light vectors in real time. IEEE Computer Graph-
dering (1999), vol. 10, pp. 317–328. 2 ics and Applications 24, 3 (2004), 36–43. 5
[DHT∗ 00] D EBEVEC P., H AWKINS T., T CHOU C., [SDS∗ 93] S CHOENEMAN C., D ORSEY J., S MITS B.,
D UIKER H.-P., S AROKIN W., S AGAR M.: Acquiring the A RVO J., G REENBURG D.: Painting with light. In SIG-
reflectance field of a human face. In SIGGRAPH (2000), GRAPH (1993), pp. 143–146. 2
pp. 145–156. 1, 2
[SL01] S HACKED R., L ISCHINSKI D.: Automatic light-
[DM97] D EBEVEC P. E., M ALIK J.: Recovering high dy- ing design using a perceptual quality metric. Computer
namic range radiance maps from photographs. In SIG- Graphics Forum 20, 3 (2001), 215–226. 2
GRAPH (1997), vol. 31, pp. 369–378. 3, 4
[WMTG05] W INNEMÖELLER H., M OHAN A., T UM -
[DWT∗ 02] D EBEVEC P., W ENGER A., T CHOU C.,
BLIN J., G OOCH B.: Light waving: Estimating light posi-
G ARDNER A., WAESE J., H AWKINS T.: A lighting re-
tions from photographs alone. Computer Graphics Forum
production approach to live-action compositing. In SIG-
24, 3 (2005), to appear. 7
GRAPH (2002), pp. 547–556. 1, 2


c The Eurographics Association 2005.
Mohan, Tumblin, Bodenheimer, Grimm, Bailey / Computed Lighting for Digital Photography

(a) Sequence showing successive sketching/optimization iterations to get the desired lighting. The first result uses a constant grayscale target, while
the others use previous results as starting points for the target image.

(b) Strategic placement of highlights in the target result in an interesting side-lit (c) Positioning of highlights reveals underlying texture in the
image. surface.

(d) Lighting a highly specular object by forcing the (e) Target results in image suggesting illumination from the
background to be dark. right.

(f) Data captured by the handheld method. Image on the left uses a smooth grayscale gradient as the target image.

Figure 5: Sample target images and lit photographs.



c The Eurographics Association 2005.
Assorted Pixels:
Multi-Sampled Imaging with Structural Models

Shree K. Nayar and Srinivasa G. Narasimhan

Department of Computer Science, Columbia University


New York, NY 10027, U.S.A.
{nayar, srinivas}cs.columbia.edu
Abstract. Multi-sampled imaging is a general framework for using pix-
els on an image detector to simultaneously sample multiple dimensions
of imaging (space, time, spectrum, brightness, polarization, etc.). The
mosaic of red, green and blue spectral filters found in most solid-state
color cameras is one example of multi-sampled imaging. We briefly de-
scribe how multi-sampling can be used to explore other dimensions of
imaging. Once such an image is captured, smooth reconstructions along
the individual dimensions can be obtained using standard interpolation
algorithms. Typically, this results in a substantial reduction of resolution
(and hence image quality). One can extract significantly greater resolu-
tion in each dimension by noting that the light fields associated with
real scenes have enormous redundancies within them, causing different
dimensions to be highly correlated. Hence, multi-sampled images can be
better interpolated using local structural models that are learned off-
line from a diverse set of training images. The specific type of structural
models we use are based on polynomial functions of measured image in-
tensities. They are very effective as well as computationally efficient. We
demonstrate the benefits of structural interpolation using three specific
applications. These are (a) traditional color imaging with a mosaic of
color filters, (b) high dynamic range monochrome imaging using a mo-
saic of exposure filters, and (c) high dynamic range color imaging using
a mosaic of overlapping color and exposure filters.

1 Multi-Sampled Imaging
Currently, vision algorithms rely on images with 8 bits of brightness or color at
each pixel. Images of such quality are simply inadequate for many real-world
applications. Significant advances in imaging can be made by exploring the fun-
damental trade-offs that exist between various dimensions of imaging (see Figure
1). The relative importances of these dimensions clearly depend on the applica-
tion at hand. In any practical scenario, however, we are given a finite number
of pixels (residing on one or more detectors) to sample the imaging dimensions.
Therefore, it is beneficial to view imaging as the judicious assignment of resources
(pixels) to the dimensions of imaging that are relevant to the application.
Different pixel assignments can be viewed as different types of samplings of
the imaging dimensions. In all cases, however, more than one dimension is simul-
taneously sampled. In the simplest case of a gray-scale image, image brightness
and image space are sampled, simultaneously. More interesting examples result
from using image detectors made of an assortment of pixels, as shown in Figure
2 Nayar and Narasimhan

brightness

spectrum

space

time

polarization
depth

Fig. 1. A few dimensions of imaging. Pixels on an image detector may be assigned to


multiple dimensions in a variety of ways depending on the needs of the application.

2. Figure 2(a) shows the popular Bayer mosaic [Bay76] of red, green and blue
spectral filters placed adjacent to pixels on a detector. Since multiple color mea-
surements cannot be captured simultaneously at a pixel, the pixels are assigned
to specific colors to trade-off spatial resolution for spectral resolution. Over the
last three decades various color mosaics have been suggested, each one resulting
in a different trade-off (see [Dil77], [Dil78], [MOS83], [Par85], [KM85]).
Historically, multi-sampled imaging has only been used in the form of color
mosaics. Only recently has the approach been used to explore other imaging
dimensions. Figure 2(b) shows the mosaic of neutral density filters with different
transmittances used in [NM00] to enhance an image detector’s dynamic range.
In this case, spatial resolution is traded-off for brightness resolution (dynamic
range). In [SN01], spatially varying transmittance and spectral filters were used
with regular wide FOV mosaicing to yield high dynamic range and multi-spectral
mosaics. Figure 2(c) shows how space, dynamic range and color can be sampled
simultaneously by using a mosaic of filters with different spectral responses and
transmittances. This type of multi-sampling is novel and, as we shall show,
results in high dynamic range color images. Another example of assorted pixels
was proposed in [BE00], where a mosaic of polarization filters with different
orientations is used to estimate the polarization parameters of light reflected by
scene points. This idea can be used in conjunction with a spectral mosaic, as
shown in Figure 2(d), to achieve simultaneous capture of polarization and color.
Multi-sampled imaging can be exploited in many other ways. Figures 2(e)
shows how temporal sampling can be used with exposure sampling. This ex-
ample is related to the idea of sequential exposure change proposed in [MP95],
[DM97] and [MN99] to enhance dynamic range. However, it is different in that
the exposure is varied as a periodic function of time, enabling the generation of
high dynamic range, high framerate video. The closest implementation appears
to be the one described in [GHZ92] where the electronic gain of the camera is
varied periodically to achieve the same effect. A more sophisticated implementa-
tion may sample space, time, exposure and spectrum, simultaneously, as shown
in Figure 2(f).
The above examples illustrate that multi-sampling provides a general frame-
work for designing imaging systems that extract information that is most per-
Assorted Pixels 3

R G R G R G R G R G R G
G B G B G B G B G B G B
R G R G R G R G R G R G
G B G B G B G B G B G B
R G R G R G R G R G R G
G B G B G B G B G B G B
R G R G R G R G R G R G
G B G B G B G B G B G B
R G R G R G R G R G R G
G B G B G B G B G B G B
R G R G R G R G R G R G
G B G B G B G B G B G B

(a) (b)

R G R G R G R G R G R G R G R G R G R G R G R G
G B G B G B G B G B G B G B G B G B G B G B G B
R G R G R G R G R G R G R G R G R G R G R G R G
G B G B G B G B G B G B G B G B G B G B G B G B
R G R G R G R G R G R G R G R G R G R G R G R G
G B G B G B G B G B G B G B G B G B G B G B G B
R G R G R G R G R G R G R G R G R G R G R G R G
G B G B G B G B G B G B G B G B G B G B G B G B
R G R G R G R G R G R G R G R G R G R G R G R G
G B G B G B G B G B G B G B G B G B G B G B G B
R G R G R G R G R G R G R G R G R G R G R G R G
G B G B G B G B G B G B G B G B G B G B G B G B

(c) (d)

time R G R G R G R G R G R G
time
R G R
B G R
B G R
B G B
R G
G R
B G B
R G
G B R G B
R G B R G B
R G R G B
G B R G
G R
B G R
B G B G R G R
B G B G R
B G
G B
R G B
R G R G B
G B R G R G B
G B R G R G
G B
G R
B G B G R
R B G
G R
B G R B G B
B G R
R G R
G B R G
G B R G B
G B R G R G B
G B R G
G B G B G
R G B G R
B G
G B G R
B G B
R G B
R G R G B
G B R G R G B
G B R G B
R G
G R
B G R B G
B G R G R
B G R
B G R
B G B
R G R G B
R G R G B
G B R G B
R G B
R G
G R
B G R G R
B G B G R
B G
G R
B G R
B G B
R G B
R G R G B
G B R G B
R G B
R G B
G B G B G B G B G B G B

(e) (f)
Fig. 2. A few examples of multi-sampled imaging using assorted pixels. (a) A color
mosaic. Such mosaics are widely used in solid-state color cameras. (b) An exposure
mosaic. (c) A mosaic that includes different colors and exposures. (d) A mosaic using
color and polarization filters. (e), (f) Multi-sampling can also involve varying exposure
and/or color over space and time.
4 Nayar and Narasimhan

tinent to the application. Though our focus is on the visible light spectrum,
multi-sampling is, in principle, applicable to any form of electromagnetic radia-
tion. Therefore, the pixel assortments and reconstruction methods we describe in
this paper are also relevant to other imaging modalities such as X-ray, magnetic
resonance (MR) and infra-red (IR). Furthermore, the examples we discuss are
two-dimensional but the methods we propose are directly applicable to higher-
dimensional imaging problems such as ones found in tomography and microscopy.

2 Learned Structural Models for Reconstruction


How do we reconstruct the desired image from a captured multi-sampled one?
Nyquist’s theory [Bra65] tells us that for a continuous signal to be perfectly re-
constructed from its discrete samples, the sampling frequency must be at least
twice the largest frequency in the signal. In the case of an image of a scene,
the optical image is sampled at a frequency determined by the size of the de-
tector and the number of pixels on it. In general, there is no guarantee that
this sampling frequency satisfies Nyquist’s criterion. Therefore, when a tradi-
tional interpolation technique is used to enhance spatial resolution, it is bound
to introduce errors in the form of blurring and/or aliasing. In the case of multi-
sampled images (see Figure 2), the assignment of pixels to multiple dimensions
causes further undersampling of scene radiance along at least some dimensions.
As a result, conventional interpolation methods are even less effective.
Our objective is to overcome the limits imposed by Nyquist’s theory by using
prior models that capture redundancies inherent in images. The physical struc-
tures of real-world objects, their reflectances and illuminations impose strong
constraints on the light fields of scenes. This causes different imaging dimen-
sions to be highly correlated with each other. Therefore, a local mapping func-
tion can be learned from a set of multi-sampled images and their corresponding
correct (high quality) images. As we shall see, it is often beneficial to use multi-
ple mapping functions. Then, given a novel multi-sampled image, these mapping
functions can be used to reconstruct an image that has enhanced resolution in
each of the dimensions of interest. We refer to these learned mapping functions
as local structural models.
The general idea of learning interpolation functions is not new. In [FP99], a
probabilistic Markov network is trained to learn the relationship between sharp
and blurred images, and then used to increase spatial resolution of an image.
In [BK00], a linear system of equations is solved to estimate a high resolution
image from a sequence of low resolution images wherein the object of interest
is in motion. Note that both these algorithms are developed to improve spatial
resolution, while our interest is in resolution enhancement along multiple imaging
dimensions.
Learning based algorithms have also been applied to the problem of interpo-
lating images captured using color mosaics. The most relevant among these is
the work of Wober and Soini [WS95] that estimates an interpolation kernel from
training data (high quality color images of test patterns and their correspond-
ing color mosaic images). The same problem was addressed in [Bra94] using a
Bayesian method.
Assorted Pixels 5

We are interested in a general method that can interpolate not just color
mosaic images but any type of multi-sampled data. For this, we propose the use
of a structural model where each reconstructed value is a polynomial function
of the image brightnesses measured within a local neighborhood. The size of the
neighborhood and the degree of the polynomial vary with the type of multi-
sampled data being processed. It turns out that the model of Wober and Soini
[WS95] is a special instance of our model as it is a first-order polynomial applied
to the specific case of color mosaic images. As we shall see, our polynomial
model produces excellent results for a variety of multi-sampled images. Since it
uses polynomials, our method is very efficient and can be easily implemented
in hardware. In short, it is simple enough to be incorporated into any imaging
device (digital still or video camera, for instance).

3 Training Using High Quality Images


Since we wish to learn our model parameters, we need a set of high quality
training images. These could be real images of scenes, synthetic images generated
by rendering, or some combination of the two. Real images can be acquired using
professional grade cameras whose performance we wish to emulate using lower
quality multi-sampling systems. Since we want our model to be general, the set
of training images must adequately represent a wide range of scene features.
For instance, images of urban settings, landscapes and indoor spaces may be
included. Rotated and magnified versions of the images can be used to capture
the effects of scale and orientation. In addition, the images may span the gamut
of illumination conditions encountered in practice, varying from indoor lighting
to overcast and sunny conditions outdoor. Synthetic images are useful as one
can easily include in them specific features that are relevant to the application.

Fig. 3. Some of the 50 high quality images (2000 x 2000 pixels, 12 bits per color
channel) used to train the local structural models described in Sections 4, 5 and 6.

Some of the 50 high quality images we have used in our experiments are
shown in Figure 3. Each of these is a 2000 x 2000 color (red, green, blue) image
with 12 bits of information in each color channel. These images were captured
using a film camera and scanned using a 12-bit slide scanner. Though the total
6 Nayar and Narasimhan

number of training images is small they include a sufficiently large number of


local (say, 7 × 7 pixels) appearances for training our structural models.
Given such high quality images, it is easy to generate a corresponding set
of low-quality multi-sampled images. For instance, given a 2000 × 2000 RGB
image with 12-bits per pixel, per color channel, simple downsampling in space,
color, and brightness results in a 1000 × 1000, 8 bits per pixel multi-sampled
image with the sampling pattern shown in Figure 2(c). We refer to this process
of generating multi-sampled images from high quality images as downgrading.
With the high quality images and their corresponding (downgraded) multi-
sampled images in place, we can learn the parameters of our structural model. A
structural model is a function f that relates measured data M(x, y) in a multi-
sampled image to a desired value H(i, j) in the high quality training image:

H(i, j) = f (M(1, 1), ...., M(x, y), ....M(X, Y )) (1)

where, X and Y define some neighborhood of measured data around, or close


to, the high quality value H(i, j). Since our structural model is a polynomial, it
is linear in its coefficients. Therefore, the coefficients can be efficiently computed
from training data using linear regression.
Note that a single structural model may be inadequate. If we set aside the
measured data and focus on the type of multi-sampling used (see Figure 2), we
see that pixels can have different types of neighborhood sampling patterns. If
we want our models to be compact (small number of coefficients) and effective
we cannot expect them to capture variations in scenes as well as changes in the
sampling pattern. Hence, we use a single structural model for each type of local
sampling pattern. Since our imaging dimensions are sampled in a uniform man-
ner, in all cases we have a small number of local sampling patterns. Therefore,
only a small number of structural models are needed. During reconstruction,
given a pixel of interest, the appropriate structural model is invoked based on
the pixel’s known neighborhood sampling pattern.

4 Spatially Varying Color (SVC)


Most color cameras have a single image detector with a mosaic of red, green and
blue spectral filters on it. The resulting image is hence a widely used type of
multi-sampled image. We refer to it as a spatially varying color (SVC) image.
When one uses an NTSC color camera, the output of the camera is nothing but
an interpolated SVC image. Color cameras are notorious for producing inade-
quate spatial resolution and this is exactly the problem we seek to overcome
using structural models. Since this is our first example, we will use it to describe
some of the general aspects of our approach.

4.1 Bayer Color Mosaic


Several types of color mosaics have been implemented in the past [Bay76], [Dil77],
[Dil78], [MOS83], [Par85], [KM85]. However, the most popular of these is the
Bayer pattern [Bay76] shown in Figure 4. Since the human eye is more sensitive
to the green channel, the Bayer pattern uses more green filters than it does red
Assorted Pixels 7

and blue ones. Specifically, the spatial resolutions of green, red and blue are
50%, 25% and 25%, respectively. Note that the entire mosaic is obtained by
repeating the 2 × 2 pattern shown on the right in Figure 4. Therefore, given a
neighborhood size, all neighborhoods in a Bayer mosaic must have one of four
possible sampling patterns. If the neighborhood is of size 3 × 3, the resulting
patterns are p1,p2, p3 and p4 shown in Figure 4.

R G R G R G R G
p1
G B G B G B G B

R G R G R G 2 x 2 Pattern
p2
G B G B G B
p3
R G R G R G p4
G B G B G B

Fig. 4. Spatially varying color (SVC) pattern on a Bayer mosaic. Given a neighborhood
size, all possible sampling patterns in the mosaic must be one of four types. In the case
of a 3 × 3 neighborhood, these patterns are p1, p2, p3 and p4.

4.2 SVC Structural Model


From the measured SVC image, we wish to compute three color values (red,
green and blue) at each pixel, even though each pixel in the SVC image provides
a single color measurement. Let the measured SVC image be denoted by M
and the desired high quality color image by H. A structural model relates each
color value in H to the measured data within a small neighborhood in M. This
neighborhood includes measurements of different colors and hence the model
implicitly accounts for correlations between different color channels.
As shown in Figure 5, let Mp be the measured data in a neighborhood with
sampling pattern p, and Hp (i + 0.5, j + 0.5, λ) be the high quality color value at
the center of the neighborhood. (The center is off-grid because the neighborhood
is an even number of pixels in width and height.) Then, a polynomial structural
model can be written as:
 
Hp (i + 0.5, j + 0.5, λ) =
(x,y)∈Sp (i,j) (k=x,l=y)∈Sp (i,j)
Np Np −n
 
Cp (a, b, c, d, λ, n) Mp n (x, y) Mp q (k, l) . (2)
n=0 q=0

Sp (i, j) is the neighborhood of pixel (i, j), Np is the order of the polynomial
and Cp are the polynomial coefficients for the pattern p. The coefficient indices
(a, b, c, d) are equal to (x − i, y − j, k − i, l − j).
The product Mp (x, y) Mp (k, l) explicitly represents the correlation between
different pixels in the neighborhood. For efficiency, we have not used these cross-
terms in our implementations. We found that very good results are obtained
8 Nayar and Narasimhan

Neighborhood S p ( i, j )
Reconstructed Values
R G R G Hp ( i+0.5, j+0.5, λ )

B G B B
G G
Mapping Function C R
R G R G

G B G B ( i+0.5, j+0.5 )

Mp ( x, y )

Fig. 5. The measured data Mp in the neighborhood Sp (i, j) around pixel (i, j) are
related to the high quality color values Hp (i + 0.5, j + 0.5, λ) via a polynomial with
coefficients Cp .
Mp ( with powers upto Np )

Cp(λ)

Hp(λ) Ap

Fig. 6. The mapping function in (3) can be expressed as a linear system using matrices
and vectors. For a given pattern p and color λ, Ap is the measurement matrix, Cp (λ)
is the coefficient vector and Hp (λ) is the reconstruction vector.

even when each desired value is expressed as just a sum of polynomial functions
of the individual pixel measurements:
Np
 
Hp (i + 0.5, j + 0.5, λ) = Cp (a, b, λ, n)Mp n (x, y). (3)
(x,y)∈Sp (i,j) n=0

The mapping function (3), for each color λ and each local pattern type p,
can be conveniently rewritten using matrices and vectors, as shown in Figure 6:

Hp (λ) = Ap Cp (λ) . (4)

For a given pattern type p and color λ, Ap is the measurement matrix. The
rows of Ap correspond to the different neighborhoods in the image that have the
pattern p. Each row includes all the relevant powers (up to Np ) of the measured
data Mp within the neighborhood. The vector Cp (λ) includes the coefficients
of the polynomial mapping function and the vector Hp (λ) includes the desired
high quality values at the off-grid neighborhood centers. The estimation of the
model parameters Cp can then be posed as a least squares problem:

Cp (λ) = (AT
p Ap )
−1 T
Ap Hp (λ) , (5)

When the signal-to-noise characteristics of the image detector are known, (5) can
be rewritten using weighted least squares to achieve greater accuracy [Aut01].
Assorted Pixels 9

4.3 Total Number of Coefficients


The number of coefficients in the model (3) can be calculated as follows. Let
the neighborhood size be u × v, and the polynomial order corresponding to each
pattern p be Np . Let the number of distinct local patterns in the SVC image be
P and the number of color channels be Λ. Then, the total number of coefficients
needed for structural interpolation is:
 P


|C| = P + u ∗ v ∗ Np ∗ Λ . (6)
p=1

For the Bayer mosaic, P = 4 and Λ = 3 (R,G,B). If we use Np = 2 and


u = v = 6, the total number of coefficients is 876. Since these coefficients are
learned from real data, they yield greater precision during interpolation than
standard interpolation kernels. In addition, they are very efficient to apply. Since
there are P = 4 distinct patterns, only 219 (a quarter) of the coefficients are used
for computing the three color values at a pixel. Note that the polynomial model
is linear in the coefficients. Hence, structural interpolation can be implemented
in real-time using a set of linear filters that act on the captured image and its
powers (up to Np ).

4.4 Experiments
A total of 30 high quality training images (see Figure 3) were used to compute
the structural model for SVC image interpolation. Each image is downgraded
to obtain a corresponding Bayer-type SVC image. For each of the four sampling
patterns in the Bayer mosaic, and for each of the three colors, the appropriate
image neighborhoods were used to compute the measurement matrix Ap and the
reconstruction vector Hp (λ). While computing these, one additional step was
taken; each measurement is normalized by the energy within its neighborhood
to make the structural model insensitive to changes in illumination intensity
and camera gain. The resulting Ap and Hp (λ) are used to find the coefficient
vector Cp (λ) using linear regression (see (5)). In our implementation, we used
the parameter values P = 4 (Bayer), Np = 2, u = v = 6 and Λ = 3 (R,G, B),
to get a total of 876 coefficients.
The above structural model was used to interpolate 20 test SVC images that
are different from the ones used for training. In Figure 7(a), a high quality (8-
bits per color channel) image is shown. Figure 7(b) shows the corresponding
(downgraded) SVC image. This is really a single channel 8-bit image and its
pixels are shown in color only to illustrate the Bayer pattern. Figure 7(c) shows
a color image computed from the SVC image using bi-cubic interpolation. As is
usually done, the three channels are interpolated separately using their respective
data in the SVC image. The magnified image region clearly shows that bi-cubic
interpolation results in a loss of high frequencies; the edges of the tree branches
and the squirrels are severely blurred. Figure 7(d) shows the result of applying
structural interpolation. Note that the result is of high quality with minimal loss
of details.
10 Nayar and Narasimhan

(a)

(b)

(c)

Pixels x 10
4 (d)
12

10
Structural Model
Bi-Cubic Interpolation
8

0 RMS
0 2 4 6 8 10 12 14 16 18 20 Error
(e)
Fig. 7. (a) Original (high quality) color image with 8-bits per color channel. (b) SVC
image obtained by downgrading the original image. The pixels in this image are shown
in color only to illustrate the Bayer mosaic. Color image computed from the SVC
image using (c) bi-cubic interpolation and (d) structural interpolation. (e) Histograms
of luminance error (averaged over 20 test images). The RMS error is 6.12 gray levels
for bi-cubic interpolation and 3.27 gray levels for structural interpolation.
Assorted Pixels 11

We have quantitatively verified of our results. Figure 7(e) shows histograms of


the luminance error for bi-cubic and structural interpolation. These histograms
are computed using all 20 test images (not just the one in Figure 7). The
difference between the two histograms may appear to be small but
is significant because a large fraction of the pixels in the 20 images
belong to “flat” image regions that are easy to interpolate for both
methods. The RMS errors (computed over all 20 images) are 6.12 and 3.27 gray
levels for bi-cubic and structural interpolation, respectively.

5 Spatially Varying Exposures (SVE)


In [NM00], it was shown that the dynamic range of a gray-scale image detector
can be significantly enhanced by assigning different exposures (neutral density
filters) to pixels, as shown in Figure 8. This is yet another example of a multi-
sampled image and is referred to as a spatially varying exposure (SVE) image. In
[NM00], standard bi-cubic interpolation was used to reconstruct a high dynamic
range gray-scale image from the captured SVE image; first, saturated and dark
pixels are eliminated, then all remaining measurements are normalized by their
exposure values, and finally bi-cubic interpolation is used to find the brightness
values at the saturated and dark pixels. As expected, the resulting image has
enhanced dynamic range but lower spatial resolution. In this section, we apply
structural interpolation to SVE images and show how it outperforms bi-cubic
interpolation.
e1
p4 e4
e2
e3
p1

p3

p2

Fig. 8. The dynamic range of an image detector can be improved by assigning different
exposures to pixels. In this special case of 4 exposures, any 6 × 6 neighborhood in the
image must belong to one of four possible sampling patterns shown as p1 . . . p4.

5.1 SVE Structural Model


As in the SVC case, let the measured SVE data be M and the corresponding
high dynamic range data be H. If the SVE detector uses only four discrete
exposures (see Figure 8), it is easy to see that a neighborhood of any given size
can have only one of four different sampling patterns (P = 4). Therefore, for
each sampling pattern p, a polynomial structural model is used that relates the
captured data Mp within the neighborhood to the high dynamic range value
Hp at the center of the neighborhood:
Np
 
Hp (i + 0.5, j + 0.5) = Cp (a, b, n) Mp n (x, y) , (7)
(x,y)∈Sp (i,j) n=0
12 Nayar and Narasimhan

where, as before, (a, b) = (x − i, y − j), Sp (i, j) is the neighborhood of pixel


(i, j), Np is the order of the polynomial mapping, and Cp are the polynomial
coefficients for the pattern p. Note that there is only one channel in this case
(gray-scale) and hence the parameter λ is omitted. The above model is rewritten
in terms of a measurement matrix Ap and a reconstruction vector Hp , and
the coefficients Cp are found using (5). The number of coefficients in the SVE
structural model is determined as:
P
|C| = P + u ∗ v ∗ Np . (8)
p=1

In our implementation, we have used P = 4, Np = 2 and u = v = 6, which


given a total of 292 coefficients. Since P = 4, only 73 coefficients are needed for
reconstructing each pixel in the image.

5.2 Experiments

The SVE structural model was trained using 12-bit gray-scale versions of 6 of
the images shown in Figure 3 and their corresponding 8-bit SVE images. Each
SVE image was obtained by applying the exposure pattern shown in Figure 8
(with e4 = 4e3 = 16e2 = 64e1) to the original image, followed by a downgrade
from 12 bits to 8 bits. The structural model was tested using 6 test images, one
of which is shown in Figure 9. Figure 9(a) shows the original 12-bit image, Figure
9(b) shows the downgraded 8-bit SVE image, Figure 9(c) shows a 12-bit image
obtained by bi-cubic interpolation of the SVE image, and Figure 9(d) shows the
12-bit image obtained by structural interpolation. The magnified images shown
on the right are histogram equalized to bring out the details (in the clouds and
walls) that are lost during bi-cubic interpolation but extracted by structural
interpolation. Figure 9(e) compares the error histograms (computed using all
6 test images) for the two cases. The RMS errors were found to be 33.4 and
25.5 gray levels (in a 12-bit range) for bi-cubic and structural interpolations,
respectively. Note that even though a very small number (6) of images were
used for training, our method outperforms bi-cubic interpolation.

6 Spatially Varying Exposure and Color (SVEC)


Since we are able to extract high spatial and spectral resolution from SVC images
and high spatial and brightness resolution from SVE images, it is natural to
explore how these two types of multi-sampling can be combined into one. The
result is the simultaneous sampling of space, color and exposure (see Figure 10).
We refer to an image obtained in this manner as a spatially varying exposure
and color (SVEC) image. If the SVEC image has 8-bits at each pixel, we would
like to compute at each pixel three color values, each with 12 bits of precision.
Since the same number of pixels on a detector are now being used to sample
three different dimensions, it should be obvious that this is a truly challenging
interpolation problem. We will see that structural interpolation does remarkably
well at extracting the desired information.
Assorted Pixels 13

(a)

(b)

(c)

(d)
4
Pixels x 10
12 Structural Model
10 Bi-Cubic Interpolation

0
RMS
0 10 20 30 40 50 60 70 80 90 100 Error
(e)
Fig. 9. (a) Original 12-bit gray scale image. (b) 8-bit SVE image. (c) 12-bit (high
dynamic range) image computed using bi-cubic interpolation. (d) 12-bit (high dynamic
range) image computed using structural models. (d) Error histograms for the two case
(averaged over 6 test images). The RMS error for the 6 images are 33.4 and 25.5 gray
levels (on a 12-bit scale) for bi-cubic and structural interpolation, respectively. The
magnified image regions on the right are histogram equalized.
14 Nayar and Narasimhan

p1 p6 p2 p7 p3 p8

R G R G R G R G R G R G R G R G

G B G B G B G B G B G B G B G B

R G R G R G R G R G R G R G R G

G B G B G B G B G B G B G B G B

R G R G R G R G R G R G R G R G

p5 G B G B G B G B G B G B G B G B

R G R G R G R G R G R G R G R G

G B G B G B G B G B G B G B G B
p12
p13
R G R G R G R G R G R G R G R G

G B G B G B G B G B G B G B G B
p15
p10
R G R G R G R G R G R G R G R G

G B G B G B G B G B G B G B G B
p16
R G R G R G R G R G R G R G R G
p9
G B G B G B G B G B G B G B G B

R G R G R G R G R G R G R G R G

G B G B G B G B G B G B G B G B

p4 p14 p11

Fig. 10. A combined mosaic of 3 spectral and 4 neutral density filters used to simulta-
neously sample space, color and exposures using a single image detector. The captured
8-bit SVEC image can be used to compute a 12-bit (per channel) color image by
structural interpolation. For this mosaic, for any neighborhood size, there are only 16
distinct sampling patterns. For a 4 × 4 neighborhood size, the patterns are p1 . . . p16.

Color and exposure filters can be used to construct an SVEC sensor in many
ways. All possible mosaics that can be constructed from Λ colors and E exposures
are derived in [Aut01]. Here, we will focus on the mosaic shown in Figure 10,
where 3 colors and 4 exposures are used. For any given neighborhood size, it is
shown in [Aut01] that only 16 different sampling patterns exist (see Figure 10).

6.1 SVEC Structural Model


The polynomial structural model used in the SVEC case is the same as the one
used for SVC, and is given by (3). The only caveat is that in the SVEC case we
need to ensure that the neighborhood size used is large enough to adequately
sample all the colors and exposures. That is, the neighborhood size is chosen
such that it includes all colors and all exposures of each color.
The total number of polynomial coefficients needed is computed the same
way as in the SVC case and is given by (6). In our experiments, we have used
the mosaic shown in Figure 10. Therefore, P = 16, Λ = 3 (R, G, and B), Np = 2
for each of the 16 patterns, and u = v = 6, to give a total of 3504 coefficients.
Once again, at each pixel, for each color, only 3504/48 = 73 coefficients are used.
Therefore, even for this complex type of multi-sampling, our structural models
can be applied to images in real-time using a set of linear filters.

6.2 Experiments
The SVEC structural model was trained using 6 of the images in Figure 3. In
this case, the 12-bit color images in the training set were downgraded to 8-bit
SVEC images. The original and SVEC images were used to compute the 3504
coefficients of the model. The model was then used to map 10 different test
SVEC images to 12-bit color images. One of these results is shown in Figure
11. The original 12-bit image shown in Figure 11(a) was downgraded to obtain
the 8-bit SVEC image shown in Figure 11(b). This image has a single channel
Assorted Pixels 15

(a)

(b)

(c)

(d)
4
x 10
Pixels 4.5

4
Structural Model
3.5
Bi-Cubic Interpolation
3

2.5

1.5

0.5
0 RMS
0 20 40 60 80 100 120 140 160 180 Error
(e)
Fig. 11. (a) Original 12-bit color image. (b) 8-bit SVEC Image. 12-bit color images
reconstructed using (c) bi-cubic interpolation and (d) structural interpolation. (e) Lu-
minance error histogram computed using 10 test images. RMS luminance errors were
found to be 118 and 80 (on a 12-bit scale) for bi-cubic and structural interpolation,
respectively. The magnified images on the right are histogram equalized.
16 Nayar and Narasimhan

and is shown here in color only to illustrate the effect of simultaneous color and
exposure sampling. Figures 11(c) and (d) show the results of applying bi-cubic
and structural interpolation, respectively. It is evident from the magnified images
on the right that structural interpolation yields greater spectral and spatial
resolution. The two interpolation techniques are compared in Figure 11(e) which
shows error histograms computed using all 10 test images. The RMS luminance
errors were found to be 118 gray-levels and 80 gray-levels (on a 12 bit scale) for
bi-cubic and structural interpolations, respectively.
Acknowledgements: This research was supported in parts by a ONR/ARPA
MURI Grant (N00014-95-1-0601) and an NSF ITR Award (IIS-00-85864).

References
[Aut01] Authors. Resolution Enhancement Along Multiple Imaging Dimensions Using
Structural Models. Technical Report, Authors’ Institution, 2001.
[Bay76] B. E. Bayer. Color imaging array. U.S. Patent 3,971,065, July 1976.
[BE00] M. Ben-Ezra. Segmentation with Invisible Keying Signal. In CVPR, 2000.
[BK00] S. Baker and T. Kanade. Limits on Super-Resolution and How to Break
Them. In Proc. of CVPR, pages II:372–379, 2000.
[Bra65] R. N. Bracewell. The Fourier Transform and Its Applications. McGraw Hill,
1965.
[Bra94] D. Brainard. Bayesian method for reconstructing color images from trichro-
matic samples. In Proc. of IS&T 47th Annual Meeting, pages 375–380,
Rochester, New York, 1994.
[Dil77] P. L. P. Dillon. Color imaging array. U.S. Patent 4,047203, September 1977.
[Dil78] P. L. P Dillon. Fabrication and performance of color filter arrays for solid-
state imagers. In IEEE Trans. on Electron Devices, volume ED-25, pages
97–101, 1978.
[DM97] P. Debevec and J. Malik. Recovering High Dynamic Range Radiance Maps
from Photographs. Proc. of ACM SIGGRAPH 1997, pages 369–378, 1997.
[FP99] W. Freeman and E. Pasztor. Learning Low-Level Vision. In Proc. of ICCV,
pages 1182–1189, 1999.
[GHZ92] R. Ginosar, O. Hilsenrath, and Y. Zeevi. Wide dynamic range camera. U.S.
Patent 5,144,442, September 1992.
[KM85] K. Knop and R. Morf. A new class of mosaic color encoding patterns for
single-chip cameras. In IEEE Trans. on Electron Devices, volume ED-32,
1985.
[MN99] T. Mitsunaga and S. K. Nayar. Radiometric Self Calibration. In Proc. of
CVPR, pages I:374–380, 1999.
[MOS83] D. Manabe, T. Ohta, and Y. Shimidzu. Color filter array for IC image sensor.
In Proc. of IEEE Custom Integrated Circuits Conference, pages 451–455, 1983.
[MP95] S. Mann and R. Picard. Being ‘Undigital’ with Digital Cameras: Extending
Dynamic Range by Combining Differently Exposed Pictures. Proc. of IST’s
48th Annual Conf., pages 442–448, May 1995.
[NM00] S. K. Nayar and T. Mitsunaga. High Dynamic Range Imaging: Spatially
Varying Pixel Exposures. In Proc. of CVPR, pages I:472–479, 2000.
[Par85] K. A. Parulski. Color filters and processing alternatives for one-chip cameras.
In IEEE Trans. on Electron Devices, volume ED-32, 1985.
[SN01] Y. Y. Schechner and S. K. Nayar. Generalized mosaicing. In ICCV, 2001.
[WS95] M. A. Wober and R. Soini. Method and apparatus for recovering image data
through the use of a color test pattern. U.S. Patent 5,475,769, Dec. 1995.
Code A: Matlab Code for Poisson Image Reconstruction from Image Gradients
% Read Input Gray Image
imgstr = ’test.png’; disp(sprintf(’Reading Image %s’,imgstr));
img = imread(imgstr); [H,W,C] = size(img); img = double(img);
% Find gradinets
gx = zeros(H,W); gy = zeros(H,W); j = 1:H-1; k = 1:W-1;
gx(j,k) = (img(j,k+1) - img(j,k)); gy(j,k) = (img(j+1,k) - img(j,k));

% Reconstruct image from gradients for verification


img_rec = poisson_solver_function(gx,gy,img);

figure;imagesc(img);colormap gray;colorbar;title(’Image’)
figure;imagesc(img_rec);colormap gray;colorbar;title(’Reconstructed’);
figure;imagesc(abs(img_rec-img));colormap gray;colorbar;title(’Abs error’);

function [img_direct] = poisson_solver_function(gx,gy,boundary_image);


% function [img_direct] = poisson_solver_function(gx,gy,boundary_image)
% Inputs; Gx and Gy -> Gradients
% Boundary Image -> Boundary image intensities
% Gx Gy and boundary image should be of same size
[H,W] = size(boundary_image);
gxx = zeros(H,W); gyy = zeros(H,W);
f = zeros(H,W); j = 1:H-1; k = 1:W-1;

% Laplacian
gyy(j+1,k) = gy(j+1,k) - gy(j,k); gxx(j,k+1) = gx(j,k+1) - gx(j,k);
f = gxx + gyy; clear j k gxx gyy gyyd gxxd

% boundary image contains image intensities at boundaries


boundary_image(2:end-1,2:end-1) = 0;
disp(’Solving Poisson Equation Using DST’); tic
j = 2:H-1; k = 2:W-1; f_bp = zeros(H,W);
f_bp(j,k) = -4*boundary_image(j,k) + boundary_image(j,k+1) +
boundary_image(j,k-1) + boundary_image(j-1,k) + boundary_image(j+1,k);
clear j k

f1 = f - reshape(f_bp,H,W);% subtract boundary points contribution


clear f_bp f

% DST Sine Transform algo starts here


f2 = f1(2:end-1,2:end-1); clear f1
%compute sine transform
tt = dst(f2); f2sin = dst(tt’)’; clear f2

%compute Eigen Values


[x,y] = meshgrid(1:W-2,1:H-2); denom = (2*cos(pi*x/(W-1))-2) + (2*cos(pi*y/(H-1)) - 2) ;

%divide
f3 = f2sin./denom; clear f2sin x y

%compute Inverse Sine Transform


tt = idst(f3); clear f3; img_tt = idst(tt’)’; clear tt

time_used = toc; disp(sprintf(’Time for Poisson Reconstruction = %f secs’,time_used));

% put solution in inner points; outer points obtained from boundary image
img_direct = boundary_image;
img_direct(2:end-1,2:end-1) = 0;
img_direct(2:end-1,2:end-1) = img_tt;
function b=dst(a,n)
%DST Discrete sine transform (Used in Poisson reconstruction)
% Y = DST(X) returns the discrete sine transform of X.
% The vector Y is the same size as X and contains the
% discrete sine transform coefficients.
% Y = DST(X,N) pads or truncates the vector X to length N
% before transforming.
% If X is a matrix, the DST operation is applied to each
% column. This transform can be inverted using IDST.

error(nargchk(1,2,nargin));

if min(size(a))==1
if size(a,2)>1
do_trans = 1;
else
do_trans = 0;
end
a = a(:);
else
do_trans = 0;
end
if nargin==1, n = size(a,1); end
m = size(a,2);

% Pad or truncate a if necessary


if size(a,1)<n,
aa = zeros(n,m); aa(1:size(a,1),:) = a;
else
aa = a(1:n,:);
end

y=zeros(2*(n+1),m); y(2:n+1,:)=aa; y(n+3:2*(n+1),:)=-flipud(aa);


yy=fft(y); b=yy(2:n+1,:)/(-2*sqrt(-1));

if isreal(a), b = real(b); end


if do_trans, b = b.’; end

function b=idst(a,n)
%IDST Inverse discrete sine transform (Used in Poisson reconstruction)
%
% X = IDST(Y) inverts the DST transform, returning the
% original vector if Y was obtained using Y = DST(X).
% X = IDST(Y,N) pads or truncates the vector Y to length N
% before transforming.
% If Y is a matrix, the IDST operation is applied to
% each column.

if nargin==1
if min(size(a))==1
n=length(a);
else
n=size(a,1);
end
end

nn=n+1; b=2/nn*dst(a,n);
Code B: Matlab Code for Graph Cuts on Images
% Read gray scale image
I = imread(’test.png’); [H,W,C] = size(I);

% Find graph cut


Ncut = graphcuts(I,33,255);

figure;imagesc(I);colormap gray;title(’Image’);
figure;imagesc(Ncut);colormap gray;title(’Segmentation’);

function [Ncut] = graphcuts(I,pad,MAXVAL)


% function [Ncut] = graphcuts(I)
% Input: I image
% pad: spatial connectivity; eg. 3
% MAXVAL: maximum image value
% Output: Ncut: Binary map 0 or 1 corresponding to image segmentation

I = double(I); [H,W] = size(I);

% Find weights between nodes I1 and I2, w = exp(a*abs(I1-I2));


% Set a to have a weight of 0.01 for diff = MAXVAL
a = log(0.01)/MAXVAL; x = [0:MAXVAL/100:MAXVAL]’; y = exp(a*x);
figure;plot(x,y);xlabel(’intensity diff’);ylabel(’weights’); title(’weights’)

ws = 2*pad + 1;
if(ws <= 3) ws = 3; end

%Build the weight matrix


disp(’Building Weight Matrix’); close all; tic

WM = zeros(H*W,H*W); countWM = 0;
for kk = 1:W
for jj = 1:H
mask = logical(zeros(H,W));
cs = kk-pad; ce = kk+pad; rs = jj-pad; re = jj+pad;
if(cs<1) cs = 1; end;
if(ce>W) ce = W; end;
if(rs<1) rs = 1; end;
if(re>H) re = H; end;
mask(rs:re,cs:ce) = 1;
idx = find(mask==1);
p = abs(I(idx) - I(jj,kk)); p = exp(a*p);
countWM = countWM + 1; WM(countWM,idx) = p(:)’;
end
end
ttime = toc; disp(sprintf(’Time for generating weight matrix = %f’,ttime)); clear countWM

% Weight between a node and iteself is 0


for jj = 1:H*W WM(jj,jj) = 0; end; WM = sparse(WM);

% Shi and Malik Algorithm: second smallest eigen vector


disp(’Finding Eigen Vector’);
d = sum(WM,2); D = diag(d); tic
B = (D-WM); B = (B+B’)/2; OPTS.disp = 0;
[v,d,flag] = eigs(B,D,2,’SA’,OPTS); ttime = toc;
disp(sprintf(’Time for finding eigen vector = %f’,ttime)); clear OPTS
y = v(:,2);
Ncut = reshape(y,H,W);
Ncut = Ncut > 0;
Code C: Matlab Code for Bilateral Filtering on Images

function [img1] = bilateral_filtering(img,winsize,sigma)

% Bilateral Filtering(img,winsize,sigma)
% Input -> Image img
% -> winsize: spatial filter width
% -> sigma for intensity diff gaussain filter
% -> sigma for spatial filter = winsize/6
% Output -> Filtered Image
% Author: Amit Agrawal, 2004

disp(’Bilateral Filtering’);

[H,W] = size(img);

%Gaussian spatial filter


g_filter = fspecial(’gaussian’,winsize,winsize/6);
padnum = (winsize-1)/2;

A = padarray(img,[padnum padnum],’replicate’,’both’);
img1 = zeros(size(img));

for jj = padnum+1:(padnum+1+H-1)
for kk = padnum+1:(padnum+1+W-1)

% Get a local neighborhood


imgwin = A(jj-padnum:jj+padnum,kk-padnum:kk+padnum);

% Find weights according to intensity diffs


Wwin = exp(-abs(imgwin - imgwin(padnum+1,padnum+1))/sigma^2);

% Find composite filter


newW = Wwin.*g_filter;

t = sum(sum(newW));
if(t>0)
newW = newW/t;
end

img1(jj-padnum,kk-padnum) = sum(sum(imgwin.*newW));
end
end
Bibliography

Fusion of Images Taken by Varying Camera and Scene Parameters


General

AKERS, D., LOSASSO, F., KLINGNER, J., AGRAWALA, M., RICK, J., AND HANRAHAN, P. 2003.
Conveying shape and features with image-based relighting. In IEEE Visualization, 349–354.

BURT, P., AND KOLCZYNSKI, R. 1993. Enhanced image capture through fusion. In International Conference on
Computer Vision (ICCV 93), 173–182.

LEVIN, A., ZOMET, A., PELEG, S., AND WEISS, Y. 2004. Seamless image stitching in the gradient domain. In
European Conference on Computer Vision (ECCV 04).

MASSEY, M., AND BENDER, W. 1996. Salient stills: Process and practice. IBM Systems Journal 35, 3&4, 557–
574.

MUTTER, S., AND KRAUSE, M. 1992. Surrational Images: Photomontages. University of Illinois Press.

ROBINSON, H. P. 1869. Pictorial Effect in Photography: Being Hints on Composition and Chiaroscuro for
Photographers. Piper & Carter.

MUYBRIDGE, E. 1955. The human figure in motion. Dover Publications, Inc.

AGARWALA, A., DONTCHEVA, AGRAWALA, M., DRUCKER, COLBURN, CURLESS, SALESIN AND
COHEN, M. Interactive Digital Photomontage. ACM Transactions on Graphics (Proceedings of SIGGRAPH 2004),
2004.

Time

BRAUN, M. 1992. Picturing Time: The Work of Etienne-Jules Marey. The University of Chicago Press.

FREEMAN, W. T., AND ZHANG, H. 2003. Shape-time photography. In Conference on Computer Vision and
Pattern Recognition (CVPR 03), 151–157.

Exposure

FATTAL, R., LISCHINSKI, D., AND WERMAN, M. 2002. Gradient domain high dynamic range compression.
ACM Transactions on Graphics 21, 3, 249–256.

REINHARD, E., STARK, M., SHIRLEY, P., AND FERWERDA, J. 2002. Photographic tone reproduction for
digital images. ACM Transactions on Graphics 21, 3, 267– 276.

DEBEVEC, AND MALIK. 1997. Recovering high dynamic range radiance maps from photographs. In Proc.
SIGGRAPH.

DURAND, AND DORSEY. 2002. Fast bilateral filtering for the display of high-dynamic-range images. ACMTrans.
on Graphics 21, 3.

MANN, AND PICARD. 1995. Being ’undigital’ with digital cameras: Extending dynamic range by combining
differently exposed pictures. In Proc. IS&T 46th ann. conference.
TUMBLIN, AND TURK. 1999. LCIS: A boundary hierarchy for detail-preserving contrast reduction. In Proc.
SIGGRAPH.

DICARLO, J., AND WANDELL, B. 2000. Rendering high dynamic range images. Proc. SPIE: Image Sensors
3965, 392–401.

Focus

HAEBERLI, P. 1994. Grafica Obscura web site. http://www.sgi.com/grafica/.

MORGAN MCGUIRE, MATUSIK, PFISTER, HUGHES, AND DURAND, Defocus Video Matting, ACM
Transactions on Graphics, Vol 24, No 3, July 2005, (Proceedings of ACM SIGGRAPH 2005).

Passive Illumination

RASKAR, R., ILIE, A., AND YU, J. 2004. Image fusion for context enhancement and video surrealism. In NPAR
2004: Third International Symposium on Non- Photorealistic Rendering.

WEISS, Y. 2001. Deriving intrinsic images from image sequences. In International Conference On Computer Vision
(ICCV 01), 68–75.

Polorization

Y. Y. SCHECHNER, S. G. NARASIMHAN and S. K. NAYAR, Instant Dehazing of Images Using Polarization,


Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, December 2001.

S. K. NAYAR, X. FANG, and T. E. BOULT, Removal of Specularities using Color and Polarization, Proceedings
of IEEE Conference on Computer Vision and Pattern Recognition,

Wavelength

D. A. SOCOLINSKY, “Dynamic range constraints in image fusion and realization.” Proc. IASTED Int. Conf.
Signal and
Image Process, 349-354 (2000).

Y. Y. SCHECHNER and S. K. NAYAR , Uncontrolled Modulation Imaging, Proceedings of IEEE Conference on


Computer Vision and Pattern Recognition, Washington DC, June 2004.

Location

B. Wilburn, N. Joshi, V. Vaish, E. Talvala, E. Antunez, A. Barth, A. Adams, M. Horowitz, M. Levoy, High-
Performance Imaging Using Large Camera Arrays.. ACM Transactions on Graphics, Vol 24, No 3, July 2005, pp
765-776 (Proceedings of ACM SIGGRAPH 2005).

Matting

CHUANG, Y.-Y., CURLESS, B., SALESIN, D., AND SZELISKI, R. 2001. A Bayesian approach to digital
matting. In Proceedings of Computer Vision and Pattern Recognition (CVPR 2001), vol. II, 264 – 271.

PORTER, T., AND DUFF, T. 1984. Compositing digital images. In Computer Graphics (Proceedings of ACM
SIGGRAPH 84), vol. 18, 253–259.

SMITH, A. R., AND BLINN, J. F. 1996. Blue screen matting. In Proceedings of ACM SIGGRAPH 96, 259–268.
Jian SUN, Jiaya JIA, Chi-Keung TANG and Heung-Yeung SHUM, Poisson Matting, ACM Transactions on
Graphics, also in SIGGRAPH 2004, vol. 23, no. 3, July 2004, pages 315-321.

Techniques
General

DANIELSSON, P.-E. 1980. Euclidean distance mapping. Computer Graphics and Image Processing 14, 227–248.

LUCAS, B. D., AND KANADE, T. 1981. An iterative image registration technique with an application to stereo
vision. In Proceedings of the 7th International Joint Conference on Artificial Intelligence (IJCAI ’81), 674–679.

MORTENSEN, E. N., AND BARRETT, W. A. 1995. Intelligent scissors for image composition. In Proceedings of
SIGGRAPH 95, Computer Graphics Proceedings, Annual Conference Series, 191–198.

Graph Cuts

BOYKOV, Y., VEKSLER, O., AND ZABIH, R. 2001. Fast approximate energy minimization via graph cuts. IEEE
Transactions on Pattern Analysis and Machine Intelligence 23, 11, 1222–1239.

KWATRA, V., SCH ¨ODL, A., ESSA, I., TURK, G., AND BOBICK, A. 2003. Graphcut textures: Image and video
synthesis using graph cuts. ACM Transactions on Graphics 22, 3, 277–286.

SHI, J., AND MALIK, J. Normalized Cuts and Image Segmentation. IEEE Conf. Computer Vision and Pattern
Recognition (CVPR), June 1997, Puerto Rico

Gradient Domain

PEREZ, P., GANGNET, M., AND BLAKE, A. 2003. Poisson image editing. ACM Transactions on Graphics 22, 3,
313–318.

Smoothing, Bilateral and Trilateral Filter

C. TOMASI, AND R. MANDUCHI, Bilateral Filtering of gray and colored images, Proc. IEEE Intl. Conference on
Computer Vision, pp. 836-846, 1998.

CHOUDHURY, P., TUMBLIN, J., "The Trilateral Filter for High Contrast Images and Meshes", Proc. of the
Eurographics Symposium on Rendering, Per. H. Christensen and Daniel Cohen eds., pp. 186-196, 2003

Feature Extraction
Shape/Material/Illumination, Surface normals

BASRI, R. JACOBS, D. Photometric stereo with general, unknown lighting, Computer Vision and Pattern
Recognition, 2001

B. K. P. HORN, "Shape from shading: A method for obtaining the shape of a smooth opaque object from one view,"
MIT Project MAC Int. Rep. TR-79 and MIT AI Lab, Tech. Rep. 232, Nov. 1970.
TODD ZICKLER, PETER N. BELHUMEUR, AND DAVID J. KRIEGMAN, "Helmholtz Stereopsis: Exploiting
Reciprocity for Surface Reconstruction." Proc. 7th European Conference on Computer Vision, May 2002. Vol. III,
pp 869-884.

ANDREAS WENGER, A GARDNER, CHRIS TCHOU, J UNGER, T HAWKINS, P DEBEVEC, Postproduction


Relighting and Reflectance Transformation With Time-Multiplexed Illumination, SIGGRAPH 2005

Depth edges

Ramesh RASKAR , Karhan TAN, Rogerio FERIS, Jingyi YU, Matthew TURK, Non-photorealistic Camera: Depth
Edge Detection and Stylized Rendering Using a Multi-Flash Camera, SIGGRAPH 2004

Depth

S. K. NAYAR and Y. NAKAGAWA,, Shape from Focus, IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 16, No. 8, pp. 824-831,

Transfer and denoising


Flash to no-flash

Elmar EISEMANN and Frédo DURAND, Flash Photography Enhancement Via Intrinsic Relighting, SIGGRAPH
2004

Georg PETSCHNIGG, Maneesh AGRAWALA, Hugues HOPPE, Richard SZELISKI, Michael COHEN, Kentaro
TOYAMA. Digital Photography with Flash and No-Flash Image Pairs.ACM Transactions on Graphics (Proceedings
of SIGGRAPH 2004), 2004.

DICARLO, J. M., XIAO, F., AND WANDELL, B. A. 2001. Illuminating illumination. In 9th Color Imaging
Conference, 27–34.

Noise

P. MEER, J. JOLION, AND A. ROSENFELD, "A Fast Parallel Algorithm For Blind Estimation of Noise Variance,"
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 12, no. 2, pp. 216-223, 1990.

Eric P. BENNETT and Leonard McMILLAN "Video Enhancement Using Per-Pixel Virtual Exposures"
SIGGRAPH 2005

Geometric Operations
Panorama

DAVIS, J. 1998. Mosaics of scenes with moving objects. In Computer Vision and Pattern Recognition (CVPR 98),
354–360.

UYTTENDAELE, M., EDEN, A., AND SZELISKI, R. 2001. Eliminating ghosting and exposure artifacts in image
mosaics. In Conference on Computer Vision and Pattern
Recognition (CVPR 01), 509–516.

SZELISKI, R., AND SHUM, H.-Y. 1997. Creating full view panoramic mosaics and environment maps. In
Proceedings of SIGGRAPH 97, Computer Graphics Proceedings,
Annual Conference Series, 251–258.
Synthetic Aperture

Marc LEVOY, Billy CHEN, Vaibhav VAISH, Mark HOROWITZ, Ian MCDOWALL, Mark BOLAS,
Synthetic Aperture Confocal Imaging. ACM SIGGRAPH 2004.

REN NG, Fourier Slice Photography, SIGGRAPH 2005

A. STERN and B. JAVIDI, "3-D computational synthetic aperture integral imaging (COMPSAII)," Opt. Express 11,
2446-2451 (2003),

C. OLIVER and S. QUEGAN, Understanding Synthetic Aperture Radar Images. London: Artech House, 1998.

Deblurring and Superresolution

M. BEN-EZRA AND S. K. NAYAR , Motion Deblurring using Hybrid Imaging, In Proc. IEEE Computer Vision
and Pattern Recognition (CVPR), Wisconsin, June 2003.

ZHOUCHEN LIN, HEUNG-YEUNG SHUM Fundamental Limits of Reconstruction-Based Superresolution


Algorithms under Local Translation PAMI, January 2004 - (Vol. 26, No. 1) pp. 83-9

O. LANDOLT, A. MITROS, AND C. KOCH, “Visual Sensor with Resolution Enhancement by Mechanical
Vibrations,” Proc. 2001 Conf. Advanced Research in VLSI, pp. 249-264, 2001.

Smart, Unconventional Cameras

MEMS Technology

S. K. NAYAR, V. BRANZOI, AND T. BOULT. Programmable Imaging using a Digital Micromirror Array,
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Washington DC, June 2004.

High Speed Imaging

B. WANDELL, P. CATRYSSE, J. DICARLO, D. YANG AND A. EL GAMAL Multiple Capture Single Image
Architecture with a CMOS Sensor , In Proceedings of the International Symposium on Multispectral Imaging and
Color Reproduction for Digital Archives, pp. 11-17, Chiba, Japan, October 21-22 1999. (Society of Multispectral
Imaging of Japan.)

S. KLEINFELDER, S.H. LIM, X.Q. LIU AND A. EL GAMAL A 10,000 Frames/s CMOS Digital Pixel Sensor, In
IEEE Journal of Solid State Circuits, Vol.36, No.12, Pages 2049-2059, December 2001

X.Q. LIU AND ABBAS EL GAMAL, Synthesis of High Dynamic Range Motion Blur Free Image From Multiple
Captures, In IEEE Transactions on circuits and systems (TCASI), VOL. 50, NO. 4, pp 530-539, APRIL 2003

Ramesh RASKAR, Amit AGRAWAL, Jack TUMBLIN, Coded Exposure Photography: Motion Deblurring using
Fluttered Shutter, ACM SIGGRAPH 2006.

Programmable SIMD

JOHANSSON, R., LINDGREN, L., MELANDER, J., AND MOLLER, B. 2003. A multi-resolution 100 gops 4
gpixels/s programmable cmos image sensor for machine vision. In Proceedings of the 2003 IEEE Workshop on
Charge-Coupled Devices and Advanced Image Sensors, IEEE.
Advanced, Programmable, Demodulating Cameras and Temporal Correlation

CANESTA Inc, 2004

PIXIM Inc, 2004

FOVEON Inc, 2004

JENOPTIK Inc, 2004

IVP Inc, Ranger Camera, 2004

F. XIAO, J. DICARLO, P. CATRYSSE AND B. WANDELL, Image Analysis using Modulated Light Sources, In
Proceedings of the SPIE Electronic Imaging '2001 conference, Vol. 4306, San Jose, CA, January 2001.

ANDO, S., K. NAKAMURA, AND T. SAKAGUCHI. Ultrafast Correlation Image Sensor: Concept, Design, and
Applications,. in Proc. IEEE CCD/AIS Workshop. 1997. Bruges, Belgium: IEEE.

ANDO, S. AND A. KIMACHI. Time-Domain Correlation Image Sensor: First CMOS Realization of Demodulator
Pixels Array. in Proc. '99 IEEE CCD/AIS Workshop. 1999.
Computational Photography
SIGGRAPH 2006 Course 15 Notes

Organizers

Ramesh Raskar
Senior Research Scientist
MERL - Mitsubishi Electric Research Labs
201 Broadway
Cambridge, MA 02139
Email: raskar@merl.com
http://www.merl.com/people/raskar/

Jack Tumblin
Assistant Professor
Computer Science Dept., Northwestern University
1890 Maple Avenue, Evanston IL 60201,
Email: jet@cs.northwestern.edu
http://www.cs.northwestern.edu/~jet

Course Web Page

http://www.merl.com/people/raskar/photo/

You might also like