Computer Vision: Cse 576 Ali Farhadi

Download as pdf or txt
Download as pdf or txt
You are on page 1of 90

Computer

 Vision  
 
 
CSE  576  
Ali  Farhadi  

Many  slides  from  Steve  Seitz,  Larry  Zitnick,  Yang  Wang  


Course  InformaGon    
•  Time:    
–  Monday,  Wednesday  1:30-­‐2:50  
•  LocaGon:  
–  MGH  238  
•  Contact:    
–  ali@cs.uw.edu  ,  CSE  652  
•  TA:  
–  Dun-­‐Yu  Hsiao    
–  dyhsiao@cs.washington.edu  
•  Website:  
–  hWp://www.cs.washington.edu/educaGon/courses/cse576/15sp/  
What  does  it  mean  to  see?  
The  car  is  in  front  of  the  pole  

Sky  
Person  
Road  
White  
Horse  
Car  

Shadow  
1m  

Wheel  
Computer  Vision  

•  Low  Level  Vision  


–  Measurements  
–  Enhancements  
–  Region  segmentaGon  
–  Features  
•  Mid  Level  Vision  
–  ReconstrucGon  
–  Depth  
–  MoGon  EsGmaGon  
•  High  Level  Vision  
–  Category  detecGon  
–  AcGvity  recogniGon  
–  Deep  understandings  
Computer  Vision  

•  Low  Level  Vision  


–  Measurements  
–  Enhancements  
–  Region  segmentaGon  
–  Features  
White  
•  Mid  Level  Vision  
–  ReconstrucGon  
–  Depth   Shadow  
–  MoGon  EsGmaGon   1m  
•  High  Level  Vision  
–  Category  detecGon  
–  AcGvity  recogniGon  
–  Deep  understandings  
Vision as Measurement Device

Real-time stereo on Mars


Physics-based Vision

Structure from Motion Virtualized Reality


Slide  Credit:  Alyosha  Efros  
Measurement  

Brightness  
Measurement  

Brightness  

Slide  Credit:  Alyosha  Efros  


Measurement  

Length  

Müller-­‐Lyer  Illusion  
hWp://www.michaelbach.de/ot/sze_muelue/index.html     Slide  Credit:  Alyosha  Efros  
Image  Enhancement  

Image Inpainting, M. Bertalmío et al.


hWp://www.iua.upf.es/~mbertalmio//restoraGon.html    
Image  Enhancement  

Image Inpainting, M. Bertalmío et al.


hWp://www.iua.upf.es/~mbertalmio//restoraGon.html    
Image  Enhancement  

Image Inpainting, M. Bertalmío et al.


hWp://www.iua.upf.es/~mbertalmio//restoraGon.html    
Seam  Carving    

[Shai  &  Avidan,  SIGGRAPH  2007]  


Tradi5onal  resizing  

Content-­‐aware  resizing  

[Shai  &  Avidan,  SIGGRAPH  2007]  


Computer  Vision  

•  Low  Level  Vision  


–  Measurements  
–  Enhancements   The  car  is  in  front  of  the  pole  
–  Region  segmentaGon  
–  Features  
•  Mid  Level  Vision  
–  ReconstrucGon  
–  Depth  
–  MoGon  EsGmaGon  
•  High  Level  Vision  
–  Category  detecGon  
–  AcGvity  recogniGon  
–  Deep  understandings  
Input Image (1 of 45) Reconstruction

Reconstruction Reconstruction Source:  S.  Seitz  


Input Image
(1 of 100)

Views of Reconstruction
Yasutaka  Furukawa  and  Jean  Ponce,  Carved  Visual  Hulls  for  Image-­‐Based  Modeling,  
ECCV  2006.    
Google’s 3D Maps
Structure estimation from tourist photos
Apple’s 3D maps
Computer  Vision  

•  Low  Level  Vision  


–  Measurements  
–  Enhancements   Sky   Person  
–  Region  segmentaGon  
•  Features  
Road  
•  Mid  Level  Vision   Car   Horse  
–  ReconstrucGon  
–  Depth  
–  MoGon  EsGmaGon  
•  High  Level  Vision  
–  Category  detecGon  
–  AcGvity  recogniGon  
–  Deep  understandings  
–  Pose  esGmaGon  
Visual  RecogniGon?  
•  What  does  it  mean  to  “see”?  
–  “What”  is  “where”,  Marr  1982    

•  Get  computers  to  “see”  


Visual Recognition
Verification

Is this a car?
Visual Recognition
Classification:
Is there a car in this picture?
Visual Recognition
Detection:
Where is the car in this picture?
Visual Recognition
Pose Estimation:
Visual Recognition
Activity Recognition:

What is he doing? What is he doing?


Visual Recognition
Object Categorization:

Sky

Person
Tree

Horse
Car

Person
Bicycle
Road
Visual Recognition
Segmentation

Sky

Tree

Car

Person
How  hard  is  computer  vision?  
“In 1966, Minsky hired a first-year
undergraduate student and assigned him
a problem to solve over the summer:
connect a television camera to a
computer and get the machine to
describe what it sees.”
Crevier 1993, pg. 88

Marvin Minsky, MIT


Turing award,1969
Marvin Minsky, MIT Gerald Sussman, MIT
Turing award,1969
“You’ll notice that Sussman never worked
in vision again!” – Berthold Horn
Why  vision  is  so  hard?  
Why  is  vision  so  hard?  
•  Ill-­‐posed  problem  

[Sinha  and  Adelson  1993]  


Challenges 1: view point variation

Michelangelo 1475-1564 slide by Fei Fei, Fergus & Torralba


Challenges 2: illumination

slide credit: S. Ullman


Challenges 3:
occlusion

Magritte, 1957 slide by Fei Fei, Fergus & Torralba


Challenges 4: scale

slide by Fei Fei, Fergus & Torralba


Challenges 5: deformation

slide by Fei Fei, Fergus & Torralba Xu, Beihong 1943


Challenges 6: background clutter

Klimt, 1913 slide by Fei Fei, Fergus & Torralba


Challenges 7: object intra-class variation

slide by Fei-Fei, Fergus & Torralba


Challenges 8: local ambiguity

slide by Fei-Fei, Fergus & Torralba


Challenges 9: the world behind the image

Slide  Credit:  Alyosha  Efros  


What  Works  Today?  
•  Reading license plates, zip codes, checks

Svetlana Lazebnik
Biometrics  

Fingerprint  scanners  on   Face  recogniGon  systems  now  beginning  


many  new  laptops,     to  appear  more  widely  
other  devices   hWp://www.sensiblevision.com/  
 

Source:  S.  Seitz  


Mobile  visual  search:  Google  Goggles  
Face  detecGon  

•  Many  new  digital  cameras  now  detect  faces  


–  Canon,  Sony,  Fuji,  …  
 
Source:  S.  Seitz  
Smile  detecGon  

Sony Cyber-shot® T70 Digital Still Camera Source:  S.  Seitz  


Face  recogniGon:  Apple  iPhoto,  
Facebook,  Google,  etc  
Object  recogniGon  (in  supermarkets)  

LaneHawk  by  EvoluGonRoboGcs  


“A  smart  camera  is  flush-­‐mounted  in  the  checkout  lane,  conGnuously  watching  
for  items.  When  an  item  is  detected  and  recognized,  the  cashier  verifies  the  
quanGty  of  items  that  were  found  under  the  basket,  and  conGnues  to  close  the  
transacGon.  The  item  can  remain  under  the  basket,  and  with  LaneHawk,you  
are  assured  to  get  paid  for  it…  “  
Safety  
Security  
AutomoGve  safety  

•  Mobileye:  Vision  systems  in  high-­‐end  BMW,  GM,  Volvo  models    


–  Pedestrian  collision  warning  
–  Forward  collision  warning  
–  Lane  departure  warning  
–  Headway  monitoring  and  warning  
Source:    A.  Shashua,  S.  Seitz  
Google  cars  

Oct  9,  2010.  "Google  Cars  Drive  Themselves,  in  Traffic".  The  New  York  Times.  John  Markoff  
June  24,  2011.  "Nevada  state  law  paves  the  way  for  driverless  cars".  Financial  Post.  
ChrisGne  Dobby  
Aug  9,  2011,  
"Human  error  blamed  auer  Google's  driverless  car  sparks  five-­‐vehicle  crash".  The  
Star  (Toronto)  
Vision-­‐based  interacGon:  Xbox  Kinect  
Kinect  Fusion  
Augmented  reality,  consumer  products    

hWp://nconnex.com/wp/  
Special  effects:    shape  and  moGon  capture  

Source:  S.  Seitz  


Vision  for  roboGcs,  space  exploraGon  

NASA'S  Mars  ExploraGon  Rover  Spirit  captured  this  westward  view  from  atop    
a  low  plateau  where  Spirit  spent  the  closing  months  of  2007.    

Vision  systems  (JPL)  used  for  several  tasks  


•  Panorama  sGtching  
•  3D  terrain  modeling  
•  Obstacle  detecGon,  posiGon  tracking  
•  For  more,  read  “Computer  Vision  on  Mars”  by  MaWhies  et  al.  
Source:  S.  Seitz
Medical  imaging  

Image  guided  surgery  


3D  imaging  
Grimson  et  al.,  MIT  
MRI,  CT  
Computer  vision  in  other  scienGfic  
fields  
 
Computer  vision  research  in  biology  

hWp://www.vision.caltech.edu/visipedia/  
hWp://leafsnap.com/  
Computer  vision  in  cosmology  

hWp://astrometry.net/  
Computer  vision  research  in  
healthcare  

assisted  living,  paGent  monitoring  


auGsm  screening  
[Lan  et  al,  PAMI  2012]  
hWp://www.gatech.edu/newsroom/
release.html?nid=60509  
Computer  vision  in  the  real-­‐world    
•  Most  examples  are  less  than  5  years  old    
•  Very  acGve  research  area.  Many  new  
applicaGons  to  come.    
•  A  website  of  computer  vision  industries  
maintained  by  Prof.  David  Lowe  (UBC):  

hWp://www.cs.ubc.ca/~lowe/vision.html  
TentaGve  Syllabus  
•  Image  Processing  (2  weeks)  
•  filtering,  convoluGon    
•  image  pyramids    
•  edge  detecGon    
•  feature  detecGon  (corners,  lines)    
•  hough  transform    

•  Image  Transforma5on  (2  weeks)  


•  image  warping  (parametric  transformaGons,  texture  mapping)    
•  image  composiGng  (alpha  blending,  color  mosaics)    
•  segmentaGon  and  ma|ng  (snakes,  scissors)    

•  Mo5on  Es5ma5on  (1  week)  


•  opGcal  flow    
•  image  alignment    
•  image  mosaics    
•  feature  tracking    
Syllabus  
3D  Modeling  (1  weeks)  
•  projecGve  geometry    
•  camera  modeling    
•  single  view  metrology    
•  camera  calibraGon    
•  stereo    

•  Computa5onal  Photography  (1  week)  


•  Super  resoluGon  
•  Alpha  Ma|ng  
•  Blur  removal  
•  Poisson  Blending  

•  Visual    Recogni5on  (3  week)  


•  Eigenfaces  
•  Category  RecogniGon  
•  Object  DetecGon  
•  Kinect  
Grading  
•  Four  assignments  (10  each+  extra  points)  
–  Mix  of  coding  and  wriWen  answers.  
–  Using  Qt  (cross  pla•orm  UI  in  c++)  qt.nokia.com  
–  Use  of  interacGve  UIs  for  exploring  and  gaining  
intuiGon  
1.  Filters  and  edge  detecGon  
2.  CreaGng  panoramas  
3.  CompuGng  depth  from  stereo  
4.  Face  detecGon  
•  FINAL  PROJECT  (60  points  +  20  extra  points)  
Assignment  1:    Image  Filtering  
10  Points  
Assignment  2:    Panorama  SGtching  
10  Points  
Assignment  3:    Stereo  ReconstrucGon  
10  Points  
Assignment  4:    Face  DetecGon  
10  Points  
Final  Project  
60  Points  +  20  Extra  points  
•  Big  Project  
–  BeWer  if  related  to  your  own  research  

–  Demo  is  a   BIG  plus  


•  Proposal  is  due  on  4/6  
–  One  Paragraph,    
–  Crisp  final  outcome/deliverable  
•  Progress  Reports  are  due  on    
–  4/15,  4/29,5/13,5/27  
–  What  has  changed  since  last  report  
•  Final  PresentaGon  will  be  on  6/3,    
–  Demo/Posters  @  CSE  atrium  
Sample  Projects    
From  Taskar  Center  for  Accessible  Technology  
Project:  My  Kingdom  
 
Sample  Projects    
From  Taskar  Center  for  Accessible  Technology  
Project:  Curb  Alert  
 
Sample  Projects    
From  Taskar  Center  for  Accessible  Technology  
Project:  Silent  Movie  
 
Samples  of  Previous  Projects  
•  Visual  Calculator  
•  Seam  Carving  
•  X-­‐ray  bone  fracture  detecGon  
•  Pipe  leak  detecGon  
•  Is  it  gonna  be  viral?  
•  Deep  learning  for  object  recogniGon  
•  …  
Project  Ideas  
•  Seam  Carving  

•  Video  StabilizaGon  
•  DetecGng  Shadows  
•  RGBD  object  DetecGon  
•   Features  
–  Learning  Features   •  Object  DetecGon  in  Videos  
–  Features  for  regions   •  Video  Google  
 
–  Comparison  of  features  in  
the  literature   •  Matching  Images  and  Videos  in  the  
wild  
•  AcGon  RecogniGon   •  Reading  Street  Signs  
–  Human  pose   •  Wearable  Cameras  for  visually  impaired  users  
–  Objects  and  InteracGons   •  Auto  Zooming    
–  Using  Kinect  
–  DetecGng  unaWended   •  Visual  Odometer  
luggages   •  Smart  stop  lights  
–  Egocentric    
•  Language  &  Vision  
•  Grab  cut  
Books  
CalibraGon  
•  How  many  of  you  
–  have  taken  an  undergrad  vision  course?  

–  have  taken  an  ML  course?  

–  have  taken  a  Graphics  course?  

–  Remember  your  linear  algebra  course  in  your  


undergrad?  

–  have  any  concerns  about  programming?  


Do  these  words  remind  you  of  
something?  

Interest  Point   SIFT  


Laplacian   Eigenvalue  
SVD   SVM  
MRF   STEREO  
Random   Graph  cut  
Forest  
Preferences  
•  Low  level  vision?  

•  Mid  level  vision?  

 
•  High  level  vision?  

You might also like