Upload
edgar-collado-alvarez
View
5
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Computer Vision and Image Processing affecting Robotics and their design. Some introductory slides are presented from Stanford University.
Citation preview
Silvio SavareseSilvio Savarese 21-Jan-15
Professor Silvio Savarese
Computational Vision and Geometry Lab
CS223AVision in Robotics
Sensing is the future
3
Sensing is the future
4
Sensing is the future
5
Sensing is the future
Everything is a sensor
Everything is a sensor
Everything is a sensor
Modern vision sensors
night
Kinect
thermal
w/ gravity
Sensing is not the hard problem
Intelligent understanding of the sensing data is the challenge!
What does it mean intelligent understanding of the sensing data?
Sensing device
Extract information
Interpretation
Computer vision
Information: features, 3D structure, motion flows, etcInterpretation: recognize objects, scenes, actions, events
Computational device
Computer vision studies the tools and theories that enable the design of machinesthat can extract useful information from imagery data
(images and videos) toward the goal of interpreting the world
http://images.google.com/imgres?imgurl=http://www.computergate.com/products/images/tb181/MDCLQP4.jpg&imgrefurl=http://www.computergate.com/products/category.cfm?prodseq=F2&h=340&w=340&sz=11&tbnid=5a4dx2QsWasJ:&tbnh=115&tbnw=115&hl=en&start=1&prev=/images?q=web+camera&svnum=10&hl=en&lr=&rls=GGLG,GGLG:2005-34,GGLG:enhttp://images.google.com/imgres?imgurl=http://www.computergate.com/products/images/tb181/MDCLQP4.jpg&imgrefurl=http://www.computergate.com/products/category.cfm?prodseq=F2&h=340&w=340&sz=11&tbnid=5a4dx2QsWasJ:&tbnh=115&tbnw=115&hl=en&start=1&prev=/images?q=web+camera&svnum=10&hl=en&lr=&rls=GGLG,GGLG:2005-34,GGLG:en
Computer vision and Applications
12
1990 20102000
EosSystems
Fingerprint biometrics
Augmentation with 3D computer graphics
14
3D object prototyping
15PhotomodelerEosSystems
Computer vision and Applications
16
1990 20102000
EosSystems Autostich
New features detector/descriptors CV leverages machine learning
Face detection
Face detection
Web applications
19
Photometria
Panoramic Photography
kolor
21
3D modeling of landmarks
Computer vision and Applications
22
1990 20102000
Kooaba
A9
Kinect
EosSystems Autostich
Google Goggles
Large scale image matching Efficient SLAM/SFM Better clouds More bandwidth Increase computational power
Movies, news, sports
Image search engines
http://www.picsearch.com/http://www.picsearch.com/
Google Goggles24
Visual search and landmarks recognition
25
Visual search and landmarks recognition
26
Augmented reality
Motion sensing and gesture recognition
27
Automotive safety
Mobileye: Vision systems in high-end BMW, GM, Volvo models
Source: A. Shashua, S. Seitz
http://www.mobileye.com/
Computer vision and Applications
Vision for robotics, space exploration
Factory inspectionSurveillance
Autonomous driving,robot navigation
Assistive technologies
Sources: K. Grauman, L. Fei-Fei, S. Laznebick
Security
http://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpghttp://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpg
301990 20102000
Computer vision and Applications
Kooaba
A9
Kinect
EosSystems Autostich
Google Goggles
EosSystems
Computer vision and Applications
31199020102000
2D
3D
Google Goggles
Computer vision and Applications
321990
EosSystems
20102000
2D
3D
Google Goggles
Computer vision
3D Reconstruction 3D shape recovery 3D scene reconstruction Camera localization Pose estimation
2D Recognition Object detection Texture classification Target tracking Activity recognition
33
Camera systemsEstablish a mapping from 3D to 2D
Pinhole perspective projectionPinhole cameraf
f = focal length
c = center of the camera
c
)z
yf,
z
xf()z,y,x(
2E
3
f = focal length
uo, vo = offset
non-square pixels,
f
Oc
Projective camera
P
Ow
iw
kw
jwR,T
wPMP
wPTRKInternal parameters
External parameters
= skew angle
R,T = rotation, translation
P
Properties of Projection Points project to points Lines project to lines Distant objects look smaller
Properties of ProjectionAngles are not preservedParallel lines meet!
Parallel lines in the world intersect in the image at a vanishing point
One-point perspective Masaccio, Trinity,
Santa Maria
Novella, Florence,
1425-28
How to calibrate a cameraEstimate camera parameters such pose or focal length
?
Calibration Problem
P1 Pn with known positions in [Ow,iw,jw,kw]
p1, pn known positions in the image
Goal: compute intrinsic and extrinsic parameters
jC
Calibration rig
Calibration Problem
P1 Pn with known positions in [Ow,iw,jw,kw]
p1, pn known positions in the image
Goal: compute intrinsic and extrinsic parameters
jC
Calibration rig
image
Calibration Problem
jC
Calibration rig
How many correspondences do we need?
M has 11 unknown We need 11 equations 6 correspondences would do it
image
Calibration ProcedureCamera Calibration Toolbox for Matlab
J. Bouguet [1998-2000]
http://www.vision.caltech.edu/bouguetj/calib_doc/index.html#examples
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Calibration Procedure
Pinhole perspective projectionOnce the camera is calibrated...
TRKM
C
Ow
-Internal parameters K are known-R, T are known but these can only relate C to the calibration rig
Pp
Can I estimate P from the measurement p from a single image?
No - in general [P can be anywhere along the line defined by C and p]
Pinhole perspective projectionRecovering structure from a single view
C
Ow
Pp
Calibration rig
Scene
Camera K
Why is it so difficult?
Intrinsic ambiguity of the mapping from 3D to image (2D)
Recovering structure from a single view
Courtesy slide S. Lazebnik
Two eyes help!
O2 O1
x2
x1
?
Two eyes help!
This is called triangulation
K =knownK =known
R, T
llX X
l'l
Find X that minimizes
),(),( 222
11
2XMxdXMxd
O1 O2
x1
x2
X
Triangulation
Stereo-view geometry
Correspondence: Given a point in one image, how can I find the corresponding point x in another one?
Camera geometry: Given corresponding points in two images, find camera matrices, position and pose.
Scene geometry: Find coordinates of 3D point from its projection into 2 or multiple images.
Epipolar Plane Epipoles e1, e2
Epipolar Lines
Baseline
Epipolar geometry
O1 O2
x2
X
x1
e1 e2
= intersections of baseline with image planes
= projections of the other camera center
Example: Converging image planes
e
e
O1 O2
X
e2x1 x2
e1
Example: Parallel image planes
Baseline intersects the image plane at infinity
Epipoles are at infinity
Epipolar lines are parallel to x axis
Example: Parallel Image Planes
e at
infinit
y
e at
infinit
y
Epipolar Constraint
O1 O2
p2
P
p1
e1e2
F p2 is the epipolar line associated with p2 (l1 = F p2)
FT p1 is the epipolar line associated with x1 (l2 = FT p1)
F e2 = 0 and FT e1 = 0
F is 3x3 matrix; 7 DOF
F is singular (rank two)
021 pFpT
Why F is useful?
- Suppose F is known
- No additional information about the scene and camera is given
- Given a point on left image, how can I find the corresponding point on right image?
l = FT xx
Why F is useful?
F captures information about the epipolar geometry of 2 views + camera parameters
MORE IMPORTANTLY: F gives constraints on how the scene changes under view point transformation (without reconstructing the scene!)
Powerful tool in: 3D reconstruction Multi-view object/scene matching
Multiple view geometry
Structure from motion problem
x1j
x2j
xmj
Xj
M1
M2
Mm
Given m images of n fixed 3D points
xij = Mi Xj , i = 1, , m, j = 1, , n
From the mxn correspondences xij, estimate:
m projection matrices Mi
n 3D points Xj
x1j
x2j
xmj
Xj
motion
structure
M1
M2
Mm
Structure from motion problem
2010.12.18 69
Structure from motion ambiguity
iiii TRKM jij XMx
jXH 1j HM
SFM can be solved up to a N-degree of freedom ambiguity
In the general case (nothing is known) the ambiguity is expressed by an arbitrary affine or projective transformation
jijij XHHMXMx -1
2010.12.18 70
Affine ambiguity
2010.12.18 71
Prospective ambiguity
Self-calibration
Condition N. Views
Constant internal parameters 3
Aspect ratio and skew known
Focal length and offset vary
4
Aspect ratio and skew known
Focal length and offset vary
5
skew =0, all other parameters vary 8
Prior knowledge on cameras or scene can be used to add constraints and remove ambiguities Obtain metric reconstruction (up to scale)
Bundle adjustment
x1j
x2j
x3j
Xj
P1
P2
P3
M1Xj
M2XjM3Xj
?
Non-linear method for refining structure and motion
Minimizing re-projection error
It can be used before or after metric upgrade
2m
1i
n
1j
jiij M,D),M(E
XxX
2010.12.1874
Bundle adjustment
Advantages Handle large number of views Handle missing data
Limitations Large minimization problem (parameters grow with number of views) Requires good initial condition
Non-linear method for refining structure and motion
Minimizing re-projection error
It can be used before or after metric upgrade
2m
1i
n
1j
jiij M,D),M(E
XxX
Results and applications
Courtesy of Oxford Visual Geometry Group
Levoy et al., 00
Hartley & Zisserman, 00
Dellaert et al., 00
Rusinkiewic et al., 02
Nistr, 04
Brown & Lowe, 04
Schindler et al, 04
Lourakis & Argyros, 04
Colombo et al. 05
Golparvar-Fard, et al. JAEI 10
Pandey et al. IFAC , 2010
Pandey et al. ICRA 2011
Microsofts PhotoSynth
Snavely et al., 06-08
Schindler et al., 08
Agarwal et al., 09
Frahm et al., 10
Lucas & Kanade, 81
Chen & Medioni, 92
Debevec et al., 96
Levoy & Hanrahan, 96
Fitzgibbon & Zisserman,
98
Triggs et al., 99
Pollefeys et al., 99
Kutulakos & Seitz, 99
M. Pollefeys et al 98---
Results and applications
Results and applications
Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM
Transactions on Graphics (SIGGRAPH Proceedings),2006,
http://phototour.cs.washington.edu/Photo_Tourism.pdf
Computer vision
3D Reconstruction 3D shape recovery 3D scene reconstruction Camera localization Pose estimation
2D Recognition Object detection Texture classification Target tracking Activity recognition
78
Classification: Does this image contain a building? [yes/no]
Yes!
Classification:Is this an beach?
Image Search
Organizing photo collections
http://av.rds.yahoo.com/_ylt=A9ibyK4d.QpFu5UA7EFuCqMX;_ylu=X3oDMTBvcjFrYm5wBHBndANhdl9pbWdfaG9tZQRzZWMDbG9nbw--/SIG=11d79a3nr/EXP=1158433437/**http:/www.altavista.com/http://av.rds.yahoo.com/_ylt=A9ibyK4d.QpFu5UA7EFuCqMX;_ylu=X3oDMTBvcjFrYm5wBHBndANhdl9pbWdfaG9tZQRzZWMDbG9nbw--/SIG=11d79a3nr/EXP=1158433437/**http:/www.altavista.com/http://www.picsearch.com/http://www.picsearch.com/
Detection:Does this image contain a car? [where?]
car
Building
clock
personcar
Detection:Which object does this image contain? [where?]
clock
Detection:Accurate localization (segmentation)
Object detection is useful
SurveillanceAssistive technologies
SecurityAssistive driving
Computational photography
http://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpghttp://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpg
Categorization vs Single instance
recognitionWhich building is this? Marshall Field building in Chicago
Where is the crunchy nut?
Categorization vs Single instance
recognition
+ GPS
Recognizing landmarks in
mobile platforms
Applications of computer vision
Object: Person, back;1-2 meters away
Object: Police car, side view, 4-5 m away
Object: Building, 45 pose, 8-10 meters awayIt has bricks
Detection: Estimating object semantic
& geometric attributes
Activity or Event recognitionWhat are these people doing?
Visual Recognition
Design algorithms that are capable to
Classify images or videos
Detect and localize objects
Estimate semantic and geometrical attributes
Classify human activities and events
Why is this challenging?
How many object categories are there?
Challenges: viewpoint variation
Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba
Challenges: illumination
image credit: J. Koenderink
Challenges: scale
slide credit: Fei-Fei, Fergus & Torralba
Challenges: deformation
Challenges:
occlusion
Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba
Challenges: background clutter
Kilmeny Niland. 1995
Challenges: intra-class variation
Basic properties
Representation
How to represent an object category; which classification scheme?
Learning
How to learn the classifier, given training data
Recognition
How the classifier is to be used on novel data
Representation
- Building blocks: Sampling strategies
RandomlyMultiple interest operators
Interest operators Dense, uniformly
Ima
ge
cre
dits: F
-F.
Li, E
. N
ow
ak, J.
Siv
ic
Representation
- Building blocks: Choice of descriptors
[SIFT, HOG, codewords.]
Representation
Appearance only or location and appearance
Representation
Invariances
View point
Illumination
Occlusion
Scale
Deformation
Clutter
etc.
Representation
To handle intra-class variability, it is convenient to
describe an object categories using probabilistic
models
Object models: Generative vs Discriminative vs
hybrid
Object categorization:
the statistical viewpoint
)|( imagezebrap
)( ezebra|imagnop
vs.
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
Bayes rule:)(
)()()(
Bp
ApB|ApA|Bp
Object categorization:
the statistical viewpoint
)|( imagezebrap
)( ezebra|imagnop
vs.
Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
)(
)()()(
Bp
ApB|ApA|Bp
Object categorization:
the statistical viewpoint
Bayes rule:
)(
)(
)|(
)|(
)|(
)|(
zebranop
zebrap
zebranoimagep
zebraimagep
imagezebranop
imagezebrap
posterior ratio likelihood ratio prior ratio
Discriminative methods model posterior
Generative methods model likelihood and
prior
Discriminative models
Support Vector Machines
Guyon, Vapnik, Heisele,
Serre, Poggio
Boosting
Viola, Jones 2001,
Torralba et al. 2004,
Opelt et al. 2006,
106 examples
Nearest neighbor
Shakhnarovich, Viola, Darrell 2003
Berg, Berg, Malik 2005...
Neural networks
Slide adapted from Antonio TorralbaCourtesy of Vittorio Ferrari
Slide credit: Kristen Grauman
Latent SVM
Structural SVM
Felzenszwalb 00
Ramanan 03
LeCun, Bottou, Bengio, Haffner 1998
Rowley, Baluja, Kanade 1998
Generative models
Nave Bayes classifier Csurka Bray, Dance & Fan, 2004
Hierarchical Bayesian topic models (e.g. pLSA and LDA)
Object categorization: Sivic et al. 2005, Sudderth et al. 2005
Natural scene categorization: Fei-Fei et al. 2005
2D Part based models- Constellation models: Weber et al 2000; Fergus et al 200
- Star models: ISM (Leibe et al 05)
3D part based models: - multi-aspects: Sun, et al, 2009
Basic properties
Representation
How to represent an object category; which classification scheme?
Learning
How to learn the classifier, given training data
Recognition
How the classifier is to be used on novel data
Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
Learning
Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
Level of supervision Manual segmentation; bounding box; image labels;
noisy labels
Learning
Batch/incremental
Priors
Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)
Level of supervision Manual segmentation; bounding box; image labels;
noisy labels
Learning
Batch/incremental
Training images:Issue of overfitting
Negative images for
discriminative methods
Priors
Basic properties
Representation
How to represent an object category; which classification scheme?
Learning
How to learn the classifier, given training data
Recognition
How the classifier is to be used on novel data
Recognition task: classification, detection, etc..
Recognition
Recognition
Recognition task
Search strategy: Sliding Windows
Simple
Computational complexity (x,y, S, , N of classes)
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Viola, Jones 2001,
Recognition
Recognition task
Search strategy: Sliding Windows
Simple
Computational complexity (x,y, S, , N of classes)
Localization
Objects are not boxes
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Viola, Jones 2001,
Segmentation
Bottom up segmentation
Semantic segmentation
Felzenszwalb and Huttenlocher, 2004
Malik et al. 01
Maire et al. 08
Duygulu et al. 02
Recognition
Recognition task
Search strategy: Sliding Windows
Simple
Computational complexity (x,y, S, , N of classes)
Localization
Objects are not boxes
Prone to false positive
- BSW by Lampert et al 08
- Also, Alexe, et al 10
Non max suppression:
Canny 86
.
Desai et al , 2009
Viola, Jones 2001,
Successful methods using sliding windows
[Dalal & Triggs, CVPR 2005]
Subdivide scanning window
In each cell compute histogram of gradients
orientation.
Code available: http://pascal.inrialpes.fr/soft/olt/
- Subdivide scanning window
- In each cell compute histogram of
codewords of adjacent segments
[Ferrari & al, PAMI 2008]
Code available: http://www.vision.ee.ethz.ch/~calvin
http://pascal.inrialpes.fr/soft/olt/http://pascal.inrialpes.fr/soft/olt/
Recognition task
Search strategy : Probabilistic heat maps
Recognition
Original
image
Fergus et al 03
Leibe et al 04
Recognition task
Search strategy :
Hypothesis generation + verification
Recognition
Recognition
Category: car
Azimuth = 225
Zenith = 30
Savarese, 2007
Sun et al 2009
Liebelt et al., 08, 10
Farhadi et al 09
- It has metal
- it is glossy
- has wheels
Farhadi et al 09
Lampert et al 09
Wang & Forsyth 09
Recognition task
Search strategy
Attributes
Recognizing 3D objects
CHAIR
BED
TABLE
Xiang & Savarese, 2012-2014
CAR
Semantic:Torralba et al 03
Rabinovich et al 07
Gupta & Davis 08
Heitz & Koller 08
L-J Li et al 08
Bang & Fei-Fei 10
Recognition
Recognition task
Search strategy
Attributes
Context
Geometric Hoiem, et al 06
Gould et al 09
Bao, Sun, Savarese 10
Lab
elm
ed
atas
et [
Ru
ssel
l et
al.,
08
]
Recognition in context
128
Bao, Sun, Savarese CVPR 2010; BMVC 2010;CIVC 2011 (editor choice)IJCV 2012
Lab
elm
ed
atas
et [
Ru
ssel
l et
al.,
08
]
Recognition in context
129
Bao, Sun, Savarese CVPR 2010; BMVC 2010;CIVC 2011 (editor choice)IJCV 2012
Recognition
Recognition task
Search strategy
Attributes
Context
Tracking
131
Stat
e-o
f-th
e-a
rt
Object Tracking
Xia
ng
& S
avar
ese,
20
12
-20
14
Object tracking from Lidar
132
Held, Thrun & Savarese, RSS 2014
133
Current state of computer vision
3D Reconstruction 3D shape recovery 3D scene reconstruction Camera localization Pose estimation
2D Recognition Object detection Texture classification Target tracking Activity recognition
Perceiving the World in 3D!
Biederman, Mezzanotte and Rabinowitz, 1982
134
Sensibility as human perception
V1
where pathway(dorsal stream)
what pathway(ventral stream)
135
Sensibility as human perception
V1Pre-frontal
cortex
136
where pathway(dorsal stream)
what pathway(ventral stream)
Sensibility as human perception
137
From images to the 3D scenesChoi & Savarese, 2013
138
A 3DGP encodes geometric and semantic relationships between groups of objects and space elements which frequently co-occur in spatially consistent configurations.
From images to the 3D scenesChoi & Savarese, 2013
139
Training Dataset 3DGPs
From images to the 3D scenesChoi & Savarese, 2013
Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table
Estimated Layout 3D Geometric Phrases
140
From images to the 3D scenesChoi & Savarese, 2013
141
Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table
Estimated Layout 3D Geometric Phrases
From images to the 3D scenesChoi & Savarese, 2013
142
Car Person Tree Sky
Street Building Else
From images to 3D scenes
Bao & Savarese, 2011-2013
143
Car Person Tree Sky
Street Building Else
Bao
& S
avar
ese,
20
11
From images to 3D scenesBao & Savarese, 2011-2013
Choi & Savarese, 2011-2014
From videos to 3D dynamic scenes
Monocular cameras Un-calibrated cameras Arbitrary motion
Highly cluttered scenesOcclusion Background clutter
Almost in real time!
Choi & Savarese, 2011-2014
From videos to 3D dynamic scenes
Monocular cameras Un-calibrated cameras Arbitrary motion
Highly cluttered scenesOcclusion Background clutter
Almost in real time!
Sensors
Objects
Summary
3D physical environment
147