147
Silvio Savarese Silvio Savarese 21-Jan-15 Professor Silvio Savarese Computational Vision and Geometry Lab CS223A Vision in Robotics

Vision

Embed Size (px)

DESCRIPTION

Computer Vision and Image Processing affecting Robotics and their design. Some introductory slides are presented from Stanford University.

Citation preview

  • Silvio SavareseSilvio Savarese 21-Jan-15

    Professor Silvio Savarese

    Computational Vision and Geometry Lab

    CS223AVision in Robotics

  • Sensing is the future

  • 3

    Sensing is the future

  • 4

    Sensing is the future

  • 5

    Sensing is the future

  • Everything is a sensor

  • Everything is a sensor

  • Everything is a sensor

  • Modern vision sensors

    night

    Kinect

    thermal

    w/ gravity

  • Sensing is not the hard problem

    Intelligent understanding of the sensing data is the challenge!

    What does it mean intelligent understanding of the sensing data?

  • Sensing device

    Extract information

    Interpretation

    Computer vision

    Information: features, 3D structure, motion flows, etcInterpretation: recognize objects, scenes, actions, events

    Computational device

    Computer vision studies the tools and theories that enable the design of machinesthat can extract useful information from imagery data

    (images and videos) toward the goal of interpreting the world

    http://images.google.com/imgres?imgurl=http://www.computergate.com/products/images/tb181/MDCLQP4.jpg&imgrefurl=http://www.computergate.com/products/category.cfm?prodseq=F2&h=340&w=340&sz=11&tbnid=5a4dx2QsWasJ:&tbnh=115&tbnw=115&hl=en&start=1&prev=/images?q=web+camera&svnum=10&hl=en&lr=&rls=GGLG,GGLG:2005-34,GGLG:enhttp://images.google.com/imgres?imgurl=http://www.computergate.com/products/images/tb181/MDCLQP4.jpg&imgrefurl=http://www.computergate.com/products/category.cfm?prodseq=F2&h=340&w=340&sz=11&tbnid=5a4dx2QsWasJ:&tbnh=115&tbnw=115&hl=en&start=1&prev=/images?q=web+camera&svnum=10&hl=en&lr=&rls=GGLG,GGLG:2005-34,GGLG:en

  • Computer vision and Applications

    12

    1990 20102000

    EosSystems

  • Fingerprint biometrics

  • Augmentation with 3D computer graphics

    14

  • 3D object prototyping

    15PhotomodelerEosSystems

  • Computer vision and Applications

    16

    1990 20102000

    EosSystems Autostich

    New features detector/descriptors CV leverages machine learning

  • Face detection

  • Face detection

  • Web applications

    19

    Photometria

  • Panoramic Photography

    kolor

  • 21

    3D modeling of landmarks

  • Computer vision and Applications

    22

    1990 20102000

    Kooaba

    A9

    Kinect

    EosSystems Autostich

    Google Goggles

    Large scale image matching Efficient SLAM/SFM Better clouds More bandwidth Increase computational power

  • Movies, news, sports

    Image search engines

    http://www.picsearch.com/http://www.picsearch.com/

  • Google Goggles24

    Visual search and landmarks recognition

  • 25

    Visual search and landmarks recognition

  • 26

    Augmented reality

  • Motion sensing and gesture recognition

    27

  • Automotive safety

    Mobileye: Vision systems in high-end BMW, GM, Volvo models

    Source: A. Shashua, S. Seitz

    http://www.mobileye.com/

  • Computer vision and Applications

    Vision for robotics, space exploration

    Factory inspectionSurveillance

    Autonomous driving,robot navigation

    Assistive technologies

    Sources: K. Grauman, L. Fei-Fei, S. Laznebick

    Security

    http://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpghttp://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpg

  • 301990 20102000

    Computer vision and Applications

    Kooaba

    A9

    Kinect

    EosSystems Autostich

    Google Goggles

  • EosSystems

    Computer vision and Applications

    31199020102000

    2D

    3D

    Google Goggles

  • Computer vision and Applications

    321990

    EosSystems

    20102000

    2D

    3D

    Google Goggles

  • Computer vision

    3D Reconstruction 3D shape recovery 3D scene reconstruction Camera localization Pose estimation

    2D Recognition Object detection Texture classification Target tracking Activity recognition

    33

  • Camera systemsEstablish a mapping from 3D to 2D

  • Pinhole perspective projectionPinhole cameraf

    f = focal length

    c = center of the camera

    c

    )z

    yf,

    z

    xf()z,y,x(

    2E

    3

  • f = focal length

    uo, vo = offset

    non-square pixels,

    f

    Oc

    Projective camera

    P

    Ow

    iw

    kw

    jwR,T

    wPMP

    wPTRKInternal parameters

    External parameters

    = skew angle

    R,T = rotation, translation

    P

  • Properties of Projection Points project to points Lines project to lines Distant objects look smaller

  • Properties of ProjectionAngles are not preservedParallel lines meet!

    Parallel lines in the world intersect in the image at a vanishing point

  • One-point perspective Masaccio, Trinity,

    Santa Maria

    Novella, Florence,

    1425-28

  • How to calibrate a cameraEstimate camera parameters such pose or focal length

    ?

  • Calibration Problem

    P1 Pn with known positions in [Ow,iw,jw,kw]

    p1, pn known positions in the image

    Goal: compute intrinsic and extrinsic parameters

    jC

    Calibration rig

  • Calibration Problem

    P1 Pn with known positions in [Ow,iw,jw,kw]

    p1, pn known positions in the image

    Goal: compute intrinsic and extrinsic parameters

    jC

    Calibration rig

    image

  • Calibration Problem

    jC

    Calibration rig

    How many correspondences do we need?

    M has 11 unknown We need 11 equations 6 correspondences would do it

    image

  • Calibration ProcedureCamera Calibration Toolbox for Matlab

    J. Bouguet [1998-2000]

    http://www.vision.caltech.edu/bouguetj/calib_doc/index.html#examples

  • Calibration Procedure

  • Calibration Procedure

  • Calibration Procedure

  • Calibration Procedure

  • Calibration Procedure

  • Calibration Procedure

  • Calibration Procedure

  • Pinhole perspective projectionOnce the camera is calibrated...

    TRKM

    C

    Ow

    -Internal parameters K are known-R, T are known but these can only relate C to the calibration rig

    Pp

    Can I estimate P from the measurement p from a single image?

    No - in general [P can be anywhere along the line defined by C and p]

  • Pinhole perspective projectionRecovering structure from a single view

    C

    Ow

    Pp

    Calibration rig

    Scene

    Camera K

    Why is it so difficult?

    Intrinsic ambiguity of the mapping from 3D to image (2D)

  • Recovering structure from a single view

    Courtesy slide S. Lazebnik

  • Two eyes help!

  • O2 O1

    x2

    x1

    ?

    Two eyes help!

    This is called triangulation

    K =knownK =known

    R, T

    llX X

    l'l

  • Find X that minimizes

    ),(),( 222

    11

    2XMxdXMxd

    O1 O2

    x1

    x2

    X

    Triangulation

  • Stereo-view geometry

    Correspondence: Given a point in one image, how can I find the corresponding point x in another one?

    Camera geometry: Given corresponding points in two images, find camera matrices, position and pose.

    Scene geometry: Find coordinates of 3D point from its projection into 2 or multiple images.

  • Epipolar Plane Epipoles e1, e2

    Epipolar Lines

    Baseline

    Epipolar geometry

    O1 O2

    x2

    X

    x1

    e1 e2

    = intersections of baseline with image planes

    = projections of the other camera center

  • Example: Converging image planes

    e

    e

  • O1 O2

    X

    e2x1 x2

    e1

    Example: Parallel image planes

    Baseline intersects the image plane at infinity

    Epipoles are at infinity

    Epipolar lines are parallel to x axis

  • Example: Parallel Image Planes

    e at

    infinit

    y

    e at

    infinit

    y

  • Epipolar Constraint

    O1 O2

    p2

    P

    p1

    e1e2

    F p2 is the epipolar line associated with p2 (l1 = F p2)

    FT p1 is the epipolar line associated with x1 (l2 = FT p1)

    F e2 = 0 and FT e1 = 0

    F is 3x3 matrix; 7 DOF

    F is singular (rank two)

    021 pFpT

  • Why F is useful?

    - Suppose F is known

    - No additional information about the scene and camera is given

    - Given a point on left image, how can I find the corresponding point on right image?

    l = FT xx

  • Why F is useful?

    F captures information about the epipolar geometry of 2 views + camera parameters

    MORE IMPORTANTLY: F gives constraints on how the scene changes under view point transformation (without reconstructing the scene!)

    Powerful tool in: 3D reconstruction Multi-view object/scene matching

  • Multiple view geometry

  • Structure from motion problem

    x1j

    x2j

    xmj

    Xj

    M1

    M2

    Mm

    Given m images of n fixed 3D points

    xij = Mi Xj , i = 1, , m, j = 1, , n

  • From the mxn correspondences xij, estimate:

    m projection matrices Mi

    n 3D points Xj

    x1j

    x2j

    xmj

    Xj

    motion

    structure

    M1

    M2

    Mm

    Structure from motion problem

  • 2010.12.18 69

    Structure from motion ambiguity

    iiii TRKM jij XMx

    jXH 1j HM

    SFM can be solved up to a N-degree of freedom ambiguity

    In the general case (nothing is known) the ambiguity is expressed by an arbitrary affine or projective transformation

    jijij XHHMXMx -1

  • 2010.12.18 70

    Affine ambiguity

  • 2010.12.18 71

    Prospective ambiguity

  • Self-calibration

    Condition N. Views

    Constant internal parameters 3

    Aspect ratio and skew known

    Focal length and offset vary

    4

    Aspect ratio and skew known

    Focal length and offset vary

    5

    skew =0, all other parameters vary 8

    Prior knowledge on cameras or scene can be used to add constraints and remove ambiguities Obtain metric reconstruction (up to scale)

  • Bundle adjustment

    x1j

    x2j

    x3j

    Xj

    P1

    P2

    P3

    M1Xj

    M2XjM3Xj

    ?

    Non-linear method for refining structure and motion

    Minimizing re-projection error

    It can be used before or after metric upgrade

    2m

    1i

    n

    1j

    jiij M,D),M(E

    XxX

  • 2010.12.1874

    Bundle adjustment

    Advantages Handle large number of views Handle missing data

    Limitations Large minimization problem (parameters grow with number of views) Requires good initial condition

    Non-linear method for refining structure and motion

    Minimizing re-projection error

    It can be used before or after metric upgrade

    2m

    1i

    n

    1j

    jiij M,D),M(E

    XxX

  • Results and applications

    Courtesy of Oxford Visual Geometry Group

    Levoy et al., 00

    Hartley & Zisserman, 00

    Dellaert et al., 00

    Rusinkiewic et al., 02

    Nistr, 04

    Brown & Lowe, 04

    Schindler et al, 04

    Lourakis & Argyros, 04

    Colombo et al. 05

    Golparvar-Fard, et al. JAEI 10

    Pandey et al. IFAC , 2010

    Pandey et al. ICRA 2011

    Microsofts PhotoSynth

    Snavely et al., 06-08

    Schindler et al., 08

    Agarwal et al., 09

    Frahm et al., 10

    Lucas & Kanade, 81

    Chen & Medioni, 92

    Debevec et al., 96

    Levoy & Hanrahan, 96

    Fitzgibbon & Zisserman,

    98

    Triggs et al., 99

    Pollefeys et al., 99

    Kutulakos & Seitz, 99

  • M. Pollefeys et al 98---

    Results and applications

  • Results and applications

    Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM

    Transactions on Graphics (SIGGRAPH Proceedings),2006,

    http://phototour.cs.washington.edu/Photo_Tourism.pdf

  • Computer vision

    3D Reconstruction 3D shape recovery 3D scene reconstruction Camera localization Pose estimation

    2D Recognition Object detection Texture classification Target tracking Activity recognition

    78

  • Classification: Does this image contain a building? [yes/no]

    Yes!

  • Classification:Is this an beach?

  • Image Search

    Organizing photo collections

    http://av.rds.yahoo.com/_ylt=A9ibyK4d.QpFu5UA7EFuCqMX;_ylu=X3oDMTBvcjFrYm5wBHBndANhdl9pbWdfaG9tZQRzZWMDbG9nbw--/SIG=11d79a3nr/EXP=1158433437/**http:/www.altavista.com/http://av.rds.yahoo.com/_ylt=A9ibyK4d.QpFu5UA7EFuCqMX;_ylu=X3oDMTBvcjFrYm5wBHBndANhdl9pbWdfaG9tZQRzZWMDbG9nbw--/SIG=11d79a3nr/EXP=1158433437/**http:/www.altavista.com/http://www.picsearch.com/http://www.picsearch.com/

  • Detection:Does this image contain a car? [where?]

    car

  • Building

    clock

    personcar

    Detection:Which object does this image contain? [where?]

  • clock

    Detection:Accurate localization (segmentation)

  • Object detection is useful

    SurveillanceAssistive technologies

    SecurityAssistive driving

    Computational photography

    http://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpghttp://www.scottcamazine.com/photos/SecurityXrays/images/briefcase7_jpg.jpg

  • Categorization vs Single instance

    recognitionWhich building is this? Marshall Field building in Chicago

  • Where is the crunchy nut?

    Categorization vs Single instance

    recognition

  • + GPS

    Recognizing landmarks in

    mobile platforms

    Applications of computer vision

  • Object: Person, back;1-2 meters away

    Object: Police car, side view, 4-5 m away

    Object: Building, 45 pose, 8-10 meters awayIt has bricks

    Detection: Estimating object semantic

    & geometric attributes

  • Activity or Event recognitionWhat are these people doing?

  • Visual Recognition

    Design algorithms that are capable to

    Classify images or videos

    Detect and localize objects

    Estimate semantic and geometrical attributes

    Classify human activities and events

    Why is this challenging?

  • How many object categories are there?

  • Challenges: viewpoint variation

    Michelangelo 1475-1564 slide credit: Fei-Fei, Fergus & Torralba

  • Challenges: illumination

    image credit: J. Koenderink

  • Challenges: scale

    slide credit: Fei-Fei, Fergus & Torralba

  • Challenges: deformation

  • Challenges:

    occlusion

    Magritte, 1957 slide credit: Fei-Fei, Fergus & Torralba

  • Challenges: background clutter

    Kilmeny Niland. 1995

  • Challenges: intra-class variation

  • Basic properties

    Representation

    How to represent an object category; which classification scheme?

    Learning

    How to learn the classifier, given training data

    Recognition

    How the classifier is to be used on novel data

  • Representation

    - Building blocks: Sampling strategies

    RandomlyMultiple interest operators

    Interest operators Dense, uniformly

    Ima

    ge

    cre

    dits: F

    -F.

    Li, E

    . N

    ow

    ak, J.

    Siv

    ic

  • Representation

    - Building blocks: Choice of descriptors

    [SIFT, HOG, codewords.]

  • Representation

    Appearance only or location and appearance

  • Representation

    Invariances

    View point

    Illumination

    Occlusion

    Scale

    Deformation

    Clutter

    etc.

  • Representation

    To handle intra-class variability, it is convenient to

    describe an object categories using probabilistic

    models

    Object models: Generative vs Discriminative vs

    hybrid

  • Object categorization:

    the statistical viewpoint

    )|( imagezebrap

    )( ezebra|imagnop

    vs.

    )(

    )(

    )|(

    )|(

    )|(

    )|(

    zebranop

    zebrap

    zebranoimagep

    zebraimagep

    imagezebranop

    imagezebrap

    Bayes rule:)(

    )()()(

    Bp

    ApB|ApA|Bp

  • Object categorization:

    the statistical viewpoint

    )|( imagezebrap

    )( ezebra|imagnop

    vs.

    Bayes rule:

    )(

    )(

    )|(

    )|(

    )|(

    )|(

    zebranop

    zebrap

    zebranoimagep

    zebraimagep

    imagezebranop

    imagezebrap

    posterior ratio likelihood ratio prior ratio

    )(

    )()()(

    Bp

    ApB|ApA|Bp

  • Object categorization:

    the statistical viewpoint

    Bayes rule:

    )(

    )(

    )|(

    )|(

    )|(

    )|(

    zebranop

    zebrap

    zebranoimagep

    zebraimagep

    imagezebranop

    imagezebrap

    posterior ratio likelihood ratio prior ratio

    Discriminative methods model posterior

    Generative methods model likelihood and

    prior

  • Discriminative models

    Support Vector Machines

    Guyon, Vapnik, Heisele,

    Serre, Poggio

    Boosting

    Viola, Jones 2001,

    Torralba et al. 2004,

    Opelt et al. 2006,

    106 examples

    Nearest neighbor

    Shakhnarovich, Viola, Darrell 2003

    Berg, Berg, Malik 2005...

    Neural networks

    Slide adapted from Antonio TorralbaCourtesy of Vittorio Ferrari

    Slide credit: Kristen Grauman

    Latent SVM

    Structural SVM

    Felzenszwalb 00

    Ramanan 03

    LeCun, Bottou, Bengio, Haffner 1998

    Rowley, Baluja, Kanade 1998

  • Generative models

    Nave Bayes classifier Csurka Bray, Dance & Fan, 2004

    Hierarchical Bayesian topic models (e.g. pLSA and LDA)

    Object categorization: Sivic et al. 2005, Sudderth et al. 2005

    Natural scene categorization: Fei-Fei et al. 2005

    2D Part based models- Constellation models: Weber et al 2000; Fergus et al 200

    - Star models: ISM (Leibe et al 05)

    3D part based models: - multi-aspects: Sun, et al, 2009

  • Basic properties

    Representation

    How to represent an object category; which classification scheme?

    Learning

    How to learn the classifier, given training data

    Recognition

    How the classifier is to be used on novel data

  • Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

    Learning

  • Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

    Level of supervision Manual segmentation; bounding box; image labels;

    noisy labels

    Learning

    Batch/incremental

    Priors

  • Learning parameters: What are you maximizing? Likelihood (Gen.) or performances on train/validation set (Disc.)

    Level of supervision Manual segmentation; bounding box; image labels;

    noisy labels

    Learning

    Batch/incremental

    Training images:Issue of overfitting

    Negative images for

    discriminative methods

    Priors

  • Basic properties

    Representation

    How to represent an object category; which classification scheme?

    Learning

    How to learn the classifier, given training data

    Recognition

    How the classifier is to be used on novel data

  • Recognition task: classification, detection, etc..

    Recognition

  • Recognition

    Recognition task

    Search strategy: Sliding Windows

    Simple

    Computational complexity (x,y, S, , N of classes)

    - BSW by Lampert et al 08

    - Also, Alexe, et al 10

    Viola, Jones 2001,

  • Recognition

    Recognition task

    Search strategy: Sliding Windows

    Simple

    Computational complexity (x,y, S, , N of classes)

    Localization

    Objects are not boxes

    - BSW by Lampert et al 08

    - Also, Alexe, et al 10

    Viola, Jones 2001,

  • Segmentation

    Bottom up segmentation

    Semantic segmentation

    Felzenszwalb and Huttenlocher, 2004

    Malik et al. 01

    Maire et al. 08

    Duygulu et al. 02

  • Recognition

    Recognition task

    Search strategy: Sliding Windows

    Simple

    Computational complexity (x,y, S, , N of classes)

    Localization

    Objects are not boxes

    Prone to false positive

    - BSW by Lampert et al 08

    - Also, Alexe, et al 10

    Non max suppression:

    Canny 86

    .

    Desai et al , 2009

    Viola, Jones 2001,

  • Successful methods using sliding windows

    [Dalal & Triggs, CVPR 2005]

    Subdivide scanning window

    In each cell compute histogram of gradients

    orientation.

    Code available: http://pascal.inrialpes.fr/soft/olt/

    - Subdivide scanning window

    - In each cell compute histogram of

    codewords of adjacent segments

    [Ferrari & al, PAMI 2008]

    Code available: http://www.vision.ee.ethz.ch/~calvin

    http://pascal.inrialpes.fr/soft/olt/http://pascal.inrialpes.fr/soft/olt/

  • Recognition task

    Search strategy : Probabilistic heat maps

    Recognition

    Original

    image

    Fergus et al 03

    Leibe et al 04

  • Recognition task

    Search strategy :

    Hypothesis generation + verification

    Recognition

  • Recognition

    Category: car

    Azimuth = 225

    Zenith = 30

    Savarese, 2007

    Sun et al 2009

    Liebelt et al., 08, 10

    Farhadi et al 09

    - It has metal

    - it is glossy

    - has wheels

    Farhadi et al 09

    Lampert et al 09

    Wang & Forsyth 09

    Recognition task

    Search strategy

    Attributes

  • Recognizing 3D objects

    CHAIR

    BED

    TABLE

    Xiang & Savarese, 2012-2014

    CAR

  • Semantic:Torralba et al 03

    Rabinovich et al 07

    Gupta & Davis 08

    Heitz & Koller 08

    L-J Li et al 08

    Bang & Fei-Fei 10

    Recognition

    Recognition task

    Search strategy

    Attributes

    Context

    Geometric Hoiem, et al 06

    Gould et al 09

    Bao, Sun, Savarese 10

  • Lab

    elm

    ed

    atas

    et [

    Ru

    ssel

    l et

    al.,

    08

    ]

    Recognition in context

    128

    Bao, Sun, Savarese CVPR 2010; BMVC 2010;CIVC 2011 (editor choice)IJCV 2012

  • Lab

    elm

    ed

    atas

    et [

    Ru

    ssel

    l et

    al.,

    08

    ]

    Recognition in context

    129

    Bao, Sun, Savarese CVPR 2010; BMVC 2010;CIVC 2011 (editor choice)IJCV 2012

  • Recognition

    Recognition task

    Search strategy

    Attributes

    Context

    Tracking

  • 131

    Stat

    e-o

    f-th

    e-a

    rt

    Object Tracking

    Xia

    ng

    & S

    avar

    ese,

    20

    12

    -20

    14

  • Object tracking from Lidar

    132

    Held, Thrun & Savarese, RSS 2014

  • 133

    Current state of computer vision

    3D Reconstruction 3D shape recovery 3D scene reconstruction Camera localization Pose estimation

    2D Recognition Object detection Texture classification Target tracking Activity recognition

    Perceiving the World in 3D!

  • Biederman, Mezzanotte and Rabinowitz, 1982

    134

    Sensibility as human perception

  • V1

    where pathway(dorsal stream)

    what pathway(ventral stream)

    135

    Sensibility as human perception

  • V1Pre-frontal

    cortex

    136

    where pathway(dorsal stream)

    what pathway(ventral stream)

    Sensibility as human perception

  • 137

    From images to the 3D scenesChoi & Savarese, 2013

  • 138

    A 3DGP encodes geometric and semantic relationships between groups of objects and space elements which frequently co-occur in spatially consistent configurations.

    From images to the 3D scenesChoi & Savarese, 2013

  • 139

    Training Dataset 3DGPs

    From images to the 3D scenesChoi & Savarese, 2013

  • Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table

    Estimated Layout 3D Geometric Phrases

    140

    From images to the 3D scenesChoi & Savarese, 2013

  • 141

    Sofa, Coffee Table, Chair, Bed, Dining Table, Side Table

    Estimated Layout 3D Geometric Phrases

    From images to the 3D scenesChoi & Savarese, 2013

  • 142

    Car Person Tree Sky

    Street Building Else

    From images to 3D scenes

    Bao & Savarese, 2011-2013

  • 143

    Car Person Tree Sky

    Street Building Else

    Bao

    & S

    avar

    ese,

    20

    11

    From images to 3D scenesBao & Savarese, 2011-2013

  • Choi & Savarese, 2011-2014

    From videos to 3D dynamic scenes

    Monocular cameras Un-calibrated cameras Arbitrary motion

    Highly cluttered scenesOcclusion Background clutter

    Almost in real time!

  • Choi & Savarese, 2011-2014

    From videos to 3D dynamic scenes

    Monocular cameras Un-calibrated cameras Arbitrary motion

    Highly cluttered scenesOcclusion Background clutter

    Almost in real time!

  • Sensors

    Objects

    Summary

    3D physical environment

  • 147