3D Vision

3D Computer Visionand Video Computing 3D Vision

Topic 4 of Part II Visual Motion

CSc I6716Fall 2011

Cover Image/video credits: Rick Szeliski, MSR

Zhigang Zhu, City College of New York [email protected]

3D Computer Visionand Video Computing Outline of Motion

n Problems and Applicationsl The importance of visual motionl Problem Statement

n The Motion Field of Rigid Motionl Basics – Notations and Equationsl Three Important Special Cases: Translation, Rotation and Moving Planel Motion Parallax

n Optical Flowl Optical flow equation and the aperture probleml Estimating optical flowl 3D motion & structure from optical flow

n Feature-based Approachl Two-frame algorithml Multi-frame algorithml Structure from motion – Factorization method

n Advanced Topics l Spatio-Temporal Image and Epipolar Plane Imagel Video Mosaicing and Panorama Generationl Motion-based Segmentation and Layered Representation

3D Computer Visionand Video ComputingThe Importance of Visual Motion

n Structure from Motionl Apparent motion is a strong visual clue for 3D reconstruction

n More than a multi-camera stereo system

n Recognition by motion (only) l Biological visual systems use visual motion to infer properties of

3D world with little a priori knowledge of itn Blurred image sequence

n Visual Motion = Video ! [Go to CVPR 2004-2010 Sites for Workshops]l Video Coding and Compression: MPEG 1, 2, 4, 7…l Video Mosaicing and Layered Representation for IBRl Surveillance (Human Tracking and Traffic Monitoring)l HCI using Human Gesture (video camera)l Image-based Renderingl …

3D Computer Visionand Video Computing Blurred Sequence

An up-sampling from images of resolution 15x20 pixels From: James W. Davis. MIT Media Lab

Recognition by Actions: Recognize object from motion even if we cannot distinguish it in any images …

3D Computer Visionand Video Computing Problem Statement

n Two Subproblemsl Correspondence: Which elements of a frame correspond to which

elements in the next frame?l Reconstruction :Given a number of correspondences, and possibly

the knowledge of the camera’s intrinsic parameters, how to recovery the 3-D motion and structure of the observed world

n Main Difference between Motion and Stereol Correspondence: the disparities between consecutive frames are

much smaller due to dense temporal samplingl Reconstruction: the visual motion could be caused by multiple

motions ( instead of a single 3D rigid transformation)n The Third Subproblem, and Fourth….

l Motion Segmentation: what are the regions the the image plane corresponding to different moving objects?

l Motion Understanding: lip reading, gesture, expression, event…

3D Computer Visionand Video Computing Approaches

n Two Subproblemsl Correspondence:

n Differential Methods - >dense measure (optical flow)n Matching Methods -> sparse measure

l Reconstruction : More difficult than stereo since n Motion (3D transformation betw. Frames) as well as

structure needs to be recoveredn Small baseline causes large errors

n The Third Subprobleml Motion Segmentation: Chicken and Egg problem

n Which should be solved first? Matching or Segmentation Segmentation for matching elements Matching for Segmentation

3D Computer Visionand Video ComputingThe Motion Field of Rigid Objects

n Motion: l 3D Motion ( R, T):

n camera motion (static scene) n or single object motion n Only one rigid, relative motion between the camera and the scene

(object)l Image motion field:

n 2D vector field of velocities of the image points induced by the relative motion.

n Data: Image sequencel Many frames

n captured at time t=0, 1, 2, …l Basics: only consider two consecutive frames

n We consider a reference frame and its consecutive framel Image motion field

n can be viewed disparity map of the two frames captured at two consecutive camera locations ( assuming we have a moving camera)


n Notationsl P = (X,Y,Z)T: 3-D point in the camera

reference framel p = (x,y,f)T : the projection of the scene

point in the pinhole camera

n Relative motion between P and the cameral T= (Tx,Ty,Tz)T: translation component of

the motionl w=(wx, wy,wz)T: the angular velocity

n Note:l How to connect this with stereo geometry

(with R, T)?l Image velocity v= ?

PpZf

=

PωTV =

p

OX

P V

fZ

Y v


n Notationsl P = (X,Y,Z)T: 3-D point in the camera

reference framel p = (x,y,f)T : the projection of the scene

point in the pinhole camera

n Relative motion between P and the cameral T= (Tx,Ty,Tz)T: translation component of

the motionl w=(wx, wy,wz)T: the angular velocity

n Note:l How to connect this with stereo geometry

(with R, T)?

PpZf

=

PωTV =

PTVPP

==

00

0

xy

xz

yz

wwwwww

TPP

=

11

1

xy

xz

yz

wwwwww

=

coscoscossinsinsincossinsincossincoscossincoscossinsinsinsincoscossinsin

sinsincoscoscosR

3D Computer Visionand Video ComputingBasic Equations of Motion Field

n Notes:l Take the time derivative of both

sides of the projection equation

l The motion field is the sum of two componentsn Translational part n Rotational part

l Assume known intrinsic parameters

)(2 PVv zVZZf

=

PpZf

=PωTV =

=

z

y

x

z

y

x

y

x

T

TT

yfxf

Zfxxyfyfyfxxy

fvv

001)(1

22

22

w

ww

Rotation part: no depth information

Translation part: depth Z

3D Computer Visionand Video Computing Motion Field vs. Disparity

n Correspondence and Point Displacements

Stereo Motion

Disparity Motion field

Displacement – (dx, dy) Differential concept – velocity (vx, vy), i.e. time derivative (dx/dt, dy/dt)

No such constraint Consecutive frame close to guarantee good discrete approximation

3D Computer Visionand Video Computing Special Case 1: Pure Translation

n Pure Translation (w =0)

n Radial Motion Field (Tz <> 0)l Vanishing point p0 =(x0, y0)T :

n motion directionl FOE (focus of expansion)

n Vectors away from p0 if Tz < 0 l FOC (focus of contraction)

n Vectors towards p0 if Tz > 0l Depth estimation

n depth inversely proportional to magnitude of motion vector v, and also proportional to distance from p to p0

n Parallel Motion Field (Tz= 0)l Depth estimation:

n depth inversely proportional to magnitude of motion vector v

=

z

y

x

y

x

T

TT

yfxf

Zvv

001

=

0

0yyxx

ZT

vv zy

x

=

y

x

z TT

Tf

yx

0

0

20

20 )()( yyxxTZ z =

v

=

y

x

y

xTT

Zf

vv

Tz =0

22yx TTfZ =

v

3D Computer Visionand Video Computing Special Case 2: Pure Rotation

n Pure Rotation (T =0)l Does not carry 3D information

n Motion Field (approximation)l Small motionl A quadratic polynomial in image

coordinates (x,y,f)T

n Image Transformation between two frames (accurate)l Motion can be largel Homography (3x3 matrix) for all points

n Image mosaicing from a rotating camera l 360 degree panorama

=

z

y

x

y

x

fxxyfyfyfxxy

fvv

w

ww

22

22 )(1

RPP ='

PpZf

=

Rpp '

'''' PpZf

=

3D Computer Visionand Video Computing Special Case 3: Moving Plane

n Planes are common in the man-made world

n Motion Field (approximation)l Given small motion

l a quadratic polynomial in image

n Image Transformation between two frames (accurate)l Any amount of motion (arbitrary)l Homography (3x3 matrix) for all pointsl See Topic 5 Camera Models

n Image Mosaicing for a planar scenel Aerial image sequencel Video of blackboard

dZf

fnynxn zyx = )(

d=PnT

=

z

y

x

z

y

x

y

x

T

TT

yfxf

Zfxxyfyfyfxxy

fvv

001)(1

22

22

w

ww

App '

Only has 8 independent parameters (write it out!)

3D Computer Visionand Video Computing Special Cases: A Summary

n Pure Translationl Vanishing point and FOE (focus of expansion)l Only translation contributes to depth estimation

n Pure Rotationl Does not carry 3D informationl Motion field: a quadratic polynomial in image, or l Transform: Homography (3x3 matrix R) for all pointsl Image mosaicing from a rotating camera

n Moving Planel Motion field is a quadratic polynomial in image, orl Transform: Homography (3x3 matrix A) for all pointsl Image mosaicing for a planar scene

3D Computer Visionand Video Computing Motion Parallax

n [Observation 1] The relative motion field of two instantaneously coincident pointsl Does not depend on the rotational component of motionl Points towards (away from) the vanishing point of the

translation direction

n [Observation 2] The motion field of two frames after rotation compensation l only includes the translation component l points towards (away from) the vanishing point p0 ( the

instantaneous epipole)l the length of each motion vector is inversely proportional to

the depth, and also proportional to the distance from point p to the vanishing point p0 of the translation direction

l Question: how to remove rotation? n Active vision : rotation known approximately?


n [Observation 1] The relative motion field of two instantaneously coincident pointsl Does not depend on the rotational component of motionl Points towards (away from) the vanishing point of the

translation direction (the instantaneous epipole)

Epipole (x0, y0)

At instant t, three pairs of points happen to be coincident

The difference of the motion vectors of each pair cancels the rotational components

. … and the relative motion field point in ( towards or away from) the VP of the translational direction (Fig 8.5 ???)

0

0xxyy

vv

x

y

=


n [Observation 2] The motion field of two frames after rotation compensation

l only includes the translation component

l points towards (away from) the vanishing point p0 ( the instantaneous epipole)

l the length of each motion vector is inversely proportional to the depth,

l and also proportional to the distance from point p to the vanishing point p0 of the translation direction (if Tz <> 0)

Question: how to remove rotation? n Active vision : rotation known approximately?n Rotation compensation can be done by image

warping after finding three (3) pairs of coincident points

0

0xxyy

v

vTx

Ty

=

FOEp0

pv

20

20 )()( yyxx

ZTz =v

3D Computer Visionand Video Computing Summary

n Importance of visual motion (apparent motion)l Many applications…l Problems:

n correspondence, reconstruction, segmentation, understanding in x-y-t space

n Image motion field of rigid objectsl Time derivative of both sides of the projection equation

n Three important special casesl Pure translation – FOE l Pure rotation – no 3D information, but lead to mosaicingl Moving plane – homography with arbitrary motion

n Motion parallax l Only depends on translational component of motion

3D Computer Visionand Video Computing

n Next lecture

3D Computer Visionand Video Computing Notion of Optical Flow

n The Notion of Optical Flow l Brightness constancy equation

n Under most circumstance, the apparent brightness of moving objects remain constant

l Optical Flow Equationn Relation of the apparent motion

with the spatial and temporal derivatives of the image brightness

n Aperture probleml Only the component of the motion

field in the direction of the spatial image gradient can be determined

l The component in the direction perpendicular to the spatial gradient is not constrained by the optical flow equation

0),,(=

dttyxdE

0= tyx EvEuE

?

3D Computer Visionand Video Computing Estimating Optical Flow

n Constant Flow Methodl Assumption: the motion field is well approximated by a

constant vector within any small region of the image planel Solution: Least square of two variables (u,v) from NxN

Equations – NxN (=5x5) planar patchl Condition: ATA is NOT singular (null or parallel gradients)

n Weighted Least Square Methodl Assumption: the motion field is approximated by a constant

vector within any small region, and the error made by the approximation increases with the distance from the center where optical flow is to be computed

l Solution: Weighted least square of two variables (u,v) from

NxN Equations – NxN patch n Affine Flow Method

l Assumption: the motion field is well approximated by a affine parametric model uT = ApT+b (a plane patch with arbitrary orientation)

l Solution: Least square of 6 variables (A,b) from NxN

Equations – NxN planar patch

3D Computer Visionand Video Computing Using Optical Flow

n 3D motion and structure from optical flow (p 208- 212)l Input:

n Intrinsic camera parametersn dense motion field (optical flow) of single rigid motion

l Algorithm n ( good comprise between ease of implementation and quality of results)n Stage 1: Translation direction

Epipole (x0, y0) through approximate motion parallax Key: Instantaneously coincident image points Approximation: estimating differences for ALMOST coincident image points

n Stage 2: Rotation flow and Depth Knowns: flow vector, and direction of translational component One point, one equation (without depth)–

w Least square approximation of the rotational component of flow From motion field to depth

l Outputn Direction of translation (f Tx/Tz, f Ty/Tz, f) = (x0, y0, f)n Angular velocity n 3-D coordinates of scene points (up to a common unknown scale)

=

y

x

z TT

Tf

yx

0

0

0

0xxyy

v

vTx

Ty

=

3D Computer Visionand Video Computing Some Details

n Step 1. Get (Tx, Ty, Tz) = s (x0,y0,f)n Step 2. For every point (x,y,f) with known v, get one

equation about w from the motion equation (by eliminate Z since it’s different from point to point)

n Step 3. Get Z (up to a scale s) given T/s and w

=

z

y

x

z

y

x

y

x

T

TT

yfxf

Zfxxyfyfyfxxy

fvv

001)(1

22

22

w

ww

Rotation part: no depth information

Translation part: depth Z

3D Computer Visionand Video Computing Feature-Based Approach

n Two frame method - Feature matchingl An Algorithm Based on the Constant Flow Method

n Features – corners detection by observing the coefficient matrix of the spatial gradient evaluation (2x2 matrix ATA)

n Iteration approach: estimation – warping – comparison

n Multiple frame method - Feature trackingl Kalman Filter Algorithm

n Estimating the position and uncertainty of a moving feature in the next frame

n Two parts: prediction (from previous trajectory) and measurement from feature matching

n Using a sparse motion field l 3D motion and structure by feature tracking over framesl Factorization method

n Orthographic projection modeln Feature tracking over multiple framesn SVD

3D Computer Visionand Video Computing Motion-Based Segmentation

n Change Detectionl Stationary camera(s), multiple moving subjectsl Background modeling and updatingl Background subtractionl Occlusion handling

n Layered representation (I)– rotating cameral Rotating camera + Independent moving objectsl Sprite - background mosaicingl Synopsis – foreground object sequences

n Layered representation (II)– translating (and rotating) cameral Arbitrary camera motion l Scene segmentation into layers

3D Computer Visionand Video Computing Summary

n After learning motion, you should be able tol Explain the fundamental problems of motion analysisl Understand the relation of motion and stereol Estimate optical flow from a image sequencel Extract and track image features over time l Estimate 3D motion and structure from sparse motion

fieldl Extract Depth from 3D ST image formation under

translational motionl Know some important application of motion, such as

change detection, image mosaicing and motion-based segmentation

3D Computer Visionand Video Computing Next

n Reviews, Exam and Projects

Exam&

Project Presentations

n Homework #4 due in May 03, 2011 before class

Documents

3D Vision