The Measurement of Visual Motion P. Anandan Microsoft Research

Preview:

Citation preview

The Measurement of Visual Motion

P. Anandan

Microsoft Research

WHY BOTHER

• Visual Motion can be annoying– Camera instabilities, jitter

– Measure it. Remove it.

• Visual Motion indicates dynamics in the scene– Moving objects, behavior

– Track objects and analyze trajectories

• Visual Motion reveals spatial layout of the scene– Motion parallax

Video Enhancement

• Visual Motion can be annoying– Camera instabilities, jitter– Measure it. Remove it.

Temporal Information

• Visual Motion indicates dynamics in the scene– Moving objects, behavior

– Track objects and analyze trajectories

Spatial Layout

• Visual Motion reveals spatial layout of the scene– Motion parallax

Sprite Viewer Demo

Classes of Techniques

• Feature-based methods– Extract salient visual features (corners, textured areas) and track

them over multiple frames– Analyze the global pattern of motion vectors of these features– Sparse motion fields, but possibly robust tracking– Suitable especially when image motion is large (10-s of pixels)

• Direct-methods– Directly recover image motion from spatio temporal image

brightness variations– Global motion parameters directly recovered without an intermediate

feature motion calculation– Dense motion fields, but more sensitive to appearance variations– Suitable for video and when image motion is small (< 10 pixels)

Our Focus Today

Brightness Constancy Equation:

The Brightness Constraint

),(),( ),(),( yxyx vyuxIyxJ

Or, better still, Minimize :2)),(),((),( vyuxIyxJvuE

),(),(),(),(),(),( yxvyxIyxuyxIyxIyxJ yx Linearizing (assuming small (u,v)):

Gradient Constraint (or the Optical Flow Constraint)

2)(),( tyx IvIuIvuE

Minimizing:

0)(

0)(

0

tyxy

tyxx

IvIuII

IvIuIIdv

E

du

E

The gradient constraint – only one constraint for each pixel

In general 0, yx II

0 tyx IvIuIHence,

Aperture Problem and Normal Flow

Aperture Problem and Normal Flow

Aperture Problem and Normal Flow

Aperture Problem and Normal Flow

0

0

UI

IvIuI tyx

The gradient constraint:

Defines a line in the (u,v) space

u

v

I

I

I

Iu t

Normal Flow:

Local Patch Analysis

Combining Local Constraints

u

v 11tIUI

22tIUI 33tIUI

etc.

Patch Translation

yx

tyx IvyxIuyxIvuE,

2),(),(),(

Minimizing

Assume a single velocity for all pixels within an image patch

ty

tx

yyx

yxx

II

II

v

u

III

III2

2

tT IIUII

On the LHS: sum of the 2x2 outer product tensor of the gradient vector

Singularities and the Aperture Problem

TIIMLet

• Algorithm: At each pixel compute by solving

• M is singular if all gradient vectors point in the same direction-- e.g., along an edge

-- of course, trivially singular if the summation is over a single pixel -- i.e., only normal flow is available (aperture problem)

• Corners and textured areas are OK

and

ty

tx

II

IIb

U bMU

Iterative Refinement

• Estimate velocity at each pixel using one iteration of Lucas and Kanade estimation

• Warp one image toward the other using the estimated flow field(easier said than done)

• Refine estimate by repeating the process

Motion and Gradients

Consider 1-d signal; assume linear function of x

x

I t=0t=1

u dt

dI

t

tx IuI

udt

dI

dx

dI

0

Iterative refinement

x

t

x

t

t

x

BUT!!

Limits of the gradient method

1. Fails when intensity structure within window is poor

2. Fails when the displacement is large (typical operating range is motion of 1 pixel)

– Linearization of brightness is suitable only for small displacements

Also, brightness is not strictly constant in images– actually less problematic than it appears, since we

can pre-filter images to make them look similar

Pyramids

• Pyramids were introduced as a multi-resolution image computation paradigm in the early 80s.

• The most popular pyramid is the Burt pyramid, which foreshadows wavelets

Two kinds of pyramids:

• Low pass or “Gaussian pyramid”

• Band-pass or “Laplacian pyramid”

Gaussian Pyramid

• Convolve image with a small Gaussian kernel– Typically 5x5

• Subsample (decimate by 2) to get lower resolution image• Repeat for more levels• A sequence of low-pass filtered images

)2,2(),(

),(),(),(

),(),(

^

1

^

0

yxGyxG

yxgyxGyxG

yxIyxG

ll

ll

Laplacian pyramid

• Laplacian as Difference of Gaussian

• Band-pass filtered images

• Highlights edges at different spatial scales

• For matching, this is less sensitive to image illumination changes

• But more noisy than using Gaussians

),(),(),(^

yxGyxGyxL lll

image Iimage J

aJwwarp refine

a

aΔ+

Pyramid of image J Pyramid of image I

image Iimage J

Coarse-to-Fine Estimation

u=10 pixels

u=5 pixels

u=2.5 pixels

u=1.25 pixels

0 tyx IvIuI ==> small u and v ...

Global Motion Models

2D Models:AffineQuadraticPlanar projective transform (Homography)

3D Models:Instantaneous camera motion models Homography+epipolePlane+Parallax

0)()( 654321 tyx IyaxaaIyaxaaI

Example: Affine Motion

Substituting into the B.C. Equation:

yaxaayxv

yaxaayxu

654

321

),(

),(

Each pixel provides 1 linear constraint in 6 global unknowns

0 tyx IvIuI

(minimum 6 pixels necessary)

2 tyx IyaxaaIyaxaaIaErr )()()( 654321

Least Square Minimization (over all pixels):

Quadratic – instantaneous approximation to planar motion

Other 2D Motion Models

287654

82

7321

yqxyqyqxqqv

xyqxqyqxqqu

yyvxxu

yhxhh

yhxhhy

yhxhh

yhxhhx

','

and

'

'

987

654

987

321

Projective – exact planar motion

3D Motion Models

ZxTTxxyyv

ZxTTyxxyu

ZYZYX

ZXZYX

)()1(

)()1(2

2

yyvxxu

thyhxh

thyhxhy

thyhxh

thyhxhx

',' :and

'

'

3987

1654

3987

1321

)(1

)(1

233

133

tytt

xyv

txtt

xxu

w

w

Local Parameter:

ZYXZYX TTT ,,,,,

),( yxZ

Instantaneous camera motion:

Global parameters:

Global parameters: 32191 ,,,,, ttthh

),( yx

Homography+Epipole

Local Parameter:

Residual Planar Parallax Motion

Global parameters: 321 ,, ttt

),( yxLocal Parameter:

Correlation and SSD

• For larger displacements, do template matching– Define a small area around a pixel as the template

– Match the template against each pixel within a search area in next image.

– Use a match measure such as correlation, normalized correlation, or sum-of-squares difference

– Choose the maximum (or minimum) as the match

– Sub-pixel interpolation also possible

SSD Surface – Textured area

SSD Surface -- Edge

SSD Surface – homogeneous area

Discrete Search vs. Gradient Based Estimation

Consider image I translated by

21

,00

2

,1

)),(),(),((

)),(),((),(

yxvvyuuxIyxI

vyuxIyxIvuE

yx

yx

00 ,vu

),(),(),(

),(),(

1001

0

yxyxIvyuxI

yxIyxI

The discrete search method simply searches for the best estimate.The gradient method linearizes the intensity function and solves for the estimate

Consider image I translated by

Uncertainty in Local Estimation

)(2

12

);|()(

)();|();|(

uEd

e

IuJPJP

uPIuJPIJuP

),( 000 vuu

),0(:

)()( 0

NnoiseGaussianiswhere

xIuxJ

Now,

This assumes uniform priors on the velocity field

2)()(()(

xd uxJxIuEwhere

Quadratic Approximation

uu ,0

)()(

)()()(2

xIxJIwhere

IuJIuJ

uJxJxIuE

t

tTT

tT

Td

When are small

T

t

Td

JJA

IJAu

uuAuuuE

andwhere 1*

**)()()(

After some fiddling around, we can show

Posterior uncertainty

At edges is singular, but just take pseudo-inverseA

Note that the error is always convex, since is positive semi-definite

i.e., even for occluded points and other false matches, this is the case… seems a bit odd!

A

)()(2

1)(

2

1 **

22);|(

uuAuuuE Td

eeIJuP

dE

Match plus confidence

• Numerically compute error for various• Search for the peak• Numerically fit a qudratic to around the peak• Find sub-pixel estimate for and covariance• If the matrix is negative, it is false match

• Or even better, if you can afford it, simply maintain a discrete sampling of and

dE

dE

u

u

A

dE );|( IJuP

Choosing the Correlation Window Size

• Small windows lead to more false matches• Large windows are better this way, but…

– Neighboring flow vectors will be more correlated (since the template windows have more in common)

– Flow resolution also lower (same reason)– More expensive to compute

Another way to look at this:• Small windows are good for local search but more precise and

less smooth• Large windows good for global search but less precise and more

smooth method

Robust Estimation

Standard Least Squares Estimation allows too much influence for outlying points

)()

)()(

)()(

2

mxx

x

mxx

xmE

i

ii

ii

( Influence

Robust Estimation

tsysxssd IvIuIvuE ),( Robust gradient constraint

),(),(),( ssssd vyuxJyxIvuE Robust SSD

Influence functions and Redescening Estimators

GO TO THE BOARD

Reweighted Least-Squares

2 ),(),(),(),(),(),()( yxIyxvyxIyxuyxIyxWaErr tyx

Robust Minimization (over all pixels):

An Outlier Measure

2),(

)),(/1(torelatedinversely),(

I

IIyxO

yxWyxO

t

Residual normal flow

Outlier rejectionOriginal

Outliers

Outliers

Synthesized

Locking Property

ORIGINAL ENHNACED

Original sequence Plane-aligned sequence Recovered shape

“block sequence” [Kumar-Anandan-Hanna’94]

Dense 3D Reconstruction(Plane+Parallax)

Dense 3D Reconstruction(Plane+Parallax)

Original sequence

Plane-aligned sequence

Recovered shape

Dense 3D Reconstruction

Brightness Constancy constraint

The intersection of the two line constraints uniquely defines the displacement.

Epipolar line

epipole

p

0 TYX IvIuI

Limitations

• Limited search range (up to 10% of the image size).

• Brightness constancy.

Multi-sensor Alignment

Originals:IR and EO images

Direct Correlation Based Alignment

0)()( 654321 tyx IyaxaaIyaxaaI

0

6

5

4

3

2

1

a

a

a

a

a

a

yIxIIyIxII yyyxxx

Example: Affine Motion

Substituting into the B.C. Equation:

yaxaayxv

yaxaayxu

654

321

),(

),(

Each pixel provides 1 linear constraint in 6 global unknowns

0 tyx IvIuI

yIxIIyIxII yyyxxx 0a

(minimum 6 pixels necessary)

The Brightness Constraint

• Brightness Constancy Equation:

• Linearizing J (assume small (u,v)):

• In practice (although not necessary) one assumes

)),(),,((),( yxvyyxuxJyxI

),(),(),(),(),(),( yxvyxJyxuyxJyxJyxI yx

),(),(where0 yxIyxJJIvJuJ ttyx

0 tyx IvIuIyyxx IJIJ ,

Motion Models

• Motion vector (u,v) has two unknowns (at each pixel)

• Brightness Constraint provides one equation• One approach is to use a regularizer (e.g., Horn

and Schunk)• Alternative: Global Motion Model Constraint

– e.g., Affine, Quadratic, Projective (planar) transform (2D Models)

– Instantaneous camera motion, homography+epipole, Plane+Parallax, etc. (3D Models)

J Jw Iwarp refine

ina

a

+

J Jw Iwarp refine

a

a+

J

pyramid construction

J Jw Iwarp refine

a+

I

pyramid construction

outa

Coarse-to-Fine Estimation

Affine Motion Estimation

Each pixel provides one constraint

Least square estimation; Minimize:

0ayIxIIyIxII yyyxxx

Taaaaaaawhere )( 654321

2

2

654321

)(

yaIxaIaIyaIxaIaI

ayIxIIyIxIIaErr

yyyxxx

yyyxxx

Coarse-to-Fine Estimation

Problem:

Brightness linearization assumes small (u,v).

Solution:• Iterative refinement• Coarse-to-fine estimation within a pyramid

Recommended