9. Tracking9. Tracking
Computer VisionComputer Vision
ZoltanZoltan KatoKatohttp://www.inf.uhttp://www.inf.uhttp://www.inf.u---szeged.hu/~katoszeged.hu/~katoszeged.hu/~kato///
2
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Tracking• Identify targets to track
• Features like corners• Whole objects (shape)
• Follow targets over subsequent frames
3
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Tracking Applications• Robotics
• Manipulation, grasping [Hong, 1995]• Mobility, driving [Taylor et al., 1996]• Localization [Dellaert et al., 1998]
• Surveillance/Activity monitoring • Street, highway [Koller et al., 1994;
Stauffer & Grimson, 1999]• Aerial [Cohen & Medioni, 1998]
• Human-computer interaction• Expressions, gestures [Kaucic & Blake,
1998; Starner & Pentland, 1996]
• Smart rooms/houses [Shafer et al., 1998; Essa, 1999]
4
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Feature matching vs. tracking
What is a good feature?
Stereo correspondence (Feature matching):Extract features independently and then match by comparing descriptors
Feature tracking: Extract features in first images and then try to find same feature back in next view
5
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Feature tracking• Identify features and track them over video
• Small difference between frames• potential large difference overall
• Approaches:• Standard: KLT
• Kanade-Lucas-Tomasi [Kanade-Lucas 81] [Shi-Tomasi94]
• Kalman filter• State of the Art: CONDENSATION
• CONditional DENSity propagATION [Isard-Blake 98]6
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Kanade-Lucas-Tomasi tracker• Identify good feature points to track
• ~Corners• Track feature points
• Assume small (~1 pixel) displacement between subsequent frames translational motion
• Assume pixels in a small window around feature point have the same displacement constant flow
• What about large motion?• Use multi-scale technique
• top-down strategy in a Gaussian pyramid
7
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
• Compute translation ∆=(dx,dy) assuming it is small:
• Differentiate:
• Affine motion is also possible (6x6 instead of 2x2) :
KLT Tracker
8
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Feature point extraction• Approximate SSD for small displacement ∆
• Image difference, square difference for pixel
• SSD for window
9
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Feature point extraction
homogeneous
edge
corner
Find points for which the following is maximum
i.e. maximize smallest eigenvalue of M10
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Good features to track• Use same window in feature selection as for
tracking itself
• Maximize minimal eigenvalue of M• Strategy:
• Look for strong well distributed features, typically few hundreds
• initialize and then track, renew features when too many are lost
11
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Tracking as probabilistic inference• We know something about
• object shape, • dynamics, but we want to estimate state
• There is also uncertainty due to • noise, • unpredictability of motion, • etc…
12
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
)()()|()|(
ZPXPXZPZXP =
Bayesian inference
• For tracking, these random variables have common names: • X is the state• Z is the measurement• These are multi-valued and time-indexed, so:
likelihood prior on X
posterior on Xevidence
)()|()|( ttttt PPP XXZZX α=
13
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
The Notion of State• State Xt is a vector of the parameters we
are trying to estimate• Changing over time
• Some possibilities: • Position: Image coordinates, world
coordinates (i.e., depth)• Orientation (2-D or 3-D)
• Rigid “pose” of entire object (e.g. a car)• Joint angle(s) if the object is articulated (e.g., a
person’s arm): • Curvature if the object is “bendable” (e.g. lips
reading)• Differential quantities like velocity,
acceleration, etc.
Example: state = image coord. + velocity
14
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Measurements• Zt is what we observe at one moment
• For example, image position, image dimensions, color, etc.
• Measurement likelihood P(Zt|Xt):Probability of measurement given the state
• Implicitly contains:• Measurement prediction function H(X)
mapping states to measurements• e.g., perspective projection• e.g., removal of velocity terms
unobservable in single image • Comparison function such that probability
is inversely proportional to |Zt-H(Xt )|
• Example:• State = position &
velocity
• Measurement = position
• Measurement prediction = remove velocity
15
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Dynamics• The prior probability on the state P(Xt) depends
on previous states: P(Xt|Xt-1,Xt-2, ...)
• Dynamics (Markov property):• 1st-order: Only consider t-1
• E.g., Random walk, constant velocity • 2nd order: Only use t-1 and t-2
• E.g., Changes of direction, periodic motion • Can be represented as a 1st-order process by doubling
the size of the state to “remember” the last value • Implicitly contains:
• State prediction function F(X) mapping current state to future
• Comparison function: Bigger |Xt-F(Xt-1)| Less likely Xt
• Example:• State
• Measurement
• Measurement prediction
• State prediction = constant velocity
Xt-3 Xt-2 Xt-1 Xt
Zt-3 Zt-2 Zt-1 Zt
XtXt
16
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Inference by MAP criterion• Want best estimate of state given current
measurement zt and previous state xt-1 :• Use, for example, Maximum A Posteriori criterion:
• For general measurement likelihood & state prior, obtaining best estimate requires iterative search• Can confine search to region of state space near F(xt-1)
for efficiency since this is where probability mass is concentrated
these are fixed
17
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Feature Tracking• Detect corner-type features • State xt
• Position of template image (original found corner)• Optional: Velocity, acceleration terms• Rotation, perspective: For a planar feature, homography
describes full range of possibilities
• Measurement likelihood P(zt|X): Similarity of match (e.g., SSD/correlation) between template and zt, which is patch of image
zt H (xt) |zt – H (xt)|
18
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Feature Tracking• Dynamics P(X|xt-1): Static or with displacement
prediction• Inference is simple: Gradient descent on match
function starting at the predicted feature location• Can actually do this in one step assuming a small enough
displacement• Image pyramid representation (i.e., Gaussian) can help
with larger motions
19
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Snakes (= Active Contours)• Idea: Track contours such as
silhouettes, road lines using edge information
• Dynamics• Low-dimensional warp of shape
template [Blake et al., 1993]
• Translation, in-plane rotation, affine, etc.
• Or more general non-rigid deformations of curve
• Measurement likelihood• Error measure = Mean distance from
predicted curve to nearest Canny edge
• Or integrate gradient orthogonal to curve along it
20
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Contour based hand tracking
21
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Kalman filtering• Used to optimize feature tracking results
• It relies on the fact that the measurement and observation equations are linear, and the posterior distribution is assumed to be Gaussian
• Optimal linear estimation• Assume: Linear system with uncertainties
• State x• Dynamical (system) model: x=Φxt-1+ε• Measurement model: z=Hx+µ• ε, µ indicate white, zero-mean, Gaussian noise with
covariances Q, R respectively• Q, R set from real data if possible, but ad-hoc numbers may also
work
• Want best state estimate at each instant22
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Kalman Filter• Essentially an online version of least squares• Provides best linear unbiased estimate
Slide adopted from CS5245 Computer Vision and Graphics for SpecSlide adopted from CS5245 Computer Vision and Graphics for Special Effects Dr. Ng Teck Khimial Effects Dr. Ng Teck Khim 23
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
kKcompute
)ˆ(ˆupdate
kkkkkk xHzKxx −+=
kPcompute
kTkkkk
kkk
QPPxx
+=
=
+
+
φφ
φ
1
1 ˆˆstepnextpredict
00ˆ Px
L,, 10 zz
L,, 10 xx
Filtering algorithm
24
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Example:• State: 2D position, velocity
• Kalman-estimated states
courtesy of K. Murphy
25
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering• Idea: Stochastic approximation of state posterior with a set of
N weighted particles (a.k.a. samples) fs(i), π(i)g, where s(i)
is a possible state and π(i) is its weight• Simulation instead of analytic solution, the underlying
probability distribution may take any form• Example: CONDENSATION — A particle filter developed for
person tracking [Isard & Blake, 1996]
from Isard & Blake, 1998
15 samples with size proportional to weight26
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering Basics• Each particle s(i) is a possible state, which has a likelihoodπ(i) associated with it that is easily computable
• The posterior distribution is approximated by the ensemble of weights on all of these sampled states
• By keeping track of state samples with non-zero probability, we imply that the rest of the distribution has zero probability
• Simulate deterministic and probabilistic motion of particles, update weights using measurement likelihood
More particles Better approximation (and more expensive), but there’s no formula for the “right amount”
27
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Updating the Particle Set1. Sample: Randomly select N particles based on
weights (same particle may be picked multiple times)
2. Predict: Move particles according to deterministic dynamics (drift), then perturb individually (diffuse)
3. Measure: Get a likelihood for each new sample by making a prediction about the image’s local appearance and comparing; then update weight on particle accordingly
28
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering: Initial Particle Set• Particles at t=0 drawn
from wide prior because of large initial uncertainty• Gaussian with large
covariance• Uniform distribution
from MacCormick & Blake, 1998
State includes shape & position;prior more constrained for shape
29
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering: Sampling• Normalize N particle weights
so that they sum to 1• Resample particles by picking
randomly and uniformly in [0,1]range N times• Analogous to spinning a roulette
wheel with arc-lengths of bins equal to particle weights
• Adaptively focuses on promising areas of state space
π(1)
π(2)
π(3)
π(N)
π(N-1)
courtesy of D. Fox
30
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering: Prediction• Update each particle using generative form of
dynamics:
• Drift may be nonlinear (i.e., different displacement for each particle)
• Each particle diffuses independently• Typically modeled with a Gaussian
Random component (aka “diffusion”)
Deterministic component (aka “drift”)
31
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering: Measurement• For each particle s(i),
compute new weight π(i) as measurement likelihood π(i)=P(z|s(i))
• Enforcing plausibility: Particles that represent impossible configurations are given 0 likelihood• E.g., positions outside of
image from MacCormick & Blake, 1998
A snake measurement likelihood method
32
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Particle Filtering Steps (CONDENSATION)
drift
diffuse
measure
measurementlikelihood
from Isard & Blake, 1998
Sampling occurshere
33
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
CONDENSATION: 1D example• The animation shows a
few cycles of the algorithm applied to a one-dimensional system. • The green spheres
correspond to the members of the sample set, where the size of the sphere is an indication of the sample weight.
• The red line is the measurement density function.
http://www.robots.ox.ac.uk/~misard/condensation.html
34
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Obtaining a State Estimate• Note that there’s no explicit state estimate
maintained just a “cloud” of particles• Can obtain an estimate at a particular time by
querying the current particle set• Some approaches
• “Mean” particle• Weighted sum of particles• Confidence: inverse variance
• Really want a mode finder mean of tallest peak
35
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
Condensation: Estimating Target State
From Isard & Blake, 1998
State samples (thickness proportional to weight)
Mean of weighted state samples
36
ZoltanZoltan Kato: Computer VisionKato: Computer Vision
CONDENSATION in action
• Tracking agile motion: • a video sequence of a girl dancing to a Scottish reel is tracked• a leaf blowing in the wind, against a background of similar leaves.
• Effective anticipation by the computer of likely movements is crucial to enable it to ``see'' such agile movements.
http://www.robots.ox.ac.uk/~misard/condensation.html