100
Motion and Tracking Eng-Jon Ong University of Surrey [email protected]

Motion and tracking

Embed Size (px)

DESCRIPTION

From BMVA summer school 2014

Citation preview

Page 1: Motion and tracking

Motion and Tracking

Eng-Jon OngUniversity of [email protected]

Page 2: Motion and tracking

Introduction

There have been many objects that have been tracked in the past.

Whole objects: Cars, bicycles, human bodies.

Source:Youtube: Intelligent Traffic Surveillance

Page 3: Motion and tracking

What objects have been tracked? There have been many

objects that have been tracked in the past.

Medium level features: Heads, Hands, small objects, etc..

Page 4: Motion and tracking

What objects have been tracked? There have been many objects that have been

tracked in the past. Fine level features: Facial feature points, finger

positions, etc...

Page 5: Motion and tracking

Overview

The task of visual tracking involves locating the position of a tracked target by a combination of features and motion models.

There is a strong relationship between the task of object detection and tracking.

Visual model + Detector

Motion Model

Page 6: Motion and tracking

Overview

One can think of tracking as a motion-model constrained detection. Detection on the whole image tends to be expensive

Visual model + Detector

Motion Model

Page 7: Motion and tracking

Overview Introduction Object models Simple search strategies Using linear dynamics Optimisation search

strategies Summary

Page 8: Motion and tracking

Object Models and Evaluation

Page 9: Motion and tracking

Representation of Tracked Objects The first question: How do we computationally

represent an object we want to track? Image template Combination of low level information (e.g. Lines) Contour information

Page 10: Motion and tracking

Evaluation of different models “fitness” We need a measure of model fitness on an image

given a set of parameters (e.g. Position + scale). For images, we have template matching using

different scores: Normalised cross correlation is the most basic(i.e. Sum ofsquares ofpixel differences)

Page 11: Motion and tracking

Evaluation of different models “fitness” There are more sophisticated methods for

matching a template to an image: Boosted detectors are a popular choice. Boosting is a method that combines a set of very

simple object detectors together to yield a strong detector.

Page 12: Motion and tracking

Boosted Cascade

Cascade Layer 1

90% Rejected

10% pass . . . .

Cascade Layer 2 Cascade Layer 3

10% pass

90% Rejected 90% Rejected 90% Rejected

Face detected

Cascade Layer n

Page 13: Motion and tracking

Boosted CascadeLayer 12 Classifiers

Layer 25 Classifiers

Layer 35 Classifiers

Layer 420 ClassifiersLayer 550 Classifiers

Layer 650 ClassifiersLayer 7128 ClassifiersLayer 8132 Classifiers

Layer 9100 Classifiers

Page 14: Motion and tracking

Detecting and Tracking Humans in Images

Page 15: Motion and tracking

Constrained Detection: Simple Search Strategies

Page 16: Motion and tracking

Simple Tracking Strategies

Detection/Global Search Goal: Where to place the

contour on the image?

Page 17: Motion and tracking

Simple Tracking Strategies

n

dIdn

I

n

(x1,y1)

(x2,y2)

(x3,y3)

(x4,y4)

^n1

^n2

^n3

^n4

Contours and Costs– Search along contour normal for edges

– Move contour x,y,scale & rotation

Page 18: Motion and tracking

Evaluation of different models “fitness” For lines and contours, we can use distances to

nearest edges. But, different configurations of contour searches

can have different results. Run demos: 3tracescanline.exe 4tracescanlinelong.exe

n

dIdn

I

n

(x1,y1)

(x2,y2)

(x3,y3)

(x4,y4)

^n1

^n2 ^

n3

^n4

Page 19: Motion and tracking

Simple Tracking Strategies

Global Search– If the parameter space of

the search is low in dimensionality then a simple global search of the image is sufficient

Page 20: Motion and tracking

Simple Tracking Strategies

Global Search– If the parameter space of

the search is low in dimensionality then a simple global search of the image is sufficient

Page 21: Motion and tracking

Simple Tracking Strategies

Global Search– If the parameter space of

the search is low in dimensionality then a simple global search of the image is sufficient

– Not practical for most applications

Page 22: Motion and tracking

Detecting and TrackingHumans in Images We can track just using

global search if the detectors are fast enough

Page 23: Motion and tracking

Iterative Tracking

Most tracking schemes work on the assumption that an object will make small iterative movements between frames

Using this assumption only a local search is required to update model parameters

Tracking is typically posed as a 2 step process:– Initialisation (Global/Detection)– Iteration (Local)

Page 24: Motion and tracking

Iterative Tracking Example 1

Assume the initial position is known

Assume object wont move far

Search locally to find movement that maximises some fitness function

Page 25: Motion and tracking

Iterative Tracking Example 1

Assume the initial position is known

Assume object wont move far

Search locally to find movement that maximises some fitness function

Page 26: Motion and tracking

Iterative Tracking Example 2

Again:– requires good initialisation– relies on small inter-frame movements

Page 27: Motion and tracking

Iterative Tracking Example 2

Example of contour tracking failing due to indistinct edges

A better example of tracking but highly susceptible to initialisation

Increasing the local search provides better initialisation but decreases tracking performance

1BadContour.exe

2BetterContour.exe

4TraceScanLineLong.exe

Page 28: Motion and tracking

Constrained Detection: Optimisation Search Strategies

Page 29: Motion and tracking

Tracking as an Optimisation Problem

Tracking can be thought of as an optimisation where some cost function represents how well a model fits an image.

Model fitting is done by attempt to find the model parameters that minimise/maximise this cost function

This can be done at each frame to track objects through a video sequence

Page 30: Motion and tracking

Using Gradient Descent

The previous approaches of iteratively refining a model given a local search is effectively a gradient descent optimisation

This will only work if theinitial pose of the model is very close to the idealposition as energy surfacestypically have many localminima

Cost

Parameter

Page 31: Motion and tracking

Using Gradient Descent

Energy surfaces are typically very complex and impossible to visualise due to high dimensionality

In the figure there is one global minimum but many local minima that are almost as good

Unless our model is very close to the ideal location a gradient descent approach will converge on a local minima and get trapped

We've already seen this in action on the contour tracker

Cost

Parameter

Page 32: Motion and tracking

Choosing a cost function

Returning to the contour example lets formulate a cost function as the Euclidean distance between a model and the strongest features in the image

We can visualise the cost surface across a single parameter

Notice the surface has a global minimum but it is not distinct

3TraceScanLine.exe

Page 33: Motion and tracking

Choosing a cost function

We can do the same after increasing the local search (by extending our search along normals) to see how this affects the cost surface

Note it makes the minima more distinct but this image has no background clutter. Additional clutter would result in further complicating the surface

4TraceScanLineLong.exe

Page 34: Motion and tracking

Choosing a cost function

Lets choose a different cost function

This time we will take the edge strength supporting the model pose

Notice the surface has inverted and we now seek to find the maximum

It has a very clear maximum which corresponds to the global solution which SHOULD be easy to find!!!

5cost2TraceScanLine.exe

Page 35: Motion and tracking

Lucas-Kanade Tracking

Remember Gradient Descent

Cost

Parameter

Well if we know more about the surface we can speed things up:– If we assume the cost

surface is a parabola then given a position anda gradient we can move to the minimum in one move

Page 36: Motion and tracking

Lucas-Kanade Tracking

Newton-Raphson convergence

v n+1=vn−f n '

f n '' Jacobian

Hessian

• Two differences

• LK uses the Sum of Squared differences across the entire image.

• x is a multi-dimensional warp parameter.

v

f(v)

Page 37: Motion and tracking

Lucas-Kanade Tracking

x

ssd Tv,wI=d 2xx

xx Tv,wIv

wI=d

v xssd

2

- =

{ } *

y

wI

Jacobian

?)(?,

ssddv

x

wI

Page 38: Motion and tracking

Lucas-Kanade Tracking

x

ssd Tv,wI=d 2xx

xx Tv,wIv

wI=d

v xssd

2

22

2

2 dO+v

wI

v

wI=

v

d

x

T

ssd

y

wI

Jacobian

Hessian

x

wI

y

wI

??

??2

2

v

dssd

x

wI

Page 39: Motion and tracking

Lucas-Kanade Tracking

Page 40: Motion and tracking

Lucas-Kanade Tracking

Youtube: vision: optical flow detection

Page 41: Motion and tracking

Mean-shift

We can look for local maxima in object detector outputs using mean-shift

Page 42: Motion and tracking

Mean-shift

We can look for local maxima in object detector outputs using mean-shift

Page 43: Motion and tracking

Mean shift

Example of simple mean-shift tracking Object “Detector” is distance to RGB histogram

Youtube: Mean shift tracking of red bal, normalised RGB and 64 bin histogram

Page 44: Motion and tracking

Regression-based Tracking

Page 45: Motion and tracking

Regression-based Tracking

Up till now, tracking is seen as a constrained detection problem. Essentially template matching, searching a parameter space to minimise a matching fitness function.

Another approach is to pose the problem as a regression problem: Given template difference, predict the translational offset to the correct position. (no explicit search needed!)

Page 46: Motion and tracking

Linear Predictors (Robust Facial Feature Tracking using Shape Constrained Multi Resolution Selected Linear Predictors, Ong et al)

a

cb Y

P= [ Ia – I'a, Ib – I'b, lc – I'c ]

X = HP

Reference Point + Support Pixels (a,b,c) Linear mapping (H) from support pixel

intensity difference to translation vector

Page 47: Motion and tracking

Linear Predictor “Bunches”– Single LPs are not stable enough for tracking image

features– Use a set (“bunch”) of

LPs instead– Final prediction =

consensus of the mostcommon predictedtranslation

Linear Predictors

Page 48: Motion and tracking

Linear Predictor “Bunches”– Single LPs are not stable enough for tracking image

features– Use a set (“bunch”) of

LPs instead– Final prediction =

consensus of the mostcommon predictedtranslation

Linear Predictors

Page 49: Motion and tracking

“Tracking context” is very important.

We only want to use surrounding visual information if it helps the tracking

Linear Predictors

We want to track this point

BUT, we shoulduse visual informationaround here for tracking it! Other regions have toomuch variations.

Page 50: Motion and tracking

We can find the tracking context by evaluating the accuracy of trackers using local patches, and gradually removing the bad ones

Linear Predictors

Page 51: Motion and tracking

Cascaded linear predictors:– Linear predictors trained to overcome large offsets are not

accurate but robust

– LPs trained to overcome small offsets are accurate but not robust.

– Solution, cascade them: Use big-offset LPs, then pass the results to smaller ones for refinement.

Linear Predictors

Errors of “large” LP predictingfrom an offseted position (blue is medium prediction error)

Errors of “small” LP predictingfrom an offseted position (white is small prediction error)

Page 52: Motion and tracking

Linear Predictors

Page 53: Motion and tracking

Linear Predictors

Page 54: Motion and tracking

Linear Predictors

Page 55: Motion and tracking

Non-Linear Predictors(Non-linear Predictors for Facial feature Tracking, FG2013, Sheerman-Chase et al.)

a

cb Y

P= [ Ia – I'a, Ib – I'b, lc – I'c ]

X = H( P )

Replace linear mapping with the non-linear mapping of regression trees

Input still support pixel differences, output still offsets

Page 56: Motion and tracking

Non-Linear Predictors

Replace linear mapping with the non-linear mapping of regression trees

Input still support pixel differences, output still offsets

S1<0.4

dy = 23 S50<0.1

Dy = 32dy = -10

Page 57: Motion and tracking

Non-Linear Predictors

Results: More robust tracking able to handle larger amounts of pose and expression variations.

Page 58: Motion and tracking

Non-Linear Predictors

Results: More robust tracking able to handle larger amounts of pose and expression variations.

Page 59: Motion and tracking

Non-Linear Predictors

Allows us to do freaky things like this:

Page 60: Motion and tracking

Background to template update problem

No update– Misrepresentation Error– Catastrophic

Naïve update– Drift Error– Slow accumulation

True Feature – Old AppearanceTrue Feature – New AppearanceFalse Feature

Frame

time

Error

time

Error

1 2 3 4 5

Page 61: Motion and tracking

Background template update(Mutual information for Lucas Kanade tracking (MILK): An inverse compositional formulation, Dowson et al, PAMI 08)

Page 62: Motion and tracking

Building a Model of Templates

Appearance space

Page 63: Motion and tracking

LP SMAT

Page 64: Motion and tracking

SMAT

Page 65: Motion and tracking

Incorporating Motion Modelsfor Tracking

Page 66: Motion and tracking

Temporal Consistency

This sequence shows a surveillance application tracking subjects as they move.

The technique uses a per pixel mixture of Gaussians to model background colour distributions and perform dynamic background subtraction.

Page 67: Motion and tracking

Tracking with Motion Models

The task of visual tracking involves locating the position of a tracked target by a combination of features and motion models.

There is a strong relationship between the task of object detection and tracking.

Visual model + Detector

Motion Model

Page 68: Motion and tracking

Using Motion

Objects often exhibit consistent motion

Page 69: Motion and tracking

Kalman Filter

To exploit this motion consistency, many authors model it with simple dynamics in the what is called the Kalman filter

A Kalman filter is simply an optimal recursive data processing algorithm. It makes predictions based on previous

estimates and current observations

Page 70: Motion and tracking

Kalman Filter

Suppose we have some hidden information to recover (i.e. Not directly observable) and takes the form of a state vector E.g. X = [x,y,v] position, velocity of a tracked object

This object has a true position at time t, Xt, which we do not know But suppose we think this object’s dynamics works in a linear

fashion like: Xt = FXt-1 BUT this may not be exactly the case, it might be slightly off, thus

we have Xt = FXt-1 + wt, where wt ~ N(0,Q)

Xt

Page 71: Motion and tracking

Kalman Filter

Suppose we have some sensors that can provide some measurements about the tracked object in the form of a state vector: Z = [a,b]

This sensor measurements is originates from the hidden state vector X with the form: Zt = HXt

BUT, in reality this sensor can be imperfect, noisy etc... We deal with this by saying Zt = HXt + v, where v ~ N(0,R) R is called the sensor’s error covariance

Page 72: Motion and tracking

Kalman Filter

We want to recover some hidden information about a tracked object: X = [x,y,v]

We can predict it’s movements “blindly” using: X’t|t-1 = FX’t-1|t-1 + wt

But this model is inaccurate in a Gaussian sense: wt ~ N(0,Q) We have some sensors that provide observations to indirectly tell

us how accurate our predictions are Zt – HX’t|t-1 BUT, need to take this with a pinch of salt, since our sensors are

inaccurate as well (Zt has Gaussian noise with covariance R)

Page 73: Motion and tracking

Kalman Filter

Suppose we have some hidden information to recover (i.e. Not directly observable) and takes the form of a state vector E.g. X = [x,y,v] position, velocity of a tracked object

This object has a true position at time t, Xt, which we do not know But suppose we think this object’s dynamics works in a linear

fashion like: Xt = FXt-1 BUT this may not be exactly the case, it might be slightly off, thus

we have Xt = FXt-1 + wt, where wt ~ N(0,Q)

Xt

Page 74: Motion and tracking

Kalman Filter

So, task at hand: how do we best combine our prediction of a tracked object state with the sensor observations, given that both have Gaussian noise?

That is what a Kalman filter does in a optimal sense (provide your noise IS Gaussian and your dynamics IS linear)

Xt|t = X’t|t-1 + K( Zt – HX’t|t-1 ) K is called the “Kalman gain” Essentially, if sensor noise is small and prediction noise large, K

becomes H-1, meaning trust the observations. Conversely, if sensor noise is large,

K becomes 0, trust prediction

Page 75: Motion and tracking

Kalman Filter Operation

From: Kalman filter for dummies

Page 76: Motion and tracking

Using a Kalman Filter to Track

How prediction overcomes occlusion issues

Youtube: kalman Filter result on real aircraft & Result of Kalman Filter on a Moving Aircraft

Page 77: Motion and tracking

Using a Kalman Filter to Track

How prediction overcomes occlusion issues

Youtube: kalman Filter result on real aircraft & Result of Kalman Filter on a Moving Aircraft

Page 78: Motion and tracking

Using a Kalman Filter to Track

How prediction overcomes occlusion issues

Youtube: kalman Filter result on real aircraft & Result of Kalman Filter on a Moving Aircraft

Page 79: Motion and tracking

Extended Kalman Filter-EKF

The Kalman filter addresses the problem of dynamics estimation by linear equations

Most problems are non-linear EKF attempts to address this making

the state prediction Xt = F( Xt-1 ) + w F can be any non linear function

See www.cs.unc.edu/~welch for introductory tutorials and sample code

Page 80: Motion and tracking

Exploring a parameter space for the global solution

We could try every single model configuration to find the lowest cost solution but this can be unfeasible (640x480x100x360=11,059,200,000)

We could just randomly pick model configurations in the hope that we find a low cost solution but this does not guarantee that we will find it and as the dimensionality and complexity increase so must the number of random samples

These are common problems and hence standard optimisation techniques can be employed– e.g. Simulated Annealing, Genetic Algorithms

7RandomSample.exe

Page 81: Motion and tracking

Tracking as an Optimisation Problem In simulated annealing we try and use some simple

heuristic to reduce the number of samples we need to test

In Genetic Algorithms we try and guide our random search through observation to again reduce the complexity of the search

However, these are blind optimisations and we often know much more about the problem we are trying to solve such as the nature of observations or the dynamics we are expecting (remember the Kalman Filter)

Page 82: Motion and tracking

Tracking as an Optimisation Problem Example of using simulated annealing for tracking the

body pose

N. Lehment, M. Kaiser, D. Arsic, and G. Rigoll. Cue-Independent Extending Inverse Kinematics For Robust Pose Estimation in 3D Point Clouds. Proc. IEEE Intern. Conf.on Image Processing (ICIP2010)

Page 83: Motion and tracking

Factored Sampling

We have seen how the KF uses a simple Gaussian to model observations but what happens if observations are non-Gaussian?

Factored Sampling can be used to search a static image in these cases

We want to calculate the posterior probability that an object X exists in an image given the observed data obj – P(X |obj)

Page 84: Motion and tracking

Factored Sampling

This is difficult to achieve for continuous complex non-Gaussian distributions

Luckily Bayes’ formula says that the posterior density can be obtained as a product of a prior density P0(X ) and an observation density P(obj|X )– P(X |obj) ≈ P(obj|X ) P0(X )

Factored sampling estimates the posterior by generating samples from the prior and weighting them according to the observation density

Page 85: Motion and tracking

Factored Sampling

A set of n points s (n), the centres of the blobs in the figure are sampled randomly from the prior density P(X )

Each sample is then assigned a weight (depicted by blob area) based upon the observation density P(obj|X = s (n) )

If n is sufficiently large then the weighted set represents the posterior density P(X |obj)

State X

Probability

posterior density

weightedsample

Page 86: Motion and tracking

CONDENSATION and Particle Filtering

CONDitional DENsity propagATION also known as particle filtering is the natural extension of the KF to factored sampling

Basically:– Randomly generate a distribution from the prior pdf

and apply a model of dynamics (i.e. predict)– Fit each sample to the image (i.e. measure)– Weight samples accordingly to generate a new

posterior pdf that will serve as the prior for the next iteration

Page 87: Motion and tracking

CONDENSATION and Particle Filtering

predict

measure

Page 88: Motion and tracking

CONDENSATION and Particle Filtering

The animation shows a few cycles of the algorithm applied to a one-dimensional system. The green spheres correspond to the members of the sample set, where the size of the sphere is an indication of the sample weight. The red line is the measurement density function.

This animation shows a short sequence of the CONDENSATION filter tracking a leaf exhibiting non-linear motion with occlusion and clutter.

Movie sequences taken from http://www.dai.ed.ac.uk/CVonline/LOCAL_COPIES/ISARD1/condensation.html

Page 89: Motion and tracking

CONDENSATION and Particle Filtering

We can extend our random sampler to a simple PF using gaussian noise as our dynamics/drift term

Notice how the population quickly homes in on the area of highest probability as we saw in the random sampling

It quickly converges on incorrect local solutions, increasing the noise term helps explore the space further but the global maximum is at the bottom of the image

8ParticleFilter.exe

Page 90: Motion and tracking

CONDENSATION and Particle Filtering

We can further try to change the model to better fit the head and ensure the global is at the correct position

Tracking is better but easily lost to other maxima

As the population size is increased we start to see multiple hypothesis tracking

By combining both the PF and a gradient decent method we can get the best results for the lowest population, but our cost function is still flawed

9Particle filter.exe

10ParticleFilter.exe

Page 91: Motion and tracking

CONDENSATION and Particle Filtering

Advantages– Allows complex non-Gaussian systems– Easy to add non-linear dynamics– Provides support for multiple hypotheses (!!!)

Disadvantages– Large numbers of samples make the techniques

extremely slow for high parameter spaces– Not a global optimisation so has the tendency to

converge upon good observations at the cost of other observations

There are many schemes for overcoming these problems but are beyond the scope of this lecture

Page 92: Motion and tracking

Interesting Applications of Motion Tracking

Page 93: Motion and tracking

Lip-Reading

Facial features of a subject are tracked, specifically the mouth regions.

Mouth texture and shape are extracted and used to build discriminative patterns called sequential patterns

Page 94: Motion and tracking

Lip-Reading

Results:

Page 95: Motion and tracking

Sign Language Recognition

Tracking required for extracting the motions of the hands and head.

Movement features of the hands and hand shapes are extracted

Again, discriminative movement patterns uniquely identifying a sign is extracted

These patterns will be used to detect whether a sign is present in a video sequence or not

Page 96: Motion and tracking

Sign Language Recognition

Results:

Page 97: Motion and tracking

Group Behaviour Profiling

Even when tracking is not very accurate or robust, it can still be used to do useful things!

Example: Use simple trackers (e.g. Lucas Kanade trackers) to “track” people in a crowd

These will only last a short while, but can form short trajectories.

The analysis of these trajectories can be used to do profile crowd behaviours.

Page 98: Motion and tracking

Group Behaviour Profiling

Results:

Page 99: Motion and tracking

Summary

We have looked at a variety of tracking strategies from very simple schemes to those which can learn and predict complex non-linear motion in cluttered environments. This talk is not exhaustive but should give you a basic understanding of the types of techniques used in modern computer vision systems.

For more details on many of the examples see my website http://www.surrey.ac.uk/personal/e.ong

For a good introduction on the temporal mechanics of tracking I would recommend reading “Active Contours” by Isard and Blake

Page 100: Motion and tracking

Things to remember!!!

When tracking:– Tracking is only as good as your model and data

A bad metric will give bad results The larger the parameter space the more difficult things

become

– Make things as simple as possible Constrain your environment Use appropriate techniques and dynamics

– e.g. if your tracking someone jumping up and down don’t use a kalman filter

– Don’t try to reinvent the wheel But if your going to use black box techniques ensure you

know what they will and wont do for you