Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park

Human-Computer InteractionHuman-Computer Interaction Tracking

Hanyang University

Jong-Il Park

Tracking in computer visionTracking in computer vision Definition

Problem of generating an inference about the motion of an object given a sequence of images

Application Motion capture Recognition from motion Surveillance

Who is doing what? Eg. Security, HCI(Kinect…)

TrackingTracking Establish state of object using time sequence

state could be: position; position+velocity; position+velocity+acceleration or more complex, Eg. all joint angles for a person

Biggest problem -- Data Association which image pixels are informative, which are not?

Key ideas Tracking by detection

if we know what an object looks like, that selects the pixels to use

Tracking through flow if we know how an object moves, that selects the pixels

to use

Appearance vs. FlowAppearance vs. Flow

Tracking by DetectionTracking by Detection Assume

a very reliable detector (e.g. faces; back of heads) detections that are well spaced in images (or have

distinctive properties) e.g. news anchors; heads in public

Link detects across time only one - easy multiple - weighted bipartite matching but what if one is missing?

Better: create abstract tracks link detects to track create tracks, reap tracks as required clean up spacetime paths

Tracking by Known AppearanceTracking by Known Appearance Even if we don’t have a detector Know rectangle in image (n) Want to find corresponding rectangle in (n+1) Search over nearby rectangles

to find one that minimizes SSD error

where sum is over pixels in rectangle

Application stabilize players in TV sport

Buidling TracksBuidling Tracks

Start at scattered points in image 1 perhaps corner detector responses

For each in image (n), compute position in (n+1) as in previous slide

Now check tracks patch in (n+1) should look like an affine transform

of patch in 1 Prune bad tracks

What if the patch deforms?What if the patch deforms? Eg a football player’s jersey

Colors are “similar” but SSD won’t work Idea: patch histogram is stable

To track: repeat

predict location of new patch search nearby for patch whose histogram matches

original the best

When are motions easy?When are motions easy? Current procedure

predict state obtaining measurement from prediction by search correct state

Easy When the object is close to where you expect it to be

eg Object guaranteed to move a little

Large motions can be easy When they’re “predictable”

e.g. ballistic motion e.g. constant velocity

Need a theory to fuse this procedure with motion model

General ModelGeneral Model We assume there are moving objects, which

have an underlying state X There are measurements Y, some of which

are functions of this state Eg. There is a clock

at each tick, the state changes at each tick, we get a new observation

object is ball, state is 3D position+velocity, measurements are stereo pairs

object is person, state is body configuration, measurements are frames, clock is in camera (30 fps)

Tracking as Tracking as

an Abstract Inference Probleman Abstract Inference Problem Internal state: X Measurement: Y Major Steps:

1. Prediction

2. Data association: determining which data are informative

3. Correction

Independence AssumptionIndependence Assumption Only the immediate past matters

* X: Markov process

Measurements depend only on the current state

Bayes RuleBayes Rule

PredictionPrediction

CorrectionCorrection

Linear Dynamic ModelLinear Dynamic Model Model

D: Transition matrix

M: Measurement matrix

Kalman filter is well-suited for this type of tracking

Read Ch.11.3, Forsyth & Ponce, Computer Vision, 2012.

Kalman filterKalman filter

Eg. Kalman filterEg. Kalman filter

*: predictedx: measured+: estimatedo: truebar: 3

Nonlinear DynamicsNonlinear Dynamics

Problems tend not to be normal

may not be Gaussian(quite common in vision problem)

multiple, well-separated modes

Eg. Nonlinear modelEg. Nonlinear model

Eg.Eg. Nonlinear modelNonlinear model Time Evolution & pdfTime Evolution & pdf

Difficulties Difficulties in Complicated Modelsin Complicated Models

In order to maintain handling multiple peaks handling multi-dimensional state vector

Particle filtering is a useful approach.

How?

Sampled RepresentationSampled Representation Representation of pdf is

NOT for representing a pdf itself BUT for computing some expectation

Computing expectation using sampled representation

Monte Carlo IntegrationMonte Carlo Integration

where : sampling distribution

: weight

Obtaining a sampled representation Obtaining a sampled representation of probability distributionof probability distribution

Computing an expectation using Computing an expectation using a set of samplesa set of samples

TransformationTransformation Transforming a sampled representation of Transforming a sampled representation of

a prior into that of a posteriora prior into that of a posterior

Naive particle filterNaive particle filter

Eg. Naive particle filterEg. Naive particle filter

Poor results due to sample impoverishment! Most of the weights get small very fast

The way to get accurate estimates = to have samples lie where the p is likely to

be large

Overcoming Sample ImpoverishmentOvercoming Sample Impoverishment

Equivalent to maintaining a set of good particles

Then, how?

Resampling the prior1. expand the sample set non-uniformly using

the weights Form a new set of samples consisting of a

union of Nk copies of (sk,1) for each k.

2. Subsample the sample set uniformly

Practical particle filterPractical particle filter Initialization Prediction Correction Resampling:

1. Normalise the weights so that 2. Compute the variance of the

normalised weights. If(var >Th) construct a new set of

samples by drawing, with replacement, N samples from the old set, using the weights as the probability that a sample will be drawn.

The weight of each sample is now 1/N.

Naive particle filter

Consequence of resamplingConsequence of resampling

Particles that tend to reflect the state rather well usually reappear in the resampled set

Many particles lie within one standard deviation of the mean of the posterior

Particle filtersParticle filters Different community -> different name

statistics particle filter

AI survival of the fittest

computer vision condensation

Tracking peopleTracking people Essential components

Motion model Likelihood model

P(image features|person present at given configuration)

Motion model Strong motion model: markers, angles,… Weak motion model: drift model

Likelihood model SSD, edges, other features,…

Likelihood computationLikelihood computation

Boundaryinformation

Non-backgroundinformation

Sample points

Annealed particle filterAnnealed particle filter To overcome the problems of

High dimensionality of the state Many local peaks in the likelihood

Annealing Starts from smooth approximations to the

likelihood to less smooth approximation Repeats weighting and resampling

[Deutscher,Blake,Reid, CVPR2000]

Finding people Finding people As an initialization of people tracker There is no person tracker that represents

the configuration of the body and can start automatically

Known approaches1. Template matching

2. Finding faces

3. Search over correspondence

* Challenging topic: learning models from data

Documents

Human-Computer Interaction Human-Computer Interaction Tracking Hanyang University Jong-Il Park