87
Papers: Pfinder: Real-Time Tracking of the Human Body, Wren, C., Azarbayejani, A., Darrell, T., and Pentland, P. Tracking and Labelling of Interacting Multiple Targets, J. Sullivan and S. Carlsson

Pedestrians Detection and Tracking

  • Upload
    illias

  • View
    63

  • Download
    1

Embed Size (px)

DESCRIPTION

Pedestrians Detection and Tracking. Papers: Pfinder : Real-Time Tracking of the Human Body , Wren, C.,  Azarbayejani , A., Darrell, T., and  Pentland , P. Tracking and  Labelling  of Interacting Multiple Targets ,  J. Sullivan and S.  Carlsson. Presentation Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: Pedestrians Detection and Tracking

Papers:•Pfinder: Real-Time Tracking of the Human Body,

Wren, C., Azarbayejani, A., Darrell, T., and Pentland, P.

•Tracking and Labelling of Interacting Multiple Targets,

J. Sullivan and S. Carlsson

Page 2: Pedestrians Detection and Tracking

This talk will cover two distinct tracking algorithms. Pfinder: Real-Time Tracking of the Human BodyMulti-target tracking and labeling

For each of them we will present:Motivation and previous approachesReview of relevant techniquesAlgorithm detailsApplications and demos

Page 3: Pedestrians Detection and Tracking

There is always a major trade-off between genericity and accuracy.

Because we know we are trying to identify and track human beings, we can start making assumptions about our objects.

If we have more specific information (example: tracking players in a football game), we can add even more specific assumptions.

These kind of assumptions will help us to get a more accurate tracking.

Page 4: Pedestrians Detection and Tracking

Tracking Algorithm #1

Pfinder: Real-Time Tracking of the Human Body

Page 5: Pedestrians Detection and Tracking

Motivation

Page 6: Pedestrians Detection and Tracking

Introduction

• Pfinder is a tracking algorithm– Detects human motion in real-time.– Segments the person’s body– Analyze internal features (head, body,

hands, and feet)

Page 7: Pedestrians Detection and Tracking
Page 8: Pedestrians Detection and Tracking

Many Tracking algorithm use a static model – For each frame, similar pixels are searched in the vicinity of the bounding box of the previous frame. We will use a dynamic model – One that learns over

time.Most tracking algorithms need some user-input

for initialization. The presented algorithm will do automatic

initialization.

Page 9: Pedestrians Detection and Tracking

Covariance For a domain of dimension , we define the

sampling domain’s variables The covariance of two variables is defined:

where The covariance of two variables is a measure of how much two variables change together.

1 nx xn

,i jx x

cov ,i j i i j jx x E x x i iE x

Page 10: Pedestrians Detection and Tracking

The Covariance Matrix (marked ) is defined:

Normal distribution of a variable is defined:

cov ,ij i jx x

2

2

1exp

22

xp x

x

Page 11: Pedestrians Detection and Tracking

The more generalized multivariate distribution is defined:

1

1 2 1 2

1 1, , exp

22

T

N Np x x x x x

Page 12: Pedestrians Detection and Tracking

Mahalanobis distance: The distance measured from a

sample vector To a group of samples with mean and a covariance matrix is defined:

1 Nx x x

MD x

1

T

N S

1T

MD x x S x

Page 13: Pedestrians Detection and Tracking

1. (Automatic) Initialization Background is modeled in a few seconds of

video where the person does not appear. When the person enters the scene, he is

detected and modeled.

2. The analysis loop After the background and person models are

initialized, each pixel in the next frame is checked against all models.

Page 14: Pedestrians Detection and Tracking

The first step in the algorithm is build a preliminary representation of the person and the surrounding scene.

First we need to acquire a video sequence of the scene that do not contain a person in order to model the background

Page 15: Pedestrians Detection and Tracking

The algorithm assumes a mostly-static background.

However, it is needed to be robust in illumination changes and to be able to recover from changes in the scene (e.g. a book that was moved from one place to another).

Page 16: Pedestrians Detection and Tracking

The images in the video are using the YUV color representation (Y = luminance component, UV = chrominance component). There exists a transformation matrix which

transforms RGB representation to YUV.The algorithm models the background by

matching each pixel a Gaussian that describes the pixel’s mean and distribution.

Page 17: Pedestrians Detection and Tracking

We do this by measuring the pixel’s YUV mean and distribution over time

This pixel has some YUV value on this frame, on the next frame, it might change, so we mark it’s mean asand its covariance matrix as

0 ,x y

0 ,K x yy u

v

Page 18: Pedestrians Detection and Tracking

After the scene has been modeled, Pfinder watches for large deviations from this model.

This is done by measuring the Mahalanobis distance in the color space between the new pixel’s value and to the scene model values in the appropriate location.

If the distance is large enough and the change is visible over a sufficient number of pixel, we begin to build a model of a person.

Page 19: Pedestrians Detection and Tracking

The algorithm represents the detected person’s body parts using blobs.

Blobs are 2D representation of a Gaussian distribution of the spatial statistics.

Also, a support map is built for each blob :

k 1 ,,

0k

x y kS x y

otherwise

Page 20: Pedestrians Detection and Tracking

To initialize the blob models, Pfinder uses a 2D contour shape analysis that attempts to identify the head, hands, and feet location.

A blob is created for each identified location.

Page 21: Pedestrians Detection and Tracking

The class analyzer find the location of body features by using statistics from their position and color in the previous frames.

Because no statistics have been gathered yet (this is the first frames where the person appears), the algorithm uses ready-made statistical priors.

Page 22: Pedestrians Detection and Tracking

Hand and face blobs have strong flesh-colored color priors (it appears that normalized skin color is constant across different skin pigmentation levels).

The other blobs are initialized to cover the clothing regions

Page 23: Pedestrians Detection and Tracking

The contour analyzer can find features in a single frame, but the results tend to be noisy.

The class analyzer produce accurate result but it depends on the stability of the underlying models (i.e. no occlusion).

A blend of contour analysis and class model is used to find the feature in the next frame.

Page 24: Pedestrians Detection and Tracking

original

contour

Page 25: Pedestrians Detection and Tracking

After the initialization step of the algorithm, the information is now divided into scene and person models. Scene (background) model consist of the color space

distribution for each pixel. Person model consist of spatial space and color space

distribution for each blobThe spatial space determines the blob’s location and

sizeThe color space determines the distribution of color in

the blob

Page 26: Pedestrians Detection and Tracking

Given a person model and a scene model, we can now acquire a new image, interpret it, and update the scene and person models.

Page 27: Pedestrians Detection and Tracking

1. Update the spatial model associated with each blob using the blob’s measured statistics, to yield the blob’s predicted spatial distribution for the current image.This is done with a Kalman filter assuming simple Newtonian dynamics.

Page 28: Pedestrians Detection and Tracking

Measuring information from video sequence can be very inaccurate sometimes

Page 29: Pedestrians Detection and Tracking

Without some kind of filtering it would be impossible to make any short-term forward predictions.

Also, each measurement is used as a seed for the tracking algorithm atthe next frame.

Some kind of filteringis needed to make themeasurements moreaccurate.

Page 30: Pedestrians Detection and Tracking

Each tracked object is represented with a state vector (usually location)

With each new frame, a linear operator is applied to the state to generate the new state, with some noise mixed in, and some information from the controls on the system

Usually, Newton’s laws are applied.

Page 31: Pedestrians Detection and Tracking

The noise added is a Gaussian noise with mean 0 and a covariance matrix.

The predicted state is then updated with the real measurement to create the estimate for the next frame.

Page 32: Pedestrians Detection and Tracking

2. Now when a new image is acquired, we measure the likelihood of each pixel being a member of each of the blob models and the scene model:the vector is defined as the location and color of each pixel. For each class , the log likelihood is measured:

, , , ,p x y Y U V

k

11 1ln ln 2

2 2 2T

k k k k k

md p K p K

Page 33: Pedestrians Detection and Tracking

3. Each pixel is now assign to a particular class.Either one of the blobs or the background.A support map is build which indicates which pixel belong to which class

, arg max ,kk

s x y d x y

Page 34: Pedestrians Detection and Tracking

Connectivity constraints are enforced by iterative morphological growing from a single central point, to produce a connected region.

First, a foreground region is growncomprised of all the blob classes.

Then, each of the individual blob isgrown with the constraint that theyremain confined to the foregroundregion

Page 35: Pedestrians Detection and Tracking

4. Now the statistical model for each class is updated. For the blob classes, the new mean is calculated

The Kalman filter statistics are also updated at this time.

Background pixels are also updated to have the ability to recover from changes in the scene.

k

T

k k k

E p k

K E p p

Page 36: Pedestrians Detection and Tracking

The algorithm employs several domain-specific assumptions in order to have an accurate tracking. If one of the assumptions break, the system

degrades.However, the system can recover after a few

frames if the assumptions again holdThe system can track only after a single

person.

Page 37: Pedestrians Detection and Tracking

RMS (Root Mean Square) errors were found on the order of a few pixels:

TestHandArm

Translation(X,Y)

0.7 pixels(0.2% relative)

2.1 pixels(0.8% relative)

Rotation( )

4.8 degrees(5.2% relative)

3.0 degrees(3.1% relative)

moran
Page 38: Pedestrians Detection and Tracking

A Modular Interface - An application that provides programmers tracking, segmentation and feature detection.

The ALIVE application places 3d animated characters that interact with the person according to his gestures.

Here, Rexy!

Page 39: Pedestrians Detection and Tracking

The SURVIVE application recorded the movement of the person to navigate a 3d virtual game environment.

I guess you can’t get any nerdy

than this

Page 40: Pedestrians Detection and Tracking

Recognition of American Sign Language Pfinder was used as a pre-process for detecting a

40-word subset of ASL. It had 99% sign accuracy

Page 41: Pedestrians Detection and Tracking

Avatars and Telepresence The model of the person is translated to several

blobs. Which can be used to model 2d characters.

Page 42: Pedestrians Detection and Tracking

Tracking Algorithm #2

Multi-Target Tracking and Labeling

uses slides by Josephine Sullivan

from http://www.csc.kth.se/~sullivan/

Page 43: Pedestrians Detection and Tracking

Motivation

Page 44: Pedestrians Detection and Tracking

Introduction

• The multi-target tracknig and labeling algorithm– Track multiple targets over large

periods of time– Robust collision recovery– Does labeling even when targets are

interacting

Page 45: Pedestrians Detection and Tracking

Multi Tracking and Labeling

Sometimes Easy Sometimes Hard

Page 46: Pedestrians Detection and Tracking

The algorithm addresses the problem of the surveillance and tracking of multiple persons over a wide area.

Previous multi-target tracking algorithms are based on Kalman filtering and advanced techniques of particle filtering.

Often tracking algorithms fails if occlusion or interaction between the targets occurs.

Page 47: Pedestrians Detection and Tracking

This work’s specific goal is to track and label the players in a football game.

This is especially hard when players collide and interact

Page 48: Pedestrians Detection and Tracking

The researchers used a wide-screen video which was produced using the video from four calibrated cameras.

The images were stitched after the homography between the images was computed.

This produces a high-resolution video which gives good tracking results

Page 49: Pedestrians Detection and Tracking

1. Background modeling and subtraction2. Build an interaction graph3. Resolve split/merge situations4. Recover identities of temporally separated

player trajectories.

Page 50: Pedestrians Detection and Tracking

A probabilistic model of the image gradient of each pixel in the background is obtained.

The gradient is used to prevent situation where the player’s uniform has the same color as the background.

Page 51: Pedestrians Detection and Tracking

Let denote the image gradient at pixel in frame .

Each background pixel is modeled by a mixture of three bivariate normal distributions with means and covariance matrices :

Where and

txg x

tbxg

ix

3

21

,i i ix x x x

i

g N

0 1ix

3

1

1ix

i

ix

Page 52: Pedestrians Detection and Tracking

A pixel in frame is considered a foreground pixel if is larger than a threshold . Let be the set of foreground pixels at time t Let be the set of background pixels at time t

Connected components are then identified and are processed by deleting small “cc”s or joining them to neighboring larger “cc”s. This is made to make sure that each connected

component corresponds to at least one whole player

x t 1Tt t

x x x x xg g

tF

tB

Page 53: Pedestrians Detection and Tracking

The set of ellipses representing the connected components detected (marked by bounding boxes) is defined:

With being the number of ellipses detected in frame

1

tntt i i

E

E

tnt

Page 54: Pedestrians Detection and Tracking

The first aim is to put the ellipses in and in correspondence.

Definition: ellipses and are an exact match if their size and orientation are sufficiently similar and distance between their centers are sufficiently small.

tE 1tE

1E 2E

Page 55: Pedestrians Detection and Tracking

Define a relation : if and are an exact match If no such exact match exists for in then

if and has no exact match in

Define a Forward and Backward mappings: Forward mapping:Backward mapping:

~

1~i jt tE E

itE 1

jtE

itE 1tE

1~i jt tE E 1 0i j

t tArea E E 1jtE

tE

1~i jt t tj F i E E t tk B i i F k

Page 56: Pedestrians Detection and Tracking

With the forward and backward mapping, we can define events at each frame:

SignalEventSignalEvent

SplitMerge

DisappearAppear

stable

1tF i

0tF i

1t t tF i B F i

1tB j

0tB j

Page 57: Pedestrians Detection and Tracking

A maximal sequence of stable events sandwiched between non-stable events is termed a track.

A player track is a track that corresponds to exactly one player

Page 58: Pedestrians Detection and Tracking

If the event sequence is track split or merge trackthen track involves multiple players

If the event sequence is{split, appear} track {merge ,disappear}then track may be a player track

If such track is long enough and ellipse size is not too big, it is considered a player track.

Other tracks are called multiple players track.

Page 59: Pedestrians Detection and Tracking
Page 60: Pedestrians Detection and Tracking

Because we’re dealing with a football game, we know that players are divided into 3 categories: Team A, Team B and officials.

This will help us in cases where teams from different teams appear in multiple players tracks.

Page 61: Pedestrians Detection and Tracking
Page 62: Pedestrians Detection and Tracking

Given the labeling of the tracks and their interactions through merging and splitting, the game can be summarized by a graph structure called target interaction graph.

White and gray nodes corresponds to team A / team B player tracks.Black nodes corresponds to multiple players trackThis graph is a small section of the ~5000 node graph describing 10 minutes of analyzed gameplay.

Page 63: Pedestrians Detection and Tracking

By examining the player interaction graph, it is possible to isolate situations where n player tracks merge and then split into n player tracks.

These merge-split situations are resolved by finding correspondence between input and output tracks.

Page 64: Pedestrians Detection and Tracking

Input and output tracks are each a set of n tracks.

We wish to find the assignment of the input to the output. It is a bijective mapping

. Where implies that track and are the same player.

Not all assignments are physically possible.

M

: 1, , 1, ,M n n M i j

iT jT

Page 65: Pedestrians Detection and Tracking
Page 66: Pedestrians Detection and Tracking

For each valid assignment, we estimate the intermediate tracks by exploiting the properties of maintaining continuity of motion and relative depth ordering.

Page 67: Pedestrians Detection and Tracking

We investigate if any of the intermediary tracks can be described by a constant velocity motion model. This is done by linearly interpolate between the last ellipse of and the first ellipse of .

If there is sufficient image data to support this, the penalty for this estimation is 0.

iT

M iT

Page 68: Pedestrians Detection and Tracking

The overall estimation for each assignment is scored:Where: is the distance traveled during

the hypothesized trajectory.

1

n

M i iM i M ii

Sc Dist T T Pen T T

i M iDist T T

1 if T is not consistent with relevant T

0

i M i

i M iPen T Totherwise

Page 69: Pedestrians Detection and Tracking
Page 70: Pedestrians Detection and Tracking

If the minimum score assignment was explained solely on linear interpolation, and its estimate is lower than threshold , then we accept this assignment.

Otherwise, we repeat this process at constant time intervals. This is called relative depth ordering.

Page 71: Pedestrians Detection and Tracking

Intermediary tracks that cannot be explained by simple linear interpolation, is analyzed every mth frame in the interval between the merge and the split.

Starting with the first interval, we define the region as the union of all ellipses and try to interpolate in smaller distances.

k jtR

moran
Page 72: Pedestrians Detection and Tracking

The aim at each interval is to maximize the intersection of with the foreground pixels and minimize the intersection with the background pixels.

Again, the penalty is set to 1 if the mentioned intersection is not consistent.

Then the score is re-calculated and the minimum scored assignment is chosen.

k jtR

Page 73: Pedestrians Detection and Tracking
Page 74: Pedestrians Detection and Tracking

This process was found to be working if the number of targets merging was smaller or equal to 5.

Nonetheless, the examined sequence contained roughly 200 merge-split situations, of varying complexity, all resolved.

Page 75: Pedestrians Detection and Tracking

At this step, it is interesting to see how frequently a player was assigned a player track.

Page 76: Pedestrians Detection and Tracking

Not all split/merge situations were accurately resolved.

Usually, other features can be used to resolved the identity of player tracks.

In a football game, a player’s identity can be obtained by his relative position to his teammates. .

The easiest example is the goalkeeper who is always behind his teammates.

Page 77: Pedestrians Detection and Tracking

We can look at the problem as a partitioning problem

Page 78: Pedestrians Detection and Tracking

This is specific to a football game, but a variation can be used for other applications.

The feature vector for each playerat frame is: which counts the number of players in the team to the left, right, in front and behind the player.

1, ,11i

t , , ,i i i i it t t t tv r l f b

Page 79: Pedestrians Detection and Tracking
Page 80: Pedestrians Detection and Tracking
Page 81: Pedestrians Detection and Tracking

We assign an index to every possible configuration (feature vector) and for each unlabeled player track, we make a histogram of the configuration over the track’s ellipses.

Page 82: Pedestrians Detection and Tracking

We start by considering only long player tracks (over 40 seconds).

Build their distance matrix:

The distance between every pair of player tracks is shown.Darker values indicated smaller distances

moran
Page 83: Pedestrians Detection and Tracking
Page 84: Pedestrians Detection and Tracking

We Grow and merge cluster by using player tracks of decreasing lengths. This clustering considers tracks of 750 frames long.

Page 85: Pedestrians Detection and Tracking

Clustering at 250 frames tracks:

Errors begin to occor

Page 86: Pedestrians Detection and Tracking
Page 87: Pedestrians Detection and Tracking

We’ve seen two algorithmsOne deals with single person tracking, the other

with multi-target trackingBoth algorithm makes specific assumptions. The

first one assumptions about the human body and motion, the other about motion and football game’s conditions.