Detection and Tracking of Moving Objects from a Moving ...people.math.gatech.edu/~mshort9/REU/Disaster_LA/Video Tracking... · Detection and Tracking of Moving Objects from a Moving

Detection and Tracking of Moving Objects from a Moving Platform

Gérard Medioni

Institute of Robotics and Intelligent SystemsComputer Science DepartmentViterbi School of Engineering

University of Southern California

Problem Definition

• Scenario: rigidly moving objects + moving camera

• Goal• Motion segmentation: motion regions / background area• Tracking of multiple objects: consistent track(s) over time• Geo-registration and Geo-tracking: Geo-referenced mosaic and tracks

Scenario example 1 – moving cameras

Image stabilization

Tracking

Moving cameras

Motion segmentation

Mosaic +TracksMosaic +Tracks

Scenario example 2 - moving cameras with a map

Moving camera

Geo registration

Map

Image stabilization

Motion segmentation

TrackingGlobal data association

Geo-referenced mosaic + tracks

Geo-referenced mosaic + tracks

Challenges & Applications

• Information sources• Pixel colors + 2D coordinates

• Object model information (optional)

• Difficulties• Camera motion

• 3D Static structures (parallax)

• Multiple moving objects

• Applications• Video surveillance

• Video compression and indexing

• …

Outline

Introduction

2D Motion segmentation

• Tracking of multiple moving objects

• Geo-registration and geo-tracking

• Summary and Discussion

Motion Segmentation – Overview

• Task: to segment motion region and background

• Assumptions• General camera motion• Distant scene• Textured background

Feature Extraction & Matching

• Salient parts of the scene

• Extraction• Harris corners

• Multi-scale• Multi-orientation• Sub-pixel accuracy

• Matching• Small inter-frame motion

• Gray-scale windows• Cross correlation

• Large viewpoint change• Gradient histogram• Vector angle

Multiple Image Registration

• Frame motion model• Assumptions:

• Small inter-frame motion

• Distant planar scene

• 2D affine transform

• Robust estimation• Random Sample Consensus

(RANSAC)• Keep the model with the

largest number of inliers

• Non-linear refinement over the inliers

=

111002

2

1

1

232221

131211

v

u

v

u

AAA

AAA

21 pAp =

Motion Segmentation

• Two-frame pixel-level segmentation?• Segmentation within a temporal window

• Accumulate the pixels warped from adjacent frames

• K-Means to find the most representative pixel

• Frame differencing and thresholding: |Ioriginal-Imodel|>ΔI

10/72

Frame t-w Frame t+w

Frame t

t: reference framew: half size of the window

Experimental Results (1)

11/72

Original Images

Tracking Results

Initial Detection Results

Motion Prob. Maps


Initial Detection Results

Motion Prob. Maps

TrackingResults

Original images


A synthesized video without motion regions

Outline

Introduction


Tracking of multiple moving objects

• Geo-registration and geo-tracking


Problem statement- multiple target tracking

• Input: foreground regions in each frame• Output: trajectories with consistent track IDs• Challenges:

• Noisy foreground regions

• Occlusions

Problematic underlying assumption

• One-to-one assumption• One target can correspond to at most one observation

• One observation can be associated to at most one target

• Appropriate to punctual observations

• Underlying one-to-one assumptionmay not stand for visual tracking

Radar Stationary cameraUAV camera

Related work

• (Pasula et al., 99) Gibbs sampling to compute joint DA• MAP, multi-scan, uniform prior (no missing or false detection)

• (Cong et al., 04) Approximate association probabilities in JPDAF• MMSE, MCMC outperforms JPDAF, one-scan/muliti-scan

• (Sastry, et.al 04) MCMC to compute joint DA with unknown number of targets• MAP, multi-scan, outperforms MHT, consider temporal association only

• (F.Dellaert et.al 03) MCMC to SfM without correspondence• MMSE, Single scan, similar to JPDAF

• Our method: overcome the one-to-one assumption

• MAP, multi-scan, consider both spatial and temporal association

One-to-one assum

ption

Anatomy of the problem

• “ Explain” foreground regions:

•It is hard at one frame without using any model inf ormation

•It is solvable if smoothness in motion and appearan ce is used

Explanation of foreground regions

• Two way of explain foreground regions

Labeling of foreground regions Cover of foreground regions

Precisely

Approximately

• The label(s) of a pixel indicates the track ID

• Each pixel can have multiple labels to represent occlusions

• Accurate but expensive!

• A set of shapes (rectangles)

• Each rectangle can have overlap with others to represent occlusions

• Approximate but Efficient!

Our formulation

• Given• A set of noisy observations (foreground regions)

• Find• A cover ω of foreground regions over time

is a sequence of shapes (rectangles)kτ

Solution space

• Solution space Ω is a collection of spatio-temporal covers of observation Y. • “Joint association event”

• Two kinds of data association• Spatial data association - change the cover at one instant

• Temporal data association - form consistent tracks

• Uncovered area belongs to false alarms

(a) Observations Y (b) One possible cover of Y

1 2, , Kω τ τ τ= K

Bayesian formulation

• MAP estimate

Prior model p(ω)• Few number of long tracks

• One track should have little overlapping with other track unless necessary

• Likelihood p(Y | ω)• Smoothness in both motion and appearance

• Areas of uncovered false alarms p(F)| | 1

11 1

( | ) ( ) ( ( ) | ( ))kK

k i k ik i

p Y p F L t tτ

ω τ τ−

+= =

= ∏∏

Motion likelihood

Appearance likelihood

( ) ( ) ( ) ( )p p L p K p Oω =

* arg max( ( | ))

( | ) ( | ) ( )

p Y

p Y p Y p

ω ωω ω ω=

∝

Motion and appearance likelihood

• Motion

• Appearance

1k k kt t

k k kt t

x A x w

y H x v

+ = +

= +

~ (0, )

~ (0, )

w N Q

v N R

1 1 1( ( ) | ( )) ( ( ) | ( ))M k i k i k i k iL t t p t tτ τ τ τ+ + +≡

1( )k itτ +

1( )k itτ +

( ) ( )1 3 3 1( ( ) | ( )) 1/ exp ( ( ), ( )A k i k i k i k iL t t z D t tτ τ λ τ τ+ += −

Kullback- Leibler (KL) distance between two RGB

color histograms

1( ( ), ( )k i k iD t tτ τ +

MAP of full posterior p(ω |Y)

• MAP estimate of such a posterior is not a trivial task• Even to determine the parameters in such a posterior is not an

easy task

• Solution to MAP:• Sampling based method to avoid enumerating all possible solutions

• Two types of proposal moves (temporal and spatial moves)

• Symmetric temporal information

0 1 2 3 4( | ) exp len olp app motC C Cp Y S K F S S SC Cω ∝ − − − − −

MAP is equivalent to minimize an energy function.

Markov Chain Monte Carlo

• Basic idea: construct a Markov chain which will converge to the target distribution• State of the Markov chain is defined in Ω• Transition of the Markov chain is guided by a proposal distribution

• Metropolis-Hasting algorithm• Propose a new state ω’ from the previous state ω(i)

• Accept ω’ with probability•

• Properties• Don’t have to compute the global p(ω), but the local ratio p(ω’)/ p(ω)

• For MAP, don’t have to keep the whole chain, but the current state and the best one

( )

( ) ( )

( ') ( | ')min 1,

( ) ( ' | )

i

i i

p q

p q

ω ω ωω ω ω

( )' ~ ( ' | )iqω ω ω

Metropolis-Hasting algorithm

(0)1. Initialize .ω2. For 0 to -1i N=

- Sample [0,1]u U∼( )- Propose ' ( ' | ).iqω ω ω∼

( )(

( ) ( )) ( ') ( | ')

mi - Comput n 1,( ) ( ' | )

e ( , ')=i

ii

i

p qA

p qω ω ω ω ω

ω ω ω

( ) ( 1)

( 1) ( )

- If ( , ') '

else

Endfor

i i

i i

u A ω ω ω ωω ω

+

+

< ==

(0) ( )The chain , , ( )NN pω ω ω→∞ →K

N is the length of Markov chain

q() is called the proposal distribution

Two types of q(ω’ | ω)

• Temporal moves and spatial

moves to drive the Markov chain

• Data-driven proposal

• Spatial moves are made only after

enough temporal information is

collected

• Symmetric temporal information

• Forward and backward (e.g. extension)

• Deal with occlusions at the very

beginning

Tem

pora

l Mov

esS

patia

l Mov

es

( ' | ) ( ' | , )q q Dω ω ω ω→

Birth/Death

Extension/Reduction

Split/Merge

Switch

Segmentation /Aggregation

Diffusion

MCMC Data Association

(0)1. Initialize .ω2. For 0 to -1i N=

- Sample [0,1]u U∼( )

Temporal

( )All

- Sample if , ' ( ' | )

else ' ( ' | ).

i

i

i N q

q

ε ω ω ω

ω ω ω

< ⋅ ∼

∼( )

(( ) ( )

) ( ') ( | ')mi - Comput n 1,

( ) ( ' | )e ( , ')=

i

ii

i

p qA

p qω ω ω ω ω

ω ω ω

( ) ( 1)

( 1) ( )

- If ( , ') '

else

Endfor

i i

i i

u A ω ω ω ωω ω

+

+

< ==

Determining Parameters

• Determine the parameters in the full posterior• Casual setting makes ground truth p(ωgt|Y) even much lower than the

“solution”.• Take advantage of the property of MCMC

0 1 2 3 4( | ) exp len olp app motC C Cp Y S K F S S SC Cω ∝ − − − − −

Degenerate the ωgt to ω’ ( )1

( ')gtp

p

ωω

≥

[ ]0 1 2 3 4

0 1 2 3 4

0 1 2 3 4

, , , ,

, , , , 0

max( )

A C C C C C b

C C C C C

C C C C C

≤

⇒ ≥ + + + +

Linear Programming to solve it (GNU Linear Programming Kit)

Simulation experiments

• Settings• K (unknown number) moving discs in 200x200

• Independent color appearance and motion

• Static occlusion and inter-occlusion

• False alarms

Original video Tracking result


• Quantitative comparison• MHT (I. Cox94), JPDAF (J.Kang03), Temporal only

• STDA score in VACE-II eval

• Same motion and appearance likelihood

• Average of multiple sequence and multiple runs

FA=0, W=50, 10K MCMC iterations K=5, W=50, 10K MCMC iterations


• Online implementation• Sliding window W

• Initialize ωt with ω*t-1

Online vs. offline comparison T=1000

Real Scenarios

Experiments

CLEAR 320x240 Vivid-II 320x240

Experiments

• Can handle occlusion at the beginning by using symmetric temporal information

Outline

Introduction



Geo-registration and geo-tracking


Geo-registration

• Use 2D homography to

compensate inter-frame (2-

view) motion

• Refine the homography

between map and images

11, , 1 ,( )i M i i i M updateHH H H−

+ +=

37/72

Hi,M

Hi,i+1

Hi+1,M

… …

Hupdate

Geo-registration results

Geo-mosaicing 2000 frames on top of the reference frame.

Experimental results

• Results are shown on two UAV data sets• Map is acquired from Google Earth®

• Geo-registration is performed every 50 frames

• Local data association (MCMCDA) window 50 frames

Geo-registration

Without geo-refinement With geo-refinement



System implementation

• C++ implementation• Xeon Dual Core P4 3.0GHz • Preliminary time performance

43/72

Procedure Time (seconds) on 320x240

Image registration ~ 0.25

Motion detection (moving cameras) ~ (2 / 0.1) (CPU / GPU)

Object detection after motion segmentation

~0.25

Geo-registration ~ 6 every 50 frames

Tracking ~ 0.4

Total ~ 1 ( GPU)

Outline

Introduction



Geo-registration and geo-tracking

Summary and Discussion

Summary & Discussion

• Detection and tracking in dynamic scene• Moving camera + rigid moving objects

• 2D motion segmentation and geometric analysis of background

• Spatial and temporal (2D+t) data association of moving objects

• Tracking with Geo-registration

• Highlights• Solution to practical problems in detection and tracking area

• Encouraging results and extensive applications

• Future directions• Multi-view geometry + object recognition

• Automatically determination of applicable tasks

Reference

• Qian Yu and Gérard Medioni, “A GPU-based implementation of Motion Detection from a Moving Platform”, to appear in IEEE workshop on Computer Vision on GPU, in conjunction with CVPR’08

• Qian Yu and Gérard Medioni, “Integrated Detection and Tracking for Multiple Moving Objects using Data-Driven MCMC Data Association,”IEEE Workshop on Motion and Video Computing(WMVC'08), 2008

• Qian Yu, Gérard Medioni, Isaac Cohen, "Multiple Target Tracking Using Spatio-Temporal Monte Carlo Markov Chain Data Association" IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR'07), pp.1-8

• Qian Yu, Gérard Medioni, "Map-Enhanced Detection and Tracking from a Moving Platform with Local and Global Data Association," IEEE Workshop on Motion and Video Computing (WMVC'07), 2007

• Yuping Lin, Qian Yu, Gerard Medioni "Map-Enhanced UA V Image Sequence Registration" Workshop on Applications of Computer Vision (WACV'07), 2007

• Qian Yu, Isaac Cohen, Gérard Medioni and Bo Wu "Boosted Markov Chain Monte Carlo Data Association for Multiple Target Detection and Tracking," Proceedings of the 18th international Conference on Pattern Recognition (ICPR'06),Vol. 2, pp. 675-678.

Q&A

Thank you!

Documents

Detection and Tracking of Moving Objects from a Moving ...people.math.gatech.edu/~mshort9/REU/Disaster_LA/Video Tracking... · Detection and Tracking of Moving Objects from a Moving