Upload
hakhue
View
218
Download
1
Embed Size (px)
Citation preview
Detection and Tracking of Moving Objects from a Moving Platform
Gérard Medioni
Institute of Robotics and Intelligent SystemsComputer Science DepartmentViterbi School of Engineering
University of Southern California
Problem Definition
• Scenario: rigidly moving objects + moving camera
• Goal• Motion segmentation: motion regions / background area• Tracking of multiple objects: consistent track(s) over time• Geo-registration and Geo-tracking: Geo-referenced mosaic and tracks
Scenario example 1 – moving cameras
Image stabilization
Tracking
Moving cameras
Motion segmentation
Mosaic +TracksMosaic +Tracks
Scenario example 2 - moving cameras with a map
Moving camera
Geo registration
Map
Image stabilization
Motion segmentation
TrackingGlobal data association
Geo-referenced mosaic + tracks
Geo-referenced mosaic + tracks
Challenges & Applications
• Information sources• Pixel colors + 2D coordinates
• Object model information (optional)
• Difficulties• Camera motion
• 3D Static structures (parallax)
• Multiple moving objects
• Applications• Video surveillance
• Video compression and indexing
• …
Outline
Introduction
2D Motion segmentation
• Tracking of multiple moving objects
• Geo-registration and geo-tracking
• Summary and Discussion
Motion Segmentation – Overview
• Task: to segment motion region and background
• Assumptions• General camera motion• Distant scene• Textured background
Feature Extraction & Matching
• Salient parts of the scene
• Extraction• Harris corners
• Multi-scale• Multi-orientation• Sub-pixel accuracy
• Matching• Small inter-frame motion
• Gray-scale windows• Cross correlation
• Large viewpoint change• Gradient histogram• Vector angle
Multiple Image Registration
• Frame motion model• Assumptions:
• Small inter-frame motion
• Distant planar scene
• 2D affine transform
• Robust estimation• Random Sample Consensus
(RANSAC)• Keep the model with the
largest number of inliers
• Non-linear refinement over the inliers
=
111002
2
1
1
232221
131211
v
u
v
u
AAA
AAA
21 pAp =
Motion Segmentation
• Two-frame pixel-level segmentation?• Segmentation within a temporal window
• Accumulate the pixels warped from adjacent frames
• K-Means to find the most representative pixel
• Frame differencing and thresholding: |Ioriginal-Imodel|>ΔI
10/72
Frame t-w Frame t+w
Frame t
t: reference framew: half size of the window
Experimental Results (1)
11/72
Original Images
Tracking Results
Initial Detection Results
Motion Prob. Maps
Experimental Results (2)
Initial Detection Results
Motion Prob. Maps
TrackingResults
Original images
Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
• Geo-registration and geo-tracking
• Summary and Discussion
Problem statement- multiple target tracking
• Input: foreground regions in each frame• Output: trajectories with consistent track IDs• Challenges:
• Noisy foreground regions
• Occlusions
Problematic underlying assumption
• One-to-one assumption• One target can correspond to at most one observation
• One observation can be associated to at most one target
• Appropriate to punctual observations
• Underlying one-to-one assumptionmay not stand for visual tracking
Radar Stationary cameraUAV camera
Related work
• (Pasula et al., 99) Gibbs sampling to compute joint DA• MAP, multi-scan, uniform prior (no missing or false detection)
• (Cong et al., 04) Approximate association probabilities in JPDAF• MMSE, MCMC outperforms JPDAF, one-scan/muliti-scan
• (Sastry, et.al 04) MCMC to compute joint DA with unknown number of targets• MAP, multi-scan, outperforms MHT, consider temporal association only
• (F.Dellaert et.al 03) MCMC to SfM without correspondence• MMSE, Single scan, similar to JPDAF
• Our method: overcome the one-to-one assumption
• MAP, multi-scan, consider both spatial and temporal association
One-to-one assum
ption
Anatomy of the problem
• “ Explain” foreground regions:
•It is hard at one frame without using any model inf ormation
•It is solvable if smoothness in motion and appearan ce is used
Explanation of foreground regions
• Two way of explain foreground regions
Labeling of foreground regions Cover of foreground regions
Precisely
Approximately
• The label(s) of a pixel indicates the track ID
• Each pixel can have multiple labels to represent occlusions
• Accurate but expensive!
• A set of shapes (rectangles)
• Each rectangle can have overlap with others to represent occlusions
• Approximate but Efficient!
Our formulation
• Given• A set of noisy observations (foreground regions)
• Find• A cover ω of foreground regions over time
is a sequence of shapes (rectangles)kτ
Solution space
• Solution space Ω is a collection of spatio-temporal covers of observation Y. • “Joint association event”
• Two kinds of data association• Spatial data association - change the cover at one instant
• Temporal data association - form consistent tracks
• Uncovered area belongs to false alarms
(a) Observations Y (b) One possible cover of Y
1 2, , Kω τ τ τ= K
Bayesian formulation
• MAP estimate
Prior model p(ω)• Few number of long tracks
• One track should have little overlapping with other track unless necessary
• Likelihood p(Y | ω)• Smoothness in both motion and appearance
• Areas of uncovered false alarms p(F)| | 1
11 1
( | ) ( ) ( ( ) | ( ))kK
k i k ik i
p Y p F L t tτ
ω τ τ−
+= =
= ∏∏
Motion likelihood
Appearance likelihood
( ) ( ) ( ) ( )p p L p K p Oω =
* arg max( ( | ))
( | ) ( | ) ( )
p Y
p Y p Y p
ω ωω ω ω=
∝
Motion and appearance likelihood
• Motion
• Appearance
1k k kt t
k k kt t
x A x w
y H x v
+ = +
= +
~ (0, )
~ (0, )
w N Q
v N R
1 1 1( ( ) | ( )) ( ( ) | ( ))M k i k i k i k iL t t p t tτ τ τ τ+ + +≡
1( )k itτ +
1( )k itτ +
( ) ( )1 3 3 1( ( ) | ( )) 1/ exp ( ( ), ( )A k i k i k i k iL t t z D t tτ τ λ τ τ+ += −
Kullback- Leibler (KL) distance between two RGB
color histograms
1( ( ), ( )k i k iD t tτ τ +
MAP of full posterior p(ω |Y)
• MAP estimate of such a posterior is not a trivial task• Even to determine the parameters in such a posterior is not an
easy task
• Solution to MAP:• Sampling based method to avoid enumerating all possible solutions
• Two types of proposal moves (temporal and spatial moves)
• Symmetric temporal information
0 1 2 3 4( | ) exp len olp app motC C Cp Y S K F S S SC Cω ∝ − − − − −
MAP is equivalent to minimize an energy function.
Markov Chain Monte Carlo
• Basic idea: construct a Markov chain which will converge to the target distribution• State of the Markov chain is defined in Ω• Transition of the Markov chain is guided by a proposal distribution
• Metropolis-Hasting algorithm• Propose a new state ω’ from the previous state ω(i)
• Accept ω’ with probability•
• Properties• Don’t have to compute the global p(ω), but the local ratio p(ω’)/ p(ω)
• For MAP, don’t have to keep the whole chain, but the current state and the best one
( )
( ) ( )
( ') ( | ')min 1,
( ) ( ' | )
i
i i
p q
p q
ω ω ωω ω ω
( )' ~ ( ' | )iqω ω ω
Metropolis-Hasting algorithm
(0)1. Initialize .ω2. For 0 to -1i N=
- Sample [0,1]u U∼( )- Propose ' ( ' | ).iqω ω ω∼
( )(
( ) ( )) ( ') ( | ')
mi - Comput n 1,( ) ( ' | )
e ( , ')=i
ii
i
p qA
p qω ω ω ω ω
ω ω ω
( ) ( 1)
( 1) ( )
- If ( , ') '
else
Endfor
i i
i i
u A ω ω ω ωω ω
+
+
< ==
(0) ( )The chain , , ( )NN pω ω ω→∞ →K
N is the length of Markov chain
q() is called the proposal distribution
Two types of q(ω’ | ω)
• Temporal moves and spatial
moves to drive the Markov chain
• Data-driven proposal
• Spatial moves are made only after
enough temporal information is
collected
• Symmetric temporal information
• Forward and backward (e.g. extension)
• Deal with occlusions at the very
beginning
Tem
pora
l Mov
esS
patia
l Mov
es
( ' | ) ( ' | , )q q Dω ω ω ω→
Birth/Death
Extension/Reduction
Split/Merge
Switch
Segmentation /Aggregation
Diffusion
MCMC Data Association
(0)1. Initialize .ω2. For 0 to -1i N=
- Sample [0,1]u U∼( )
Temporal
( )All
- Sample if , ' ( ' | )
else ' ( ' | ).
i
i
i N q
q
ε ω ω ω
ω ω ω
< ⋅ ∼
∼( )
(( ) ( )
) ( ') ( | ')mi - Comput n 1,
( ) ( ' | )e ( , ')=
i
ii
i
p qA
p qω ω ω ω ω
ω ω ω
( ) ( 1)
( 1) ( )
- If ( , ') '
else
Endfor
i i
i i
u A ω ω ω ωω ω
+
+
< ==
Determining Parameters
• Determine the parameters in the full posterior• Casual setting makes ground truth p(ωgt|Y) even much lower than the
“solution”.• Take advantage of the property of MCMC
0 1 2 3 4( | ) exp len olp app motC C Cp Y S K F S S SC Cω ∝ − − − − −
Degenerate the ωgt to ω’ ( )1
( ')gtp
p
ωω
≥
[ ]0 1 2 3 4
0 1 2 3 4
0 1 2 3 4
, , , ,
, , , , 0
max( )
A C C C C C b
C C C C C
C C C C C
≤
⇒ ≥ + + + +
Linear Programming to solve it (GNU Linear Programming Kit)
Simulation experiments
• Settings• K (unknown number) moving discs in 200x200
• Independent color appearance and motion
• Static occlusion and inter-occlusion
• False alarms
Original video Tracking result
Simulation experiments
• Quantitative comparison• MHT (I. Cox94), JPDAF (J.Kang03), Temporal only
• STDA score in VACE-II eval
• Same motion and appearance likelihood
• Average of multiple sequence and multiple runs
FA=0, W=50, 10K MCMC iterations K=5, W=50, 10K MCMC iterations
Simulation experiments
• Online implementation• Sliding window W
• Initialize ωt with ω*t-1
Online vs. offline comparison T=1000
Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
Geo-registration and geo-tracking
• Summary and Discussion
Geo-registration
• Use 2D homography to
compensate inter-frame (2-
view) motion
• Refine the homography
between map and images
11, , 1 ,( )i M i i i M updateHH H H−
+ +=
37/72
Hi,M
Hi,i+1
Hi+1,M
… …
Hupdate
Experimental results
• Results are shown on two UAV data sets• Map is acquired from Google Earth®
• Geo-registration is performed every 50 frames
• Local data association (MCMCDA) window 50 frames
System implementation
• C++ implementation• Xeon Dual Core P4 3.0GHz • Preliminary time performance
43/72
Procedure Time (seconds) on 320x240
Image registration ~ 0.25
Motion detection (moving cameras) ~ (2 / 0.1) (CPU / GPU)
Object detection after motion segmentation
~0.25
Geo-registration ~ 6 every 50 frames
Tracking ~ 0.4
Total ~ 1 ( GPU)
Outline
Introduction
2D Motion segmentation
Tracking of multiple moving objects
Geo-registration and geo-tracking
Summary and Discussion
Summary & Discussion
• Detection and tracking in dynamic scene• Moving camera + rigid moving objects
• 2D motion segmentation and geometric analysis of background
• Spatial and temporal (2D+t) data association of moving objects
• Tracking with Geo-registration
• Highlights• Solution to practical problems in detection and tracking area
• Encouraging results and extensive applications
• Future directions• Multi-view geometry + object recognition
• Automatically determination of applicable tasks
Reference
• Qian Yu and Gérard Medioni, “A GPU-based implementation of Motion Detection from a Moving Platform”, to appear in IEEE workshop on Computer Vision on GPU, in conjunction with CVPR’08
• Qian Yu and Gérard Medioni, “Integrated Detection and Tracking for Multiple Moving Objects using Data-Driven MCMC Data Association,”IEEE Workshop on Motion and Video Computing(WMVC'08), 2008
• Qian Yu, Gérard Medioni, Isaac Cohen, "Multiple Target Tracking Using Spatio-Temporal Monte Carlo Markov Chain Data Association" IEEE Conference on Computer Vision and Pattern Recognition, 2007 (CVPR'07), pp.1-8
• Qian Yu, Gérard Medioni, "Map-Enhanced Detection and Tracking from a Moving Platform with Local and Global Data Association," IEEE Workshop on Motion and Video Computing (WMVC'07), 2007
• Yuping Lin, Qian Yu, Gerard Medioni "Map-Enhanced UA V Image Sequence Registration" Workshop on Applications of Computer Vision (WACV'07), 2007
• Qian Yu, Isaac Cohen, Gérard Medioni and Bo Wu "Boosted Markov Chain Monte Carlo Data Association for Multiple Target Detection and Tracking," Proceedings of the 18th international Conference on Pattern Recognition (ICPR'06),Vol. 2, pp. 675-678.