Upload
archibald-cornelius-dalton
View
224
Download
0
Embed Size (px)
Citation preview
UC Berkeley Computer Vision Group
ICCV 2003
Recognizing Action at a Distance
A.A. Efros, A.C. Berg, G. Mori, J. Malik
UC Berkeley
UC Berkeley Computer Vision Group
ICCV 2003
Looking at People
• 3-pixel man• Blob tracking
– vast surveillance literature
• 300-pixel man• Limb tracking
– e.g. Yacoob & Black, Rao & Shah, etc.
Far fieldNear field
UC Berkeley Computer Vision Group
ICCV 2003
Medium-field Recognition
The 30-Pixel Man
UC Berkeley Computer Vision Group
ICCV 2003
Appearance vs. Motion
Jackson PollockNumber 21 (detail)
UC Berkeley Computer Vision Group
ICCV 2003
Goals
• Recognize human actions at a distance– Low resolution, noisy data– Moving camera, occlusions– Wide range of actions (including non-periodic)
UC Berkeley Computer Vision Group
ICCV 2003
Our Approach
• Motion-based approach– Non-parametric; use large amount of data– Classify a novel motion by finding the most similar
motion from the training set• Related Work
– Periodicity analysis• Polana & Nelson; Seitz & Dyer; Bobick et al; Cutler & Davis;
Collins et al.
– Model-free • Temporal Templates [Bobick & Davis]
• Orientation histograms [Freeman et al; Zelnik & Irani]
• Using MoCap data [Zhao & Nevatia, Ramanan & Forsyth]
UC Berkeley Computer Vision Group
ICCV 2003
Gathering action data
• Tracking – Simple correlation-based tracker– User-initialized
UC Berkeley Computer Vision Group
ICCV 2003
Figure-centric Representation
• Stabilized spatio-temporal volume– No translation information– All motion caused by person’s
limbs• Good news: indifferent to camera
motion
• Bad news: hard!
• Good test to see if actions, not just translation, are being captured
UC Berkeley Computer Vision Group
ICCV 2003
input sequence
Remembrance of Things Past• “Explain” novel motion sequence by
matching to previously seen video clips– For each frame, match based on some temporal
extent
Challenge: how to compare motions?
motion analysisrun
walk leftswing
walk rightjog
database
UC Berkeley Computer Vision Group
ICCV 2003
How to describe motion?
• Appearance – Not preserved across different clothing
• Gradients (spatial, temporal)– same (e.g. contrast reversal)
• Edges/Silhouettes – Too unreliable
• Optical flow– Explicitly encodes motion – Least affected by appearance – …but too noisy
UC Berkeley Computer Vision Group
ICCV 2003
Spatial Motion Descriptor
Image frame Optical flow yxF ,
yx FF , yyxx FFFF ,,, blurred
yyxx FFFF ,,,
UC Berkeley Computer Vision Group
ICCV 2003
Spatio-temporal Motion Descriptor
t
…
…
…
…
Sequence A
Sequence B
Temporal extent E
Bframe-to-frame
similarity matrix
A
motion-to-motionsimilarity matrix
A
B
I matrix
E
E
blurry I
E
E
UC Berkeley Computer Vision Group
ICCV 2003
Football Actions: matching
InputSequence
Matched Frames
input matched
UC Berkeley Computer Vision Group
ICCV 2003
Football Actions: classification
10 actions; 4500 total frames; 13-frame motion descriptor
UC Berkeley Computer Vision Group
ICCV 2003
Classifying Ballet Actions16 Actions; 24800 total frames; 51-frame motion descriptor. Men used to classify women and vice versa.
UC Berkeley Computer Vision Group
ICCV 2003
Classifying Tennis Actions
6 actions; 4600 frames; 7-frame motion descriptorWoman player used as training, man as testing.
UC Berkeley Computer Vision Group
ICCV 2003
Classifying Tennis
• Red bars show classification results
UC Berkeley Computer Vision Group
ICCV 2003
Querying the Databaseinput sequence
database
run
walk leftswing
walk rightjog
run walk left swing walk right jogAction Recognition:
Joint Positions:
UC Berkeley Computer Vision Group
ICCV 2003
2D Skeleton Transfer
• We annotate database with 2D joint positions
• After matching, transfer data to novel sequence– Ajust the match for best fit
Input sequence:
Transferred 2D skeletons:
UC Berkeley Computer Vision Group
ICCV 2003
3D Skeleton Transfer
• We populate database with rendered stick figures from 3D Motion Capture data
• Matching as before, we get 3D joint positions (kind of)!
Input sequence:
Transferred 3D skeletons:
UC Berkeley Computer Vision Group
ICCV 2003
“Do as I Do” Motion Synthesis
• Matching two things:– Motion similarity across sequences– Appearance similarity within sequence (like VideoTextures)
• Dynamic Programming
input sequence
synthetic sequence
UC Berkeley Computer Vision Group
ICCV 2003
“Do as I Do” Source Motion Source Appearance
Result
3400 Frames
UC Berkeley Computer Vision Group
ICCV 2003
“Do as I Say” Synthesis
• Synthesize given action labels– e.g. video game control
run walk left swing walk right jog
synthetic sequence
run
walk leftswing
walk rightjog
UC Berkeley Computer Vision Group
ICCV 2003
“Do as I Say”
• Red box shows when constraint is applied
UC Berkeley Computer Vision Group
ICCV 2003
Actor Replacement
SHOW VIDEO(GregWorldCup.avi, DivX)
UC Berkeley Computer Vision Group
ICCV 2003
Conclusions
• In medium field action is about motion
• What we propose:– A way of matching motions at coarse scale
• What we get out:– Action recognition– Skeleton transfer – Synthesis: “Do as I Do” & “Do as I say”
• What we learned?– A lot to be said for the “little guy”!