Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
CVPR 2011 Ioannis Patras1
Localisation and Recognition of Human Actions
Ioannis Patras
School of
Electronic Engineering and Computer Science
Queen Mary University of London
in collaboration withA. Oikonomopoulos and M. Pantic, Imperial College London
I. Kotsia and Guo Weiwei, Queen Mary University of London
CVPR 2011 Ioannis Patras2
Related research in QMUL
• Scene analysis (Izquierdo, Diplaros)
Object Detection/ Semantic segmentation
• Motion Analysis (Lagendijk, Hendriks, Hancock)
Motion estimation / segmentation
Object Tracking
• Facial (Expression) Analysis (Pantic, Koelstra, Rudovic)
Head tracking/Facial Feature Tracking
Facial expression recognition
• Action / Gesture Recognition (Kotsia, Guo, Kumar, Pantic)
Spatio-temporal representations for action recognition
Pose estimation
• Brain Computer Interfaces
Dynamic Vision
Looking at / sensing people
Static Analysis
URL: www.eecs.qmul.ac.uk/~ioannisp/
CVPR 2011 Ioannis Patras3 3
Looking at/sensing people
• Facial (Expression) Analysis
Head tracking/Facial Feature Tracking
Facial expression recognition
• Action / Gesture Recognition
Action recognition and localisation
Pose estimation
Tensor-based space-time analysis
• Brain Computer Interfaces
CVPR 2011 Ioannis Patras44
Localisation of Human ActionsOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing, Mar. 2011.
Goal:
Recognize categories of actions
Localize them in terms of their
bounding box (space + time)
Challenges:
Occlusions, clutter, variations, …
Hypothesis: Analysis can be restricted on a set of
spatiotemporally „interesting‟/salient events
CVPR 2011 Ioannis Patras55
Information theoretical spatial saliencyT. Kadir and M. Brady. IJVC, Nov. 2001
Proposal: Use signal unpredictability as an indicator of saliency
HD=3.866
HD=7.201
Spatial Saliency: Unpredictability in a single frame
CVPR 2011 Ioannis Patras66
Scale (circle radius)
En
tro
py
0 20 40 60 80-0.2
0
0.2
0.4
0.6
0.8
1
29 59
Towards scale invariance
The entropy maxima reveal the spatial scale(s) of a salient region
Detected salient points
in a single frame
CVPR 2011 Ioannis Patras7
En
tro
py (
HD)
7
Spatial and spatiotemporal saliencyOikonomopoulos, Patras, Pantic, IEEE Transaction s SMC, part B, 2006
Spatiotemporal Saliency:
Driven by signal unpredictability in a spatiotemporal volume
(cylinder / sphere)
Examine entropy:
kkk vHvwvY
Entropy‟s „height‟Entropy‟s „peakness‟
dqudspd
ddqudsps
sudswq
D
q
D
,,,,,,
CVPR 2011 Ioannis Patras8 8
Descriptor extraction – codebook creation
Optical Flow
after median subtraction
Spatiotemporal
Salient Point Detection
c1
c2
…
cN
Codebook
(class-specific)
Optical Flow Input sequence
t
Feature ensembles
O.Boiman & M.Irani [ICCV‟05]
Feature selection
Ensemble codewords
Optical Flow + Spatial Gradient
Descriptors.
Bin in histograms and concatenate.
CVPR 2011 Ioannis Patras99
Class-dependent Spatio-temporal probabilistic voting
Current frame
T
t
-t T-t
• Parameters stored for each ensemble in the training set
average spatial position of ensemble with
respect to subject center and lower bound.
distance in frames of the activated ensemble from
the start/end of the action
average spatiotemporal scale of ensemble.
• Localisation model learned for codeword/cluster :
d
e
idii epcepwcpd
|||
X
T
S
de
ic
ic
de iX cpx
|
CVPR 2011 Ioannis Patras10
Discriminative learning
• Higher weights for pdfs with low
localisation entropy
• Class dictionary comprises of
discriminative codewords•Adaboost on the codeword similarities
iii cpcpdw |log|exp( icp |
CVPR 2011 Ioannis Patras11
Discriminative learning
Higher weights for pdfs with low temporal localisation entropy
CVPR 2011 Ioannis Patras12 12
Spatio-temporal probabilistic voting
Extension in the space time domain of ‘Implicit Shape Model’, Leibe et al., ECCV’04
CVPR 2011 Ioannis Patras13 13
Hypothesis verification with Relevance Vector Machine classification
• Mean-shift responses
used as features in RVM-based classification
• Two class classification problem (one-vs-all)
• Select class l that maximizes the posterior probability
2
2
( , ')
2( , ')
CD F F
K F F e
N
ji
l
jl
l
jl FFKwwwFc ,);( 0
,......,,1 iffF
1;1|
wFcleFlp
CVPR 2011 Ioannis Patras14 14
Localisation of single actions
CVPR 2011 Ioannis Patras15
Localisation accuracy (KTH)
CVPR 2011 Ioannis Patras16
Localisation accuracy (KTH)
[SS-PE] Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. CVPR 2007
CVPR 2011 Ioannis Patras17 17
Action recognition
• KTH dataset – average : 88% • HoHA dataset – average : 37%
CVPR 2011 Ioannis Patras18
Localisation under artificial occlusions (KTH)
CVPR 2011 Ioannis Patras19
Localisation under clutter (KTH)
CVPR 2011 Ioannis Patras20
Conclusions
• Voting schemes based on local descriptors are robust to
occlusions
• Good localisation and recognition accuracy
• Relies on annotation in terms of action localisation.
• More suitable for gestures rather than less „structured‟ actions
CVPR 2011 Ioannis Patras21
Support Tensor Learning
I. Kotsia and I. Patras, “Support Tucker Machines” CVPR 2011, Thursday afternoon
I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action
recognition," in CIVR 2010.
CVPR 2011 Ioannis Patras22
1
1min s.t. 1
2
0, 1,...
NT T
j j j
j=
j
w w +C ξ w φ(g )+b ξ
ξ j = ,N
Vector-based methods ignore
the space (time) structure
of the visual data
Motivation
Large dimensionality in the case of linear SVMs
CVPR 2011 Ioannis Patras23
1
min ( ) where ( ) a regularisation term e.g. ( ) ,
s.t. , 1 , 0, 1,...
N
j
j=
j j
f W +C ξ f W f W W W
X W +b ξ ξ j = ,N
Variants of Linear SVMs, where constraints are imposed
on the separating tensorplane
Tensor Machines
Smaller dimensionality, structural constraints
Support Tensor Machines[16] D. Tao, et al, KIS,13(1):1–42, 2007
I. Kotsia, I. Patras, CVPR 2011
Support Tucker Machines I. Kotsia, I. Patras, CVPR 2011
S/Sw Support Tucker MachinesI. Kotsia, I. Patras, CVPR 2011
=
CVPR 2011 Ioannis Patras24
Non-convex optimization problem w.r.t. A, B, C and core tensor G.
But: Convex w.r.t. A or B or C or G alone
Block coordinate optimization:
- e.g. optimization w.r.t G keeping A, B, C fixed
Each step can be reduced to a vector-based SVM-like constrained
optimization problem, e.g.1
(1)(1) (1)
, , 01
(1) (1) :
1min ( ( )) ( ( )) ,
2
1 s.t. [( ( )) ( ] 1 , 0
2
MIT
iG b
i
T
i i i i
A vec G A vec G C
y A vec G vec X b
Supervised learning
CVPR 2011 Ioannis Patras25
Probe Set Sota (five
methods)
SVMs STMs [16]
(w vector)
STMs
(W tensor)
RMSTMs
(W tensor)
StuMs
(W tensor)
Σw-StuMs
(W tensor)
A 100/100 80/97 92/100 99/100 100/100 99/100 100/100
B 89/90 79/93 81/90 85/93 89/97 85/93 87/95
C 83/88 68/85 73/88 79/93 83/95 79/90 81/91
D 39/55 30/54 47/67 53/72 56/75 53/71 55/74
E 33/55 23/46 48/79 62/88 65/91 63/86 65/90
F 30/46 24/49 29/49 41/71 44/74 42/63 44/66
G 29/48 12/37 31/71 50/88 53/90 52/87 54/90
Average - 45/62 57/68 67/86 70/89 68/84 69/87
Gait Recognition (USF dataset)
• Significant improvements in comparison to state of the art
CVPR 2011 Ioannis Patras26
KTH recognition
[7] T.K.Kim and R. Cipolla, „Canonical Correlation analysis of video volume tensors for action
categorization and detection,‟IEEE PAMI, vol. 31, no. 8, pp. 1415-1428, August 2009)
Input features: Dense oriented gradients (at each pixel)
Results comparable to state of the art, using very simple features
CVPR 2011 Ioannis Patras27
Conclusions
•Tensors exploit topology of data better than vectors
•The proposed algorithms (STuMs and Σ/Σw-STuMs) consistently outperform previous approaches, producing state of the art results
Limitations:
• Requires good alignment of the input data
• More suitable for gestures rather than less „structured‟ actions
CVPR 2011 Ioannis Patras28
References
• A. Oikonomopoulos, I. Patras and M. Pantic, "Spatiotemporal Localization and Categorization of Human Actions in Unsegmented Image Sequences" . IEEE Trans. Image Processing, vol. 20, no. 4, pp. 1126-1140, Mar. 2011
• I. Kotsia and I. Patras, "Support Tucker Machines", Int'l Conf. Computer Vision and Pattern Recognition, Jun. 2011, Colorado, USA
• I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action recognition," in Int'l Conf. Image and Video Retrieval 2010, 5-7 July, Xi'an, China, 2010.
•S. Koelstra, M. Pantic and I. Patras, "A Dynamic Texture based Approach to Recognition of Facial Actions and their Temporal Models". IEEE Trans. Pattern Analysis and Machine Intelligence, Nov. 2010
• O. Rudovic, I. Patras and M. Pantic, "Coupled Gaussian Process Regression for pose-invariant facial expression recognition", European Conf. Computer Vision (ECCV‟10), pp. 350-363, Heraklion, Crete, Greece, Sept. 2010