28
CVPR 2011 Ioannis Patras 1 Localisation and Recognition of Human Actions Ioannis Patras School of Electronic Engineering and Computer Science Queen Mary University of London in collaboration with A. Oikonomopoulos and M. Pantic, Imperial College London I. Kotsia and Guo Weiwei, Queen Mary University of London

Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras1

Localisation and Recognition of Human Actions

Ioannis Patras

School of

Electronic Engineering and Computer Science

Queen Mary University of London

in collaboration withA. Oikonomopoulos and M. Pantic, Imperial College London

I. Kotsia and Guo Weiwei, Queen Mary University of London

Page 2: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras2

Related research in QMUL

• Scene analysis (Izquierdo, Diplaros)

Object Detection/ Semantic segmentation

• Motion Analysis (Lagendijk, Hendriks, Hancock)

Motion estimation / segmentation

Object Tracking

• Facial (Expression) Analysis (Pantic, Koelstra, Rudovic)

Head tracking/Facial Feature Tracking

Facial expression recognition

• Action / Gesture Recognition (Kotsia, Guo, Kumar, Pantic)

Spatio-temporal representations for action recognition

Pose estimation

• Brain Computer Interfaces

Dynamic Vision

Looking at / sensing people

Static Analysis

URL: www.eecs.qmul.ac.uk/~ioannisp/

Page 3: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras3 3

Looking at/sensing people

• Facial (Expression) Analysis

Head tracking/Facial Feature Tracking

Facial expression recognition

• Action / Gesture Recognition

Action recognition and localisation

Pose estimation

Tensor-based space-time analysis

• Brain Computer Interfaces

Page 4: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras44

Localisation of Human ActionsOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing, Mar. 2011.

Goal:

Recognize categories of actions

Localize them in terms of their

bounding box (space + time)

Challenges:

Occlusions, clutter, variations, …

Hypothesis: Analysis can be restricted on a set of

spatiotemporally „interesting‟/salient events

Page 5: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras55

Information theoretical spatial saliencyT. Kadir and M. Brady. IJVC, Nov. 2001

Proposal: Use signal unpredictability as an indicator of saliency

HD=3.866

HD=7.201

Spatial Saliency: Unpredictability in a single frame

Page 6: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras66

Scale (circle radius)

En

tro

py

0 20 40 60 80-0.2

0

0.2

0.4

0.6

0.8

1

29 59

Towards scale invariance

The entropy maxima reveal the spatial scale(s) of a salient region

Detected salient points

in a single frame

Page 7: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras7

En

tro

py (

HD)

7

Spatial and spatiotemporal saliencyOikonomopoulos, Patras, Pantic, IEEE Transaction s SMC, part B, 2006

Spatiotemporal Saliency:

Driven by signal unpredictability in a spatiotemporal volume

(cylinder / sphere)

Examine entropy:

kkk vHvwvY

Entropy‟s „height‟Entropy‟s „peakness‟

dqudspd

ddqudsps

sudswq

D

q

D

,,,,,,

Page 8: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras8 8

Descriptor extraction – codebook creation

Optical Flow

after median subtraction

Spatiotemporal

Salient Point Detection

c1

c2

cN

Codebook

(class-specific)

Optical Flow Input sequence

t

Feature ensembles

O.Boiman & M.Irani [ICCV‟05]

Feature selection

Ensemble codewords

Optical Flow + Spatial Gradient

Descriptors.

Bin in histograms and concatenate.

Page 9: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras99

Class-dependent Spatio-temporal probabilistic voting

Current frame

T

t

-t T-t

• Parameters stored for each ensemble in the training set

average spatial position of ensemble with

respect to subject center and lower bound.

distance in frames of the activated ensemble from

the start/end of the action

average spatiotemporal scale of ensemble.

• Localisation model learned for codeword/cluster :

d

e

idii epcepwcpd

|||

X

T

S

de

ic

ic

de iX cpx

|

Page 10: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras10

Discriminative learning

• Higher weights for pdfs with low

localisation entropy

• Class dictionary comprises of

discriminative codewords•Adaboost on the codeword similarities

iii cpcpdw |log|exp( icp |

Page 11: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras11

Discriminative learning

Higher weights for pdfs with low temporal localisation entropy

Page 12: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras12 12

Spatio-temporal probabilistic voting

Extension in the space time domain of ‘Implicit Shape Model’, Leibe et al., ECCV’04

Page 13: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras13 13

Hypothesis verification with Relevance Vector Machine classification

• Mean-shift responses

used as features in RVM-based classification

• Two class classification problem (one-vs-all)

• Select class l that maximizes the posterior probability

2

2

( , ')

2( , ')

CD F F

K F F e

N

ji

l

jl

l

jl FFKwwwFc ,);( 0

,......,,1 iffF

1;1|

wFcleFlp

Page 14: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras14 14

Localisation of single actions

Page 15: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras15

Localisation accuracy (KTH)

Page 16: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras16

Localisation accuracy (KTH)

[SS-PE] Shechtman, E., Irani, M.: Matching local self-similarities across images and videos. CVPR 2007

Page 17: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras17 17

Action recognition

• KTH dataset – average : 88% • HoHA dataset – average : 37%

Page 18: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras18

Localisation under artificial occlusions (KTH)

Page 19: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras19

Localisation under clutter (KTH)

Page 20: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras20

Conclusions

• Voting schemes based on local descriptors are robust to

occlusions

• Good localisation and recognition accuracy

• Relies on annotation in terms of action localisation.

• More suitable for gestures rather than less „structured‟ actions

Page 21: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras21

Support Tensor Learning

I. Kotsia and I. Patras, “Support Tucker Machines” CVPR 2011, Thursday afternoon

I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action

recognition," in CIVR 2010.

Page 22: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras22

1

1min s.t. 1

2

0, 1,...

NT T

j j j

j=

j

w w +C ξ w φ(g )+b ξ

ξ j = ,N

Vector-based methods ignore

the space (time) structure

of the visual data

Motivation

Large dimensionality in the case of linear SVMs

Page 23: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras23

1

min ( ) where ( ) a regularisation term e.g. ( ) ,

s.t. , 1 , 0, 1,...

N

j

j=

j j

f W +C ξ f W f W W W

X W +b ξ ξ j = ,N

Variants of Linear SVMs, where constraints are imposed

on the separating tensorplane

Tensor Machines

Smaller dimensionality, structural constraints

Support Tensor Machines[16] D. Tao, et al, KIS,13(1):1–42, 2007

I. Kotsia, I. Patras, CVPR 2011

Support Tucker Machines I. Kotsia, I. Patras, CVPR 2011

S/Sw Support Tucker MachinesI. Kotsia, I. Patras, CVPR 2011

=

Page 24: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras24

Non-convex optimization problem w.r.t. A, B, C and core tensor G.

But: Convex w.r.t. A or B or C or G alone

Block coordinate optimization:

- e.g. optimization w.r.t G keeping A, B, C fixed

Each step can be reduced to a vector-based SVM-like constrained

optimization problem, e.g.1

(1)(1) (1)

, , 01

(1) (1) :

1min ( ( )) ( ( )) ,

2

1 s.t. [( ( )) ( ] 1 , 0

2

MIT

iG b

i

T

i i i i

A vec G A vec G C

y A vec G vec X b

Supervised learning

Page 25: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras25

Probe Set Sota (five

methods)

SVMs STMs [16]

(w vector)

STMs

(W tensor)

RMSTMs

(W tensor)

StuMs

(W tensor)

Σw-StuMs

(W tensor)

A 100/100 80/97 92/100 99/100 100/100 99/100 100/100

B 89/90 79/93 81/90 85/93 89/97 85/93 87/95

C 83/88 68/85 73/88 79/93 83/95 79/90 81/91

D 39/55 30/54 47/67 53/72 56/75 53/71 55/74

E 33/55 23/46 48/79 62/88 65/91 63/86 65/90

F 30/46 24/49 29/49 41/71 44/74 42/63 44/66

G 29/48 12/37 31/71 50/88 53/90 52/87 54/90

Average - 45/62 57/68 67/86 70/89 68/84 69/87

Gait Recognition (USF dataset)

• Significant improvements in comparison to state of the art

Page 26: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras26

KTH recognition

[7] T.K.Kim and R. Cipolla, „Canonical Correlation analysis of video volume tensors for action

categorization and detection,‟IEEE PAMI, vol. 31, no. 8, pp. 1415-1428, August 2009)

Input features: Dense oriented gradients (at each pixel)

Results comparable to state of the art, using very simple features

Page 27: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras27

Conclusions

•Tensors exploit topology of data better than vectors

•The proposed algorithms (STuMs and Σ/Σw-STuMs) consistently outperform previous approaches, producing state of the art results

Limitations:

• Requires good alignment of the input data

• More suitable for gestures rather than less „structured‟ actions

Page 28: Localisation and Recognition of Human Actionsclopinet.com/isabelle/Projects/CVPR2011/slides/YiannisPatras.pdfOikonomopoulos, Patras, Pantic, IEEE Transactions of Image Processing,

CVPR 2011 Ioannis Patras28

References

• A. Oikonomopoulos, I. Patras and M. Pantic, "Spatiotemporal Localization and Categorization of Human Actions in Unsegmented Image Sequences" . IEEE Trans. Image Processing, vol. 20, no. 4, pp. 1126-1140, Mar. 2011

• I. Kotsia and I. Patras, "Support Tucker Machines", Int'l Conf. Computer Vision and Pattern Recognition, Jun. 2011, Colorado, USA

• I. Kotsia and I. Patras, "Relative Margin Support Tensor Machines for gait and action recognition," in Int'l Conf. Image and Video Retrieval 2010, 5-7 July, Xi'an, China, 2010.

•S. Koelstra, M. Pantic and I. Patras, "A Dynamic Texture based Approach to Recognition of Facial Actions and their Temporal Models". IEEE Trans. Pattern Analysis and Machine Intelligence, Nov. 2010

• O. Rudovic, I. Patras and M. Pantic, "Coupled Gaussian Process Regression for pose-invariant facial expression recognition", European Conf. Computer Vision (ECCV‟10), pp. 350-363, Heraklion, Crete, Greece, Sept. 2010