54
Learning Realistic Human Actions from Movies Ivan Laptev*, Marcin Marszałek**, Cordelia Schmid**, Benjamin Rozenfeld*** INRIA Rennes, France ** INRIA Grenoble, France *** Bar-Ilan University, Israel Presented by: Nils Murrugarra University of Pittsburgh

Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Embed Size (px)

Citation preview

Page 1: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Learning Realistic Human

Actions from MoviesIvan Laptev*, Marcin Marszałek**, Cordelia

Schmid**, Benjamin Rozenfeld***

•INRIA Rennes, France** INRIA Grenoble, France

*** Bar-Ilan University, Israel

Presented by: Nils Murrugarra

University of Pittsburgh

Page 2: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Motivation

2

Action recognition useful for:

• Content-based browsing

e.g. fast-forward to the next goal scoring scene

• Human scientists

influence of smoking in movies on adolescent smoking

Internet has tons of video and still growing

Human actions are very common in movies,

TV news, personal video …

150,000 uploads every day

Page 3: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Motivation

3

• Actions in current datasets:

• Actions “In the Wild”:KTH action dataset

[3] Slides version of " Learning realistic human actions from movies.“ Source:

http://www.di.ens.fr/~laptev/actions/

Page 4: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Context

4

Web video search

• Useful for some action classes: kissing, hand shaking

• Noise results and not useful for most action

Web image search

– Useful for learning action context: static scenes and objects

– See also [Li-Jia & Fei-Fei ICCV07]

Goodle Video, YouTube, MyspaceTV, …

How to find real actions?

Page 5: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Context

5

Movies contains many classes and many examples of realistic actions

Problems:

• Only few class-samples per movie

• Manual annotation is very time consuming

How to annotate automatically?

Page 6: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Annotation [1]

6

01:20:17

01:20:23

1172

01:20:17,240 --> 01:20:20,437

Why weren't you honest with me?

Why'd you keep your marriage a secret?

1173

01:20:20,640 --> 01:20:23,598

lt wasn't my secret, Richard.

Victor wanted it that way.

1174

01:20:23,800 --> 01:20:26,189

Not even our closest friends

knew about our marriage.

subtitles

RICK

Why weren't you honest with me? Why

did you keep your marriage a secret?

Rick sits down with Ilsa.

ILSA

Oh, it wasn't my secret, Richard.

Victor wanted it that way. Not even

our closest friends knew about our

marriage.

movie script

• Scripts available with no time synchronization

• Subtitles + time information

How to use the previous information?

• Identify an action and transfer time to scripts by text alignment

[1]. Everingham, M., Sivic, J., & Zisserman, A. (2006). Hello! My name is... Buffy--automatic naming of characters in TV

video.

Page 7: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Annotation

7

On the good side:

• Realistic variation of actions: subjects, views, etc…

• Many Classes and many examples per action

• No additional work for new classes

• Character names may be used to resolve “who is doing certain

action?”

Problems:

• No spatial localization (no bounding box)

• Temporal localization may be poor

• Missing actions: e.g. scripts do not always follow the movie (not

aligned)

• Annotation is incomplete, it can’t be a ground truth for test stage

• Large within-class variability per action in text

Page 8: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Annotation - Evaluation

8

1. Annotate action samples in text

2. Perform automatic script-video alignment

3. Check the correspondence based on manual annotation

Example of a “visual false positive”

A black car pulls up, two army

officers get out.a: quality of subtitle-script matching

a = (# matched words)/(# all words)

How to improve?

Page 9: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Annotation – Text Approach

9

“… Will gets out of the Chevrolet. …” “…

Erin exits her new truck…”

Problem: Text can express the same action in different ways:

Action:

GetOutCar

Potential false

positives:“…About to sit down, he freezes…”

Solution: Supervised text classification approach

• Given an scene description, predict if a target action is

present or not

• Based on bag-of-words representation

Page 10: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Annotation – Text Approach

10

Features:

• Words

• Adjacent pair of words

• Non-adjacent pair of words within a small window

Page 11: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Annotation – Data

11

12 m

ovie

s20 d

iffe

ren

t

mo

vie

s

a>0.5

video length <= 1000 frames

60%

Goal

• Compare performance of manual annotated data with automatic version

Page 12: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Action Classifier [Overview]

12

Bag of space-time features + multi-channel SVM

Histogram of visual words

Multi-channel

SVM

Classifier

Collection of space-time patches

HOG & HOF

patch

descriptors

[4], [5], [6]

[3] Slides version of " Learning realistic human actions from movies.“ Source:

http://www.di.ens.fr/~laptev/actions/

Page 13: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method – Action Classifier - Features

13

Space-time corner detector

[7]

Dense scale sampling (no explicit scale selection)

Multi-scale detection

Page 14: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Descriptor

14

Histogram of oriented spatial

grad. (HOG)

Histogram of optical

flow (HOF)

3x3x2x4bins HOGdescriptor

3x3x2x5bins HOF descriptor

Public code available at www.irisa.fr/vista/actions

Multi-scale space-time patches from corner detector

Page 15: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Descriptor

15

Visual Vocabulary ConstructionUsed a subset of 100’000 features sampled from training

videos

Identified 4000 clusters with k-means

Centroids = Visual Vocabulary Words

Bag-of-features

Page 16: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Descriptor

16

Vector BOF GenerationCompute all features

Assign each feature to the closest vocabulary word

Compute vector of visual word occurrences.

17 8 . . . 2 39

vw1 vw2 vw3 . . . vwn

Page 17: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Descriptor

17

Global spatio-temporal grids

In the spatial domain:1x1 (standard BoF)

2x2, o2x2 (50% overlap)

h3x1 (horizontal), v1x3 (vertical)

3x3

In the temporal domain:t1 (standard BoF), t2, t3 and centre-focused ot2

Spatio-temporal grids Examples

Page 18: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Descriptor

18

Global spatio-temporal grids

Entire Action Sequence

17 8 . . . 2 39

vw1 vw2 vw3 . . . vwn

Action Sequence

Action Sequence Splitted on 2

over time

17 8 . . . 2 39

vw1 vw2 vw3 . . . vwn

1st half

10 8 . . . 35 1

vw1 vw2 vw3 . . . vwn

2nd half

Normalized

Page 19: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Descriptor

19

Global spatio-temporal grids

Page 20: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Learning

20

Non-Linear SVM:

• Map original space to a higher space, where the data is separable

Page 21: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Method - Action Classifier - Learning

21

Channel c is a combination of a descriptor (HOG or HOF) and a spatio-

temporal grid

Dc(H

i, H

j) is the chi-square distance between histograms

Ac

is the mean value of the distances between all training samples for

the channel c

The best set of channels C for a given training set is found based on a

greedy approach

Multi-channel chi-square kernel

Page 22: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

22

Findings

• Different grids and channels combination are beneficial to

increment performance

• HOG performs better for realistic actions (context, image

content)

Page 23: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

23

Number of occurrences for each channel component within the optimized channel combinations for the KTH action dataset and our manually labelled

movie dataset

Page 24: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

24

Sample frames from the KTH actions sequences, all classes (columns) and scenarios (rows) are presented

Page 25: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

25

Average class accuracy on the KTH actions dataset

Confusion matrix for the KTH actions

Page 26: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

26

p<=0.2 ; performance decreases insignicantlyp=0.4 ; performance decreases by around 10%

Automatic Annotation avoid cost of human annotation

Noise Robustness

Why?

Page 27: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

27

Correct PredictionClass not present,

prediction says YES

Class present,

prediction says NO

Evaluation in Real-World Videos

Page 28: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

28

Action Classification example results based on automatic annotated data

Evaluation in Real-World Videos

Page 29: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Evaluation - Action Classifier

29

Evaluation based on Average precision (AP) over actions.

Clean = Annotated

Chance = Random Classifier

Page 30: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Demo - Action Classifier

30

Test episodes from movies “The Graduate”, “It’s a wonderful life”,

“Indiana Jones and the Last Crusade”

Page 31: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Conclusion

31

SummaryAutomatic generation of realistic action samples

New action dataset available www.irisa.fr/vista/actions

Bag-of-features expanded to video domain

Best performance on KTH benchmark

Promising results for actions in the “wild”

DisadvantagesStill improvement in automatic annotation is required. Only a 60%

was achieved.

Parameters for the grid of cuboids are not well-justified, how were

determined. Similarly, the # of visual words for k-means algorithm.

K-means is susceptible to outliers.

A greedy approach for determine the best set of channels can

achieve sub-optimal results.

Future directionsAutomatic action class discovery

Internet-scale video search

Page 32: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Questions

32

Page 33: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

References

33

[1]. Everingham, M., Sivic, J., & Zisserman, A. (2006). Hello! My name is...

Buffy--automatic naming of characters in TV video.

[2]. Laptev, I., Marszalek, M., Schmid, C., & Rozenfeld, B. (2008, June).

Learning realistic human actions from movies. In Computer Vision and Pattern

Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8). IEEE.

[3]. Slides version of " Learning realistic human actions from movies.“ Source:

http://www.di.ens.fr/~laptev/actions/

[4]. Schuldt, C., Laptev, I., & Caputo, B. (2004, August). Recognizing human

actions: a local SVM approach. In Pattern Recognition, 2004. ICPR 2004.

Proceedings of the 17th International Conference on (Vol. 3, pp. 32-36). IEEE.

[5]. Niebles, J. C., Wang, H., & Fei-Fei, L. (2008). Unsupervised learning of

human action categories using spatial-temporal words. International journal of

computer vision, 79(3), 299-318.

[6]. Zhang, J., Marszałek, M., Lazebnik, S., & Schmid, C. (2007). Local features

and kernels for classification of texture and object categories: A comprehensive

study. International journal of computer vision, 73(2), 213-238.

[7]. Laptev, I. (2005). On space-time interest points. International Journal of

Computer Vision, 64(2-3), 107-123.

Page 34: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Action Recognition Using a

Distributed Representation of

Pose and Appearance

Subhransu Maji1, Lubomir Bourdev 1,2, and Jitendra Malik1

1University of California at Berkeley2 Adobe Systems, Inc.

Presented by: Nils Murrugarra

University of Pittsburgh

Page 35: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Goal

35

[3]-poster http://people.cs.umass.edu/~smaji/presentations/action-cvpr11-poster.pdf

Page 36: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Motivation

Motivation:

• Humans can easily recognize pose and actions from Limited Views of a single image

36

• Action and pose is identified by body parts (occlusions) at different

locations and scales.

Page 37: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Poselets

37

Poselet:

• Body part detectors of joint locations of people in images.

• They are used to find patches related to a given configuration of joints.

[3]-poster http://people.cs.umass.edu/~smaji/presentations/action-cvpr11-poster.pdf

Page 38: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Poselets - People

38

L.Bourdev, S.Maji, T.Brox and J. Malik, Detection People using Mutually Consistent

Poselet Activations,ECCV 2010

Page 39: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Robust Representation of Pose and

Appearance

Poselet Activation Vector

39

• Poselet annotation are reused from a previous article.

• Represent each example by the poselets that are active.

Estimate 3D Orientation of Head and Torso

Page 40: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Data Collection

40

Manual Verification

• Discard images with high disagreement

• Low resolution and high occlusion

• Only used rotation in Y

Amazon Mechanical Turk

Human Error

• Small error in canonical views (front,back, left and right)

• Measured as average of standarddeviation

3D pose of head and torso Annotations

Page 41: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

3D Estimation - Goal

41

Goal

• Given a bounding box of a person, estimate its 3D orientation of head and torso.

Page 42: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

3D Estimation - Descriptor

42

Procedure

• Discretize 3D orientation [-180, 180] in 8 bins [Classification] .

• Angled estimation based on interpolation

• Highest predicted bin

• Two adjacent neighbors

0.7

Each entry correspond to a poselet type

0.8 . . . 0.2 0.9

pt1 pt2 pt3 . . . ptn

Page 43: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

3D Estimation – Example Results

43

Page 44: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

3D Estimation – Evaluation

44

Head Orientation: 62.1 % Torso Orientation: 61.71 %

Page 45: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Goal

45

Goal

• Given a bounding box, estimate an action category

Page 46: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Method

46

Joint Locations Annotation

Pose alone can’t learn to identify actions

Page 47: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Method

47

Appearance information would help

Solution

• Learn appearance considering poselets per action category

• Based on HOG and SVM

Page 48: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Method

48

Windows

7

2

• Find Poselet k-Nearest Neighbors

• Select the more discriminative

• Learn appearance model based on

HOG and SVM

Approach

Page 49: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Method

49

Object Interaction can help?

• It was considered an interaction with horse, motorbike, bicycle and TV.

• A people-object model spatial location was learnt [object activation vector]

Page 50: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Method

50

Context can still help us?

Add action classifier for other people in image

Page 51: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

2D Action Classifier - Evaluation

51

Page 52: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Conclusion

52

SummaryA method for Action Recognition in static image was presented.

It is based mainly in:

Poselet features

An Appearance model

Object Interaction

Context information

DisadvantagesThe use of bounding-boxes is not realistic. A better scenario is that

given an image, an algorithm should detect all people actions

automatically.

Related to the Poselet Activation Vector, a intersection threshold of

0.15 is defined. How this threshold was determined? . A similar

situation happens with the Spatial Model of Object Interaction.

Page 53: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

Questions

53

Page 54: Learning Realistic Human Actions from Moviespeople.cs.pitt.edu/~kovashka/cs3710_sp15/actions_nils.pdf · Learning Realistic Human Actions from Movies Ivan Laptev*, ... 2 Action recognition

References

54

[1]. Maji, Subhransu, Lubomir Bourdev, and Jitendra Malik. "Action recognition

from a distributed representation of pose and appearance." In Computer Vision

and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 3177-3184.

IEEE, 2011.

[2]. Bourdev, Lubomir, Subhransu Maji, Thomas Brox, and Jitendra Malik.

"Detecting people using mutually consistent poselet activations." In Computer

Vision–ECCV 2010, pp. 168-181. Springer Berlin Heidelberg, 2010.

[3]. Poster Version of "Action recognition from a distributed representation of

pose and appearance.“ Source: poster:

http://people.cs.umass.edu/~smaji/presentations/action-cvpr11-poster.pdf