1
2D Human Pose Estimation: New Benchmark and State of the Art Analysis Mykhaylo Andriluka 1,3 , Leonid Pishchulin 1 , Peter Gehler 2 and Bernt Schiele 1 1 Max Planck Institute for Informatics, 2 Max Planck Institute for Intelligent Systems, 3 Stanford University, Germany Germany USA Overview We analyse the state of the art in articulated human pose estimation using a new large-scale benchmark dataset. • Our MPII Human Pose benchmark includes more than 40,000 images systematically collected using an established taxonomy of human activities [1]. The collected images cover 410 human activities in total. • Our analysis is based on the rich annotations that include body pose, body part occlusion, torso and head viewpoint, and person activity. • Dataset and web-based evaluation toolkit are available at human-pose.mpi-inf.mpg.de. Related Datasets Dataset #training #test img. type Full body pose datasets Parse [Ramanan, NIPS’06] 100 205 diverse LSP [Johnson&Everingham, BMVC’10] 1,000 1,000 sports (8 types) PASCAL Person Layout [Everingham et al., IJCV’10] 850 849 everyday Sport [Wang et al., CVPR’11] 649 650 sports UIUC people [Wang et al., CVPR’11] 346 247 sports (2 types) LSP extended [Johnson&Everingham, CVPR’11] 10,000 - sports (3 types) FashionPose [Dantone et al., CVPR’13] 6,530 775 fashion blogs J-HMDB [Jhuang et al., ICCV’13] 31,838 - diverse (21 act.) Upper body pose datasets Buffy Stickmen [Ferrari et al, CVPR’08] 472 276 TV show (Buffy) ETHZ PASCAL Stickmen [Eichner&Ferrari, BMVC’09] - 549 PASCAL VOC Human Obj. Int. (HOI) [Yao&Fei-Fei, CVPR’10] 180 120 sports (6 types) We Are Family [Eichner&Ferrari, ECCV’10] 350 imgs. 175 imgs. group photos Video Pose 2 [Sapp et al., CVPR’11] 766 519 TV show (Friends) FLIC [Sapp&Taskar, CVPR’13] 6,543 1,016 feature movies Sync. Activities [Eichner&Ferrari, PAMI’12] - 357 imgs. dance / aerobics Armlets [Gkioxari et al., CVPR’13] 9,593 2,996 PASCAL VOC/Flickr MPII Human Pose (this paper) 28,821 11,701 diverse (491 act.) (a) (b) (c) Visualization of the upper body pose variability. (a) color coding of the body parts, (b) annotations from the Armlets dataset [Gkioxari et al. CVPR’13], and (c) annotations from our MPII Human Pose dataset. sports occupation water activities home activities condition. exerc. fishing and hunt. religious activ. winter activ. walking dancing lawn and garden self care bicycling miscellaneous home repair running music playing transportation hockey, ice bakery skindiving, scuba child care home exercise trapping game sitting in church skiing, downhill descending stairs Anishinaabe Jingle planting trees grooming, washing unicycling chess game, sitting home repair running flute, sitting pushing plane rope skipping typing, electric swimming, synchr. polishing floors ski machine hunting, birds serving food skiing, cross-countr. backpacking general dancing raking roof with snow taking medication bicycling, racing board game playing put on and remov. jogging accordion, sitting riding in a car trampoline locksmith swimming, breaststr. kitchen activity stair-treadmill hunting kneeling in church skiing, climbing up walking, for exerc. aerobic, step watering lawn or gard. showering, toweling bicycling, BMX copying documents caulking, chinking running, training conducting orchestra motor scooter, motor rock climbing masonry, concrete tubing, floating playing with child. elliptical trainer fishing, ice sitting, playing skating, ice dancing pushing a wheelchair ballet, modern driving tractor hairstyling bicycling standing, talking repairing appliances running, stairs, up piano, sitting riding in a bus cricket, batting fire fighter, haul. windsurfing carrying groceries slimnastics, jazzer. hunting, bow and arr. standing, talking dog sledding climbing hills caribbean dance clearing brush eating, sitting bicycling, mountain sitting, arts and cr. carpentry, general running, marathon cello, sitting driving automobile Randomly chosen activities and images from 18 top level categories of our MPII Human Pose dataset. One image per activity is shown. The full dataset is available at human-pose.mpi-inf.mpg.de. Analysis of the state of the art • Performance of the upper-body and full-body state-of-the-art approaches: Setting Torso Upper Lower Upper Fore- Head Upper Full leg leg arm arm body body Gkioxari et al. [2] 51.3 - - 28.0 12.4 - 26.4 - Sapp&Taskar [5] 51.3 - - 27.4 16.3 - 27.8 - Yang&Ramanan [6] 61.0 36.6 36.5 34.8 17.4 70.2 33.1 38.3 Pishchulin et al. [3] 63.8 39.6 37.3 39.0 26.8 70.7 39.1 42.3 Gkioxari et al. [2] + loc 65.1 - - 33.7 14.9 - 32.4 - Sapp&Taskar [5] + loc 65.1 - - 32.6 19.2 - 33.7 - Yang&Ramanan [6] + loc 67.2 39.7 39.4 37.4 18.6 75.7 35.8 41.4 Pishchulin et al. [3] + loc 66.6 40.5 38.2 40.4 27.7 74.5 40.6 43.9 Yang&Ramanan [6] retrained 69.3 39.5 38.8 43.4 27.7 74.6 42.3 44.7 Pishchulin et al. [3] retrained 68.4 42.7 42.8 42.0 29.2 76.3 42.1 46.1 PS and FMP models improve significantly after retraining. Full-body approaches perform better than upper-body ones. • We define complexity measures with respect to body pose, viewpoint of the torso, body part length, occlusion, and truncation, and analyze sensitivity of the state-of-the-art approaches to each of these factors: 0 1000 2000 3000 4000 5000 6000 7000 30 40 50 60 70 80 Number of images PCPm, % Occlusion Part length Pose VP torso Truncation 0 1000 2000 3000 4000 5000 6000 7000 30 40 50 60 70 80 Number of images PCPm, % Occlusion Part length Pose VP torso Truncation Pishchulin et al. [3] Yang&Ramanan [6] 0 1000 2000 3000 4000 5000 6000 7000 30 40 50 60 70 80 Number of images PCPm, % Occlusion Part length Pose VP torso Truncation 0 1000 2000 3000 4000 5000 6000 7000 30 40 50 60 70 80 Number of images PCPm, % Occlusion Part length Pose VP torso Truncation Sapp&Taskar [5] Gkioxari et al. [2] Complexity measures: Occlusion: number of occluded body parts m occ ( L )= N i =1 ρ i Part length: average deviation from the mean part length m f ( L )= N i =1 | d (l i ) - m i |/m i Pose: deviation from the mean pose rep- resented via PS prior distribution m pose ( L )= Q (i , j )E p ps (l i | l j ) VP torso: deviation of the torso from the frontal viewpoint m v ( L )= 3 i =1 α i Truncation: number of truncated body parts m t ( L )= N i =1 τ i Body pose has a significantly more profound effect on performance than occlusion and truncation. •Detailed performance analysis for activity, viewpoint and body poses types: 20 25 30 35 40 45 50 55 60 65 PCPm, % sports occupation conditioning exercise dancing water activities home repair home activities lawn and garden fishing and hunting music playing winter activities walking miscellaneous bicycling running inactivity quiet/light transportation self care religious activities volunteer activities Pishchulin et al. Yang&Ramanan Sapp&Taskar Gkioxari et al. 20 30 40 50 60 70 PCPm, % sports occupation conditioning exercise dancing water activities home repair home activities lawn and garden fishing and hunting music playing winter activities walking miscellaneous bicycling running inactivity quiet/light transportation self care religious activities volunteer activities Pishchulin et al. Pishchulin et al. retrained Yang&Ramanan Yang&Ramanan retrained Striking performance differences for all approaches across activities and viewpoints even after retraining. Results on sports activities are the best, even though they have been previously considered among the most difficult for pose estimation. Detailed analysis reveals differences between FMP and PS, even though overall performance is comparable. 0 10 20 30 40 50 60 70 80 PCPm, % 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 Pishchulin et al. Yang&Ramanan Sapp&Taskar Gkioxari et al. Performance (mPCP) for different pose types. Activity recognition We evaluate the performance of the state-of-the-art activity recognition methods on our benchmark in [4]. 1 50 100 150 200 250 300 350 400 0 50 100 150 200 250 activity # examples/activity train test 1 50 100 150 200 250 300 350 400 0 5 10 15 20 25 30 # activities mAP Dense trajectories (DT) GT single pose (GT) GT single pose + track (GT-T) PS single pose + track (PS-T) PS multi-pose (PS-M) PS-M + DT PS-M filter DT (a) (b) (a) number of examples per activity. (b) Comparison of the activity recognition performance of [Wang et al., ICCV’13] (DT) and variants of posed-based approach of [Jhuang et al.,ICCV’13]. The plot shows performance (mAP) as a function of the number of activities. Activities are sorted by the number of training examples. yoga, bicycling, skiing, cooking or skate- rope softball, forestry power mountain downhill food prep. boarding skipping general Dense trajectories (DT) 10.6 14.5 51.9 0.5 11.4 36.0 12.7 8.4 PS multi-pose (PS-M) 18.3 34.0 27.3 2.6 17.2 90.5 3.0 5.2 PS-M + DT 19.6 40.7 32.9 2.2 19.5 88.7 3.9 7.2 PS-M filter DT 16.1 20.4 52.2 0.8 13.5 55.7 4.2 10.6 carpentry, bicycling, golf rock ballet, aerobic resistance total general racing climbing modern step training Dense trajectories (DT) 5.5 5.5 33.0 41.5 12.7 24.5 16.5 19.0 PS multi-pose (PS-M) 3.4 8.6 47.9 4.7 22.9 10.4 7.2 20.2 PS-M + DT 5.0 12.1 51.9 14.4 23.7 17.1 14.4 23.5 PS-M filter DT 6.1 15.5 15.9 38.6 7.1 25.8 9.6 19.5 Activity recognition results (mAP) on the 15 largest classes. References [1] B. Ainsworth, W. Haskell, S. Herrmann, N. Meckes, D. Bassett, C. Tudor-Locke, J. Greer, J. Vezina, M. Whitt-Glover, and A. Leon. 2011 compendium of physical activities: a second update of codes and MET values. MSSE’11. [2] G. Gkioxari, P. Arbelaez, L. Bourdev, and J. Malik. Articulated pose estimation using discriminative armlet classifiers. In CVPR’13. [3] L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Strong appearance and expressive spatial models for human pose estimation. In ICCV’13. [4] L. Pishchulin, M. Andriluka, and B. Schiele. Fine-grained activity recognition with holistic and pose based features. arXiv:1406.1881, 2014. [5] B. Sapp and B. Taskar. Multimodal decomposable models for human pose estimation. In CVPR’13. [6] Y. Yang and D. Ramanan. Articulated human detection with flexible mixtures of parts. PAMI’13.

2D Human Pose Estimation: New Benchmark and State of the ...human-pose.mpi-inf.mpg.de/contents/andriluka14cvpr_poster.pdf · 2D Human Pose Estimation: New Benchmark and State of the

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 2D Human Pose Estimation: New Benchmark and State of the ...human-pose.mpi-inf.mpg.de/contents/andriluka14cvpr_poster.pdf · 2D Human Pose Estimation: New Benchmark and State of the

2D Human Pose Estimation: New Benchmark and State of the Art AnalysisMykhaylo Andriluka1,3, Leonid Pishchulin1, Peter Gehler2 and Bernt Schiele1

1Max Planck Institute for Informatics, 2Max Planck Institute for Intelligent Systems, 3Stanford University,Germany Germany USA

Overview

We analyse the state of the art in articulated human poseestimation using a new large-scale benchmark dataset.

• Our MPII Human Pose benchmark includes morethan 40,000 images systematically collected using anestablished taxonomy of human activities [1]. Thecollected images cover 410 human activities in total.

• Our analysis is based on the rich annotations thatinclude body pose, body part occlusion, torso andhead viewpoint, and person activity.

• Dataset and web-based evaluation toolkit areavailable at human-pose.mpi-inf.mpg.de.

Related Datasets

Dataset #training #test img. type

Full body pose datasets

Parse [Ramanan, NIPS’06] 100 205 diverse

LSP [Johnson&Everingham, BMVC’10] 1,000 1,000 sports (8 types)

PASCAL Person Layout [Everingham et al., IJCV’10] 850 849 everyday

Sport [Wang et al., CVPR’11] 649 650 sports

UIUC people [Wang et al., CVPR’11] 346 247 sports (2 types)

LSP extended [Johnson&Everingham, CVPR’11] 10,000 - sports (3 types)

FashionPose [Dantone et al., CVPR’13] 6,530 775 fashion blogs

J-HMDB [Jhuang et al., ICCV’13] 31,838 - diverse (21 act.)

Upper body pose datasets

Buffy Stickmen [Ferrari et al, CVPR’08] 472 276 TV show (Buffy)

ETHZ PASCAL Stickmen [Eichner&Ferrari, BMVC’09] - 549 PASCAL VOC

Human Obj. Int. (HOI) [Yao&Fei-Fei, CVPR’10] 180 120 sports (6 types)

We Are Family [Eichner&Ferrari, ECCV’10] 350 imgs. 175 imgs. group photos

Video Pose 2 [Sapp et al., CVPR’11] 766 519 TV show (Friends)

FLIC [Sapp&Taskar, CVPR’13] 6,543 1,016 feature movies

Sync. Activities [Eichner&Ferrari, PAMI’12] - 357 imgs. dance / aerobics

Armlets [Gkioxari et al., CVPR’13] 9,593 2,996 PASCAL VOC/Flickr

MPII Human Pose (this paper) 28,821 11,701 diverse (491 act.)

(a) (b) (c)

Visualization of the upper body pose variability. (a) color coding ofthe body parts, (b) annotations from the Armlets dataset [Gkioxariet al. CVPR’13], and (c) annotations from our MPII Human Pose

dataset.

sports occupation water activities home activities condition. exerc. fishing and hunt. religious activ. winter activ. walking dancing lawn and garden self care bicycling miscellaneous home repair running music playing transportation

hockey, ice bakery skindiving, scuba child care home exercise trapping game sitting in church skiing, downhill descending stairs Anishinaabe Jingle planting trees grooming, washing unicycling chess game, sitting home repair running flute, sitting pushing plane

rope skipping typing, electric swimming, synchr. polishing floors ski machine hunting, birds serving food skiing, cross-countr. backpacking general dancing raking roof with snow taking medication bicycling, racing board game playing put on and remov. jogging accordion, sitting riding in a car

trampoline locksmith swimming, breaststr. kitchen activity stair-treadmill hunting kneeling in church skiing, climbing up walking, for exerc. aerobic, step watering lawn or gard. showering, toweling bicycling, BMX copying documents caulking, chinking running, training conducting orchestra motor scooter, motor

rock climbing masonry, concrete tubing, floating playing with child. elliptical trainer fishing, ice sitting, playing skating, ice dancing pushing a wheelchair ballet, modern driving tractor hairstyling bicycling standing, talking repairing appliances running, stairs, up piano, sitting riding in a bus

cricket, batting fire fighter, haul. windsurfing carrying groceries slimnastics, jazzer. hunting, bow and arr. standing, talking dog sledding climbing hills caribbean dance clearing brush eating, sitting bicycling, mountain sitting, arts and cr. carpentry, general running, marathon cello, sitting driving automobile

Randomly chosen activities and images from 18 top level categories of our MPII Human Pose dataset. One image per activity is shown. The full dataset is available at human-pose.mpi-inf.mpg.de.

Analysis of the state of the art

• Performance of the upper-body and full-bodystate-of-the-art approaches:

Setting Torso Upper Lower Upper Fore- Head Upper Fullleg leg arm arm body body

Gkioxari et al. [2] 51.3 - - 28.0 12.4 - 26.4 -Sapp&Taskar [5] 51.3 - - 27.4 16.3 - 27.8 -Yang&Ramanan [6] 61.0 36.6 36.5 34.8 17.4 70.2 33.1 38.3Pishchulin et al. [3] 63.8 39.6 37.3 39.0 26.8 70.7 39.1 42.3

Gkioxari et al. [2] + loc 65.1 - - 33.7 14.9 - 32.4 -Sapp&Taskar [5] + loc 65.1 - - 32.6 19.2 - 33.7 -Yang&Ramanan [6] + loc 67.2 39.7 39.4 37.4 18.6 75.7 35.8 41.4Pishchulin et al. [3] + loc 66.6 40.5 38.2 40.4 27.7 74.5 40.6 43.9

Yang&Ramanan [6] retrained 69.3 39.5 38.8 43.4 27.7 74.6 42.3 44.7Pishchulin et al. [3] retrained 68.4 42.7 42.8 42.0 29.2 76.3 42.1 46.1

⇒ PS and FMP models improve significantly afterretraining.⇒ Full-body approaches perform better than

upper-body ones.

• We define complexity measures with respect to bodypose, viewpoint of the torso, body part length, occlusion,and truncation, and analyze sensitivity of thestate-of-the-art approaches to each of these factors:

0 1000 2000 3000 4000 5000 6000 7000

30

40

50

60

70

80

Number of images

PCPm

, %

OcclusionPart lengthPoseVP torsoTruncation

0 1000 2000 3000 4000 5000 6000 7000

30

40

50

60

70

80

Number of images

PCPm

, %

OcclusionPart lengthPoseVP torsoTruncation

Pishchulin et al. [3] Yang&Ramanan [6]

0 1000 2000 3000 4000 5000 6000 7000

30

40

50

60

70

80

Number of images

PCPm

, %

OcclusionPart lengthPoseVP torsoTruncation

0 1000 2000 3000 4000 5000 6000 7000

30

40

50

60

70

80

Number of images

PCPm

, %

OcclusionPart lengthPoseVP torsoTruncation

Sapp&Taskar [5] Gkioxari et al. [2]

Complexity measures:

Occlusion: number of occluded bodypartsmocc(L) =∑N

i=1ρi

Part length: average deviation from themean part lengthm f (L) =∑N

i=1 |d(li)−mi|/mi

Pose: deviation from the mean pose rep-resented via PS prior distributionmpose(L) =∏

(i, j)∈E pps(li|l j)

VP torso: deviation of the torso from thefrontal viewpointmv(L) =∑3

i=1αi

Truncation: number of truncated bodypartsmt(L) =∑N

i=1τi

⇒Body pose has a significantly more profound effecton performance than occlusion and truncation.

• Detailed performance analysis for activity, viewpointand body poses types:

20

25

30

35

40

45

50

55

60

65

activity categories

PCPm

, %

sport

s

occu

patio

n

cond

itionin

g exe

rcise

danc

ing

water a

ctivitie

s

home r

epair

home a

ctivitie

s

lawn a

nd ga

rden

fishing

and h

untin

g

music p

laying

winter

activi

ties

walking

miscella

neou

s

bicycl

ing

runnin

g

inactiv

ity qu

iet/lig

ht

trans

porta

tion

self c

are

religio

us ac

tivitie

s

volun

teer a

ctivitie

s

Pishchulin et al.Yang&RamananSapp&TaskarGkioxari et al.

20

30

40

50

60

70

activity categories

PCPm

, %

sport

s

occu

patio

n

cond

itionin

g exe

rcise

danc

ing

water a

ctivitie

s

home r

epair

home a

ctivitie

s

lawn a

nd ga

rden

fishing

and h

untin

g

music p

laying

winter

activi

ties

walking

miscella

neou

s

bicycl

ing

runnin

g

inactiv

ity qu

iet/lig

ht

trans

porta

tion

self c

are

religio

us ac

tivitie

s

volun

teer a

ctivitie

s

Pishchulin et al.Pishchulin et al. retrainedYang&RamananYang&Ramanan retrained

⇒ Striking performance differences for all approachesacross activities and viewpoints even after retraining.⇒Results on sports activities are the best, even though

they have been previously considered among themost difficult for pose estimation.⇒Detailed analysis reveals differences between FMP

and PS, even though overall performance iscomparable.

0

10

20

30

40

50

60

70

80

PCPm

, %

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51

Pishchulin et al.Yang&RamananSapp&TaskarGkioxari et al.

108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161

162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215

CV

PR

#2022C

VP

R#2022

CV

PR

2014S

ubmission

#2022.C

ON

FIDE

NTIA

LR

EV

IEW

CO

PY.D

ON

OT

DIS

TRIB

UTE

.

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

−150 −100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180−150 −100 −50 0 50 100 150

0

50

100

150

200

−100 −50 0 50 100 150

−20

0

20

40

60

80

100

120

140

160

−100 −80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

120

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

−60 −40 −20 0 20 40 60 80 100 120

−20

0

20

40

60

80

100

120

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180−80 −60 −40 −20 0 20 40 60

−20

0

20

40

60

80

−100 −50 0 50 100

−50

0

50

100

150

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

−60 −40 −20 0 20 40 60

−20

−10

0

10

20

30

40

50

60

70

−100 −80 −60 −40 −20 0 20 40 60 80 100

−20

0

20

40

60

80

100

120

140

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180−80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−150 −100 −50 0 50 100

−40

−20

0

20

40

60

80

100

120

140

160

−150 −100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180

−80 −60 −40 −20 0 20 40 60

−20

0

20

40

60

80

100

−80 −60 −40 −20 0 20 40 60 80 100

−20

0

20

40

60

80

100

120

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180−100 −80 −60 −40 −20 0 20 40 60 80 100

−20

0

20

40

60

80

100

120

140−100 −80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

120

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160−150 −100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

180

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−80 −60 −40 −20 0 20 40 60 80 100 120

−20

0

20

40

60

80

100

120

140

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

−80 −60 −40 −20 0 20 40 60 80

0

20

40

60

80

100

120−100 −80 −60 −40 −20 0 20 40 60 80 100

−20

0

20

40

60

80

100

120

140

−80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

120−50 0 50 100

−20

0

20

40

60

80

100

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160−100 −80 −60 −40 −20 0 20 40 60

−20

0

20

40

60

80

−80 −60 −40 −20 0 20 40 60 80 100

0

20

40

60

80

100

120

140−120 −100 −80 −60 −40 −20 0 20 40

−20

0

20

40

60

80

−80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

−80 −60 −40 −20 0 20 40 60 80

−20

0

20

40

60

80

100

−100 −80 −60 −40 −20 0 20 40 60

−20

0

20

40

60

80

100−60 −40 −20 0 20 40 60

−20

0

20

40

60

80−100 −50 0 50 100 150

−20

0

20

40

60

80

100

120

140

160

180

−100 −50 0 50 100

−20

0

20

40

60

80

100

120

140

160

2

Performance (mPCP) for different pose types.

Activity recognition

We evaluate the performance of the state-of-the-artactivity recognition methods on our benchmark in [4].

1 50 100 150 200 250 300 350 4000

50

100

150

200

250

activity

# ex

ampl

es/a

ctiv

ity

traintest

1 50 100 150 200 250 300 350 4000

5

10

15

20

25

30

# activities

mA

P

Dense trajectories (DT)GT single pose (GT)GT single pose + track (GT−T)PS single pose + track (PS−T)PS multi−pose (PS−M)PS−M + DTPS−M filter DT

(a) (b)

(a) number of examples per activity. (b) Comparison of the activityrecognition performance of [Wang et al., ICCV’13] (DT) andvariants of posed-based approach of [Jhuang et al.,ICCV’13]. Theplot shows performance (mAP) as a function of the number ofactivities. Activities are sorted by the number of training examples.

yoga, bicycling, skiing, cooking or skate- rope softball, forestrypower mountaindownhill food prep. boarding skipping general

Dense trajectories (DT) 10.6 14.5 51.9 0.5 11.4 36.0 12.7 8.4PS multi-pose (PS-M) 18.3 34.0 27.3 2.6 17.2 90.5 3.0 5.2PS-M + DT 19.6 40.7 32.9 2.2 19.5 88.7 3.9 7.2PS-M filter DT 16.1 20.4 52.2 0.8 13.5 55.7 4.2 10.6

carpentry, bicycling, golf rock ballet, aerobic resistance totalgeneral racing climbing modern step training

Dense trajectories (DT) 5.5 5.5 33.0 41.5 12.7 24.5 16.5 19.0PS multi-pose (PS-M) 3.4 8.6 47.9 4.7 22.9 10.4 7.2 20.2PS-M + DT 5.0 12.1 51.9 14.4 23.7 17.1 14.4 23.5PS-M filter DT 6.1 15.5 15.9 38.6 7.1 25.8 9.6 19.5

Activity recognition results (mAP) on the 15 largest classes.

References[1] B. Ainsworth, W. Haskell, S. Herrmann, N. Meckes, D. Bassett, C. Tudor-Locke, J. Greer,

J. Vezina, M. Whitt-Glover, and A. Leon. 2011 compendium of physical activities: a secondupdate of codes and MET values. MSSE’11.

[2] G. Gkioxari, P. Arbelaez, L. Bourdev, and J. Malik. Articulated pose estimation usingdiscriminative armlet classifiers. In CVPR’13.

[3] L. Pishchulin, M. Andriluka, P. Gehler, and B. Schiele. Strong appearance and expressivespatial models for human pose estimation. In ICCV’13.

[4] L. Pishchulin, M. Andriluka, and B. Schiele. Fine-grained activity recognition with holistic andpose based features. arXiv:1406.1881, 2014.

[5] B. Sapp and B. Taskar. Multimodal decomposable models for human pose estimation. InCVPR’13.

[6] Y. Yang and D. Ramanan. Articulated human detection with flexible mixtures of parts.PAMI’13.