22
FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding Dian Shao, Yue Zhao, Bo Dai, Dahua Lin CVPR 2020 Oral STRUCT Group Paper Reading Presented by Xiao Wu2020.4.19

FineGym: A Hierarchical Video Dataset for Fine-grained Action …39.96.165.147/Seminar/XiaoWu_200419.pdf · 2020. 4. 25. · FineGym: A Hierarchical Video Dataset for Fine-grained

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • FineGym: A Hierarchical Video Dataset for Fine-grained Action Understanding

    Dian Shao, Yue Zhao, Bo Dai, Dahua Lin

    CVPR 2020 Oral

    STRUCT Group Paper Reading

    Presented by Xiao Wu2020.4.19

  • Outline◦Authorship◦ Introduction◦Dataset◦Experiments◦Conclusion

    2020/4/25 STRUCT PAPER READING 2

  • Introduction

    2020/4/25 STRUCT PAPER READING 3

    ◦Coarse-grained Action Dataset:◦ UCF101: only category labels◦ THUMOS, ActivityNet: + temporal locations◦ or: + spatial-temporal bounding boxes

    ◦Problems:◦ Background > Action, e.g. hockey vs gymnastics

  • Introduction

    2020/4/25 STRUCT PAPER READING 4

    ◦Fine-grained Action Dataset:◦ Breakfast: action-> many units, e.g., juice = cut orange + …◦ Diving48: label-> 4 attributes, e.g., diving = back + 15som + 15twist +free

    ◦Problems:◦ Limited classes (~50)◦ Limited structure hierarchy

  • Introduction

    2020/4/25 STRUCT PAPER READING 5

    ◦ FineGym with rich annotation◦ Recognition, detection, auto-scoring, generation…◦ 3 semantical level, 2 temporal level

  • Outline◦Authorship◦Related Work◦Dataset◦Experiments◦Conclusion

    2020/4/25 STRUCT PAPER READING 6

  • Dataset

    2020/4/25 STRUCT PAPER READING 7

  • Dataset

    2020/4/25 STRUCT PAPER READING 8

    ◦ Gymnastics = 10 action (6 man + 4 woman)4 woman action(跳马,平衡木,自由体操,高低杠)

    = 15 subaction(e.g.,平衡木转身)= 530 subsubaction(e.g., 3次屈体转体)

    ◦ Release:◦ Gym99 – balanced distribution◦ Gym288, Gym530

  • Dataset

    2020/4/25 STRUCT PAPER READING 9

    ◦ Stats:◦ action(~55s), subaction(~2s)◦ Mostly 720p+

  • Outline◦Authorship◦Related Work◦Dataset◦Experiments◦Conclusion

    2020/4/25 STRUCT PAPER READING 10

  • Experiments

    2020/4/25 STRUCT PAPER READING 11

    ◦ Event/Set (action, subaction) Recognition◦ 3 frames is enough for event-/set-level recognition◦ RGB > Flow at this level

  • Experiments

    2020/4/25 STRUCT PAPER READING 12

    ◦ Element (subsubaction) Recognition◦ Long-tail overfitting◦ Flow > RGB in fine-grained◦ TSM, TRN > TSN

    ◦ Temporal dynamics is important◦ Pretrained ImageNet≈Kinetics◦ Skeleton methods suffer from

    estimation

  • Experiments

    2020/4/25 STRUCT PAPER READING 13

  • Experiments

    2020/4/25 STRUCT PAPER READING 14

  • Experiments

    2020/4/25 STRUCT PAPER READING 15

    ◦ Temporal Action Localization◦ Localizing sub-actions is much more challenging

  • Experiments

    2020/4/25 STRUCT PAPER READING 16

    ◦ Ablation on sparse sampling◦ Accuracy saturated slowly when # Frame increases◦ Every frame counts in fine-grained action◦ Sample rate on UCF(2.7%, 5-frame), on FineGym(30%, 12)

  • Experiments

    2020/4/25 STRUCT PAPER READING 17

    ◦ Other Ablations◦ (a) Flow contributes in subsubaction recognition◦ (b) Frame order matters significantly in TRN

  • Experiments

    2020/4/25 STRUCT PAPER READING 18

    ◦ Other Ablations◦ (c) TSN is more robust than TSM when # test frames vary◦ (d) on UCF101, pretrain I3D 84.5%->97.9%

    on FineGym, not helpfulHypothesis: gaps in terms of temporal pattern

  • Experiments

    2020/4/25 STRUCT PAPER READING 19

    ◦ Challenging Classes from Confusion Matrix◦ Intense motion (e.g. salto, often < 1s)

    ◦ Subtle spatial semantics (e.g. legs bent or straight)

    ◦ Complex temporal dymamics (e.g. direction of motion, degree ofrotation, counting times of saltos)

  • Outline◦Authorship◦ Introduction◦Dataset◦Experiments◦Conclusion

    2020/4/25 STRUCT PAPER READING 20

  • Conclusion

    2020/4/25 STRUCT PAPER READING 21

    ◦ Coarse -> Fine-grained, RGB -> Flow

    ◦ Temporal localization: not well solved for fine-grained dataset

    ◦ Sparse Sampling: disapproved on fine-grained dataset

    ◦ Shuffle frame will degrade TRN, increase #test frame will degradeTSM

    ◦ Pretrained model hard to transfer

  • Thank you!Presented by Xiao Wu

    2020/4/25 STRUCT PAPER READING 22