1
At the end of each trial, the correct category was revealed and the subjects recorded the accuracy of their category guess. Tiny People dataset We introduce a new Tiny People dataset, which contains 200 real scene images capturing people at distance (the dataset is used for testing only) Tiny People Pose Lukáš Neumann and Andrea Vedaldi Introduction 1. Andriluka, M., Pishchulin, L., Gehler, P., Bernt, S.: 2d human pose estimation: New benchmark and state of the art analysis, CVPR 2014 2. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation, ECCV 2016 3. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields, CVPR 2017 4. Hu, P., Ramanan, D.: Finding tiny faces, CVPR 2017 Visual Geometry Group, Department of Engineering Science, University of Oxford Goal : recognizing pose from tiny images of people, down to 24px high Tiny People are important in surveillance, autonomous driving, etc. Challenge: low resolution data is extremely ambiguous Approach Models data uncertainty via probability distributions A CNN that outputs a distribution over possible body joints from a small image of a person A probabilistic variant of keypoint localization using dense heat maps Evaluation Downsampled standard benchmarks (MPII Human Pose and MS-COCO) New Tiny People dataset of “real” low-resolution people [email protected] Our method outperforms both standard models References This research was supported by Probabilistic Formulation Our model emits a continuous Gaussian distribution for each keypoint u by estimating Gaussian Mixture Model parameters over a coarse 16 × 16 feature map Ω d generated over the whole image Output: a continuous distribution over possible body joint configurations: Training: minimum neg. log-likelihood of the ground truth keypoint locations MPII Human Pose dataset Evaluation on the standard MPII Human Pose dataset 1 , where we reduced the resolution to approximate people seen at a distance Standard-height (128px): comparable to the state-of-the-art Stacked Hourglass 2 and better than Part Affinity Fields 3 Small people (<64px): our method performs significantly better detection accuracy model surprise regression error person height histogram of human pose datasets We also train the Tiny Faces detector 4 for person detection and use it as an input for our method We have shown modelling uncertainty explicitly in a deep network can significantly boosts accuracy for ambiguous data, such as low-resolution pose recognition Standard Formulation The standard approach 2 for landmark detection is to output one heatmap per joint Heatmaps are fitted to the ground truth data by a L2 per-pixel regression of a heuristic Gaussian-like kernel around the ground truth landmark location Limitations: Heatmaps only convey information about the location – they may appear to encode uncertainty, but they do not The accuracy is bound by the heatmap resolution The model allows for modelling data uncertainty (by increasing the variance of corresponding Gaussians), as well as for sub-pixel accuracy

Tiny People Posevgg/publications/2018/... · 2018-12-04 · Tiny People dataset • We introduce a new Tiny People dataset, which contains 200 real scene images capturing people at

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Tiny People Posevgg/publications/2018/... · 2018-12-04 · Tiny People dataset • We introduce a new Tiny People dataset, which contains 200 real scene images capturing people at

• At the end of each trial, the correct category was revealed and the subjects recorded the accuracy of their category guess.

Tiny People dataset

• We introduce a new Tiny People dataset, which contains 200 real scene images

capturing people at distance (the dataset is used for testing only)

Tiny People Pose

Lukáš Neumann and Andrea Vedaldi

Introduction

1. Andriluka, M., Pishchulin, L., Gehler, P., Bernt, S.: 2d human pose estimation: New benchmark and state of the art analysis, CVPR 2014

2. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation, ECCV 2016

3. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields, CVPR 2017

4. Hu, P., Ramanan, D.: Finding tiny faces, CVPR 2017

Visual Geometry Group, Department of Engineering Science, University of Oxford

• Goal: recognizing pose from tiny images of people, down to 24px high

• Tiny People are important in surveillance, autonomous driving, etc.

• Challenge: low resolution data is extremely ambiguous

• Approach

• Models data uncertainty via probability distributions

• A CNN that outputs a distribution over possible body joints from a small

image of a person

• A probabilistic variant of keypoint localization using dense heat maps

• Evaluation

• Downsampled standard benchmarks (MPII Human Pose and MS-COCO)

• New Tiny People dataset of “real” low-resolution people

[email protected]

• Our method outperforms both standard models

References

This research was supported by

Probabilistic Formulation

• Our model emits a continuous Gaussian distribution for each keypoint u by

estimating Gaussian Mixture Model parameters over a coarse 16 × 16 feature

map Ωd generated over the whole image

• Output: a continuous distribution over possible body joint configurations:

• Training: minimum neg. log-likelihood of the ground truth keypoint locations

MPII Human Pose dataset

• Evaluation on the standard MPII Human Pose dataset1, where we reduced the

resolution to approximate people seen at a distance

• Standard-height (128px): comparable to the state-of-the-art Stacked Hourglass2

and better than Part Affinity Fields3

• Small people (<64px): our method performs significantly better

detection accuracy model surpriseregression error

person height histogram

of human pose datasets

• We also train the Tiny Faces detector4 for person detection and use it as an

input for our method

• We have shown modelling uncertainty explicitly in a deep network can significantly boosts accuracy for ambiguous data, such as low-resolution pose

recognition

Standard Formulation

• The standard approach2 for landmark detection is to output one heatmap per

joint

• Heatmaps are fitted to the ground truth data by a L2 per-pixel regression of a

heuristic Gaussian-like kernel around the ground truth landmark location

• Limitations:

• Heatmaps only convey information about the

location – they may appear to encode uncertainty, but they do not

• The accuracy is bound by the heatmap

resolution

• The model allows for modelling data uncertainty (by increasing the variance of

corresponding Gaussians), as well as for sub-pixel accuracy