Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Semi-Supervised Hierarchical Models for 3D Human Pose

Reconstruction

Atul Kanaujia, CBIM, Rutgers

Cristian Sminchisescu, TTI-C

Dimitris Metaxas,CBIM, Rutgers

3D Human Pose Inference DifficultiesTowards automatic monocular methods

• Background clutter• Geometric transforms

– Scale & viewpoint change

• Illumination changes• Fast motions • (Self-)occlusion

• Variability in the human body proportions

Standard Discriminative Approach• Train structured model to predict 3D human

poses given image descriptor inputs - Multi-valued predictor necessary

for multiple plausible

pose interpretations

• Train using images and corresponding 3D

human poses

Problems… we address• Lack of typical training data (lab, quasi-real)• Image descriptors are unstable w.r.t. to geometric

deformations and background clutter

Clean Quasi-real (QReal) Real

Predictor cannot generalize!

This Talk

1. Learning hierarchical image descriptors– Multi-level / coarse-to-fine encodings stable w.r.t.

deformation and misalignment in the training set – Metrics for noise suppression (clutter removal)

2. Semi-supervised generalization to multi-valued prediction

Hierarchical Image Encodings

Distance Metric Learning

Multi-ValuedSemi-supervised

Learning

Why do we need better features?

Global histograms (bag of features) are robust to local deformation but sensitive to background clutter

Regular grid descriptors can be made robust to clutter but are sensitive to training set misalignments & local deformations

Hierarchical Image Descriptors

Coarse-to-fine encodings designed to represent multiple degrees of selectivity & invariance

Progressively relax rigid local spatial encodings to weaker models of geometry accumulated over larger regions– Layered encodings (e.g. spatial pyramid) or bottom-

up, hierarchical aggregation based on successive template matching + max pooling (e.g. HMAX)

Hierarchical Image Descriptors

HMAX (Poggio et al, 2002-06)

S116 Gabor Filter response

Select Patches / Object Parts to match againstresults from previous layer

Ω = MAX

Encoding

Multilevel Image Descriptors

SIFT Descriptor, followed by Vector quantization

Votes accumulation in a spatial region

Concatenateto Descriptor

Spatial Pyramid (Lazebnik et al, 2006)

Dealing with Background Clutter

Multi-level encodings are still perturbed by background

Need to suppress noise

Feature selection based on e.g. sparse linear regression tends to be ineffective for global descriptors

Learn Mahalanobis distance that maximizes similarity within chunklets = sets of images of people in similar poses, but differently proportioned and placed on different backgrounds– Relevant Component Analysis (Hillel et. al. 2003)

Chunklet 1

Chunklet 2

Suppressing Background Clutter

The distance between the learned descriptors computed on different backgrounds is diminished

Clean Quasi-real (QReal) Real

This talk … Flexible training

1. Learning hierarchical image descriptors– Multi-level / coarse-to-fine encodings stable w.r.t.

deformation and misalignment in the training set – Metrics for noise suppression & clutter removal

Hierarchical Image Encodings

Multi-ValuedSemi-supervised

Learning

x – 3D Human Pose

r – Image Descriptor

Semi-supervised Multi-valued Prediction

Manifold AssumptionIf two image descriptors are close in their intrinsic geometry (e.g. encoded by the graph Laplacian), their 3D outputs should vary smoothly

x – 3D Human Pose

r – Image Descriptor

x1 x2x3

Expert Ranking Assumption (Mixture of experts) If two image descriptors are close in their intrinsic image

geometry (graph Laplacian), their 3D outputs should be smooth only if predicted by the same expert (prevent smoothing across partitions)

Semi-supervised Multi-valued Prediction

Experiments

Multi-level encodings – 5 hierarchical descriptors – ~1500d image descriptor

Dataset of human poses– 56d human joint angle state vector– 5 Motions obtained with motion capture

• Walk, Pantomime, Bending Pickup, Dancing and Running

– 3247 x 3 images + 1000 unlabeled images

Multi-valued predictor uses 5 experts

Prediction Accuracy Multilevel vs. Global Descriptors

Multilevel / hierarchical descriptors perform significantly better than global histograms or single-layer (fine) grids of local descriptors

Prediction AccuracyBefore / after Metric Learning

Metric learning improves the prediction error for global histogram descriptors

Run Lola Run MovieAutomatic 3D Pose Reconstruction

Integrated scanning detection window + 3D predictionNotice: scale change, occlusions (trees), self-occlusion,

fast motion

Run Lola Run Movie 3D Pose Reconstruction

Integrated scanning detection window + 3D predictionNotice: scale change, fast motion, transparencies

We have argued for

1. Learning of hierarchical image descriptors for better generalization under shape variability and background clutter

Ongoing work– Jointly learn the features and the predictor – Scaling to large datasets (> 500K samples)

Semi-Supervised Hierarchical Models for 3D Human Pose Reconstruction

Documents

Data Mining for Text Categorization with Semi-Supervised Agglomerative Hierarchical Clustering

Multiview-Consistent Semi-Supervised Learning for 3D ......Multiview-Consistent Semi-Supervised Learning for 3D Human Pose Estimation Rahul Mitra∗ IIT Bombay rmitter@cse.iitb.ac.in

A Hierarchical Pose-Based Approach to Complex Action ...ai.stanford.edu/~jniebles/JuanCarlosNiebles_activities_CVPR2016.pdf · Performing a reading session picking object reading

3D View-Invariant Face Recognition Using a Hierarchical ...levine/Levine_Ajit_3DFaceRec_CVIU.pdf · 3D View-Invariant Face Recognition Using a Hierarchical Pose-Normalization Strategy

Supervised classification of remote sensing images ... · Joint PDF Single-scale Markovian model Hierarchical Markovian model Experimental results Conclusion Supervised classiﬁcation

A Robust Likelihood Function for 3D Human Pose Trackingvisal.cs.cityu.edu.hk/static/pubs/journal/tip14-robustlikelihood.pdf · 3 table ii previous work on supervised 3d human pose

Weakly-supervised 3D Hand Pose Estimation from Monocular RGB …openaccess.thecvf.com/content_ECCV_2018/papers/Yujun_Cai... · 2018-08-28 · Weakly-supervised 3D HandPose Estimation

Self-Supervised Learning of Pose Embeddings From

6D Object Pose Regression via Supervised Learning on Point Clouds · 2020-01-27 · 6D Object Pose Regression via Supervised Learning on Point Clouds Ge Gao 1, Mikko Lauri , Yulong

Body-form and body-pose recognition with a hierarchical model of the ventral stream

Weakly Supervised Adversarial Learning for 3D Human Pose ...humanmotion.ict.ac.cn/papers/2019P7_POSE/paper.pdf · weakly supervised adversarial learning framework for 3D human pose

Multi-view Self-supervised Deep Learning for 6D Pose ... · Multi-view Self-supervised Deep Learning for 6D Pose Estimation in the Amazon Picking Challenge Andy Zeng 1Kuan-Ting Yu2

Weakly-Supervised 3D Pose Estimation from a …epubs.surrey.ac.uk/852639/1/Weakly-Supervised 3D Pose...More recently, Convolutional Pose Machines have become a popular approach [34]

semi-supervised training - arxiv.org · Semi-supervised training. There has been work on mul-titask networks [3] for joint 2D and 3D pose estimation [36,33] as well as action recognition

Deep Graph Pose: a semi-supervised deep graphical model ...€¦ · Semi-supervised learning aims to fully utilize unlabeled or weakly-labeled data to gain additional insights into

Lifting 2d Human Pose to 3d : A Weakly Supervised …Lifting 2d Human Pose to 3d : A Weakly Supervised Approach Sandika Biswas, Sanjana Sinha, Kavya Gupta and Brojeshwar Bhowmick Embedded

Hierarchical topic modeling with pose-transition feature

A Supervised Learning Architecture for Pose Recognition in a Social Robot

Weakly-supervised 3D Hand Pose Estimation from Monocular ...imi.ntu.edu.sg/NewsEvents/Events/PastSeminars/Documents/31_Jan… · Convolutional Pose Machines [Wei. et al. CVPR 2016]

Towards 3D Human Pose Estimation in the Wild: a Weakly ... · Towards 3D Human Pose Estimation in the Wild: a Weakly-supervised Approach Xingyi Zhou, Qixing Huang, Xiao Sun, Xiangyang