48
Matthew Krenik Advisor: Fabrizio Pece Hand Pose Estimation

Hand Pose Estimation - ETH Z

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hand Pose Estimation - ETH Z

Matthew Krenik Advisor: Fabrizio Pece

Hand Pose Estimation

Page 2: Hand Pose Estimation - ETH Z

§  What is Hand Pose Estimation?

§  Why does it matter?

§  How does it work?

§  What has been done?

2

Agenda

Page 3: Hand Pose Estimation - ETH Z

§  Estimate full Degree of Freedom (DOF) of a hand from depth images

§  This is a tough problem, especially to perform in real time! §  Not to be confused with “hand shape estimation”

3

What is Hand Pose Estimation?

Page 4: Hand Pose Estimation - ETH Z

4

Page 5: Hand Pose Estimation - ETH Z

§  More than just gestures §  Ideal for continuous

input applications §  Links your hand

dexterity into a computer model

§  Will it redefine how we interact with computers??

5

Why Does it Matter?

Page 6: Hand Pose Estimation - ETH Z

6

Gaming

Page 7: Hand Pose Estimation - ETH Z

7

Design / Engineering

Page 8: Hand Pose Estimation - ETH Z

8

Robot Hand Control– Surgery? Industry?

Page 9: Hand Pose Estimation - ETH Z

9

Communication – Sign Language

Page 10: Hand Pose Estimation - ETH Z

§  Its going to take some time to explain

§  Starting from the ground up! §  Decision trees §  Ensemble techniques §  Random forests §  Body Pose estimation §  Hand Pose Estimation

§  Assumption is that everyone has a very basic idea of what machine learning is and does

10

How Does it Work?

Page 11: Hand Pose Estimation - ETH Z

§  Goal: §  Given training data T with entries (𝒙, 𝒚) §  Find a model that estimates 𝒚 for unseen 𝒙 §  This is called prediction

§  Quality Measurement: §  Minimize the probability of model prediction errors on future data

§  What are some models? §  Linear Regression §  Support Vector Machines §  Decision Trees!

11

Machine Learning

Page 12: Hand Pose Estimation - ETH Z

§  Very intuitive §  Each node asks a question

about a feature of the data §  Propagates through the tree

depending on the answer to each question

§  When algorithm gets to the end, the decision tree makes a classification

12

Decision Trees

Page 13: Hand Pose Estimation - ETH Z

§  In what order do we ask the questions (test features)? §  Each possible tree has an amount of entropy §  Test out all possible questions for a node, and choose the one

that reduces the entropy the most (largest information gain)

§  How do nodes make decisions based on the features? §  Same way! §  Choose a decision boundary that gives the largest information

gain

13

How to grow a tree from data?

Page 14: Hand Pose Estimation - ETH Z

14

How to grow a tree from data?

Page 15: Hand Pose Estimation - ETH Z

15

Decision Trees: A Pretty Good Model!

Page 16: Hand Pose Estimation - ETH Z

§  Two competing methodologies: §  Traditional: Build one really good model §  Ensemble: Build many models and average the results

§  Build a ton of “pretty good” models §  Combine them into one “pretty awesome” prediction! §  Important for individual models to not be correlated,

otherwise there is a strong tendency to overfit §  So we add randomness!

16

Ensemble Learning

Page 17: Hand Pose Estimation - ETH Z

§  Bootstrap Aggregation (Bagging) §  Take a random subsample from the training set T, with replacement §  Train each model on a different subsample §  Classification is the majority vote; Regression is the average

§  Random Forests: Multiple, randomized decision trees 1.  Bagging 2.  Randomized Node Optimization: choose random set of questions

§  Number of questions affects the correlation of the trees 3.  Decision boundary of the decision trees: conic, linear, etc. 4.  Depth of the component decision trees

§  More depth means there will be more overfitting 17

Ensemble Techniques

Page 18: Hand Pose Estimation - ETH Z

18

Example: Different Trees

Page 19: Hand Pose Estimation - ETH Z

19

Example: Different Trees

Page 20: Hand Pose Estimation - ETH Z

20

Example: Different Trees

Page 21: Hand Pose Estimation - ETH Z

21

Example: Random Decision Forest

Page 22: Hand Pose Estimation - ETH Z

22

Example: Multi-class Decision Trees

Page 23: Hand Pose Estimation - ETH Z

23

Example: Comparison to SVM Model

Page 24: Hand Pose Estimation - ETH Z

24

A quick look at body pose estimation

§  Body Pose Estimation Pipeline §  Technology found in consumer devices, like the Kinect §  Very similar to hand pose estimation

Page 25: Hand Pose Estimation - ETH Z

25

Hand Pose Estimation Pipeline

Page 26: Hand Pose Estimation - ETH Z

§  Hand is much smaller than the body, but still has 22 DOF §  Self occlusion is very common and severe §  Can be rotated in any direction (body is always upright) §  Real depth data can be difficult to label

26

What makes Hand Pose tough?

Page 27: Hand Pose Estimation - ETH Z

§  Restrict the viewing area of the hand §  One Advantage: Hands are fairly invariant among humans §  Train with synthetic data, rendered from 3D models

27

Some ideas..

Page 28: Hand Pose Estimation - ETH Z

§  Use 3D hand models to generate data

§  Train the Random Decision Forests using this data

28

Train based on Synthetic Data

Page 29: Hand Pose Estimation - ETH Z

29

Hand Pose Estimation Pipeline

Page 30: Hand Pose Estimation - ETH Z

30

Pixel Classification

One Tree Two Trees Three Trees

Page 31: Hand Pose Estimation - ETH Z

§  Algorithm used to determine where the joints are

§  Each pixel is given a weighted Gaussian kernel §  Weight is determined by class probability times depth §  Gradient ascent from many points finds the local maxima §  Highest local maxima determines the joint §  Threshold the scores to filter out non-visible joints

31

Mean shift local mode finding

Page 32: Hand Pose Estimation - ETH Z

32

Joint Determination

Page 33: Hand Pose Estimation - ETH Z

Strengths §  Very fast §  Robust to fast movements and noise §  No initialization needed §  Can run on a GPU for interface applications or games

Issues §  Training must be done offline §  Number of images ~1-10M, takes 25-250 GB of data §  Number of operations is huge even with simple algorithm

33

Hand Pose Estimation Algorithm

Page 34: Hand Pose Estimation - ETH Z

§  Difficult to generate every possible hand pose §  Dataset size is huge! §  Hard to capture the variation in the data set §  More variation à deeper trees à more RAM/memory

§  Solution: Divide into sub problems and solve with separate RDFs

§  Lower variation à lower complexity à less RAM/memory

34

Limitations of Single Layer RDF

Page 35: Hand Pose Estimation - ETH Z

35

Page 36: Hand Pose Estimation - ETH Z

36

Multi-layered RDFs for Hand Pose

Page 37: Hand Pose Estimation - ETH Z

§  Local Expert Network §  Hand Shape Classification gives each pixel a label §  Train local expert forests for each pixel label §  Expert forest depends on pixel label; each pixel is classified

§  Global Expert Network §  Hand Shape Classification gives each pixel a label §  The hand shape is determined by pixel voting §  Train global expert forests for each pixel label §  Expert forest depends on hand shape label; each pixel is classified

37

Two Structures of Multi-layer RDFs

Page 38: Hand Pose Estimation - ETH Z

38

Local Expert Network

Page 39: Hand Pose Estimation - ETH Z

39

Global Expert Network

Page 40: Hand Pose Estimation - ETH Z

§  Given the same data as before (hand shape not given)

1.  Cluster the data 2.  Train Hand Shape Classifier based on all clusters 3.  Train each Pixel Classifier based on a specific cluster

40

Training a Multi-layer RDF

Page 41: Hand Pose Estimation - ETH Z

§  Global Expert Networks average class distributions à More robust to noise

§  Local Expert Networks use info from each pixel à

Better at generalizing unseen data

41

Which is better? GEN or LEN

Page 42: Hand Pose Estimation - ETH Z

42

Test: American Sign Language

Page 43: Hand Pose Estimation - ETH Z

§  Huge improvement over single-layer RDFs

43

Results

Page 44: Hand Pose Estimation - ETH Z

§  Remaining errors are concentrated on very similar poses

44

Results

Page 45: Hand Pose Estimation - ETH Z

§  What is Hand Pose Estimation? Determine the joint positions to fix all DOFs of the hand

§  Why does it matter? Continuous Input Applications

§  How does it work? Randomized Decision Forests

§  What has been done? Add multiple layers for increased performance.

45

Summary

Page 46: Hand Pose Estimation - ETH Z

§  [1] Keskin- Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests

§  [2] Thompson-Real Time Continuous Pose Recovery of Human Hands Using Convolutional Networks

§  [3] Qian- Realtime and Robust Hand Tracking from Depth §  [4] Tang- Latent Regression Forest: Structured Estimation of 3D Articulated

Hand Posture §  [5] Oikonomidis - Evolutionary Quasi-random Search for Hand Articulations

Tracking §  [6] Wang - 6D Hands: Markerless Hand Tracking for Computer Aided Design §  [7] Hilliges - Advanced topics in Gesture Recognition Part II

46

References

Page 47: Hand Pose Estimation - ETH Z

47

Questions?

Page 48: Hand Pose Estimation - ETH Z

§  Hand shape is just shape information “fist”, “flat”, etc. §  Hand pose is specific joint angles for every DOF

§  With hand pose, can use SVM to determine hand shape very robustly

48

Appendix: Getting Hand Shape from Hand Pose