Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS

Shape-Based Human Detection and Segmentation via Hierarchical Part-

Template Matching

Zhe Lin, Member, IEEELarry S. Davis, Fellow, IEEE

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLGENCE, APRIL 2010

Overview

• Introduction• Previous Work• Proposed Approach– Hierarchical Part-Template Matching– Pose-Adaptive Descriptors– Combining With Calibration And Background

Subtraction• Experiment Result• Conclusion

Overview



Introduction

• Robust Human tracking and identification are highly dependent on reliable human detection and human segmentation.

• Remains challenging due to several conditions like body postures, illumination, occlusion, and viewpoint changes.

• Goal: Develop a robust and efficient approach to detect and segmentation.

• Method: Shape-based, part-template matching

Overview



Previous Work

• Shape Feature extraction schemes– Model human shapes globally [1],[2],[3]– Model shapes using sparse local features [9],[10],[11]

• Learning Perspective– Generative approach – tree-based data structure [6],

[7],[8]– Discriminative approach – using SVMs as the test

classifiers [3]• Surveillance scenarios– Motion blob information [35],[36]

Overview



Proposed Approach

• Hierarchical part-template matching approach combining with discriminative learning.

Overview



Hierarchical Part-Template Matching

• Generating the part-template tree model– Synthesizing global shape models– Generating parts by decomposition– Constructing an initial tree model using parts

• Learning the part-template tree• Hierarchical part-template matching

Synthesizing Global Shape Models

• Analyzing articulation of human body to six regions– Head, torso, pair of upper legs, pair of lower legs– Parameter above are quantized into {3,2,3,3,3,3}

Generating Parts by Decomposition

• Binarize (a) and to obtain (b), then extract boundaries of the silhouettes to get (c).

• Silhouettes are decomposed into three parts(head-torso, upper legs, and lower legs)

• The parameters of silhouettes are denoted by θj, consist of index and location

Constructing an Initial Tree Model Using Parts

• A part-template tree is conducted by placing the decomposed part region or fragment into a tree.

• Four layer L0~L3, denote root, head-torso, upper and lower legs separately.

• Tree consists of 186 part-template. (6 ht models, 18 ul models, and 162 ll models)

• Much larger set only slightly improves in performance.

• Applying fast hierarchical shape matching scheme.

Constructing an Initial Tree Model Using Parts

Learning the Part-Template Tree

• The tree doesn’t contain any prior statistics from real human silhouettes.

• The learning is performed by matching the tree to a set of real human silhouette images.

• The goal is to explicitly estimate branching probability distributions (conditional probability distributions).

Learning the Part-Template Tree

• Learning method:– The training silhouette is passed through the tree

from root to estimate the matching score and find the optimal path.

– Based on the set of paths, a branching probability distribution is estimated for each node.

– Each node contains a binary image of the part-template, its sample point coordinates, and a branching probability.

Hierarchical Part-Template Matching

• Similarly to the model used for tree learning.• The overall matching score for a detection

window is simply modeled as a summation of scores of all nodes along the path.

• Score of node is the product of the part-template matching score and the probability of the node.

• Matching method is similar to Chamfer matching [6].– The matching score of a sample point on the contour

is measured by edge-orientation matching to find the optimal human pose.

[6] D.M. Gavrila and V. Philomin, “Real-Time Object Detection for SMART Vehicles,” Proc. IEEE

Overview



Pose-Adaptive Descriptors

• Introduce a pose-adaptive feature computation method for detecting human from images using SVM.

• By similar method of HOG descriptor[3] getting object detection window.

• After given the candidate detection window, hierarchical part-template matching is performed to estimate the optimal pose.

• After the pose is estimated, block features closest to each pose contour point are collected.

[3] N. Dalal and B. Triggs, “Histograms of Oriented Gradients for Human Detection,” Proc. IEEE

Conf.

Pose-Adaptive Descriptors

Low-Level Features

• Similar to [3]• Given an image, calculate gradient magnitudes

|G| and edge orientation O• Quantize the image into 8x8 nonoverlapping

cells, each represent a histogram of edge orientations.

Pose Inference on The Low-Level Features

• An optimal tree path is estimated based on the matching score.

• Among matching score, the part-template score is measured by an average of gradient magnitude.

• Matching score (1), where B(t) = [O(t)/(π/9)], h is the

orientation histogram• The average score of the part-template is

(2)

Representation Using Pose-Adaptive Descriptors

• The global shape models are represented as a set of boundary points with corresponding edge orientations.

Overview



Scene-to-Camera Calibration

• To obtain a mapping between head points and foot points in the image, estimate the homography between the head plane and the foot plane in the image.

• Get head point ph = f(pf), where pf is an arbitrary point of foot.

Combining With Background Subtraction

• Find foot regions Rfoot = {x|ϒx≥ξ}• Through part-template matching finding

regions that may be legs.• Given the estimated human vertical axis vx and

an adaptive rectangular window W(x,(w0,h0)), get human detection.

• Get human segmentation.

Combining With Calibration and Background Substraction

Overview



Experiment Result

• Present result of human detector using their method on two public pedestrian data sets (INRIA and MIT-CBCL).

• Present result of multiple occluded human detector on three crowded image and video data set.

• Compare with other approaches using DET curves.

Experiment of Detection Result


• Better performance than HOG-SVM.• Not only detecting but also segmenting

human poses.• Can be further improved because of capability

of being extended to cover more pose or articulations.

• Successfully detected difficult poses while the HOG-based detector missed.



Experiment of Segmentation Result

• Using pose model and probabilistic hierarchical part-template matching algorithm give very accurate segmentation in the MIT-CBCL and INRIA data set.

Experiment Without Subtraction

Experiment Without Subtraction

Experiment With Subtraction

• Data set– Caviar Benchmark data set– Munich Airport data set collected by Siemens

Corporate Research• Can get good result even with poor and

inaccurate background subtraction.



Overview



Conclusion

• A hierarchical part-template matching approach is employed to match human shapes with images detect and segment simultaneously.

• Many of misdetections are due to the pose estimation failures.

• Future work– Investigating the addition of color and

texture statistics to the local contextual descriptor to improve the detection and segmentation performance.

Documents

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS