1
3D Deep Shape Descriptor Yi Fang 1 , Jin Xie 1 , Guoxian Dai 1 , Meng Wang 1 , Fan Zhu 1 , Tiantian Xu 2 , Edward Wong 2 , 1 Department of Electrical and Computer Engineering, New York University Abu Dhabi 2 Polytechnic School of Engineering, New York University Shape descriptor refers to an informative description that provides a 3D ob- ject with an identification as a member of some category. The development of an effective and efficient 3D shape descriptor poses several technical chal- lenges, including, in particular, the high data complexity of 3D models and their representations, the structural variations, noise, and incompleteness present in 3D models [1, 2]. To address these challenges, we develop a deep shape descriptor (DeepSD), which includes 1) the heat shape descrip- tor (HeatSD) that uses the point-based heat kernel signature (HKS) and 2) the combination of the eigen-shape descriptor (ESD) and the fisher-shape descriptor (FSD). Our deep shape descriptor has high discriminative power that tends to maximizes the inter-class margin while minimizing the intra- class variance. DeepSD is mainly constitute of four components, where the illustration on how these four components are mapped onto a deep neural network is illustrated in Figure 1. The first component is a 3D shape database where a large volume of shapes are stored. The second component is shape fea- ture extraction where two features: heat kernel signature (HKS) and heat shape descriptor (HeatSD), are extracted. The third component is a deep neural network for learning deep shape descriptor. A multi-layer deep neu- ral network is used in our method. A collection of HeatSDs are used in the training of principal component analysis (PCA) and linear discriminant analysis (LDA) to generate the Eigen-shape descriptor (FSD) and Fisher- shape descriptor (ESD) respectively. The fourth component is the target value of DNN where pre-computed ESD and FSD are used as target values in the training the DNN. Figure 1: Pipeline of learning deep shape descriptor. Given input shapes, three steps are included along with the pipeline: 1) Heat kernel signatures are extracted for each shape in the database. Heat shape descriptor are com- puted based on HKS. 2) Heat shape descriptors are fed into two deep neural networks with target values using ESD and FSD, respectively. 3)The deep shape descriptor is formed by concatenating nodes in hidden layer (circled by yellow dash lines). Heat kernel signature: The heat kernel H t (i, j) aggregates heat flow through all possible paths between two vertices on the meshed surface, and a heat kernel signature can be defined as: HKS( p)=(H t 1 ( p, p), H t 2 ( p, p),..., H t n ( p, p)) (1) This is an extended abstract. The full paper is available at the Computer Vision Foundation webpage. Figure 2: Illustration of heat shape descriptor. (A) illustrates the HeatSD for three centaur models undergone isometric transformation. (B) illustrates the HeatSD for three dinosaur models with moderate structural variations. where p denotes a point on the surface, HKS( p) denotes the heat kernel signature at point p, H t ( p, p) denotes the heat kernel value at point p, t n denotes the diffusion time of the nth sample point. Heat shape descriptor: HetaSD is developed using probability distri- bution of HKS values at all vertices and at all scales. At each scale, HeatSD is defined based on the probability distribution of HKS at that scale. HeatSD is a multi-scale shape descriptor, thereby it can provide a complete and local- to-global description of 3D shape. Figure 2 displays 3D objects and their corresponding HeatSDs. Deep shape descriptor: We use the architecture of a many-toone en- coder neural network to develop our encoder for deep shape descriptor. By enforcing the target value to be unique for input HeatSDs from the same group but with structural variations, the deep shape descriptor represented by the neurons in the hidden layer is invariant to within-group structural variations but will discriminate against other groups. We set the target value of the neural network as pre-computed Eigen-shape descriptor and Fisher- shape descriptor for each group. We formulate the objective function of the proposed sparse many-to-one encoder by the square-loss function with sparse constraint on the weights as: argmin W,b 1 2 i, j Y i - h(x j i , W, b) 2 2 + λ 2 kW k 2 F , (2) where L is the number of layers in the deep neural network, W is the weight matrix of the multiple-hidden-layer neural network, b is the bias matrix of the neural network, x j i represents the j-th training sample from the i- th group, h(x j i , W, b) in general is a non-linear mapping from the input x j i to the output. The parameter λ is the weight of the regularizer, and Y i is the target value for the i-th group. [1] Natraj Iyer, Subramaniam Jayanti, Kuiyang Lou, Yagnanarayanan Kalyanaraman, and Karthik Ramani. Three-dimensional shape search- ing: state-of-the-art review and future trends. Computer-Aided Design, 37(5):509 – 530, 2005. [2] Johan WH Tangelder and Remco C Veltkamp. A survey of content based 3d shape retrieval methods. Multimedia tools and applications, 39(3):441–471, 2008.

3D Deep Shape Descriptor · 2015-05-24 · 3D Deep Shape Descriptor Yi Fang1, Jin Xie1, Guoxian Dai1, Meng Wang1, Fan Zhu1, Tiantian Xu2, Edward Wong2, 1Department of Electrical and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3D Deep Shape Descriptor · 2015-05-24 · 3D Deep Shape Descriptor Yi Fang1, Jin Xie1, Guoxian Dai1, Meng Wang1, Fan Zhu1, Tiantian Xu2, Edward Wong2, 1Department of Electrical and

3D Deep Shape Descriptor

Yi Fang1, Jin Xie1, Guoxian Dai1, Meng Wang1, Fan Zhu1, Tiantian Xu2, Edward Wong2,1Department of Electrical and Computer Engineering, New York University Abu Dhabi2Polytechnic School of Engineering, New York University

Shape descriptor refers to an informative description that provides a 3D ob-ject with an identification as a member of some category. The developmentof an effective and efficient 3D shape descriptor poses several technical chal-lenges, including, in particular, the high data complexity of 3D models andtheir representations, the structural variations, noise, and incompletenesspresent in 3D models [1, 2]. To address these challenges, we develop adeep shape descriptor (DeepSD), which includes 1) the heat shape descrip-tor (HeatSD) that uses the point-based heat kernel signature (HKS) and 2)the combination of the eigen-shape descriptor (ESD) and the fisher-shapedescriptor (FSD). Our deep shape descriptor has high discriminative powerthat tends to maximizes the inter-class margin while minimizing the intra-class variance.

DeepSD is mainly constitute of four components, where the illustrationon how these four components are mapped onto a deep neural network isillustrated in Figure 1. The first component is a 3D shape database wherea large volume of shapes are stored. The second component is shape fea-ture extraction where two features: heat kernel signature (HKS) and heatshape descriptor (HeatSD), are extracted. The third component is a deepneural network for learning deep shape descriptor. A multi-layer deep neu-ral network is used in our method. A collection of HeatSDs are used inthe training of principal component analysis (PCA) and linear discriminantanalysis (LDA) to generate the Eigen-shape descriptor (FSD) and Fisher-shape descriptor (ESD) respectively. The fourth component is the targetvalue of DNN where pre-computed ESD and FSD are used as target valuesin the training the DNN.

Figure 1: Pipeline of learning deep shape descriptor. Given input shapes,three steps are included along with the pipeline: 1) Heat kernel signaturesare extracted for each shape in the database. Heat shape descriptor are com-puted based on HKS. 2) Heat shape descriptors are fed into two deep neuralnetworks with target values using ESD and FSD, respectively. 3)The deepshape descriptor is formed by concatenating nodes in hidden layer (circledby yellow dash lines).

Heat kernel signature: The heat kernel Ht(i, j) aggregates heat flowthrough all possible paths between two vertices on the meshed surface, anda heat kernel signature can be defined as:

HKS(p) = (Ht1(p, p),Ht2(p, p), . . . ,Htn(p, p)) (1)

This is an extended abstract. The full paper is available at the Computer Vision Foundationwebpage.

Figure 2: Illustration of heat shape descriptor. (A) illustrates the HeatSD forthree centaur models undergone isometric transformation. (B) illustrates theHeatSD for three dinosaur models with moderate structural variations.

where p denotes a point on the surface, HKS(p) denotes the heat kernelsignature at point p, Ht(p, p) denotes the heat kernel value at point p, tndenotes the diffusion time of the nth sample point.

Heat shape descriptor: HetaSD is developed using probability distri-bution of HKS values at all vertices and at all scales. At each scale, HeatSDis defined based on the probability distribution of HKS at that scale. HeatSDis a multi-scale shape descriptor, thereby it can provide a complete and local-to-global description of 3D shape. Figure 2 displays 3D objects and theircorresponding HeatSDs.

Deep shape descriptor: We use the architecture of a many-toone en-coder neural network to develop our encoder for deep shape descriptor. Byenforcing the target value to be unique for input HeatSDs from the samegroup but with structural variations, the deep shape descriptor representedby the neurons in the hidden layer is invariant to within-group structuralvariations but will discriminate against other groups. We set the target valueof the neural network as pre-computed Eigen-shape descriptor and Fisher-shape descriptor for each group. We formulate the objective function ofthe proposed sparse many-to-one encoder by the square-loss function withsparse constraint on the weights as:

argminW,b12 ∑

i, j

∥∥∥Yi−h(x ji ,W,b)

∥∥∥2

2+

λ

2‖W‖2

F , (2)

where L is the number of layers in the deep neural network, W is the weightmatrix of the multiple-hidden-layer neural network, b is the bias matrixof the neural network, x j

i represents the j-th training sample from the i-th group, h(x j

i ,W,b) in general is a non-linear mapping from the input x ji to

the output. The parameter λ is the weight of the regularizer, and Yi is thetarget value for the i-th group.

[1] Natraj Iyer, Subramaniam Jayanti, Kuiyang Lou, YagnanarayananKalyanaraman, and Karthik Ramani. Three-dimensional shape search-ing: state-of-the-art review and future trends. Computer-Aided Design,37(5):509 – 530, 2005.

[2] Johan WH Tangelder and Remco C Veltkamp. A survey of contentbased 3d shape retrieval methods. Multimedia tools and applications,39(3):441–471, 2008.