8
Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds Wei Yan 1 , Yiting Shao 2 , Shan Liu 3 , Thomas H Li 4 , Zhu Li 5 , Ge Li 6,* 1,2,6 School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School 3 Tencent America 4 Advanced Institute of Information Technology, Peking University 5 School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School 1 [email protected], 6,* Corresponding Author:[email protected] Abstract Point cloud is a fundamental 3D representation which is widely used in real world applications such as autonomous driving. As a newly-developed media format which is char- acterized by complexity and irregularity, point cloud creates a need for compression algorithms which are more flexi- ble than existing codecs. Recently, autoencoders(AEs) have shown their effectiveness in many visual analysis tasks as well as image compression, which inspires us to employ it in point cloud compression. In this paper, we propose a general autoencoder-based architecture for lossy geometry point cloud compression. To the best of our knowledge, it is the first autoencoder-based geometry compression codec that directly takes point clouds as input rather than voxel grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ- ously unseen media contents and media formats, meanwhile achieving competitive performance. Our architecture con- sists of a pointnet-based encoder, a uniform quantizer, an entropy estimation block and a nonlinear synthesis trans- formation module. In lossy geometry compression of point cloud, results show that the proposed method outperforms the test model for categories 1 and 3 (TMC13) published by MPEG-3DG group on the 125th meeting, and on average a 73.15% BD-rate gain is achieved. 1. Introduction Thanks to recent developments in 3D sensoring tech- nology, point cloud has become a useful representation of holograms which enables free viewpoint viewing. It has been used in many fields such as Virtual/Augmented/Mixed reality (VR/AR/MR), smart city, robotics and automated driving[24]. Point cloud compression becomes an increas- ingly important technique in order to efficiently process and transmit this type of data. It has attracted much attention from researchers as well as the MPEG Point Cloud Com- pression (PCC) group[24]. Geometry compression and at- tribute compression are two fundamental problems of static point cloud compression. Geometry compression aims at compressing the point locations. Attribute compression tar- gets at reducing the redundancy among points’ attribute val- ues, given the point locations. This paper focuses on the geometry compression. Point cloud is a set of unordered points that are ir- regularly distributed in Euclidean space. Because there is no uniform grid structure like 2D pictures, traditional image compression and video compression schemes can- not effectively work. In recent years, many researchers have been dedicated to developing methods for it. Octrees [12][18][23] are usually used to compress geometry infor- mation of point clouds, and it has been developed for intra- and inter- frame coding. In the MPEG PCC group, point cloud compression is divided into two profiles: a video coding based method named V-PCC and a geometry based method named G-PCC. All these methods are carefully de- signed by human experts who apply various heuristics to reduce the amount of information needing to be preserved and to transform the resulting code in a way that is amenable to lossless compression. However, when designing a point cloud codec, human experts usually focus on a specific type of point cloud and tend to make assumptions about their features because of the diversity of point clouds. For ex- ample, an early version of the G-PCC reference software, TMC13, was divided into two parts, one for compressing point clouds belonging to category 1, and the other for com- pressing point clouds belonging to category 3, which means that it was difficult to build a universal point cloud codec. Therefore, given a particular type of point cloud, designing a codec that adapts quickly to the characteristics of such a point cloud and achieves better compression efficiency is a problem worth exploring. In recent years, 3D machine learning has made great progress in high-level vision tasks such as classification and 4321 arXiv:1905.03691v1 [cs.CV] 18 Apr 2019

Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

Deep AutoEncoder-based Lossy Geometry Compression for Point Clouds

Wei Yan1, Yiting Shao2, Shan Liu3, Thomas H Li4, Zhu Li5, Ge Li6,∗1,2,6School of Electronic and Computer Engineering, Peking University Shenzhen Graduate School

3Tencent America4Advanced Institute of Information Technology, Peking University

5School of Electronic and Computer Engineering, Peking University Shenzhen Graduate [email protected], 6,∗Corresponding Author:[email protected]

Abstract

Point cloud is a fundamental 3D representation which iswidely used in real world applications such as autonomousdriving. As a newly-developed media format which is char-acterized by complexity and irregularity, point cloud createsa need for compression algorithms which are more flexi-ble than existing codecs. Recently, autoencoders(AEs) haveshown their effectiveness in many visual analysis tasks aswell as image compression, which inspires us to employ itin point cloud compression. In this paper, we propose ageneral autoencoder-based architecture for lossy geometrypoint cloud compression. To the best of our knowledge, itis the first autoencoder-based geometry compression codecthat directly takes point clouds as input rather than voxelgrids or collections of images. Compared with handcraftedcodecs, this approach adapts much more quickly to previ-ously unseen media contents and media formats, meanwhileachieving competitive performance. Our architecture con-sists of a pointnet-based encoder, a uniform quantizer, anentropy estimation block and a nonlinear synthesis trans-formation module. In lossy geometry compression of pointcloud, results show that the proposed method outperformsthe test model for categories 1 and 3 (TMC13) published byMPEG-3DG group on the 125th meeting, and on average a73.15% BD-rate gain is achieved.

1. Introduction

Thanks to recent developments in 3D sensoring tech-nology, point cloud has become a useful representation ofholograms which enables free viewpoint viewing. It hasbeen used in many fields such as Virtual/Augmented/Mixedreality (VR/AR/MR), smart city, robotics and automateddriving[24]. Point cloud compression becomes an increas-ingly important technique in order to efficiently process andtransmit this type of data. It has attracted much attention

from researchers as well as the MPEG Point Cloud Com-pression (PCC) group[24]. Geometry compression and at-tribute compression are two fundamental problems of staticpoint cloud compression. Geometry compression aims atcompressing the point locations. Attribute compression tar-gets at reducing the redundancy among points’ attribute val-ues, given the point locations. This paper focuses on thegeometry compression.

Point cloud is a set of unordered points that are ir-regularly distributed in Euclidean space. Because thereis no uniform grid structure like 2D pictures, traditionalimage compression and video compression schemes can-not effectively work. In recent years, many researchershave been dedicated to developing methods for it. Octrees[12][18][23] are usually used to compress geometry infor-mation of point clouds, and it has been developed for intra-and inter- frame coding. In the MPEG PCC group, pointcloud compression is divided into two profiles: a videocoding based method named V-PCC and a geometry basedmethod named G-PCC. All these methods are carefully de-signed by human experts who apply various heuristics toreduce the amount of information needing to be preservedand to transform the resulting code in a way that is amenableto lossless compression. However, when designing a pointcloud codec, human experts usually focus on a specific typeof point cloud and tend to make assumptions about theirfeatures because of the diversity of point clouds. For ex-ample, an early version of the G-PCC reference software,TMC13, was divided into two parts, one for compressingpoint clouds belonging to category 1, and the other for com-pressing point clouds belonging to category 3, which meansthat it was difficult to build a universal point cloud codec.Therefore, given a particular type of point cloud, designinga codec that adapts quickly to the characteristics of such apoint cloud and achieves better compression efficiency is aproblem worth exploring.

In recent years, 3D machine learning has made greatprogress in high-level vision tasks such as classification and

4321

arX

iv:1

905.

0369

1v1

[cs

.CV

] 1

8 A

pr 2

019

Page 2: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

n×3

Sampling Layer

m×3

5 layers perceptron

m×k

Max pooling

1×k

Latent code

Entropy bottleneck

layer

1×k

Quantized code

......

...m×3

Cham

fer

dist

ance

Rate-distortion loss RateDistortion

Inpu

t

Out

put

Encoder

Decoder

Figure 1. Compression architecture.

detection[6][22][30]. A natural question is whether we canemploy this useful class of methods to further develop thepoint cloud codec, especially for point cloud sizes for whichwe do not have carefully designed. Usually, the design ofa new point cloud codec can take years, but a compressionframework based on neural networks may be able to adaptmuch more quickly to those niche tasks.

In this work, we consider the point cloud compression asan analysis/synthesis problem with a bottleneck in the mid-dle. There are a number of research aims to teach neuralnetworks to discover compressive representations. Achliop-tas et al.[1] proposed an end-to-end deep auto-encoder thatdirectly takes point clouds as inputs. Yang et al.[29] fur-ther proposed a graph-based encoder and a folding-baseddecoder. These auto-encoders are able to extract compres-sive representations from point clouds in terms of the trans-fer classification accuracy. These works motivate us todevelop a novel autoencoder-based lossy geometry pointcloud codec. Our proposed architecture consists of fourmodules: a pointnet-based encoder, a uniform quantizer, anentropy estimation block and a nonlinear synthesis trans-formation module. To the best of our knowledge, it is thefirst autoencoder-based geometry compression codec thatdirectly takes point clouds as input rather than voxel gridsor collections of images.

2. Related works

Several approaches for point cloud geometry com-pression have been proposed in the literature. SparseVoxel Octrees (SVOs), also called Octrees[12][18], areusually used to compress geometry information of point

clouds[23][10][8][19][13]. R. Schnabel et al.[23] first usedOctrees in point cloud compression. This work predictsoccupancy codes by surface approximations and uses oc-tree structure to encode color information. Y. Huang et al.[10] further developed it for progressive point cloud coding.They reduced the entropy by bit reordering in the subdivi-sion bytes. Their method also includes attribute coding suchas color coding based on frequency of occurrence and nor-mal coding using spherical quantization. There are severalmethods that adopt inter- and intra-frame coding in pointcloud geometry compression[8][7]. Kammerl et al.[13]developed a prediction octree and used XOR as an inter-coding tool. Mekuria et al[19] further proposed an octree-based intra and inter coding system. In MPEG PCC[24], thedepth of the octree is constrained at a certain level and theleaf node of the octree is regarded as a single point, a listof points or a geometric model. Triangulations, also calledtriangle soups, are regarded as the geometric model in PCC.Pavez et al.[21] first explored the polygon soup representa-tion of geometry for point cloud compression.

Recently, 3D machine learning has made great progress.Deep networks that directly handle points in a point set arestate-of-the-art for supervised learning tasks on point cloudssuch as classification and segmentation. Qi et al.[6][22]first proposed a deep neural network that directly takespoint clouds as input. After that, many other networkswere proposed for high-level analysis problems with pointclouds[15][9][17][28]. There are also a few works thatfocus on 3D autoencoders. Achlioptas et al.[1] proposedan end-to-end deep auto-encoder that directly takes pointclouds as input. Yang et al.[29] further proposed a graph-

4322

Page 3: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

based encoder and a folding-based decoder. To the bestof our knowledge, there are few machine-learning-basedworks focusing on point cloud compression, but severalautoencoder-based methods have been proposed to enhancethe performance of image compression. Toderici et al.[27]proposed to use recurrent neural networks (RNNs) for im-age compression. Theis et al.[25] achieved multiple bitrates by learning a scaling parameter that changes the ef-fective quantization coarseness. Balle et al[2][4][3]. useda similar autoencoder architecture and replaced the non-differentiable quantization function with a continuous re-laxation by adding uniform noise.

3. Formulation of point cloud geometry com-pression

We consider a point cloud codec generally as an anal-ysis/synthesis problem with a bottleneck in the middle.The input point cloud is represented as a set of 3D points{Pi|i = 1, ..., n}, where each point Pi is a vector of its(x, y, z) coordinate. In order to compress the input pointcloud, the encoder should transform the input point cloud inEuclidean space R3 into higher dimensional feature spaceH. In the feature space, we could discard some tiny com-ponents by quantization, which reduces the redundancy inthe information. So we get the compressive representationsas the latent code z. Then, the decoder transforms the com-pressive representations from the feature spaceH back intoEuclidean space R3 and we get the reconstructed point set{P ′i |i = 1, ..., n}.

4. Proposed geometry compression architec-ture

In this section, we describe the proposed compression ar-chitecture. The details of each component will be discussedin the subsections. Our proposed architecture consists offour modules: a pointnet-based encoder, a uniform quan-tizer, an entropy estimation block and a nonlinear synthe-sis transformation module. We adopt auto-encoder[16] asour basic compression platform. The structure of the auto-encoder is shown in Figure 1. Firstly, the input point cloudis downsampled by the sampling layer S to create a pointcloud with different point density. Then, the downsam-pled point set goes through the autoencoder-based codec.The codec consists of an encoder E that takes an unorderedpoint set as input and produces a compressive representa-tion, a quantizerQ, and a decoderD that takes the quantizedrepresentation produced by Q and produces a reconstructedpoint cloud. Thus, our compression architecture can be for-mulated as:

x′

= D(Q(E(S(x)))), (1)

where x is the original unordered point set and x′

is thereconstructed point cloud.

4.1. Sampling layer

In G-PCC, the octree-based geometry coding usesa quantization scale to control the lossy geometrycompression[24]. Let (Xi = (xi, yi, zi))i=1...N be the setof 3D positions associated with the points of the input pointcloud. The G-PCC encoder computes the quantized posi-tions (Xi)i=1...N as follows:

Xi = b(Xi −Xshift)× sc, (2)

where Xshift and s are user-defined parameters that aresignaled in the bitstream. After quantization, there willbe many duplicate points sharing the same quantized po-sitions. A common approach is merging those duplicatepoints, which reduces the number of points in the inputpoint cloud.

Inspired by G-PCC, we use a sampling layer[22] toachieve the downsampling step. Given input points{x1, x2, ..., xn}, we adopt iterative farthest point sampling(FPS)[22] to select a subset of points {xi1 , xi2 , ..., xim},such that xij is the farthest point (in metric distance) fromthe former point set {xi1 , xi2 , ..., xij−1

} with regard to therest points. In contrast to random sampling, the point den-sity of the resulted point set is more uniform, which is betterat keeping the shape characteristics of the original object.

4.2. Encoder and Decoder

Generally, an autoencoder can be regarded as an anal-ysis function, y = fe(x; θe), and a synthesis function,x = fd(y;φd),where x, x, and y are original point clouds,reconstructed point clouds, and compressed data, respec-tively. θe and φd are optimized parameters in the analysisand the synthesis function.

To learn the encoded compressive representation, weconsider the pointnet architecture[6][1]. There are n pointsin the input point set (n × 3 matrix). Every point is en-coded by several 1-D convolution layers with kernel size1. Each convolution layer is followed by a ReLU[20] anda batch-normalization layer[11]. In order to make a modelinvariant to input permutation, there is a feature-wise maxi-mum layer following the last convolutional layer to producea k-dimensional latent code. This latent code will be quan-tized and the quantized latent code will be encoded by theentropy encoder to get the final bitstream. In our experi-ment, we use 5 1-D convolutional layers and the number offilters in each layer is 64, 128, 128, 256 and k respectively.k is decided by the number of input points. See details inthe experimental results.

Currently, there are two kinds of decoder for pointclouds: the fully-connected decoder[1] and the folding-based decoder[29]. Both decoders are able to produce re-constructed point clouds. The folding-based decoder ismuch smaller in parameter size than the fully-connected de-coder, but there will be more hyperparameters to choose

4323

Page 4: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

such as the number of grid points and the interval of gridpoints. Thus, we choose the fully-connected decoder asour decoder, which interprets the latent code using 3 fully-connected layers to produce a n × 3 reconstructed pointcloud. Each fully-connected layer is followed by a ReLU,and the number of nodes in each hidden layer is 256, 256and n× 3 respectively.

4.3. Quantization

To reduce the amount of information necessarily neededfor storage and transmission, quantization is a significantstep in media compression. However, quantization func-tions like the rounding function’s derivative is zero or un-defined. Theis et al.[25] replaced the derivative in thebackward pass of backpropagation with the derivative ofa smooth approximation r. Thus, the rounding function’sderivative becomes:

d

dz[z] =

d

dzr(z). (3)

During backpropagation, the derivative of the roundingfunction will be computed by equation (3), while the round-ing function will not be replaced by the smooth approxima-tion during the forward pass[25]. This is because if we re-place the rounding function with the smooth approximationcompletely, the decoder may learn to invert the smooth ap-proximation, thereby affecting the entropy bottleneck layerthat learns the entropy model of the latent code. In [25],they found r(z) = z to work as well as more sophisticatedchoices.

In contrast to [25], Balle et al.[3] replaced quantizationby additive uniform noise,

[f(x)] ≈ f(x) + u, (4)

where u is random noise. After experiment, we find that theperformance of these two methods is very similar. In ourimplementation, we use the second method.

4.4. Rate-distortion Loss

In the lossy compression problem, one must trade offbetween the entropy of the discretized latent code (rate)and the error caused by the compression (distortion). So,our goal is to minimize the weighted sum of the rate (R) andthe distortion (D) λD + R over the parameters of encoder,decoder and the rate estimation model that will be discussedlater, where λ controls the tradeoff.

Entropy rate estimation has been studied by many re-searchers who use neural networks to compress images[25][2] [4]. In our architecture, the encoder E transforms theinput point cloud x into a latent representation z, using anon-linear function fe(x; θe) with parameters θe. The la-tent code z is then quantized to form z by quantizer Q. zcan be losslessly compressed by entropy coding techniques

such as arithmetic coding as its value is discrete. The rateof the discrete code z, R, is lower-bounded by the entropyof the discrete probability distribution of z, H[mz], that is:

Rmin = Ez∼m[− log2mz(z)], (5)

where m(z) is the actual marginal distribution of the dis-crete latent code. However, the m(z) is unknown to us andwe need to estimate it by building a probability model ac-cording to some prior knowledge. Suppose we get an esti-mation of the probability model pz(z). Then the actual rateis given by the Shannon cross entropy between the marginalm(z) and the prior pz(z):

R = Ez∼m[− log2 pz(z)]. (6)

Therefore, if the estimated model distribution is identical tothe actual marginal distribution, the rate is minimum andthe estimated rate is the most accurate. Similar to [4], weuse the entropy bottleneck layer1 that models the prior pz(z)using a non-parametric and fully factorized density model:

pz|ψ(z|ψ) =∏i

(pzi|ψ(i)(ψ(i)) ∗ u(−1

2,

1

2)

)(zi), (7)

where the vector ψ(i) represents the parameters of each uni-variate distribution pzi|ψ(i) . Note that each non-parametricdensity is convolved with a standard uniform density, whichenables a better match of the prior to the marginal[4].

The distortion is computed by the Chamfer distance.Suppose that there are n points in the original point cloud,represented by a n × 3 matrix. Each row of the matrix iscomposed of the 3D position (x, y, z). The reconstructedpoint cloud is represented by a m × 3 matrix. The numberof original points n may be different from m because of thelossy compression. Suppose the original point cloud is S1

and the reconstructed point set is S2. Then, the reconstruc-tion error is computed by the Chamfer distance:

dCH(S1, S2) =∑x∈S1

miny∈S2

‖x− y‖22 +∑y∈S2

minx∈S1

‖x− y‖22.

(8)Finally, our rate-distortion loss function is:

L[fe, fd, pz] = λE[dCH(S1, S2)] + E[− log2 pz], (9)

where fe is the non-linear function of the encoder, fd is thenon-linear function of the decoder and pz is the estimatedprobability model. The expectation will be approximatedby averages over a training set of point clouds.

1https://tensorflow.github.io/compression/docs/entropy_bottleneck.html

4324

Page 5: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

Airplane(-79.8% BD-rate)Car (-80.1% BD-rate)

Chair (-71.4% BD-rate) table (-61.3% BD-rate)

Figure 2. Objective results.

5. Experimental results5.1. Datasets

Since our data-driven method requires a large number ofpoint clouds for training, we use the ShapeNet dataset[5].Shapes from ShapeNet dataset are axis aligned and cen-tered into the unit sphere. The point-cloud-format of theShapeNet data set is obtained by uniformly sampling pointson the triangles from the mesh models in the dataset. With-out additional statements, we train models with point cloudsfrom a single object class and the train/test splits is 90%-10%.

5.2. Implementation Details

The number of points in the original point cloud is 2048and the latent code’s dimension is 512. To better com-pare with the reconstructed point clouds that compressed byTMC13, we downsample the original point cloud to 1024,512, 256 and 128 points and the corresponding latent codesizes are 256, 128, 64 and 32. Because the TMC13 cannot compress the normalized point cloud directly, we ex-pand normalized point clouds to be large enough so that the

TMC13 can compress it properly. To be fair, we also per-form the same operation when we compress point cloudsby our model. We add an extra normalization before feed-ing point clouds into our network, and expand reconstructedpoint clouds when we compute their distortion.

We implement our model based on the python2.7 plat-form with Tensorflow 1.12. We run our model in a computerwith an i7-8700 CPU and a GTX1070 GPU (with 8G mem-ory). We use the Adam[14] method to train our networkwith learning rate 0.0005. We train our model with entropyoptimization within 1200 epochs and train the model with-out entropy optimization within 500 epochs. The batch sizeis 8 as limited by our GPU memory.

5.3. Compression results

We compare our method with the TMC13 anchor lat-est released in the MPEG 125th meeting. We experimenton four types of point clouds: chair, airplane, table andcar. These four types of point clouds contain a very richpoint cloud shape. The chair category contains 6101 pointclouds for training and 678 point clouds for testing. Theairplane category contains 3640 point clouds for training

4325

Page 6: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

GT Proposed TMC13

0.282227 bppPSNR: 11.2439 dB

0.761719 bppPSNR: 11.0722 dB

0.255371 bppPSNR: 10.8908 dB

0.589844 bppPSNR: 10.4177 dB

0.269531 bppPSNR: 15.2908 dB

0.535156 bppPSNR: 15.2666 dB

0.290039 bppPSNR: 15.84 dB

0.308594 bppPSNR: 15.8073 dB

0.249 bppPSNR: 20.8142 dB

0.796974 bppPSNR: 20.2003 dB

0.2764 bppPSNR: 17.8893 dB

1.16797 bppPSNR: 17.485 dB

0.2363 bppPSNR: 12.8006 dB

1.23828 bppPSNR: 12.8038 dB

Figure 3. Subjective results. Left: Original point clouds (GT). Color for better display.

4326

Page 7: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

Figure 4. An ablation study of entropy estimation module. Theresults are tested on point clouds of chair class. A 19.3% BD-rategain is obtained.

and 405 point clouds for testing. The table category con-tains 7658 point clouds for training and 851 point cloudsfor testing. The car category contains 6747 point clouds fortraining and 750 point clouds for testing. Rate-distortionperformances for each types of point cloud are shown inFigure 2. To avoid unfairly penalizing the TMC13 due tothe unavoidable cost of file headers, we exclude the headersize from bitstream produced by TMC13. The distortionin Figure 2 is the point-to-point geometry PSNR obtainedfrom the pc error MPEG tool[26]. Rate-distortion curvesare obtained by averaging over all test point clouds. Re-sults show that our method outperforms the TMC13 in alltypes of point clouds at all bitrates. On average, a 73.15%BD-rate gain can be achieved.

In Figure 3 we show some test point clouds compressedto low bit rates. In line with objective results, we find thatour method produces smaller bits per point than TMC13under the similar PSNR reconstruction quality. The recon-structed point clouds of proposed method is more densethan those compressed by TMC13.

To further analyze the entropy estimation module in ourmethod, we implement a simple ablation study. We considerthe model without the entropy bottleneck layer as our base-line. The comparison of the RD curve between the baselineand our proposed model on point clouds of chair is pre-sented in Figure 4. Results show that the entropy estima-tion can effectively reduce the size of bitstream, yielding a19.3% BD-rate gain.

6. ConclusionIn this paper, we propose a general deep autoencoder-

based architecture for lossy geometry point cloud compres-

sion. Compared with handcrafted codecs, this approachnot only achieves better coding efficiency, but also canadapt much quicker to new media contents and new me-dia formats. Experimental evaluation demonstrates that forthe given benchmark, the proposed model outperforms theTMC13 on the rate-distortion performance, and on averagea 73.15% BD-rate gain is achieved.

To the best of our knowledge, it is the first autoencoder-based geometry compression codec that directly takes pointclouds as input rather than voxel grids or collections of im-ages. The algorithms that we present may also be extendedto work on attribute compression of point cloud or evenpoint cloud sequence compression. To encourage futurework, we will make all the materials public.

References[1] P. Achlioptas, O. Diamanti, I. Mitliagkas, and L. Guibas.

Learning representations and generative models for 3d pointclouds, 2018.

[2] J. Ball, V. Laparra, and E. Simoncelli. End-to-end optimizedimage compression. 11 2016.

[3] J. Ball, V. Laparra, and E. P. Simoncelli. End-to-end opti-mization of nonlinear transform codes for perceptual quality.In Picture Coding Symposium, 2016.

[4] J. Ball, D. Minnen, S. Singh, S. J. Hwang, and N. John-ston. Variational image compression with a scale hyperprior.2018.

[5] A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan,Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Su,J. Xiao, L. Yi, and F. Yu. ShapeNet: An Information-Rich3D Model Repository. Technical Report arXiv:1512.03012[cs.GR], Stanford University — Princeton University —Toyota Technological Institute at Chicago, 2015.

[6] R. Q. Charles, S. Hao, K. Mo, and L. J. Guibas. Pointnet:Deep learning on point sets for 3d classification and segmen-tation. In IEEE Conference on Computer Vision & PatternRecognition, 2017.

[7] R. L. de Queiroz and P. A. Chou. Motion-compensated com-pression of point cloud video. In 2017 IEEE InternationalConference on Image Processing (ICIP), pages 1417–1421,Sep. 2017.

[8] D. C. Garcia and R. L. de Queiroz. Context-based octreecoding for point-cloud video. In 2017 IEEE InternationalConference on Image Processing (ICIP), pages 1412–1416,Sep. 2017.

[9] B. S. Hua, M. K. Tran, and S. K. Yeung. Point-wise convo-lutional neural network. 2017.

[10] Y. Huang, J. Peng, C. C. Kuo, and M. Gopi. A genericscheme for progressive point cloud coding. IEEE Trans-actions on Visualization & Computer Graphics, 14(2):440–453, 2008.

[11] S. Ioffe and C. Szegedy. Batch normalization: acceleratingdeep network training by reducing internal covariate shift.In International Conference on International Conference onMachine Learning, 2015.

4327

Page 8: Deep AutoEncoder-based Lossy Geometry …grids or collections of images. Compared with handcrafted codecs, this approach adapts much more quickly to previ-ously unseen media contents

[12] C. L. Jackins and S. L. Tanimoto. Oct-trees and their use inrepresenting three-dimensional objects. Computer Graphics& Image Processing, 14(3):249–270, 1980.

[13] J. Kammerl, N. Blodow, R. B. Rusu, M. Beetz, E. Stein-bach, and S. Gedikli. Real-time compression of point cloudstreams. In IEEE International Conference on Robotics &Automation, 2012.

[14] D. P. Kingma and J. Ba. Adam: A method for stochasticoptimization. In ICLR, 2015.

[15] R. Klokov and V. Lempitsky. Escape from cells: Deep kd-networks for the recognition of 3d point cloud models. In2017 IEEE International Conference on Computer Vision(ICCV), 2017.

[16] A. Krizhevsky and G. E. Hinton. Using very deep autoen-coders for content-based image retrieval. In ESANN, 2011.

[17] Y. Li, R. Bu, M. Sun, and B. Chen. Pointcnn. 01 2018.[18] D. Meagher. Geometric modeling using octree encoding.

Computer Graphics & Image Processing, 19(2):129–147,1982.

[19] R. Mekuria, K. Blom, and P. Cesar. Design, implementa-tion and evaluation of a point cloud codec for tele-immersivevideo. IEEE Transactions on Circuits & Systems for VideoTechnology, PP(99):1–1, 2016.

[20] V. Nair and G. E. Hinton. Rectified linear units improve re-stricted boltzmann machines. In International Conferenceon International Conference on Machine Learning, 2010.

[21] E. Pavez and P. A. Chou. Dynamic polygon cloud compres-sion. In IEEE International Conference on Acoustics, 2017.

[22] C. R. Qi, L. Yi, H. Su, and L. J. Guibas. Pointnet++: Deep hi-erarchical feature learning on point sets in a metric space. InI. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus,S. Vishwanathan, and R. Garnett, editors, Advances in Neu-ral Information Processing Systems 30, pages 5099–5108.Curran Associates, Inc., 2017.

[23] R. Schnabel and R. Klein. Octree-based point-cloud com-pression. In Eurographics, 2006.

[24] S. Schwarz, M. Preda, V. Baroncini, M. Budagavi, P. Ce-sar, P. Chou, R. Cohen, M. Krivokua, S. Lasserre, Z. Li,J. Llach, K. Mammou, R. Mekuria, O. Nakagami, E. Sia-haan, A. Tabatabai, A. M. Tourapis, and V. Zakharchenko.Emerging mpeg standards for point cloud compression.IEEE Journal on Emerging and Selected Topics in Circuitsand Systems, PP:1–1, 12 2018.

[25] L. Theis, W. Shi, A. Cunningham, and F. Huszar. Lossyimage compression with compressive autoencoders. CoRR,abs/1703.00395, 2017.

[26] D. Tian, H. Ochimizu, C. Feng, R. Cohen, and A. Vetro.Geometric distortion metrics for point cloud compression.pages 3460–3464, 09 2017.

[27] G. Toderici, S. M. O’Malley, S. J. Hwang, D. Vincent,D. Minnen, S. Baluja, M. Covell, and R. Sukthankar. Vari-able rate image compression with recurrent neural networks.Computer Science, 2015.

[28] W. Wang, R. Yu, Q. Huang, and U. Neumann. Sgpn: Sim-ilarity group proposal network for 3d point cloud instancesegmentation. 2017.

[29] Y. Yang, C. Feng, Y. Shen, and D. Tian. Foldingnet: Pointcloud auto-encoder via deep grid deformation. In The IEEEConference on Computer Vision and Pattern Recognition(CVPR), June 2018.

[30] Y. Zhou and O. Tuzel. Voxelnet: End-to-end learning forpoint cloud based 3d object detection. In The IEEE Confer-ence on Computer Vision and Pattern Recognition (CVPR),June 2018.

4328