Coding of Three-dimensional Video Contentmiun.diva-portal.org/smash/get/diva2:626000/FULLTEXT01.pdf · 2013-06-18 · niﬁcant challenge for the network transmission capacity. An

Coding of Three-dimensionalVideo Content

Depth Image Coding by Diffusion

Yun Li

Department of Information and Communication SystemsMid Sweden University

Licentiate Thesis No. 100Sundsvall, Sweden

2013

MittuniversitetetNaturvetenskap, teknik och medier

ISBN 978-91-87103-76-6 SE-851 70 SundsvallISNN 1652-8948 SWEDEN

Akademisk avhandling som med tillstand av Mittuniversitetet framlaggs till offentliggranskning for avlaggande av teknologie licentiatexamen fredagen den 05 june 2013i O102, Mittuniversitetet, Holmgatan 10, Sundsvall.

c©Yun Li, june 2013

Tryck: Tryckeriet Mittuniversitetet

To My Family and Friends

iv

Abstract

Three-dimensional (3D) movies in theaters have become a massive commercial suc-cess during recent years, and it is likely that, with the advancement of display tech-nologies and the production of 3D contents, TV broadcasting in 3D will play animportant role in home entertainments in the not too distant future. 3D video con-tents contain at least two views from different perspectives for the left and the righteye of viewers. The amount of coded information is doubled if these views are en-coded separately. Moreover, for multi-view displays (i.e. different perspectives of ascene in 3D are presented to the viewer at the same time through different angles),either video streams of all the required views must be transmitted to the receiver,or the displays must synthesize the missing views with a subset of the views. Thelatter approach has been widely proposed to reduce the amount of data being trans-mitted. The virtual views can be synthesized by the Depth Image Based Rendering(DIBR) approach from textures and associated depth images. However it is still thecase that the amount of information for the textures plus the depths presents a sig-nificant challenge for the network transmission capacity. An efficient compressionwill, therefore, increase the availability of content access and provide a better videoquality under the same network capacity constraints.

In this thesis, the compression of depth images is addressed. These depth imagescan be assumed as being piece-wise smooth. Starting from the properties of depthimages, a novel depth image model based on edges and sparse samples is presented,which may also be utilized for depth image post-processing. Based on this model,a depth image coding scheme that explicitly encodes the locations of depth edges isproposed, and the coding scheme has a scalable structure. Furthermore, a compres-sion scheme for block-based 3D-HEVC is also devised, in which diffusion is used forintra prediction. In addition to the proposed schemes, the thesis illustrates severalevaluation methodologies, especially, the subjective test of the stimulus-comparisonmethod. It is suitable for evaluating the quality of two impaired images, as the ob-jective metrics are inaccurate with respect to synthesized views.

The MPEG test sequences were used for the evaluation. The results showed thatvirtual views synthesized from post-processed depth images by using the proposedmodel are better than those synthesized from original depth images. More impor-tantly, the proposed coding schemes using such a model produced better synthe-sized views than the state of the art schemes. As a result, the outcome of the thesiscan lead to a better quality of 3DTV experience.

v

vi

Acknowledgements

I would like to thank my main supervisor Prof. Marten Sjostrom for his dedicationto my study in all aspects, my second supervisor Dr. Ulf Jennehag who has ledme through the kingdom of video coding and beyond, and my third supervisor Dr.Roger Olsson who has guided me across the wonderland of 3D and further. I amgrateful for all of your help.

Moreover, I would like to thank all the members in the Realistic 3D group, whohave been great colleagues as well as classmates, and with whom the discussionabout 3D has always been inspiring. I am thankful to all staff in IKS, who havecreated such a nice working environment, and who are always helpful beyond myacademic field. I am also thankful to my family and friends.

In addition, thanks to all of our project partners, especially Ericsson AB, whoseinvolvements and contributions have been so beneficial to my study.

Last but not least, thanks to our funders. This work has been supported bygrant 2009/0264 of the Knowledge Foundation, Sweden, by grant 00156702 of theEU European Regional Development Fund, Mellersta Norrland, Sweden, and bygrant 00155148 of Lansstyrelsen Vasternorrland, Sweden.

vii

viii

Contents

Abstract v

Acknowledgements vii

List of Papers xiii

Terminology xvii

1 Introduction 1

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 3D history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.2 3DTV transmission chain . . . . . . . . . . . . . . . . . . . . . . 2

1.1.3 Human depth perception for 3D content . . . . . . . . . . . . . 2

1.1.4 3D display . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.1.5 Coding of 3D video content . . . . . . . . . . . . . . . . . . . . . 5

1.2 Problem motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Overall aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.4 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.5 Concrete and verifiable goals . . . . . . . . . . . . . . . . . . . . . . . . 6

1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.7 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Depth image coding 9

2.1 Depth image acquisition and view synthesis . . . . . . . . . . . . . . . 9

2.2 Depth image coding using standard codecs . . . . . . . . . . . . . . . . 10

2.3 Previous works in depth image coding . . . . . . . . . . . . . . . . . . . 12

ix

x CONTENTS

2.3.1 Distortion measurement . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.2 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.3 Edge preserving Transform . . . . . . . . . . . . . . . . . . . . . 12

2.3.4 Regional segmentation . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.5 Tree and mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.6 Improvement in standard codecs . . . . . . . . . . . . . . . . . . 14

2.3.7 Other techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Preserving edges in depth images . . . . . . . . . . . . . . . . . . . . . . 14

3 Depth image modeling, processing, and coding by diffusion 17

3.1 Depth image modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2 Depth image processing . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.3 Scalable depth image coding using depth edges . . . . . . . . . . . . . 18

3.3.1 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.3.2 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3.3 Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.4 Diffusion modes in 3D-HEVC using texture edges . . . . . . . . . . . . 21

3.5 Evaluation methodologies . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Contributions 23

4.1 Paper I: A depth image post-processing method . . . . . . . . . . . . . 23

4.1.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.1.2 Evaluation and results . . . . . . . . . . . . . . . . . . . . . . . . 24

4.2 Paper II: Scalable coding approach for high quality depth image coding 24

4.2.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24


4.3 Paper III: Subjective evaluation of the proposed coding scheme . . . . 24

4.3.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


4.4 Paper IV: Diffusion modes in 3D-HEVC . . . . . . . . . . . . . . . . . . 25

4.4.1 Novelty . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


5 Conclusions and future work 27

CONTENTS xi

5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.2 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Bibliography 31

xii

List of Papers

This thesis is mainly based on the following papers, herein referred by their Romannumerals:

I Li, Y., Sjostrom, M., Jennehag, U., and Olsson, R. A depth image post-processingmethod for diffusion. In Proc. SPIE, San Francisco, USA, Feb. 2013.

II Li, Y., Sjostrom, M., Jennehag, U., and Olsson, R. A scalable coding approach forhigh quality depth image compression. In Proc. 3DTV Con., Zurich, Switzerland,Oct. 2012.

III Li, Y., Sjostrom, M., Jennehag, U., Olsson, R. and Sylvain T. Subjective test foran edge-based depth image compression scheme. In Proc. SPIE, San Francisco,USA, Feb. 2013.

IV Li, Y., Sjostrom, M., Jennehag, U., and Olsson, R. Depth map compression withdiffusion modes in 3D-HEVC. In Proc. MMedia, Venice, Italy, Apr. 2013.

xiii

xiv

List of Figures

1.1 The MUSCADE project signal processing and data transmission chain[mus]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Two images from different views of Poznan Street [DGKK09]. . . . . . 3

1.3 Illustration of binocular cues. Different perspectives of a scene areprojected into two eyes. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.4 Synthesized artifacts resulted from depth compression: (a) compresseddepth and (b) synthesized view. . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Projection of 3D point M to cameras [KES05]. . . . . . . . . . . . . . . . 10

2.2 (a) texture image and (b) its associated depth image. . . . . . . . . . . . 11

3.1 Depth image reconstruction and evaluation system. . . . . . . . . . . . 19

3.2 Depth image coding system. . . . . . . . . . . . . . . . . . . . . . . . . . 20

xv

xvi

Terminology

Abbreviations and Acronyms

2D Two-Dimensional3D Three-dimensional3DTV Three-dimensional TelevisionAVC Advance Video CodingDIBR Depth Image Based RenderingDMM Depth Modelling ModeDVB Digital Video BrocastingGOP Group Of PicturesHD High DefinitionHEVC High Efficiency Video CodingHVS Human Vision SystemIEC International Electrotechnical CommissionISO International Standards OrganizationITU International Telecommunications UnionJPEG Joint Photographic Experts GroupJVC-3V Joint Collaborative Team on 3D Video coding extension develop-

mentJVC-VC Joint Collaborative Team on Video CodingJVT Join Video TeamMOS Mean Opinion ScoreMPEG Motion Picture Expert GroupMVC Multi-view Video CodingMVD Multi-view Video plus DepthMVP Multi-view ProfilePSNR Peak Signal-to-Noise RatioQP Quantization ParameterSSIM Structural SimilarityTV TelevisionVSRS View Synthesis Reference Software

xvii

xviii LIST OF FIGURES

Mathematical Notation

⊕ Morphological dilation∂ Partial differential operatord Pixel values in a depth imagel Left cameraK Camera calibration matrixM,m = (x, y, z)T Euclidean 3D pointr Right cameraR Rotation matrixC Camera projection centerCn Canny edge thresholdse(n) Edge set with index nEc(n) Canny detected edges with threshold multiplication factor nEd Canny detected edgesEm Edge maskf A 2D imagefc Canny edge detectorS3 Square structure element of size 3T Matrix translationTe Edge threshold multiplication factorTes Edge sampling factorTrs Uniform reconstruction sparse sampling factorTs Uniform sparse sampling factorTq Coding image with quantization parameter qv Virtual cameraz Actual depth of a scene

Chapter 1

Introduction

Three-dimensional (3D) media is not merely an extension of the two-dimensional(2D), but a new form of art [BM12]. Not only can 3D provide audiences with theimpression of depth but also with the realism of pervasiveness. 3D markets in termsof contents, cinemas and displays have been expanding during recent years. The ad-vancement of 3D display technology enables the 3D visions from different perspec-tives of a scene, so that viewers are able to watch the scene in 3D from their preferredperspective. For this purpose, a multi-view display is required to enable this feature,and multiple-views for different perspective must be provided or generated for thedisplay. Therefore, a huge amount of data in relation to the multiple-view contentspresents a significant challenge for transmission and storage, if all views are trans-mitted. Fortunately, the intermediate views can be synthesized from a small num-ber of views plus their associated depth information. Unfortunately, the traditionalvideo and image encoders will introduce disturbing artifacts into the synthesizedviews when depth images are coded by these encoders. More explicitly, they blurthe depth transitions in the depth images, and an alternative coding scheme thatpreserves the depth transitions can be beneficial for view synthesis.

In this thesis, a novel depth image modeling approach is presented, and codingschemes utilizing such a model are proposed for improving the quality of the syn-thesized views.

1.1 Background

The motivation behind devising an alternative coding scheme for depth images isillustrated in this section. The fundamentals of 3D are also presented to overviewthe important aspects of this research work and underline the problems which areassociated with respect to the coding.

1

2 Introduction

1.1.1 3D history

The 3D development has been through a long development. The first stereoscopydevice was invented in 1838 by a British researcher: Sir Charles Wheatstone [Feh05]for still images. In 1903, the first 3D short movie was shown on the stereoscopy atthe work fair in Paris, but, it was possible for only one viewer at a time to watchthe screening. The first full 3D movie, Power of Love, was witnessed in Los Ange-les in 1922. The separation of views for left and right eyes were conducted by ananaglyphic process, and multiple viewers could watch the movie at the same time.However, the real 3D movie boom occurred after the 1950s when Hollywood turned3D movies into a commercial purpose. In particular during the year 1952 to 1954,over 60 3D movies were produced. The movie industries attempted to highlight thesensation introduced by the depth impression to catch the audiences.

The boom subsided after 1954, which was caused by several factors (e.g. inexpe-rienced photographers, inadequate quality controls, and poorly operated projectionsystems) [Lip82]. Screening influenced by such factors results in headaches and eye-strain of viewers. It was not until recently, with the advancement of 3D productiontechniques, that 3D movies began to attract the attention of viewers back to the IMAX3D theaters.

3D has been also involved in broadcast television. The first 3DTV experimentalbroadcasting was conducted in the US in 1953, but, the first ’non-experimental’ 3Dprogram was broadcasted more than two decades later in December 1980. With thetransitions from analogue to digital signals in TV broadcasting in the early 1990s,compression technologies were developed by the Moving Pictures Expert Group(MPEG) for the coding of 3D video contents [IL02]. The later coding techniquessuch as H.264/AVC [Adv12] are also used for transmitting 3D video contents.

In the context of this thesis, 3DTV not only represents the 3D television systemsfor home entertainments but also other aspects of 3D, e.g. 3D medical applications[NBM+12], 3D mobile devices [GAC+], and telepresence [KS05].

1.1.2 3DTV transmission chain

An entire processing chain for 3DTV broadcasting is illustrated in Figure 1.1 for theEuropean project Multimedia Scalable 3D for Europe (MUSCADE) project [mus]. Itconsists of 3D content generation, coding, transmission, and rendering. The 3D con-tent can be captured by cameras or generated by computer animation. These con-tents are then post-processed to reach the desired effects and are then compressed.After transmission through networks, the compressed contents will be decompressedand played back in 2D or 3D mode according to the type of displays.

1.1.3 Human depth perception for 3D content

Before the discussion of 3D techniques, it will be of interest to understand how hu-man beings interpret the closeness of an object.

1.1 Background 3

Figure 1.1: The MUSCADE project signal processing and data transmission chain [mus].

Figure 1.2: Two images from different views of Poznan Street [DGKK09].

The human vision system can perceive depth through a combination of monocu-lar and binocular cues (information). A more accurate depth perception results frommore consistent cues [IJM05].

Monocular cues: consists of occlusion, linearity, aerial effects, accommodation,motion parallax, relative density and relative size. This depth perception from monoc-ular cues is manifested in such a way that depth can still be perceived when we closeone eye. Based on the monocular cues, a viewer can perceive that the light gray caris in front of the building in Figure 1.2 (a).

Binocular cues include stereopsis and vergence. The separation of our two eyesprovides each eye with the ability to capture the unique perspective of the samescene, e.g. Figure 1.2 (a) for right eye and Figure 1.2 (b) for left eye. Depth canthen be perceived from the retinal disparity of the images captured by two eyes forthe scene. The impression of depth by such a perception is the stereopsis. Figure1.3 shows an example of the image perceived to be behind the display. In addition,both of our eyes will rotate to fixate on an object in the scene. This simultaneouseye movement is the vergence. Fairly good depth can be perceived by means of thestereopsis alone, even if the monocular cues are inconsistent [Jul71]. However, theretinal disparity should be within a certain range (i.e., Panum’s fusional area) so that

4 Introduction

Figure 1.3: Illustration of binocular cues. Different perspectives of a scene are projected intotwo eyes.

images from two viewpoints can be fused into one. In addition, the accommodationand convergence conflict and different types of stereoscopic distortions can causevisual discomfort. A more detailed discussion concerning these aspects is describedin [IJM05].

1.1.4 3D display

It is discussed above that depth can be perceived by using monocular cues, whichexplains the perceived depth from 2D displays. A stereoscopic 3D display, however,aims at providing stereopsis, i.e. separating different view points for each eye. 3Ddisplays can be categorized into two groups based on the techniques that separatedifferent perspective views into left and right eyes. The displays from the two groupsare aided viewing and free-viewing, respectively.

Aided viewing displays: different techniques in relation to carrying signals to therespective eyes have been proposed. A common solution is to utilize glasses to chan-nel the signals, e.g. active shutter glasses and passive polarized glasses.

Autostereoscopic displays: the lights are directed from the display screen to thespecific areas. If the eye position coincides with the areas, the perspective views arechanneled to the respective eyes.

Multi-view autostereoscopic displays require an input of multiple views, whichpresents a challenge for transmitting and storing such large amounts of data. Amodern approach with regards to addressing this problem is to synthesize virtualviews by using the Depth Image Based Rending (DIBR) approach with the MultipleVideo plus Depth (MVD) format data. Depth images from the MVD data representsthe depth of the scene for the corresponding texture images. MPEG View SynthesisReference Software (VSRS[vsr10]) has been widely used for view synthesis.

1.2 Problem motivation 5

1.1.5 Coding of 3D video content

Being in a digital and networking society as in the case today, coding is necessaryfor transmission through a limited network capacity, even if arbitrary views can begenerated by view synthesis from the MVD data. The amount of data to be trans-mitted is much larger than for a single texture view scenario. The current trend ofsynthesizing multiple views (e.g. 24 or 48 views) for a multi-view autostereoscopicdisplay requires input of two-views plus depth or three-views plus depth from theMVD data. The size of encoded MVD data is proportional to the number of views,if the data is encoded in a simulcast manner (i.e. the different views are encodedseparately). The portion of data for transmitting encoded depth is around 10 to 20percent of the encoded texture [Feh05].

Coding of still images for the 3D content can be accomplished by using state ofthe art video codecs with intra mode or by using dedicated image encoders (e.g.JPEG2000). For video sequences, the current coding standard codec H.264/AVC canbe employed. This codec has several extensions that explore the redundancies of 3Dcontent further. A more efficient video coding scheme High Efficiency Video Coding(HEVC [JV10]) has recently been developed and will soon become standardized.

1.2 Problem motivation

Traditional video and image encoders introduce distortions perceived as acceptableby the Human Vision System (HVS) on the images. The distortion is manifestedas edge blurring because the encoders tend to remove the high frequencies in animage. Depth images, on the other hand, are not viewed directly by viewers andinstead provide geometric translation for the view synthesis. The edge blurring ofdepth images will result in disturbing artifacts on the synthesized view [LLT+10a].Figure 1.4 illustrates the synthesized artifact which resulted from depth compressionfor a portion of the Lovebird1 [jtc11] MVD images. Traditional encoders have notbeen exploited to adapt to the special characteristics of depth images. This providesthe motivation for the research work involved in devising an alternative scheme fordepth image coding, which can preserve the edges (i.e. depth transitions) on depthimages.

1.3 Overall aim

The overall aim is to improve the 3DTV experience under a limited transmissioncapacity for 3D video content. The transmission capacity limits the quality of con-tent as well as the amount of contents being transmitted. An efficient transmissionthrough the means of coding that reduces the consumption for networking resourcescan enable both the popularity of a 3D presence and the comfortability of 3D view-ing.

6 Introduction

Figure 1.4: Synthesized artifacts resulted from depth compression: (a) compressed depth and(b) synthesized view.

1.4 Scope

The scope of this thesis is to propose a depth image model and coding schemes uti-lizing such a model. The current work is focused on, but not limited to, still imageand intra frame coding. Synthesized views produced by the model and the codingschemes are evaluated. VSRS is used for view synthesis. The MPEG MVD test se-quences are the data sources for the evaluation, and tested images are selected fromthese sequences.

The evaluation methodologies are chosen so that the evaluation results are com-parable with the results from other research works. These methodologies are the im-age quality assessment metrics PSNR and SSIM. To overcome the shortage of theseobjective metrics, the results are further assessed by visual inspections and subjectivetests.

1.5 Concrete and verifiable goals

The depth images can be assumed as piece-wise smooth gray level images [SFWK11].Centering on this assumption, the following goals are defined:

I. Propose a model and representation of depth images based on their intrinsicproperty, i.e. the piece-wise smooth assumption.

II. Propose a coding scheme for depth images that preserves the fundamentalproperties of depth images.

III. Propose a coding scheme for depth images for MVD sequences based on hy-brid video coding schemes.

1.6 Outline 7

1.6 Outline

The sequence of the thesis is organized as follows: Chapter 2 briefly overviews theprevious work about depth image coding; Chapter 3 describes the proposed schemeof depth image coding by diffusion, and the contributions of the thesis are discussedin Chapter 4. Chapter 5 concludes this thesis and reflects the aims, the goals, and thepossible impacts of the work.

1.7 Contributions

The author’s contributions for the thesis are listed in the section relating to the listof papers. The author is responsible for the ideas, evaluation criteria and test set-up, result analysis, and the presentation of the papers. The co-authors have alsopartially contributed to the entire chain of the research works for the papers. Thecontributions of the thesis are described below:

I. Introduce a depth image post processing scheme by diffusion for a better qual-ity of synthesized views.

II. Propose a scalable depth image compression scheme by diffusion for highquality depth images and aim at improving the subjective quality of synthesizedviews.

III.Analyze the accuracy of the objective metrics SSIM by conducting a subjectiveevaluation for the proposed depth image compression scheme.

IV. Introduce diffusion modes in 3D-HEVC. The modes utilize texture edges andare integrated into the Depth Modeling Mode (DMM) framework.

8

Chapter 2

Depth image coding

This chapter overviews depth image coding using standard codecs and some previ-ous works with regards to depth image coding.

2.1 Depth image acquisition and view synthesis

Depth images are assumed to be piece-wise smooth gray level images [SFWK11] andcan be obtained from multiview-to-stereo approaches or range cameras. Figure 2.2shows the depth image and its associated texture from the first frame of the MVDsequence of Poznan Street [DGKK09]. The range of the depth per pixel d is from 0to 255, which represents the farthest to the closest distance relative to a viewer. Theactual depth z is transformed from the pixel value d of depth images by

z = 1/(d

255(

1

zmin− 1

zmax) +

1

zmax), (2.1)

where zmin and zmax are the minimum and the maximum depth of the scene respec-tively. The disparity of a pixel in different views can be established using the depthvalues. Figure 2.1 illustrates a camera set-up [KES05]. An image point m can beprojected from a point M in the 3D scene by

z ·m = KRT (M − C). (2.2)

Therefore, the 3D point M is transformed from the image point m by

M = z(KRT )−1m+ C. (2.3)

Consequently, an image point m is projected to the virtual image point mv by

mv = z(KvRTv )(KRT )−1m+ (C − Cv). (2.4)

Therefore, in Figure 2.1, the image point mv can be warped from the image point ml

from the left camera and the point mr from the right camera by

mv = zl(KvRTv )(KlR

Tl )

−1ml + (Cl − Cv) (2.5)

9

10 Depth image coding

Figure 2.1: Projection of 3D point M to cameras [KES05].

andmv = zr(KvR

Tv )(KrR

Tr )

−1mr + (Cr − Cv) (2.6)

respectively.

Equations (2.2), (2.3), and (2.4) are described in [KES05]. The parameter K is thecamera intrinsic parameter, and R the camera rotation matrix. The C represents thecamera translation, T the matrix transpose. Subscripts v, l and r specify the virtualcamera, the left camera and the right camera respectively.

The state of the art two-frame stereo correspondence approaches are listed inthe Middleburry stereo website [mid]. These approaches can achieve a competitivedepth quality when compared to the ground truth depth. Depth images of lowerresolution can also be acquired from range cameras. Various methods (e.g. Join Bi-lateral filtering [GB09]) have been proposed to improve the quality of depth images.

2.2 Depth image coding using standard codecs

The Joint Photographic Experts Group (JPEG) in the International Standards Organi-zation (ISO) is responsible for developing still image coding standards. These stan-dards include the JPEG [JpeEG] and the JPEG2000 [Jpe00]. JPEG2000 is a successorof JPEG and aims at more efficient image compression. It is also a wavelet based en-coder, and different quantization strategies can be applied [MLB+00] in lossy com-pression.

2.2 Depth image coding using standard codecs 11

Figure 2.2: (a) texture image and (b) its associated depth image.

For the coding of video contents, the Moving Picture Experts Group (MPEG) inISO and International Electrotechnical Commission (IEC) has developed video andaudio compression standards such as MPEG-1, MPEG-2, and MPEG-4. In addition,the Video Coding Experts Group (VCEG) of the International TelecommunicationStandardization Sector (ITU-T) also develops video coding standards, e.g. H.261,H.263 and H.26L. In 2001, the Joint Video Team (JVT) was set up by the two groups(MPEG and VCEG) to convert the H.26L project to the standard H.264/MPEG-4 Part10, namely Advanced Video Coding (AVC). In 2010, The two groups establishedthe Join Collaborative Team on Video Coding (JCT-VC), which has been workingon the next generation of video coding, the High Efficiency Video Coding (HEVC).HEVC has a similar block-based coding structure as the H.264 and aims at highercompression efficiency. In 2012 the Join Collaborative Team on 3D Video codingextension development (JCT-3V) was formed by ITU-T and ISO/IEC to develop the3D video coding standards.

Both H.264 and HEVC have multi-view (MV) [VIWS11] [SBB12] and 3D exten-sion [atm11][htm]. The Multi-view extension exploits inter-view redundancies tocompress the video sources of different perspectives more efficiently, whereas the3D extension further utilizes the correlation between textures and depths for com-pression. The quantization process in these encoders removes the high frequenciesof an image or a video frame.

The standard video and image codecs were originally devised for the coding oftexture images. The removal of high frequencies results in acceptable distortion intexture images, but not in depth images [MMW11]. It was not until recently thatthe 3D extensions of these codecs started to take the properties of depth images intoaccount, i.e. preserving depth edges in a rate-distortion sense [MBM12]. In addi-tion to the standard codecs, as of today, there are many other approaches designedspecifically for depth image coding.


2.3 Previous works in depth image coding

Depth image coding schemes from previous works are categorized into seven groupsin this thesis for clarity. These groups are Distortion measurement, Filtering, Edge pre-serving transform, Regional Segementation, Tree and mesh, Improvement in standard codecs,Other Techniques. Some of the approaches from these groups can also be integratedinto the standard codecs as Improvement in standard codecs.

As coding of 3D video contents has been through rapid developments and is stillattracting massive research attention, the schemes illustrated in this chapter are abrief survey of the entire research works in depth coding.

2.3.1 Distortion measurement

In the rate-distortion optimization process of image or video compression(e.g. inH.264), the distortion measurement on images does not reflect the distortion on syn-thesized views when coding depth images. The metric that takes the texture infor-mation into consideration was presented in the research works [OLP11] [TSMT12].

2.3.2 Filtering

A Joint Trilateral Filtering that utilizes the edges of associated texture frames wasimplemented in H.264/AVC for depth image compression to strengthen the prop-erties of decoded depth images in paper [LLT+10a]. The down-sampling and up-sampling approach, in which depth images are down-sampled before encoding andup-sampled in decoding, is a straightforward approach for reducing the amountof data to be encoded. Paper [NMD00] proposes an up-sampling approach for theH.264 compressed down-sampled depth images. In the research work [SD09], depthimages were compressed via compressed sensing, in which original depth imageswere sub-sampled by a measurement matrix, and decoded images were reconstructedby the conjugate gradient method, which is a method for solving particular linearequation systems.

2.3.3 Edge preserving Transform

Depth edge preserving coding using graph-based wavelets [SSO09] transmits the bi-nary edge maps of depth images to the decoder and chooses filters adaptively byminimizing a cost function. Transform coefficients are encoded by using the com-pression method Set Partitioning in Hierarchical Treess (SPHIT), and the edge mapsby the bi-level compression scheme JBIG. Shape Adaptive Discrete Wavelet Trans-form (SA-DWT) [LLSWun] extrapolates pixels on the other side of the edges bysymmetric extension, whereas edges are encoded by a differential Freeman chaincode and bit-stream by the SPHIT. The adaptive wavelet coding [Til08] applies shotfilters around the edges of depth images and long filters for other areas. Cellular

2.3 Previous works in depth image coding 13

automata [Kar05] was applied for depth image coding in the work [CCRCK09], inwhich the gray level depth image is converted into gray code and represented by bitplanes. Inter-plane (i.e. between bit planes) correlations are utilized for prediction,and bit-streams are produced by an arithmetic encoder.

Some transform bases, e.g., Bandelets [LM05] that decompose an image accord-ing to the geometric flow of image intensity variation and Contourlets [DV05] thatcapture the geometry of image edges, are also suitable for depth image compres-sion. In paper [NW11], the performance of the Discrete Cosine Transform (DCT),the Karhunen-loeve Transform KLT, and the Block Truncation Coding (BTC) werecompared. Adaptive BTC that adaptively selects the size of a block for BTC wasdevised.

2.3.4 Regional segmentation

Papers [Got11] [Jag11] illustrate the algorithms that losslessly encode the edge con-tours of a depth image and the coefficients of the polynomials utilized to approx-imate the smooth areas bounded by depth edges. In paper [MC10], depth imageswere segmented into depth layers based on a rate-distortion function. Each layerwas approximated by the mean value of this layer, and the residue of the predic-tion was encoded by H.264/AVC. An edge-based depth image coding scheme thatcompresses the edges by JBIG and reconstructs the depth image by diffusion wasproposed in [GM12].

Ramp functions and Bezier curves has been utilized for depth coding in thepatent [MH01]. The coding scheme uses the Bezier curve to represent edges andmodels the areas belonging to objects (inside closed edges) by means of ramp func-tions.

2.3.5 Tree and mesh

Depth image coding using Tri-tree [SZD10] employs the tri-tree decomposition tosubdivide a depth image. Depth values from three vertices of each triangle fromthe decomposition can form a plane to approximate the depth of that triangular re-gion. The sparse points from the triangle vertices are encoded by differential andarithmetic coding.

Quadtree decomposition for depth image coding was proposed in [MF07]. Eachblock from the decomposition is approximated by a modeling function. The decom-position and the selection of the modeling function are subjective to a rate-distortioncriterion. Coefficients from the modeling functions are encoded by an adaptive arith-metic encoder.

Mesh-based depth coding using hierarchical decomposition of depth map [KH07]decomposes a depth image into a layer descriptor and disjoint images. Five types ofmodeling modes with mesh grids are used to represent the edge regions, and the no-edge-regions are represented by feature points. The layer descriptors are encoded by


H.264/AVC, and the disjoint images are then merged together and also compressedby H.264/AVC.

2.3.6 Improvement in standard codecs

Five extra prediction modes were introduced [LLT+10b] in H.264/AVC for depthimage compression. They utilize texture information for prediction. In edge-awareintra prediction for depth image coding [SKOL10], the prediction is performed byusing the available reconstructed signals that are within the same region withoutcrossing depth edges.

The Edge Adaptive Transform (EAT) [SKN+] has been implemented in H.264 forcoding blocks containing depth edges. A rate-distortion optimizer will choose be-tween the Discrete Cosine Transform (DCT) and the EAT for the coding of a depthblock in which depth edges are present. The join motion vector scheme [DTPP09]that exploits the correlation of motion vectors between texture and depth was intro-duced in MPEG-2. Similar approaches for motion vector sharing can be found in the3DV HEVC Test Model (3DV-HTM) [htm].

2.3.7 Other techniques

There are many other novel techniques available for depth image compression. In[KPS10], a lossless depth image coding scheme was proposed, in which depth im-ages were converted into the gray code representation and separated into bit planes.Intra bit plane and inter bit planes (i.e. inside one bit plane and between differentbit planes) as well as inter frame correlation were utilized for prediction. A context-based arithmetic encoder was employed to encode the bit planes. In depth codingfor a multi-view scenario, certain depth images can be predicted from the decodeddepth images of adjacent views using the geometrical information [EWaK09]. Cod-ing of depth by using WYNER-ZIV for the MVD format data [PCDPP11] exploitstexture motion information and uses motion vector interpolation to encode depthimages.

2.4 Preserving edges in depth images

Depth images must be compressed efficiently with respect to the quality of the syn-thesized views. Therefore, the coding of depth images in this respect is different tothe coding of conventional images. A straightforward approach is to measure dis-tortions on the synthesized views for the rate-distortion optimization process in acoding system as described in Distortion measurement. But, measuring distortions onsynthesized views is likely to be either inaccurate [HE12] or will increase the codingcomplexity dramatically [TSMT12]. The majority of approaches in groups Filtering,Edge preserving transform, Regional Segementation, and Tree and mesh utilize the prop-erty of depth images and focus on preserving depth edges, which leads to a better

2.4 Preserving edges in depth images 15

quality of synthesized views. In addition, the Tree and mesh approaches can facilitatethe rendering when consideration is given to the rendering speed.

16

Chapter 3

Depth image modeling,processing, and coding bydiffusion

Depth images can be assumed as piece-wise smooth, i.e. they consist of smooth areasbounded by sharp edges. Distortions on the depth images introduce disturbing ef-fects on the synthesized views, and the previous work shows that preserving depthedges in depth coding is a feasible approach for improving the quality of synthe-sized views. Therefore, following the idea of preserving depth edges, in this chapter,a depth image model is proposed, which is inspired by the edge-based compressionscheme using diffusion for Cartoon images presented in [MW09]. Depth image cod-ing scheme using such a model is devised, and the scheme is further incorporatedinto the state of the art 3D video encoder: the HEVC.

3.1 Depth image modeling

Depth image consists of sharp edges and smooth areas. Firstly, the sharp edges rep-resent the depth transitions of the scene in a video sequence. The level of perceivabledepth depends on the magnitude of retinal disparities [DRE+11], which is also re-lated to the depth edges. In this thesis, the significance of edges is relative to themagnitude of the first order derivatives of a depth image. Larger magnitudes im-ply higher significances. Depth transitions of small magnitudes might not introduceany perceivable depth but produce synthesized artifacts. Secondly, the smooth areascan be reconstructed from a set of sparse sampling points, if the sampling frequencysatisfies the Nyquist sampling theorem. Given the rationales, depth images are mod-eled as edges and sparse sampling points in this thesis.

The edges are identified by the Canny edge detector with a threshold multiplica-

17

18 Depth image modeling, processing, and coding by diffusion

tion factor Te from a depth image, and the sparse sampling points are obtained byuniformly sub-sampling the depth image with a sampling factor Ts. The pixels onboth sides of the edges are also extracted together with pixels on the detected edges.For this edge extraction, an edge mask Em is created by morphological dilation

Em = Ed ⊕ S3, (3.1)

where S3 is a square structure element of size 3, Ed is the detected edges, and ⊕ themorphological dilation. The pixels inside the mask are the edge pixels required forthe later reconstruction by using the model.

Figure 3.1 illustrates the depth image reconstruction and evaluation system forthe proposed model. The reconstruction part is marked in a red box with dash lines.The reconstructed depth image is diffused by using the Laplace equation from theedge pixels, the uniform sparse samples, and the border of the depth image. TheLaplace equation

∂2f

∂x2+

∂2f

∂y2= 0, (3.2)

in the implementation of this thesis is approximated by

f(x, y) = [f(x− 1, y) + f(x+ 1, y) +

f(x, y − 1) + f(x, y + 1)]/4. (3.3)

Eq. 3.3 sets the pixels of an unknown to be equal to its surrounding four pixels. Anoverdetermined system can be constructed from the equations. The unknowns ofthe equations can be solved by a least square method. Elevating Te eliminates theinsignificant edges, and the sampling factor Ts governs the smoothness versus thefidelity of the reconstruction.

3.2 Depth image processing

Depth image can be reconstructed by diffusion using the model, and the model isthen applied to depth image post-processing. The quality of depth images can beevaluated from the virtual views synthesized from the post-processed depth imagesand the original textures. The evaluation process for the post-processing is illus-trated in Figure 3.1.

3.3 Scalable depth image coding using depth edges

The model is also used for depth image compression. Encoding, decoding, and scal-ability of the coding system are presented in this Section.

3.3 Scalable depth image coding using depth edges 19

Figure 3.1: Depth image reconstruction and evaluation system.

3.3.1 Encoding

Preserving the locations of depth edges for depth coding is probably more importantfor high quality synthesized views. One means of preserving these locations is byencoding the edges explicitly. Figure 3.2 shows the coding system. As mentioned,the continuous edge contours are detected by using the Canny edge detector withthresholds. These edge contours are then represented by the Freeman chain code.The scheme requires the starting location of the contour (X,Y ) and the length Ltogether with the directions of the contour for the reconstruction in decoding sides.Pixels on both sides of the contours are uniformly sub-sampled, before which, anedge pre-processing is applied to render more coherent depth values on both sidesof the edge contours. In the edge pre-processing, the pixels inside mask Em areremoved. The pixels in Em but outside of contour Ed are assumed to be similar totheir adjacent horizontal and vertical pixels:

f(x, y) = f(x+ 1, y) = f(x, y + 1),

with (x, y), (x+ 1, y), (x, y + 1) /∈ Ed.(3.4)

An overdetermined set of linear equations for the pixels within the mask is formed.The unknowns of equations can be solved by using the least square error method.

The edge contours and the edge samples are encoded by differential coding fol-lowed by arithmetic coding. A low quality depth image compressed by HEVC isalso transmitted to the decoder. Sparse sampling points are obtained from this lowquality depth image for reconstruction.


Figure 3.2: Depth image coding system.

3.3.2 Decoding

In the decoding side, edge contours are reconstructed after entropy decoding. Thepixels are linearly interpolated on both sides of the contours, and the pixels on thecontours are filled by pixels on their right hand side. Uniform sparse samplingpoints and the border of the depth image are obtained from the decoded low qual-ity depth image. The depth image is reconstructed by solving the Laplace equationusing the least square method.

3.3.3 Scalability

Scalability in this thesis refers to layers of progressive refinement. The scalabilityis relative to the number of significant edge contours and their associated pixels onboth sides of the contours. The more of this edge information that is received andinvolved in the reconstruction, the better the reconstruction quality will be. For amore detailed description, assuming that Ec(n) = fc(Cn) are the edges extractedby the Canny edge detector with thresholds Cn, which is changed by the thresholdmultiplication factor Te = n, where n = 1, 2...N . Function fc(·) is to detect edgesusing the Canny edge detector, and the edges are represented by Ec(n). An edgeset e can be built by means of e(n) = Ec(n) − Ec(n + 1). The significance of theedges in descending order is Ec(N), e(N − 1), e(N − 2), ..., e(1). However, in thecoding of the edge contours, the connectivity must be considered, so that a connectededge contour should not be split into two edge sets. Otherwise, more coordinateinformation for the starting point of an edge contours is required to be encoded,which will increase the bit-rate because the coordinate information (X,Y, L) costsmore bits for encoding than the directions of the contours.

3.4 Diffusion modes in 3D-HEVC using texture edges 21

3.4 Diffusion modes in 3D-HEVC using texture edges

There are two issues associated with the coding scheme in Section 3.3. The explicitcoding of depth edge contours is expensive in terms of coding bit-rate. This can besolved by utilizing edges from associated textures. The second issue is the lack ofrate-distortion control that optimizes the compression efficiency. This can be over-come by implementing the diffusion process into the HEVC. In the thesis, the Depthmodeling modes (DMM) [MBM12] in 3D-HEVC are targeted. Two of the DMMsusing Constant Partition Value (CPV) coding are replaced by two diffusion modes,which employ a diffusion process called the two-step diffusion in the thesis. Thefirst step diffuses a block with edge constraints from the reconstructed depth of thetop and the left sides, and the second step diffuses only isolated regions. The edgeswithin a block of the depth image are detected from the associated texture eitherby an efficient wedgelet search or by means of a thresholding process. The diffu-sion is conducted by an iterative process described in [CYD+12]. In addition, a timeconstraint is imposed by the maximum number of iterations.

3.5 Evaluation methodologies

Peak Signal to Noise Ratio (PSNR) and structural similarity (SSIM) [ZBSS04] indexhas been widely used for evaluating image quality. The edges of synthesized viewscorresponding to the depth transitions are more distorted due to the removal of highfrequencies in the coding of depth images by traditional video or image encoders.Therefore, an evaluation approach, namely Edge SSIM, is proposed that only takesthe pixels around significant edges into the calculation with the SSIM metrics.

The objective metrics (e.g. PSNR, SSIM) are inaccurate with respect to synthe-sized views [HE12]. To evaluate the perceived quality, subjective tests are de factocriteria. ITU-R BT. 500-13 [Bt212] specifies the methodologies of subjective qualityassessment for television pictures. Stimulus-comparison methods can be appliedwhen assessing the quality of two impaired images in relation to each other.

The BD-PSNR model [Bjo01] calculates the average change of PSNR or bit-ratethrough four bit-rate points. In the model, a curve is fitted though the PSNR for fourbit-rate points. The average difference for two curves is the difference between theintegral of curves divided by their respective integral interval. The curve fitting canbe conducted using the PSNR against the bit-rate for calculating the average PSNRdifferences or by means the bit-rate against the PSNR for obtaining average bit-ratechanges. Although the PSNR is inaccurate in evaluating the quality of synthesizedviews, it is used for its simplicity and for obtaining results that are compatible withother research.

The depth image modeling and processing were presented in Paper I [LSJO],and the scalable depth compression scheme utilizing such a model was proposed inPaper II [li]. The subjective evaluation of the proposed compression scheme was de-scribed in Paper III [lisp2], and Paper IV [limmedia] introduced the diffusion modes


into 3D-HEVC. The next chapter lists the separate contributions of the papers.

Chapter 4

Contributions

In this chapter the novelties and the evaluation results are presented for the follow-ing four papers:

I. A depth image post-processing method by diffusion

II. A scalable coding approach for high quality depth image compression

III. Subjective evaluation of an edge-based depth image compression scheme

IV. depth map compression with diffusion modes in 3D-HEVC

The implementation and evaluation for these research works are performed inMatlab and Microsoft Visual Studio.

4.1 Paper I: A depth image post-processing method

Paper I introduces a depth image model, in which the depth image is modeled byedges and uniform sparse samples. The depth image is reconstructed by diffusion.The model was evaluated objectively by changing the parameters of the edge thresh-olds multiplication factor and a uniform sampling factor. In addition, the model wasfurther utilized for depth image post-processing.

4.1.1 Novelty

The novelty of this modeling and evaluation work is to (1) represent depth images byedges and uniform sparse samples and apply diffusion based on the Laplace equa-tion for depth image post-processing and (2) investigate the consequence of param-eterizations.

23

24 Contributions

4.1.2 Evaluation and results

Improvement is registered with respect to the objective metric PSNR for one of thetested sequences. The improvement is further confirmed by visual inspection. How-ever, there is no improvement for the sequence with depth images that are character-ized by planar areas bounded by sharp edges, partially because the proposed schemecannot process them to be even more piece-wise smooth. The results demonstratedthat the proposed depth image post-processed scheme using the model can achievea better quality of synthesized views.

4.2 Paper II: Scalable coding approach for high qual-ity depth image coding

The proposed depth image model was utilized for depth image compression. Thelocations of depth edges are encoded explicitly, and the smooth areas of the depthimage are interpolated by diffusion from sparse points of a low quality decodeddepth image and edge pixels. The quality of synthesized views using the decodeddepth images was evaluated objectively and, in addition, by visual inspection.

4.2.1 Novelty

The novelties of the scheme are: (1) it applies an edge pre-processing scheme torender a more coherent depth values on both sides of depth edges before samplingof edge pixels and encoding; (2) the coding scheme is scalable, which is inheritedfrom the coding structure; (3) a new evaluation approach that better represents thequality of edges was proposed.


The proposed scheme, objectively, produces better synthesized views than HEVC.This can be seen not only at the texture edges which corresponded to significantdepth edges but also on the full synthesized views when the depth images are en-coded in lower bit-rates. In addition, the visual inspection results from viewersrevealed that the proposed scheme outperforms HEVC at the majority of bit-ratepoints for the tested images.

4.3 Paper III: Subjective evaluation of the proposedcoding scheme

As mentioned, discrepancies exist between the objective test results and the visual in-spection. A subjective test was then conducted to clarify these discrepancies. Stimulus-

4.4 Paper IV: Diffusion modes in 3D-HEVC 25

comparison methods of ITU-R BT. 500-13 were employed for evaluating the qualityof two impaired images in relation to each other.

4.3.1 Novelty

The novelty of this paper is to conduct a subjective test for the proposed edge-baseddiffusion depth image compression scheme in Paper II and demonstrate the im-proved experienced quality.


The virtual views synthesized in the work of Paper II were selected for the subjec-tive evaluation. Two synthesized views were paired such that they used the codeddepth images of similar bit-rates, and that the depth images were coded using theproposed scheme and the HEVC, respectively. The paired images were presented ina random order in a full HD 2D display to the viewers, and the viewer provided thequality rating of paired images in relation to each other. The images were assessedin 2D so that the quality of experience of the viewer is not affected by the 3D factors(e.g. crosstalk and 3D viewing comfort). The ITU-R BT.-500 testing procedure wasfollowed. The mean and the 95 % confidence interval are plotted for the results.

Tested results show that the proposed scheme outperforms HEVC subjectivelyfor the majority of bit-rates, the exception being at a very low bit-rate. The excep-tion is because the proposed scheme uses a scalable coding approach, which assignshigher priorities (higher bit-rates) to significant edges. This results in more distor-tions in the areas of synthesized view corresponding to the areas of depth imagesthat contain less significant edges. In a very low bit-rate, these distortions attract theattention of the viewers. Furthermore, the results demonstrated that the SSIM failsto predict the importance of preserving edges on the synthesized views.

4.4 Paper IV: Diffusion modes in 3D-HEVC

This paper further introduces the diffusion process into the state of the art videoencoder 3D-HEVC. The bit-rate saving of both the depth images and the synthesizedviews for the proposed scheme to the original 3D-HEVC were measured.

4.4.1 Novelty

The novelties of this paper are: (1) it introduces two diffusion modes into 3D-HEVC;(2) texture edges are utilized for diffusion constraints; (3) the diffusion is conductedin two-step when isolated regions exist.

26 Contributions


The results demonstrated that better compression has been achieved for the pro-posed scheme with respect to the original 3D-HEVC. The diffusion process is betterfor approximating the depth images than the constant partition value coding in 3D-HEVC. Further bit-rate savings are likely to be obtained as the proposed scheme isfurther improved.

Chapter 5

Conclusions and future work

The previous chapter has summarized the contribution of the papers in this thesisseparately. In this chapter an overview of the thesis work is presented and followedby the analysis of the goals. The contributions to the 3D community are also dis-cussed, and, last but not least, future works list research directions.

5.1 Overview

The works presented in the thesis start with an overall aim of improving the 3DTVexperience by means of more efficient coding for 3D video contents.

A depth image modeled by edges and sparse samples and reconstructed by dif-fusion was devised for depth image post-processing. A scalable depth image com-pression scheme using the model was proposed in order to render a better qualityof synthesized views. A subjective test was conducted for evaluating the perceivedquality of the synthesized views, for which it was shown that the proposed schemeoutperforms HEVC intra subjectively. The diffusion process was further introducedinto 3D-HEVC as intra coding diffusion modes, by using which, bit-rate saving wasachieved for a constant quality of synthesized views.

The experimental results from the research work show that the task for a moreefficient coding of 3D video contents, specifically for still depth images, was accom-plished. This will indirectly lead to a better 3DTV experience.

5.2 Outcome

The specific goals were mandated in Chapter 1.5 of the introduction. A discussionof the outcome for the presented papers with respect to the goals is presented here.

Goal I. Propose a model and representation of depth images based on their intrinsic prop-

27

28 Conclusions and future work

erty, i.e. the piece-wise smooth assumption.

In Paper I, depth images were modeled as significant edges and uniform sparsesampling points, and the missing areas were interpolated by using diffusion basedon the Laplace equation. When the model was applied to depth image post-processing,the virtual views synthesized with the processed depth images were better than thevirtual views synthesized with the original depth images both objectively and sub-jectively. Therefore, the proposed scheme by using the model can improve the qual-ity of the depth images and preserve the intrinsic property of these images.

Goal II. Propose a coding scheme for depth images that preserves the fundamental prop-erties of depth images.

Paper II presents the scalable coding scheme that applies the depth image model.In this scheme, the locations of depth edge contours are explicitly encoded, andsmooth areas are interpolated by the Laplace diffusion. The scalability is inheritedfrom the coding structure and is relative to the significance of the edges. Less signifi-cant edge information will be discarded when the network bandwidth is subjected toa constraint. Therefore, the most significant depth transitions were better preservedby using the proposed scheme than by the traditional video and image encoders. Theadvantage of the proposed scheme was further confirmed by a subjective test usingthe stimulus-comparison methods in paper III. The evaluation results show that theproposed coding scheme, by preserving the fundamental properties of depth images,outperforms the HEVC subjectively.

Goal III. Propose a coding scheme for depth images for MVD sequences based on hybridvideo coding schemes.

In paper IV, the diffusion process was implemented in the 3D-HEVC as intraprediction modes. The scheme diffuses a coding block by using the Laplace equationwith texture edge constraints. Two-step diffusion was applied when isolated partsexist in the block. The experimental results show that the proposed scheme achievesbit-rate saving with respect to the quality of both depth images and synthesizedviews for the MVD sequences.

5.3 Impact

Compression is part of the 3DTV transmission chain. With the advancement ofdisplay technologies, 3DTV of multi-view, free-view and even holographic view-ing might be a future trend for home entertainments. This implies an explosion ofcapacity required for storing and transmitting video data. An efficient compressionscheme, therefore, enables this video data to be transmitted through a limited ca-pacity of channels, but, with a better quality of presentations. As a result, a moreefficient compression technique will surely enhance a better 3DTV experience.

The proposed depth image post-processing and coding schemes result in a bet-ter quality of synthesized views than the state of the art schemes. Therefore, theproposed schemes provide a more convincing and satisfactory quality for the 3DTV

5.4 Future work 29

viewers while saving network resources. As a consequence, the research will in-crease the availability of contents for end users and can indirectly boost the 3D con-tent production and can probably accelerate the advancement of 3D displays. Theresearch might also boost the development of 3D mobile applications because thenetwork capacity in a mobile environment is further limited. Furthermore, homeentertainment users can enjoy a more realistic virtual 3D program while some pro-fessionals, e.g. medical doctors, may benefit from the high quality 3D transmissionwhen examining a patient remotely.

5.4 Future work

There are still problems associated with the proposed coding schemes. Therefore,the schemes are intended to be further improved in future research. One problemis in relation to the decoding complexity. As the diffusion is performed by solvingoverdetermined systems or by an iterative process, the decoding complexity willincrease accordingly. This problem can be solved by utilizing some other fast inter-polation or filtering methods.

The second problem is the coding of depth edges, which cost significant amountof bits. For video sequences, efficient compression may be difficult to achieve byusing temporal prediction (i.e. predicted a frame from its adjacent frames) for edgecontours. Although texture edges can be utilized to approximate depth edges, thetexture edges are probably inaccurate. This problem may probably be solved bymeans of a joint encoding scheme of texture edges and depth edges, which is sub-jected to future research.

In addition, transforms that preserve edges for residue coding might be beneficialfor standard video codecs (e.g. HEVC) in depth coding. The areas of transformcoding can be exploited further in the future.

As the quality of synthesized views relies on the combination of texture anddepth from MVD sequences, future research should also be focused on the com-pression of texture images.

30

Bibliography

[Adv12] Advanced video coding. Document ISO/IEC 14496-10, Telecommunica-tions Union, 2012.

[atm11] Tentative test model for avc-based 3d video coding (3DV-TM). ISO/IECJTC1/SC29/WG11 MPEG2010/N11631, Nov. 2011. Doc. M22842 Geneva,Switzerland.

[Bjo01] G. Bjontegard. Calculation of average PSNR differences between RD-curves. ITU-T VCEG-M33, Mar. 2001.

[BM12] B. Bruce and F. Ma. BIG S SMALL 3D : WHAT MAKES STEREOSCOPICVIDEO SO COMPELLING ? In Proc. IC3D, pages 2–6, 2012.

[Bt212] Methodology for the subjective assessment of the quality of televisionpictures . Recommendation ITU-R BT. 500-13, Jan. 2012.

[CCRCK09] L. Cappellari, C. Cruz-Reyes, G. Calvagno, and J. Kari. Lossy to LosslessSpatially Scalable Depth Map Coding with Cellular Automata. 2009Data Compression Conference, pages 332–341, March 2009.

[CYD+12] J. Chen, F. Ye, J. Di, C. Liu, and A. Men. Depth map compression viaedge-based inpainting. Picture Coding Symposium, pages 57–60, 2012.

[DGKK09] M. Domanski, T. Grajek, K. Klimaszewski, and M. Kurc. PoznanMultiview Video Test Sequences and Camera Parameters. ISO/IECJTC1/SC29/WG11, 2009.

[DRE+11] P. Didyk, T. Ritschel, E. Eisemann, K. Myszkowski, and H.-P. Seidel. Aperceptual model for disparity. ACM Transactions, 2011.

[DTPP09] I. Daribo, C. Tillier, and B. Pesquet-Popescu. Motion Vector Sharing andBitrate Allocation for 3D Video-Plus-Depth Coding. EURASIP Journalon Advances in Signal Processing, 2009:1–14, 2009.

[DV05] M. N. Do and M. Vetterli. The contourlet transform: an efficient direc-tional multiresolution image representation. IEEE transactions on imageprocessing : a publication of the IEEE Signal Processing Society, 14(12):2091–106, December 2005.

31

32 BIBLIOGRAPHY

[EWaK09] E. Ekmekcioglu, S. Worrall, and a.M. Kondoz. A Temporal SubsamplingApproach for Multiview Depth Map Compression. IEEE Transactions onCircuits and Systems for Video Technology, 19(8):1209–1213, August 2009.

[Feh05] C. Fehn. 3DTV Broadcasting. In O. Schreer, P. Kauff, and T. Sikora, ed-itors, 3D videocommunication, pages pp(23–36). John Wiley & Sons Ltd.,2005.

[GAC+] A. Gatchev, G. Akar, T. Capin, D. Strohmeier, and A. Boev. Three-dimensional media for mobile devices. In IEEE, 99(4): 708-741.

[GB09] O. Gangwal and R. Berretty. Depth map post-processing for 3D-TV.In Consumer Electronics, 2009. ICCE’09. Digest of Technical Papers Interna-tional Conference on, pages 1–2. IEEE, 2009.

[GM12] J. Gautier and O. Meur. Efficient depth map compression based on loss-less edge coding and diffusion. Picture Coding Symposium, pages 81–84,2012.

[Got11] S. Goto. Region-interior painting in Contour Based Depth map CodingSystem. 2011 International Symposium on Intelligent Signal Processing andCommunications Systems (ISPACS), pages 1–5, December 2011.

[HE12] P. Hanhart and T. Ebrahimi. Quality assessment of a stereo pair formedfrom decoded and synthesized views using objective metrics. in Proc.3DTV CON., 2012.

[htm] 3DV HEVC Test Model (3DV-HTM) version 4.1. Online:https://hevc.hhi.fraunhofer.de/svn/svn 3DVCSoftware/tags/HTM-4.1/, retrieved: 09, 2010.

[IJM05] W. A. IJsselsteijn, P. J.H.Seuntiens, and L. M. Meesters. Human Fac-tors of 3D Displays. In O. Schreer, P. Kauff, and T. Sikora, editors, 3Dvideocommunication, pages pp(219–231). John Wiley & Sons Ltd., 2005.

[IL02] H. Imaizumi and A. Luthra. Stereoscopic Video Compression Standard”MPEG-2 Multiview Profile”. In Three-Dimensional Television, Video,and Display Technologies, pages pp(169–181). Springer, Berlin, Germany,2002.

[Jag11] F. Jager. Contour-based segmentation and coding for depth map com-pression. 2011 Visual Communications and Image Processing (VCIP), pages1–4, November 2011.

[Jpe00] Information technology- JPEG 2000 image coding system. ISO/IEC15444, 2000.

[JpeEG] Digital compression and coding of continuous tone still images.ISO/IEC 10918-1/ITU-T Recommendation T.81, 1992 (JPEG).

[jtc11] Call for Proposals on 3D Video Coding Technology. ISO/IECJTC1/SC29/WG11, Mar. 2011.

BIBLIOGRAPHY 33

[Jul71] B. Julesz. Foundations of Cyloppean Perceptions. University of ChicagoPress, Chicago, IL, 1971.

[JV10] JCT-VC. WD1: Working Draft 1 of High-Efficiency Video Coding.JCTVC-C403, Oct. 2010. Guangzhou, China.

[Kar05] J. Kari. Theory of cellular automata:a survey. Theor. Comput. Sci., 334:3–33, 2005.

[KES05] R. Koch and J.-F. Evers-Senne. View synthesis and rendering methods.In O. Schreer, P. Kauff, and T. Sikora, editors, 3D videocommunication,pages pp(152–174). John Wiley & Sons Ltd., 2005.

[KH07] S. Kim and Y. Ho. Mesh-based depth coding for 3D video using hierar-chical decomposition of depth maps. Image Processing, 2007. ICIP 2007,pages 117–120, 2007.

[KPS10] K. Y. Kim, G. H. Park, and D. Y. Suh. Bit-plane-based lossless depth-map coding. Optical Engineering, 49(6):067403, 2010.

[KS05] P. Kauff and O. Schreer. Immersive videoconferencing. In O. Schreer,P. Kauff, and T. Sikora, editors, 3D videocommunication, pages pp(219–231). John Wiley & Sons Ltd., 2005.

[Lip82] L. Lipton. A study in depth. Foundations of the Stereoscopic Cinema, pages2–6, 1982. Van Nostrand Reinhold, New York, NY, USA.

[LLSWun] S. Li, W. Li, H. Sun, and Z. Wu. Shape adaptive wavelet coding. InCircuits and Systems, 1998. ISCAS ’98. Proceedings of the 1998 IEEE Inter-national Symposium on, volume 5, pages 281–284 vol.5, May-3 Jun.

[LLT+10a] S. Liu, P. Lai, D. Tian, C. Gomila, and C. Chen. Joint trilateral filteringfor depth map compression. Image Processing, 7744:77440F1–10, 2010.

[LLT+10b] S. Liu, P. Lai, D. Tian, C. Gomila, and C. Chen. Sparse dyadic modefor depth map compression. In Image Processing (ICIP), 2010 17th IEEEInternational Conference on, pages 3421–3424. IEEE, 2010.

[LM05] E. Le Pennec and S. Mallat. Sparse geometric image representationswith bandelets. Image Processing, IEEE Transactions on, 14(4):423–438,2005.

[LSJO] Y. Li, M. Sjostrom, U. Jennehag, and R. Olsson. A Depth image post-processing method by diffusion. In Proc. SPIE, Feb. 2013.

[MBM12] P. Merkle, C. Bartnik, and K. Muller. 3D video: Depth coding basedon inter-component prediction of block partitions. In Picture CodingSymposium, pages 149–152, 2012.

[MC10] S. Milani and G. Calvagno. A depth image coder based on progressivesilhouettes. Signal Processing Letters, IEEE, 17(8):711–714, 2010.

34 BIBLIOGRAPHY

[MF07] Y. Morvan and D. Farin. Depth-Image compression based on an RDoptimized quadtree decomposition for the transmission of multiviewimages. In Image Processing, 2007. ICIP 2007. IEEE International Confer-ence on, volume 5, pages V–105. IEEE, 2007.

[MH01] A. Millin and P. Harman. Depth map compression technique, May 2001.Patent: WO/2001/039125.

[mid] Middlebury stereo evaluation. Online:http://vision.middlebury.edu/stereo/eval/, retrieved: 03, 2013.

[MLB+00] M. W. Marcellin, M. A. Lepley, A. Bilgin, T. J. Flohr, T. C. Chinen, andJ. H. Kasner. An overview of quantization in JPEG 2000. Signal Process-ing: Image Communication, pages 73–84, 2000.

[MMW11] B. K. Mu, P. Merkle, and T. Wiegand. 3-D Video Representation UsingDepth Maps. 2011.

[mus] Multimedia Scalable 3D for Europe. Online:http://www.muscade.eu/overview.html, retrieved: 02, 2013.

[MW09] M. Mainberger and J. Weickert. Edge-Based Image Compression withHomogeneous Diffusion. Computer Analysis of Images and Patterns,pages 476–483, 2009.

[NBM+12] M. Neuman, G. Baura, S. Meldrum, O. Soykan, M. Valentinuzzi,R. Leder, S. Micera, and Z. Yuan-Ting. Advances in medical devicesand medical electronics. In IEEE, 100 (Special Centennial Issue): 1537-1550, 2012.

[NMD00] V.-a. Nguyen, D. Min, and M. N. Do. Efficient Techniques for DepthVideo Compression Using Weighted Mode Filtering. pages 1–13, 2000.

[NW11] H. Nayyar and A. Wei. A Comparative Study of Depth-Map CodingSchemes for 3D Video. In Proc. IEEE, Apr. 2011.

[OLP11] B. T. Oh, J. Lee, and D.-S. Park. Depth map coding based on synthesizedview distortion function. Selected Topics in Signal Processing, IEEE Journalof, 5(7):1344–1352, Nov 2011.

[PCDPP11] G. Petrazzuoli, M. Cagnazzo, F. Dufaux, and B. Pesquet-Popescu.Wyner-ziv coding for depth maps in multiview video-plus-depth. 201118th IEEE International Conference on Image Processing, pages 1817–1820,September 2011.

[SBB12] H. Schwarz, C. Bartnik, and S. Bosse. 3D video coding using advancedprediction, depth modeling, and encoder control methods. Picture Cod-ing Symposium, pages 3–6, 2012.

[SD09] M. Sarkis and K. Diepold. Depth map compression via compressedsensing. In Image Processing (ICIP), 2009 16th IEEE International Confer-ence on, pages 737–740. IEEE, 2009.

BIBLIOGRAPHY 35

[SFWK11] D. V. S. X. D. Silva, W. A. C. Fernando, S. Worrall, and A. Kondoz.A Depth Map Post-Processing Technique for 3D-TV Systems based onCompression Artifact Analysis. Signal Processing, pages 1–30, 2011.

[SKN+] G. Shen, W.-S. Kim, S. Narang, A. Ortega, J. Lee, and H. Wey. Edge-adaptive transforms for efficient depth map coding. In Picture CodingSymposium (PCS), 2010, pages 566–569.

[SKOL10] G. Shen, W. Kim, A. Ortega, and J. Lee. Edge-aware intra prediction fordepth-map coding. Image Processing (ICIP), (c):3393–3396, 2010.

[SSO09] A. Sanchez, G. Shen, and A. Ortega. Edge-preserving depth-map cod-ing using graph-based wavelets. In Signals, Systems and Computers, 2009Conference Record of the Forty-Third Asilomar Conference on, pages 578–582, Nov. 2009.

[SZD10] M. Sarkis, W. Zia, and K. Diepold. Fast depth map compression andmeshing with compressed tritree. In H. Zha, R.-i. Taniguchi, andS. Maybank, editors, Computer Vision - ACCV 2009, volume 5995 of Lec-ture Notes in Computer Science, pages 44–55. Springer Berlin Heidelberg,2010.

[Til08] C. Tillier. Adaptive Wavelet Coding of the Depth Map for StereoscopicView Synthesis. Image Processing, pages 413–417, 2008.

[TSMT12] G. Tech, H. Schwarz, K. Muller, and W. Thomas. 3D video coding us-ing the synthesized view distortion change. Picture Coding Symposium,pages 25–28, 2012.

[VIWS11] B. A. Vetro, F. Ieee, T. Wiegand, and G. J. Sullivan. Overview of theStereo and Multiview Video Coding Extensions of the H . 264 / MPEG-4 AVC Standard. Proceedings of the IEEE, 99(4), 2011.

[vsr10] Report on experimental framework for 3D video coding. ISO/IECJTC1/SC29/WG11 MPEG2010/N11631, Oct. 2010. Guangzhou, China.

[ZBSS04] W. Zhou, A. Bovik, H. Sheikh, and E. Simoncelli. Image quality as-sessment: from error visibility to structural similarity. Image Processing,IEEE Transactions on, 13(4):600 –612, April 2004.

36

Documents

Coding of Three-dimensional Video Contentmiun.diva-portal.org/smash/get/diva2:626000/FULLTEXT01.pdf · 2013-06-18 · niﬁcant challenge for the network transmission capacity. An