5
New Fast Depth Image-Based Rendering Method for 3DTV Linwei Zhu 1 , Gangyi Jiang 1 , Yun Zhang 2 1. Faculty of Information Science and Engineering, Ningbo University, Ningbo, China [email protected] Mei Yu 1 , Zongju Peng 1 , FengShao 1 2. Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Shenzhen, China [email protected] AbstractVirtual view rendering is one of the key technologies to realize Three-Dimensional Television (3DTV) system. This paper presents a novel fast method of virtual view rendering that the holes are divided into large and small hole regions, which can be processed according to their size, also we propose using a new way of rendering that some pixels refer to the warping relation of previous frame and the other pixels update the warping relation by re-computing in the current rendering frame, in the process of selecting threshold to determine the two parts of pixel regions, we propose a mathematical model to adaptive select the threshold after a large number of experiments. Experimental results show that the proposed method is effective, and it effectively reduces the rendering time within a certain range of the quality of virtual image declining. Keywords-3DTV; virtual view rendering; holes;threshold I. INTRODUCTION Depth Image Based Rendering (DIBR) is a technology of using color image and its associated depth map, to generate virtual views [1]. At the video encoder, it only compresses color image information of fewer views, which greatly reduces the amount of data during encoding and transmission, moreover at the video decoder, it can get color image information of more views by quick virtual view rendering to meet the demand of audience viewing video from any viewpoints. The technology has been recognized by MPEG organization as an alternative for virtual view rendering in 3DTV system [2]. In the recent years, scholars have been studying virtual view rendering based on depth map. Fehn proposed the way of carrying out virtual view rendering with DIBR technology [3]. Lu, who used the constraint condition of process of warping, solved the problem of boundary noise dues to the inaccurate depth value around the edges of objects in virtual view image [4]. Jun presented an adaptive edge-oriented smoothing process [5], to smooth the depth map before warping, it reduced the number of holes to a certain extent, but the quality of virtual image declined because of geometric distortion. Zinger [6] solved the problem of re-sampling by inv-warping, and used the information of depth map in the process of holes- filling, and got a better quality of rendering. II. DEPTH IMAGE BASED RENDERING Virtual view rendering based on depth map generally includes these aspects: 1) depth map pre-processing, 2) 3D image warping, 3) holes filling, and 4) post-processing. A. Depth map Depth map is a kind of grayscale image, reflecting the distance of every pixel point to camera, with the realistic distance of each pixel point to camera quantitative from 0 to 255. Larger values in depth map indicate that the corresponding pixel points are more close to the camera, as shown in Fig.1. Fig.2 shows the relationship between realistic depth values and quantitative depth values. Figure 1. Depth map of “bookarrival” test sequence Figure 2. Relationship between realistic and quantitative depth values 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) 978-1-4673-1547-0/12/$31.00 ©2012 IEEE

[IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

  • Upload
    feng

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

New Fast Depth Image-Based Rendering Method for 3DTV

Linwei Zhu1, Gangyi Jiang1, Yun Zhang2

1. Faculty of Information Science and Engineering, Ningbo University, Ningbo, China

[email protected]

Mei Yu1, Zongju Peng1, FengShao1

2. Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences (CAS), Shenzhen, China

[email protected]

Abstract—Virtual view rendering is one of the key technologies to realize Three-Dimensional Television (3DTV) system. This paper presents a novel fast method of virtual view rendering that the holes are divided into large and small hole regions, which can be processed according to their size, also we propose using a new way of rendering that some pixels refer to the warping relation of previous frame and the other pixels update the warping relation by re-computing in the current rendering frame, in the process of selecting threshold to determine the two parts of pixel regions, we propose a mathematical model to adaptive select the threshold after a large number of experiments. Experimental results show that the proposed method is effective, and it effectively reduces the rendering time within a certain range of the quality of virtual image declining.

Keywords-3DTV; virtual view rendering; holes;threshold

I. INTRODUCTION Depth Image Based Rendering (DIBR) is a technology of

using color image and its associated depth map, to generate virtual views [1]. At the video encoder, it only compresses color image information of fewer views, which greatly reduces the amount of data during encoding and transmission, moreover at the video decoder, it can get color image information of more views by quick virtual view rendering to meet the demand of audience viewing video from any viewpoints. The technology has been recognized by MPEG organization as an alternative for virtual view rendering in 3DTV system [2].

In the recent years, scholars have been studying virtual view rendering based on depth map. Fehn proposed the way of carrying out virtual view rendering with DIBR technology [3]. Lu, who used the constraint condition of process of warping, solved the problem of boundary noise dues to the inaccurate depth value around the edges of objects in virtual view image [4]. Jun presented an adaptive edge-oriented smoothing process [5], to smooth the depth map before warping, it reduced the number of holes to a certain extent, but the quality of virtual image declined because of geometric distortion. Zinger [6] solved the problem of re-sampling by inv-warping, and used the information of depth map in the process of holes-filling, and got a better quality of rendering.

II. DEPTH IMAGE BASED RENDERING Virtual view rendering based on depth map generally

includes these aspects: 1) depth map pre-processing, 2) 3D image warping, 3) holes filling, and 4) post-processing.

A. Depth map Depth map is a kind of grayscale image, reflecting the

distance of every pixel point to camera, with the realistic distance of each pixel point to camera quantitative from 0 to 255. Larger values in depth map indicate that the corresponding pixel points are more close to the camera, as shown in Fig.1. Fig.2 shows the relationship between realistic depth values and quantitative depth values.

Figure 1. Depth map of “bookarrival” test sequence

Figure 2. Relationship between realistic and quantitative depth values

2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin)

978-1-4673-1547-0/12/$31.00 ©2012 IEEE

Page 2: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

B. 3D Image Warping The core idea of rendering is: (1) all the pixels in the

reference image are re-warped to their corresponding 3-D space by using depth information, (2) the pixels in the 3-D space are warped to the target image plane again. This process that from 2-D to 3-D re-warping and 3-D to 2-D is called the three-D image transformation (3D Image Warping), the process is shown in Fig.3.

Figure 3. Schematic diagram of 3D image warping 3D Image Warping can be expressed as [3]:

1 1 12 2 1 2 2 1 1 1 2 2 1 1 2 2Z p Z A R R A p A R R t A t� � �� � � (1)

where, p1 and p2, stand for the pixel coordinates in the reference and virtual image plane, Z1 and Z2, are the depth values of coordinate system in the reference and virtual viewpoint, A1 and A2 , stand for reference and virtual cameras’ internal parameter, R1 and R2, t1 and t2 are the rotation matrix, translation vector of the reference and virtual camera.

Figure 4. Flow chart of proposed method

III. PROPOSED FAST DEPTH IMAGE BASED RENDERING METHOD

Shown in Fig 4, it’s the flow chart of proposed method. First, the holes are marked during the 3D Image Warping for following treatment, then by the method of counting number of no-holes pixels in a certain window to determinate the large or small hole in mask of holes, and then the large and small hole regions are processed separately, in the post-processing, image restoration [7] and holes filling are used. Fig.5 shows the

hole regions marked during the process of warping, the white area is hole region.

Fig.6 is the proposed method of rendering in this paper, for the first frame of each group, we calculate the warping relation of each pixels, and for the other frames in the group, part of the pixels refer to warping relation of the previous frame and the other part of pixels update the warping relation until the next group. In Fig.7, it’s the proposed flow chart of the whole rendering process.

(a) (b)

(c) (d)

Figure 5. (a) (c) masked holes from left view and right viewpoint rendering, (b) (d) masked small holes from left and right viewpoint rendering.

Figure 6.Proposed way of rendering

A. Identified two regions In this paper, the critical section is to determine the two

regions, namely, “part of pixels refer to the warping relation of previous frame” and “the other part of pixels update the warping relation”. For the Eq. (1), after determining reference and virtual viewpoints, the parameters of camera are determined, these parameters become constant values, namely M, N are both constant in Eqs. (3) and (4), Eq. (1) can be modified, as follows:

NMpZpZ �� 1122 (2)1

11

122��� ARRAM (3)

2211

122 tAtRRAN ��� � (4)Taking into account the depth map of previous and current

frame between the adjacent frames, at the same image plane location p1, pixel value (depth value) Z1 is almost the same, after the calculation of Eq. (2), it can get that the values of the position p2 are the same. This indicates that most pixels can refer the warping relation of previous frame.

According to the literature [8], when the depth values change within a certain range, the warping relation may be the same because of the rounding-off method, so that the result of

Page 3: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

rendering is not affected. Then we use the following equation to determine two parts of the region:

| ( , ) ( , ) |( , )

| ( , ) ( , ) |p c

p c

C D x y D x y Tr x y

R D x y D x y T� ���� � � �

(5)

where, T demonstrates “threshold”, Dp(x,y) and Dc(x,y), respectively, stand for the depth value at the position (x,y) of previous and current frame. When pixel position (x,y) is marked “C”, it means that it needs to update the warping relation at the position (x,y). When the pixel position (x,y) is marked “R”, it means that it can refer to the warping relation of previous frame at the position (x,y).

Figure 7. Flow chart of proposed rendering method

B. Two regions determined under different thresholds As shown in Fig.8, it’s the two segmented regions with

different thresholds, we select the “ballet” sequence with the third viewpoint, Fig. 8(a) is the result of depth map difference of frame 1 and frame 0, and the others are the results that two determined regions under different thresholds. In Fig 8, the “white” area is the region that needs to update the warping relation; the “black” area is the region that can refer to the warping relation of previous frame.

C. Virtual image quality under different thresholds We select the “breakdancers” sequence with the fourth and

sixth viewpoints to render the fifth viewpoint, test rendering quality under different threshold values, with group being 2 and group being 100. As shown in Fig.9, where “previous” refers as the method that the warping relation updates at each

pixel location, the PSNR curves obtained in Fig 9 (a) and Fig 9 (b).

(a) (b) (c)

(d) (e) (f)

Figure 8. Two segmented regions. (a) depth map difference, (b)threshold=0, (c) threshold=1, (d) threshold=2, (e)threshold=3, (f) threshold=4.

(a)PSNR curve without “cumulative error” of different thresholds

(b) PSNR curve with “cumulative error” of different thresholds Figure 9. Rendering quality of kinds of threshold

D. Mathematical model We obtain the following mathematical relationship through

a large number of experiments, while according to the mathematical relationship we get Eqs. (6) and (7). Among them, “a1~f1”and “a2~e2” are constants. We propose Eq. (8), as the cost function, � is constant, T demonstrates “threshold” and G demonstrates “group”, to measure the current rendering process is good or not, and be used to

Page 4: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

adjust the threshold to make trade off between rendering time and quality. By Eqs. (6), (7) and (8), it can be found that when the “group” is selected, with the “threshold” increasing, the value of the cost function decreases.

2 21 1 1 1 1 1PSNR a b T c G d T e G f TG� � � � � � (6)

2 22 2 2

c G e TTime a b e d e� �� � � (7)

PSNRTimeJ /��� (8)The cost of the current rendering is “J0” and the mean cost

of previous rendering is “Jmean”, initially, we set “T=5”. Then it utilizes the Eq. (9) to adjust the adaptive threshold to get the least cost of rendering, and the value of T is selected from 0 to 10.

0

0

0

22

mean

mean

mean

T J JT T J J

T J J

� ���� � ��� �

(9)

IV. EXPERIMENTAL RESULTS AND ANALYSIS In order to verify the validity of proposed method, we use

the video test sequence of 100 frames named “ballet” and “breakdancers” with color image and corresponding depth map provided by Microsoft Research (MSR) [9] and video test sequence of 100 frames named “bookarrival”, “laptop” and “doorflower” with color image and corresponding depth map provided by Germany HHI [10]. Among them, “ballet” and “breakdancers” are the convergence of multi-view camera video sequences, and “bookarrival, laptop, doorflower” are obtained by the parallel multi-view camera, the images all have a resolution of 1024 768.

A. Result of proposed method We select the “ballet” sequence with the third and fifth

viewpoint rendering the fourth viewpoint, results are shown in Fig 10. It can be seen that it precisely finds the small holes due to inaccurate depth value and error warping in the Fig.10 (a), then it takes a different way to fill these holes compared holes compared with processing large holes, and finally it obtains good warping result, as shown in Fig.10 (b). .

B. Warping subjective image contrast In order to verify the proposed method of rendering is

effective, we compare the method of updating warping relation of each pixel location with the method of part of pixels refer to the warping relation of previous frame and the other part of pixels update warping relation. We choose the “breakdancers” and “bookarrival” as the test sequences, For “breakdancers” sequence, we use the fourth and sixth viewpoint to render the fifth viewpoint, and for “bookarrival” sequence, we use the eighth and tenth to render ninth viewpoint. The subjective results are shown in Fig.11. The results are both frame 1.

From the Fig. 11, we can find that the results of the two methods are the same, which indicates that the proposed method of part of pixels referencing and part of pixels updating warping relation is reasonable.

(a) (b)

Figure.10 Result of rendering. (a) mask of holes, (b) result of after small holes processed (PSNR=32.07dB).

(a) (b)

(c) (d) Figure.11 Result of rendering. Previous method, (a) PSNR=32.96dB, (c) PSNR=33.01dB. Proposed method, (b) PSNR=32.97dB, (d) PSNR=33.0dB.

C. Experimental results of proposed way of rendering For each sequence, different group values are also tested,

we statistic the time of spending on the rendering and the quality of rendered image, to measure the process of rendering, and the PSNR is compared between original test sequence which comes from camera and the virtual viewpoint image from the depth image based rendering..

In Table 1, the sequence “s1, s2, s3, s4, s5”stand for “ballet, breakdancers, bookarrival, laptop, doorflower”, G demonstrates “group”, and “group=1” is previous method, “group= (2, 4, 6, 8)” is presented method.

Table 1, Time is the total of 100 frames. PSNR is the average of 100 frames. Time1 is the time used to render 100 images, time2 is the time used in the core part of the rendering (the time used in the warping).

previouspresent psnrpsnrpsnr ��� (10)

%100 �

��previous

presentprevious

timetimetime

time (11)

From Table 1, we can find that with the group increasing, namely, the number of updating warping relation decreasing, the quality of image also decreases. When the value of group increases 1, PSNR declines by about 0.02~ 0.03dB or so. The proposed method, within limits of the decreasing of quality, effectively reduces the rendering time.

Page 5: [IEEE 2012 IEEE Second International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2012.09.3-2012.09.5)] 2012 IEEE Second International Conference on

Table I Result of kinds of test sequences G Items s1 s2 s3 s4 s5

PSNR(dB) 33.37 32.86 33.37 35.2 34.3

Time1(ms) 353936 358461 275399 280644 282957

Time2(ms) 120173 119182 119697 119610 118575

� psnr(dB) 0 0 0 0 0

� time1 0 0 0 0 0

1

� time2 0 0 0 0 0

PSNR(dB) 33.35 32.84 33.37 35.18 34.28

Time1(ms) 308232 325344 229076 231416 233540

Time2(ms) 72318 81157 73732 70492 69670

� psnr(dB) -0.02 -0.02 0 -0.02 -0.02

� time1 12.91% 9.24% 16.82% 17.54% 17.46%

2

� time2 39.82% 31.90% 38.40% 41.07% 41.24%

PSNR(dB) 33.31 32.72 33.35 35.14 34.24

Time1(ms) 286762 306547 206463 208108 209800

Time2(ms) 47751 55022 49427 45961 44738

� psnr(dB) -0.06 -0.14 -0.02 -0.06 -0.06

� time1 18.98% 14.48% 25.03% 25.85% 25.85%

4

� time2 60.26% 53.83% 58.71% 61.57% 62.27%

PSNR(dB) 33.28 32.68 33.33 35.11 34.2

Time1(ms) 284027 303581 201343 203079 202812

Time2(ms) 41503 46933 43800 38366 37074

� psnr(dB) -0.09 -0.18 -0.04 -0.09 -0.1

� time1 19.75% 15.31% 26.89% 27.64% 28.32%

6

� time2 65.46% 60.62% 63.41% 67.92% 68.73%

PSNR(dB) 33.26 32.59 33.31 35.08 34.16

Time1(ms) 281035 302330 195405 203554 199486

Time2(ms) 36451 40793 38344 36588 32970

� psnr(dB) -0.11 -0.27 -0.06 -0.12 -0.14

� time1 20.60% 15.66% 29.05% 27.47% 29.50%

8

� time2 69.67% 65.77% 67.97% 69.41% 72.19%

V. CONCLUSIONS This paper presents a new fast method of rendering based

on depth map, which separates the holes in the process of warping into large and small holes region, then will be

processed according to the character of size. Also we propose the way of part of pixels referring to the warping relation and the other part of pixels updating the warping relation, the way of rendering not only considers the connection of spatial, but also considers the connection of temporal. After lots of experimental tests, it shows that the proposed method of rendering can effectively reduce rendering time with negligible quality degradation. In our future work, we will consider that how to improve the quality of rendering and also effectively reduce the rendering time.

ACKNOWLEDGEMENTS This work was supported in part by the Natural Science

Foundation of China (61171163, 60902096, 61071120, 60832003, 61102088), Natural Science Foundation of Zhejiang Province (Y1101240), Natural Science Foundation of Ningbo (2011A610200, 2011A610197)

REFERENCES [1] D. Ismael, P. Beatrice, “Depth-aided image inpainting for novel view

synthesis,” Multimedia Signal Processing(MMSP’10), France, pp.167-170, October 4-6, 2010.

[2] J. Xue, M. Xi, D. Li and M. Zhang, “A New Virtual View Rendering Method Based on Depth Image,” 2010 Asia-Pacific Conference on Wearable Computing Systems, pp.147-150, April 17-18, 2010.

[3] C. Fehn, “Depth-Image-Based Rendering(DIBR),Compression and Transmission for a New Approach on 3D-TV,” In Proceedings of SPIE Stereoscopic Displays and Virtual Reality Systems XI, pp. 93-104. 2004.

[4] Y. Zhao, Z. Chen, D. Tian, C. Zhu, L. Yu, “Suppressing Texture-depth Misalignment for Boundary Noise Removal in View Synthesis,” 28th Picture Coding Symposium(PCS2010), Nagoya, Japan, pp.30-33, December 8-10,2010.

[5] P. Jun Lee,Effendi, “Nongeometric Distortion Smoothing Approach for Depth Map Preprocessing,” IEEE Transactions on Multimedia, vol.13,no.2, pp.246-254, April,2011.

[6] S. Zinger, L.Do, P.H.N.de With.Free-viewpoint depth image based rendering. J.Vis.Commun.Image, R.21(2010) pp.533-541, 2010.

[7] T. F. Chan, J. Shen, “Mathematical Models for Local Nontexture Inpaintings,” Socity for Industrial and Applied Mathematics. vol.62, no.3, pp.1019-1043, 2002

[8] Y. Zhao, C. Zhu, Z. Chen, L.Yu, “Depth No-Synthesis-Error Model for View Synthesis in 3-D Video,” IEEE Transations on Image Processing, vol.20, no.8, pp.2221-2228, August 2011.

[9] C. L. Zitnick, S. B. Kang, M. Uyttendaele, S. Winder, and R. Szeliski, “High-quality video view interpolation using a layered representation,” ACM SIGGRAPH and ACM Trans. on Graphics, vol.23, no.3, pp. 600-608, August 2004.

[10] ftp://ftp.hhi.de/HHIMPEG3DV