[IEEE 2010 2nd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2010) - Beijing, China (2010.09.24-2010.09.26)] 2010 2nd IEEE InternationalConference

Proceedings of IC-NIDC2010

ZERO COEFFICIENT-AWARE FAST IQ-IDCT ALGORITHM

Kiho Choi, Ki hoon Lee, Eun Ji Kim, Euee S. Jang

Department of Electronics and Computer Engineering Hanyang University, Seoul, Korea

[email protected], [email protected]

AbstractIn this paper, we propose a zero quantized DCT coefficients-aware algorithm for the implementation of fast inverse quantization (IQ) and inverse discrete cosine transform (IDCT). In our prior work [4], we showed that zero coefficient-ware IDCT algorithm for fast decoding. In this paper, we extended a zero-skipping IDCT (Z-IDCT) by incorporating IQ with Z-IDCT. By adaptively skipping zero quantized coefficient computations, the proposed method can significantly reduce the computation time of IQ and IDCT in decoder. The decoding time of IQ and IDCT using the proposed method showed 36.7 percent speedup on average compared to that of MPEG-4 simple profile, and it also showed 9.6 percent speedup on average compared to that of Z-IDCT with MPEG-4 simple profile.

Keywords: Fast decoding, Inverse quantization, Inverse discrete cosine transform, zero coefficient

1 Introduction In the major processing components of a video coding, the quantization after DCT is aimed at reducing the total number of bits through increasing probability of zero coefficients. The quantization with DCT creates many zero quantized coefficients during the encoding process causing the zero quantized coefficient to become the dominant input value of inverse quantization (IQ) and/or inverse discrete cosine transform (IDCT) in the decoder. Then, zero DCT coefficients are not necessary for computation of IQ and IDCT due to the fact that the value does not influence the reconstructed values. Using this fact, some researchers have tried to reduce the computational complexity of encoder and decoder [1]-[3].

Xuan et. al. proposed algorithm using the sum of absolute difference (SAD) of each motion compensation block as the criteria for skipping DCT, Q, IQ, and IDCT. The results showed that Xuan’s method can determine up to 40 percent of

all-zero DCT coefficient blocks for reducing the computational complexity [1].

Jun et. al. proposed an improved SAD criteria for early detection of all-zero coefficients, which recognizes that some all-zero blocks will be misjudged when using the method provided by [I] to H.263 head-shoulder sequences [2]. In the paper the results showed that up to 60 percent of blocks can be correctly determined to be all-zero DCT coefficient blocks.

Recently, Ji et. al. approached early determination of zero-quantized 8×8 DCT coefficients for fast video encoding using several sufficient conditions that derived to early determination of whether a prediction error block (8×8) is an all-zero or a partial-zero block [3]. The experimental results showed that up to 74.35 percent addition, 81.34 percent Shift, and 81.12 percent comparison operations can be saved in the aspect of operational complexity.

All of the aforementioned algorithms early determine all-zero blocks, which use the characteristic that the most values are zero after DCT and quantization process. Throughout elimination of redundancy in DCT, Q, IQ and IDCT process, they introduced the capability of a fast en/decoding process. However, these algorithms are not mainly concerned with the low computational complexity of decoder side; therefore, their algorithms cannot support an adaptively skipping computation of the zero coefficient for decoding process only. In this paper, we propose a zero quantized DCT coefficient skip algorithm for the fast IQ and IDCT computation in the decoder side. For adaptively skipping a zero quantized DCT coefficient, we extend a zero coefficient-ware fast IDCT (IDCT), which is our previous work, [4] to IQ process. Grounded on the concept of zero coefficient awareness, the proposed method, with the use of the concept, he proposed method can significantly reduce the consecutive computations of IQ and IDCT, which is a not necessary

327

___________________________________ 978-1-4244-6853-9/10/$26.00 ©2010 IEEE

computation when a quantized DCT coefficient is zero.

The organization of this paper is as follows. In Section , we present the quantized DCT coefficient analysis. In Section , we propose a compromised design of IQ and IDCT which is used a zero coefficient-aware concept. The proposed method evaluates the running time with the MPEG-4 simple profile (SP) decoder. The experimental results are shown in Section IV. Finally, we conclude this paper.

2 Proposed algorithm

2.1 Zero coefficient aware IQ-IDCT

In lossy compression, quantization is the setp that introduces signal loss, for better compression. In decoder, inverse quantization by simply scaling the quantizaed data by quantized stepsize Qs as follows:

),(),( vuQQvuX s �� (1)

where Q(u, v) is the quantized DCT coefficient and X(u, v) is the DCT coefficient at (u, v) in the transform domain.

After IDCT, one block of the 8×8 DCT coefficients can be transformed into a pixel domain via the following equations:

��

�7

0

7

0),()()(

41),(

u vvuXvCuCjiY

��

� �

��

� �

�16

)12(cos16

)12(cos �� vjui (2)

and

C(u), C( v) = �

��

.,10,,2/1

otherwisevufor

where Y(i, j) is the pixel value at (i, j), and C(u)and C(v) are the scaling factors.

In (1) and (2), only the quantized DCT coefficients are variable, while the other terms such as the scaling factor and the basis function are fixed by the values of (u, v) and (i, j). Based on this, equation (1) can be rewritten using as follows:

),(),(41),(

7

0

7

0, vuQvuAjiY

u vji��

� �

� (3)

where

��

� �

��

� �

�16

)12(cos16

)12(cos)()(41),(,

�� vjuiQvCuCvuA sji

For a block of 8×8 pixels, (3) can be rewritten in a matrix form (Y = AQ ) as follows:

��

�

�

��

)7,7(

)1,0()0,0(

Y

YY

�=

��

�

�

��

)7,7()1,0()0,0(

)7,7()1,0()0,0(

)7,7()1,0()0,0(

7,77,77,7

1,01,01,0

0,00,00,0

AAA

AAA

AAA

�

��

�

�

��

�

�

��

)7,7(

)1,0()0,0(

Q

QQ

�.(4)

In (4), the pixel value Y(i, j) is the output of the inner product between the row vector of A and the column vector Q. Equation (4) can be rewritten as a linear combination of the column vector of the matrix A and the scalar entries of Q:

��

�

�

��

)7,7(

)1,0()0,0(

Y

YY

�= )0,0(Q

��

�

�

��

)0,0(

)0,0(

)0,0(

7,7

1,0

0,0

A

AA

�+ � + )7,7(Q

��

�

�

��

)7,7(

)7,7(

)7,7(

7,7

1,0

0,0

A

AA

�.(5)

Equation (5) shows that all the multiplication operations contain Q(u, v) as a term. This means that the multiplication between Q(u, v) and the column vector of A is not necessary if the term Q(u,v) is zero. Accordingly, (5) can be written as follows:

��

�

�

��

)7,7(

)1,0()0,0(

Y

YY

� = 0K

��

�

�

��

07,7

01,0

00,0

K

K

K

A

AA

�+ � + 1�mK

��

�

�

��

�

�

�

17,7

11,0

10,0

mK

mK

mK

A

AA

� (6)

where Kl designates the l-th DCT coefficient out of m non-zero quantized DCT coefficients and lK

jiA , is

the column vector entry with Kl. We call (6) the zero coefficient aware IQ-IDCT (Z-IQIDCT) design as it makes possible a reduction in the number of operations depending on the number of non-zero coefficients (m « 64). To further reduce the number of multiplications in (6), we employ the table look-up method as follows:

328

Terms

Terms

0 0 2 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 7 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

Table at l=2

2 ……

Table at l=28

…

…

…

Quantized DCT coefficients

Reconstruction values

7 ……

…

…

Z-IQIDCT

48-30 -40 22 22-40 -30 485 14 3-21 -21 3 14 5

-29 47 36-54 -54 36 47 -29-11 29 18-36 -36 18 29 -1136-18 -29 11 11-29 -18 3654-36 -47 29 29-47 -36 5421 -3 -14 -5 -5 -14 -3 21-22 40 30-48 -48 30 40 -22

Figure 1. An example of Z-IQIDCT where QP is 16

��

�

�

��

)7,7(

)1,0()0,0(

Y

YY

� = �

�

� ��

�

�

��

1

07,7

1,0

0,0m

llK

l

lKl

lKl

AK

AKAK

� = �

�

� ��

�

�

��

1

0)63(

)1()0(

m

l

l

l

l

T

TT

� (7)

where )(, eTAK lK

jill � with jie �� )×8( .

Equation (7) is the core concept of the proposed a Z-IQIDCT design without multiplication by use of pre-stored values ( � �T

lll TTT )63()1()0( � ) in a table.

A key premise behind this Z-IQIDCT is that the all of the zero quantized DCT coefficients before the IQ process can be ignoried and the non-zero coefficients are adaptively taken by Z-IQIDCT. It makes a big difference between the Z-IQIDCT and the conventional all-zero blocks skipping method. Because the Z-IQIDCT approached the decode side and skipped the zero quantized DCT coefficients in the coefficient level not a block level, the Z-IQIDCT can be practicable to carry out a perceptually computations for the fast IQ-IDCT process. The typical example is shown in Fig. 1. When it comes to the two non-zero coefficients, the two memory accesses for 64 term are enough for the IQ-IDCT computations.

For the implementation of Z-IQIDCT, it is important to determine how the table of the terms in (7) is implemented. If the table includes all the possible terms from the defined range of the DCT coefficients, the memory size of the table is not practical (e.g., 67 MB for MPEG-4 SP). For practicality, the table size may be chosen from 6.8 MB, 2.1 MB, or 1.3 MB with QP values varying from 5, 16, and 25, respectively. The appropriate table can be chosen according to the QP value.

3 Experimental result For the performance evaluation of the proposed method, we evaluated the decoding time of IQ-IDCT with the exiting coding standard MPEG-4 SP []. Additionally, the proposed method compared the decoding time of IQ-IDCT with our previous work Z-IDCT with MPEG-4 sp. The proposed Z-IQIDCT method was implemented in MPEG-4 SP reference decoder. For the accurate comparison, the compression efficiency is computed as follows:

100_ ��

�REF

PROPREF

TTT

rateSpeedup

More details on the decoding environment are described in Table 1.

Table 1 Test environment the MPEG-4 SP decoder

Test sequencesAkiyo, Foreman, Mobile, Pedestrian, Riverbed, Rush-hour, and Sunflower

Sequence resolution CIF (352×288), and HD (1920×1080)

Total frames to be coded

CIF (300 frames), and HD (250 frames)

Profile Simple profile Quantization

parameter5, 16, and 25

Table 2 shows the total IQ-IDCT decoding times of the MPEG-4 SP and the MPEG-4 SP with Z-IDCT, and proposed method. The table represented that the proposed method outperformed the MPEG-4 SP and MPEG-4 SP with Z-IDCT regardless of the

329

Table 2 The IQ-running time comparison

Time (ms) Speedup rate (%) Sequence QP MPEG-4 (A) MPEG-4 with

Z-IDCT (B) Z-IQIDCT

(Prop) A vs Prop B vs Prop

5 901,942 690,450 559,605 37.96 18.95 16 328,615 222,149 195,185 40.60 12.14 akiyo25 282,078 186,431 156,027 44.69 16.31 5 2,949,468 2,387,060 2,114,152 28.32 11.43

16 804,208 571,904 519,990 35.34 9.08 foreman 25 483,000 304,839 290,341 39.89 4.76 5 6,169,879 6,222,492 5,553,646 9.99 10.75

16 3,060,901 2,235,168 2,017,542 34.09 9.74 mobile 25 1,799,292 1,351,743 1,101,870 38.76 18.49

CIF Average 34.40 12.405 38,583,914 21,719,184 21,902,309 43.23 -0.84

16 13,897,594 7,913,402 7,493,061 46.08 5.31 pedestrian 25 11,334,754 6,433,304 6,472,382 42.90 -0.61 5 70,478,743 54,793,777 49,930,241 29.16 8.88

16 31,987,476 23,161,006 20,521,864 35.84 11.39 riverbed 25 24,357,448 15,280,056 13,738,910 43.59 10.09 5 23,217,263 14,682,358 15,109,147 34.92 -2.91

16 5,727,311 3,704,090 3,731,149 34.85 -0.73 rush_hour 25 4,919,064 3,076,991 2,884,327 41.36 6.26 5 18,334,807 12,670,775 11,614,519 36.65 8.34

16 5,720,118 3,394,655 2,946,800 48.48 13.19 sunflower 25 4,941,466 2,790,635 2,606,189 47.26 6.61

HD Average 39.13 6.79 Total Average 36.76 9.60

sequences and QPs. The overall time reduction is about 38 percent compared to MPEG-SP and about 10 percent compared to MPEG-SP with Z-IDCT, respectively. More specifically, the running time of the proposed method was less than that of the MPEG-4 SP by a factor of 34 percent and that of the MPEG-4 SP with Z-IDCT by a factor of 12 percent in the case of CIF sequence. In the case of HD sequence, the running time of the proposed method was less than that of the MPEG-4 SP by a factor of 39 percent and that of the MPEG-4 SP with Z-IDCT by a factor of 7 percent.

In the results, the proposed method (Z-IQIDCT) is more effective in the case of HD sequence compared to MPEG-4 SP; otherwise, the proposed method (Z-IQIDCT) is more effective in the case of CIF sequence compared to MPEG-4 SP with Z-IDCT. This is due to the fact that our previous work (Z-IDCT) was designed more effectively for skipping zero coefficient where zero coefficients are a lot such as HD sequence. Thus, the good performance of Z-IDCT at HD sequence makes relatively small reduction rate compared to CIF sequences. Nevertheless, it should be noted that the proposed method (Z-IQIDCT) is faster up to 19 percent than the MPEG-4 with Z-IDCT, and faster up to 48 percent than the MPEG-4. From the overall experimental results, the proposed method

significantly reduced the time complexity of IQ-IDCT.

4 Conclusions In this paper, we proposed a zero coefficient-aware fast IQ-IDCT algorithm in decoder. For adaptively skipping the computations related to a zero quantized DCT coefficient, we extend zero coefficient-ware fast IDCT (IDCT) which is our previous work [4] to IQ process. Experimental results demonstrate that the proposed method significantly reduced the time complexity of IQ-IDCT by a decreased factor of 37 percent of on average compared to MPEG-4 SP.

Acknowledgements This work was supported by National Research Laboratory, Korea.

References [1] X. Zhou, Z. Yu, and S. Yu, “Method for

detecting all-zero DCT coefficients ahead of discrete cosine transformation and quantization,” Electron. Lett., vol. 34, no. 19, pp. 1839–1840, Sep. 1998.

[2] S. Jun and S. Yu, “Efficient method for early

330

detection of all-zero DCT coefficients,” Electron. Lett., vol. 27, no. 3, pp. 160–161, Feb. 2001.

[3] X. Ji, S. Kwong, D. Zhao, H. Wang, C. -C. J. Kuo, and Q. Dai, “Early Determination of Zero-Quantized 8x8 DCT Coefficients,” IEEE Transactions on CSVT, vol. 19, No. 12, December 2009, pp. 1755-1765.

[4] Kiho Choi, Sunyoung Lee, and Euee S. Jang. “Zero coefficient-aware IDCT algorithm for fast video decoding,” submitted IEEE Trans. Consum. Electron. (2010).

[5] ISO/IEC 14496-5:2000, Coding of Audio-Visual Objects-Part5: Reference Software, 2000

331

Documents

[IEEE 2010 2nd IEEE International Conference on Network Infrastructure and Digital Content (IC-NIDC 2010) - Beijing, China (2010.09.24-2010.09.26)] 2010 2nd IEEE InternationalConference