[IEEE 2011 IEEE First International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2011.09.6-2011.09.8)] 2011 IEEE International Conference on Consumer

FIXED-POINT ZERO COEFFICIENT-AWARE FAST IQ-IDCT ALGORITHM

Kihoon Lee, Kiho Choi, and Euee S. Jang Digital Media Laboratory

Hanyang University Seoul, Korea

[email protected]

Abstract—In this paper, we propose a fixed-point zero coefficient–aware fast IQ-IDCT algorithm to reduce the computational complexity of discrete cosine transform and cope with mismatch of decoded data in between encoder and decoder. The major theme of this paper is based on zero coefficient-aware design, which reduces the computational complexity of inverse DCT algorithms by avoiding unnecessary computations caused by zero DCT coefficients. We extended the zero coefficient-aware design to the inverse quantization stage to farther reduce the computational complexity of inverse quantization and inverse DCT by avoiding computations with zero quantized DCT coefficients. In order to maximize the computational complexity reduction as well as to preserve precision accuracy of the ideal IQ-IDCT process, the proposed method employs the fixed-point approximation scheme on all computational procedures of the IQ-IDCT based on table-lookup operations with accurate dyadic terms. As a result, we have achieved a speedup by factor of 3.1 on average compared to the fixed-point 8x8 inverse discrete cosine transform standard.

Keywords-Inverse quantization; Inverse discrete cosine transform; zero coefficient; DCT; IDCT; fixed-point;

I. INTRODUCTION For many video coding standards (e.g., MPEG-1, MPEG-2,

and MPEG-4), discrete cosine transform (DCT) and inverse discrete transform (IDCT) have been used extensively for their comparable performance to Karhunen-Loève transform (KLT) on energy compaction [1]. DCT is adopted because the overall complexity of DCT is lower than that of KLT, while the performance on energy compaction is comparable. As the video resolution increases, the computational complexity reduction of DCT/IDCT has been a major research topic in video coding.

Various approaches to reduce the computational complexity of IDCT while preserving data integrity as well as visual quality have thus been developed. One of the most successful approaches is the butterfly based algorithm that reduces the number of computations down to one sixth of the conventional DCT algorithm [2]. Another interesting approach includes the design of fast DCT/IDCT exploiting zero DCT coefficients in butterfly architecture. More specifically, Ji et al. proposed an early determination method of zero DCT coefficients based on butterfly architecture [3].

The introduction of fixed-point algorithm in IDCT design has brought even further reduction in computational complexity. Originally, the fixed-point algorithm was introduced to solve the mismatch of decoded data in between encoder and decoder caused by the drift problem of which the floating-point operation induced. By adopting the fixed-point algorithm in MPEG (i.e., ISO/IEC 23002-1 and ISO/IEC 23002-2), it is now possible to double the decoding speed compared to that of the floating-point algorithm [4]-[5].

In our previous study, we investigated zero coefficient-aware IDCT (Z-IDCT) design as an alternative approach to further reduce the computational complexity of IDCT. The Z-IDCT approach makes it possible to avoid IDCT computations when the input DCT coefficients are zeroes [6]. We have extended our research of Z-IDCT for the fixed point algorithm, which was shown to be twice faster than ISO/IEC 23002-2 implementation [7].

In this paper, we propose a fixed-point zero coefficient-aware fast IQ-IDCT algorithm to extent our previous zero coefficient-aware fast IQ-IDCT (Z-IQIDCT) onto fixed-point approximation scheme [8]. In order to decrease the computational complexity of the Z-IQIDCT caused by float-point operation as well as to increase precision accuracy for the ideal IQ-IDCT, we designed a table that has a fixed-point dyadic approximation terms according to the quantized DCT coefficient and made a procedure only for the IQ-IDCT calculation of a non-zero quantized DCT coefficient. The experimental result of the proposed method showed that the proposed method was faster than that of the conventional fixed-point 8�8 IDCT and IQ process while preserving the accuracy of ISO/IEC 23002-1.

The paper is arranged as follows. In Section II, we introduce the theoretical study of the proposed method along with the background in choosing the sufficient bit for fixed-point algorithm. The experimental results are shown in Section III. Finally, we conclude this paper.

II. THE PROPOSED METHOD

A. Mathmatical derivation In a conventional decoding process, IQ and IDCT are

implemented as a concatenated process as shown in Fig. 1-(a).

This work was supported by Seoul R&BD Program (PA100094), Korea.

2011 IEEE International Conference on Consumer Electronics - Berlin (ICCE-Berlin)

978-1-4577-0234-1/11/$26.00 ©2011 IEEE 161

More specifically, the IQ process can be described as the following equation:

),(),( vuQQvuX s �� , (1) where u and v indicates the horizontal and vertical position in the transform domain respectively, ),( vuX the DCT coefficient, ),( vuQ the quantized DCT coefficient and, sQ the scalar determined by the quantization parameter (QP) (e.g.,

sQ = 8 for QP= 1 through 4, sQ = 2×QP for QP= 5 through 8,

sQ = QP+8 for QP= 9 through 24, and sQ = 2×QP-16 for QP > 24 ).

After IQ, the IDCT process is conducted as follows:

��

�7

0

7

0),()()(

41),(

u vvuXvCuCjiY

��

� �

��

� �

�16

)12(cos16

)12(cos �� vjui

(2)

�

��

. ,1 0 , ,2/1)(),(

otherwisevuforvCuC

,

where i and j indicates the horizontal and vertical position in the pixel domain respectively, ),( jiY the reconstructed pixel value, ),( vuX the DCT coefficient at position ),( vu , and

)(uC and )(vC the scalars that are determined by the horizontal position u and vertical position v.

From Eq. (1) and Eq. (2), we can derive an equation that allows direct transformation of quantized DCT coefficients to pixel values as follows:

� ��

7

0

7

0, ),(),(

41),(

u vji vuAvuQjiY

(3)

��

� �

��

� �

��16

)12(cos16

)12(cos)()(41),(,

�� vjuivCuCQvuA sji

where ),(, vuA ji is the set of all the input independent factors and scalars such as sQ , )(uC , )(vC , and the basis function. Eq (3) represents the proposed method of the IQ-IDCT algorithm as illustrated in the Fig. 1-(b).

Eq. (3) can be simplified further by removing zero quantized DCT coefficients which are not necessary during the IQ-IDCT computation. From the observation of our previous work [6] that around 60 out of 64 quantized DCT coefficients are zeroes regardless of sequence and QP, we can reduce the number of quantized DCT coefficients involved in the IQ-IDCT computation by removing those zeroes. The equation without zero quantized DCT coefficients can be represented as follows:

��

�

1

0,),(

m

l

KjillAKjiY , (4)

where lK indicates the l-th coefficient from m non-zero

quantized DCT coefficients and lKjiA , indicates the column

vector corresponding to lK . Eq. (4) is the key equation that expresses the IQ-IDCT algorithm with zero coefficient-aware scheme. However, Eq. (4) is better suited for floating-point based operations since the basis function cosine and the scalar

)(uC , )(vC within the term lKjiA , are real numbers embracing

irrational numbers. Thus, any implementation of Eq. (4) could induce the mismatch problem because of precision accuracy.

Figure 1. IQ and IDCT in decoding process

The mismatch problem can be solved by applying the fixed-point scheme as follows:

� ��

�

�

�

1

0

1

0 21),(

21),(),(

m

lbl

m

lbl jiEjiTjiY

��

��

��

��

��

��

2122),(

212),(

,,

,

ll

l

Kjil

bKjil

bl

Kjil

bl

AKAKjiE

AKjiT

, (5)

where b indicates the mantissa bit, ),( jiTl the dyadic rational approximation for the given input value lK , and ),( jiEl the error term (i.e., 0≤ ),( jiEl < 1). If the error term is kept within the tolerable range, Eq. (5) can be represented without ),( jiEl . It is now possible to isolate the precision error within ),( jiEl using Eq. (5), which makes it possible to avoid the mismatch problem.

Eq. (5) is the core concept of the fixed-point zero coefficient-aware algorithm. It should be noted that the proposed method inherits the advantage of Z-IQIDCT which takes only the non-zero quantized DCT coefficients into the IQ-IDCT operation. In addition, the calculation of ),( jiTl can be minimized to a memory access instead of seven multiplications including the computation of lK

jiA , , a round-off operation, and a shift operation.

B. Considerations for Implementation For the implementation of the proposed method, it is

important to consider the size of the table, the search method of the index, and the selection of mantissa bit.

The size of the table should be considered for the practicality of the proposed method. A non-zero input value of

162

IQ-IDCT may hold 4095 possible values within the range as follows:

20472048 �� lK (6)

where lK indicates a non-zero input value of IQ-IDCT. Each value of lK holds 64 intermediate dyadic terms, and intermediate dyadic terms differ depending on the 64 positions of the input. The proposed method should keep a table that contains all the terms to operate in an ideal condition, which requires the size of the table to be as large as 65 megabytes (MB). Although the storage requirement of 65 MB may not be an issue for most decoders, the storage requirement can be minimized if we know the QP values beforehand. Table I shows the reduced table size according to the QP values.

TABLE I. SIZE OF THE TABLE FOR THE CORRESPONDING QP

QP 18 24 30

Size(MB) 2.46 1.99 1.45

When using the table look-up method, it is important to minimize the memory access time by table look-up operation. To shorten the memory access time, the proposed method employs direct indexing method. Index is calculated by the following equation:

64)2048( �� lKIndex , (7) where lK indicates the input that is a non-zero quantized DCT coefficient. Eq. (7) allows the input value to be used directly as the index points at the position of the corresponding 64 intermediate dyadic terms in the table.

Another important feature to consider is the selection of the mantissa bit b that requires a careful attention. The mantissa bit is the key variable that determines the precision accuracy. Only if the precision accuracy is adequate, the error term can be kept to minimum and prevention of the mismatch problem which is due to the accumulation of rounding errors during IDCT operation, can be ensured. The proposed method set mantissa bit be 10 to limit the error term within 10-8.

III. EXPERIMENTAL RESULT For the performance evaluation, the proposed method is

implemented on MPEG-4 Simple Profile (SP) reference decoder [9]. We have compared the IQ-IDCT run-time of fixed-point Z-IQIDCT with the standard fixed-point algorithm (ISC/IEC23002-2) [5]. The testing environment is specified in Table II.

TABLE II. TESTING ENVIRONMENT ON MPEG-4 SP

Test sequences BasketBallDrive, Kimono1, ParkScence Sequence resolution HD (1920×1080)

Total frames coded BasketBallDrive: 114 frames

Kimono1: 217 frames ParkScene: 240 frames

Profile Simple profile Quantization parameter 18, 24, and 30

For the evaluation of the proposed method, we have measured the speedup rate as below:

PROP

REF

TT

Speedup � ,

where REFT designates the decoding time of the reference fixed-point algorithm and PROPT the decoding time of the proposed method.

Table III shows the total IQ-IDCT decoding time of the reference fixed-point algorithm and the proposed method. The proposed method outperforms the reference method for all sequences and QPs. More precisely, the proposed method was three times faster than the standard on average. The most speedup was shown in the ParkScene sequence with the speedup factor of 3.7 at QP 18 and the lowest was in the BasketBallDrive with the factor 2.6 at QP 18.

Table IV shows the comparison between Z-IQIDCT and the proposed method. The proposed method was more than three times faster than Z-IQIDCT in most of the cases except BasketballDrive sequence which the performance of the reference and Z-IQIDCT was about the same. The highest factor was 5.6 with the Kimono1 sqeuence at QP 30 and the lowest was 2.1 with the BasketBallDrive at QP 30.

TABLE III. IQ-IDCT DECODING TIME OF MPEG-4 SP

Sequences QP MPEG-4 SP Z-IQIDCT

Speedup (A/B) IQ

ISO/IEC 23002-2 SUM (A) Intra Inter SUM (B)

Intra Inter

BasketBall -Drive

18 925,946 2,573,302 2,552,745 6,051,992 865,952 1,427,055 2,293,007 2.6 24 673,112 1,526,426 2,791,149 4,990,686 865,100 848,073 1,713,173 2.9 30 554,009 1,051,751 2,938,515 4,544,275 885,493 533,325 1,418,818 3.2

Kimono1 18 368,167 861,599 1,521,670 2,751,435 534,408 483,362 1,017,770 2.7 24 273,718 456,359 1,543,424 2,273,501 446,778 225,377 672,155 3.4 30 241,702 281,064 1,606,878 2,129,643 464,115 132,721 596,836 3.6

ParkScene 18 542,630 1,279,098 1,590,760 3,412,488 568,131 348,168 916,299 3.7 24 363,133 656,956 1,639,026 2,659,115 559,352 347,601 906,953 2.9 30 280,377 386,417 1,654,271 2,321,065 521,489 210,341 731,830 3.2

Average 3.1

163

TABLE IV. IQ-IDCT DECODING TIME OF MPEG-4 SP

Sequences QP Z-IQIDCT(A) Fixed-point Z-IQIDCT(B)

Speedup (A/B)

BasketBall Drive

18 4,078,667 2,293,007 1.8

24 4,781,874 1,713,173 2.8

30 3,023,458 1,418,818 2.1

Kimono1

18 4,090,491 1,017,770 3.0

24 3,384,250 672,155 5.0

30 3,345,931 596,836 5.6

ParkScene

18 4,743,316 916,299 5.2

24 4,019,826 906,953 4.4

30 3,345,931 731,830 4.6

Average 3.8

Such notable results of the proposed method lies on the reduction of the computations through elimination of the zero quantized DCT coefficients and replacement of calculation of dyadic rational approximation with table-look up method. Moreover, an increase to the IQ-IDCT computation speed was brought to the proposed method by replacing the floating-point operations with fixed-point operations.

IV. CONCLUSION In this paper, we have proposed a fixed-point zero

coefficient-aware fast IQ-IDCT algorithm in decoder. In extension to Z-IQIDCT, we have applied the fixed-point scheme to further reduce the computational complexity and to solve the mismatch problem which float-point induces. Experimental results show that the adoption of the fixed-point algorithm has reduced the computational complexity of Z-IQIDCT by the speed factor of 3.8 on average. Moreover, the average speed rate factor of 3.1 comparing the runtime of the proposed method with the ISO/IEC 23002-2 show that the zero coefficient-aware algorithm still holds its effectiveness in reducing the computational complexity of IQ-IDCT under the fixed-point implementation.

ACKNOWLEDGMENTS This work was supported by Seoul R&BD Program

(PA100094), Korea.

REFERENCES [1] N. Ahmed, T. Natarajan, and K. Rao, “Discrete cosine transform”, IEEE

Transaction on Computers, vol. 32, no. 1, pp. 90-93, Jan. 1974. [2] W. H. Chen, C. H. Smith, and S. C. Fralick. “A Fast Computational

Algorithm for the Discrete Cosine Transform”, IEEE Transactions on Communications, vol. COM-25, no. 9, pp. 1004-1009, Sept. 1977.

[3] X. Ji, S. Kwong, D. Zhao, H. Wang, C. -C. J. Kuo, and Q. Dai, “Early Determination of Zero-Quantized 8x8 DCT Coefficients,” IEEE Transactions on CSVT, vol. 19, No. 12, December 2009, pp. 1755-1765.

[4] ISO/IEC 23002-1:2006, MPEG video technologies Part 1: Accuracy requirements for implementation of integer-output 8 8 inverse discrete cosine transform, 2006

[5] ISO/IEC 23002-2:2008, MPEG video technologies Part 2: Fixed-point 8 8 inverse discrete cosine transform and discrete cosine transform, 2008

[6] K. Choi, S. Lee, and E.S. Jang, “Zero coefficient-aware IDCT algorithm for fast video decoding”, IEEE Trans. Consum. Electron., 2010, 56, (3).

[7] Kiho Choi, Kihoon Lee, Eun Ji Kim, and Euee S. Jang, “Zero coefficient-aware fast IQ-IDCT algorithm”, Proceedings of IEEE international confernece on Network Infrastructure and Digital Content 2010, pp 327-331

[8] Kiho Choi and Euee S. Jang, “Scaled zero coefficient-aware IDCT algorithm for fast video decoding,” ELECTRONICS LETTERS, vol. 46, Issue: 25, Dec. 2010

[9] ISO/IEC 14496-5:2000, Coding of Audio-Visual Objects-Part5: Reference Software, 2000

164

Documents

[IEEE 2011 IEEE First International Conference on Consumer Electronics - Berlin (ICCE-Berlin) - Berlin, Germany (2011.09.6-2011.09.8)] 2011 IEEE International Conference on Consumer