6
Frame-level Adaptive Variable Length Coding for MPEG-4 AVC/H.264 Sinwook Lee, Gyeong Gi Noh, and Euee S. Jang Digital Media Laboratory Hanyang University Seoul, Korea [email protected] Abstract—In conventional video coding, transforms such as DCT have been playing a crucial role by providing good data compaction for further compression with quantization and entropy coding. Traditionally, transformed coefficients after quantization are represented with nonzero coefficients and their locations (or patterns) in a block. The compression of coefficient patterns was often done with run-length coding, which is also the case in MPEG-4 AVC/H.264. In this paper, we propose a new method to compress the coefficient patterns adaptively on the frame-basis. In the proposed method, we first categorize all possible coefficient patterns (i.e., 2 16 – 1 = 65535 for 4 4 transform in AVC) into 16 different classes based on the number of nonzero coefficients in a block. The proposed method exploits the fact that the coefficient patterns have strong correlation depending on the video sequence and quantization parameter. The probability of each coefficient pattern is updated after encoding each frame, which is then used to generate a new variable length code of the pattern for the next frame. Experimental results show that the proposed method can provide up to 9.4% bit saving over MPEG-4 AVC/H.264 baseline profile encoder. Keywords-component; RLC; entropy coding; frame-level; I. INTRODUCTION In the conventional video coding standards, run-level coding (RLC) has been widely used for compressing the quantized coefficients efficiently. In RLC, the quantized coefficients are described as pairs of the non-zero coefficients and the run-lengths (consecutive zero coefficients) in between the non-zero coefficients in a predefined order (e.g., zigzag scan order). Many DCT coefficients become zero after quantization and RLC provides an efficient way to separate non-zero coefficients from zero coefficients. Traditionally, RLC has been designed with the block size of 88 because 88 DCT has been the popular choice in many video coding standards until MPEG-4 part 2 [1]. In the recent video coding standards such as MPEG-4 AVC/H.264 [2], the transform block size has been mainly chosen to be 44 instead of 88. With the smaller block size, RLC still plays an important role in describing non-zero coefficients out of 16 quantized coefficients. As an alternative to RLC, the direct compression of binary pattern of non-zero and zero coefficients in an 88 block is theoretically possible. But it is not practical because the number of possible 64-bit binary patterns is very large to model the patterns efficiently for compression (2 64 1.8 10 19 ). This is why RLC has been effective for 88 block-based coding. When it comes to the case of 44 block— the number of binary patterns becomes 65 536 (or 2 16 ) cases, the direct compression of binary patterns becomes very feasible. In this paper, we investigated the feasibility of the direct compression of binary patterns to replace the existing RLC structure. The binary patterns are variable length coded like Huffman coding. Unlike transformed coefficients, we observed that the binary patterns have strong correlation depending on the video sequence and quantization parameter. It is desired that such characteristics should be exploited in the VLC design. A good way to design an optimal VLC is to design a VLC for each video sequence and/or quantization parameter. In [3], adaptive multiple VLC tables are generated and transmitted for each image to increase the coding efficiency. When it comes to video—differently from image, the transmission of VLC tables has not been so popular. It is because fixed VLC tables were preferred to customized VLC tables for such applications as broadcasting. However, the latest video codecs employ adaptive entropy coding such as context adaptive binary arithmetic coding (CABAC). CABAC generally outperforms context adaptive VLC (CAVLC) because the probability table is updated whenever each symbol is encoded [4]. In this paper, we proposed frame-level adaptive VLC (FAVLC) for encoding the binary patterns of DCT coefficients. FAVLC has several advantages over fixed VLC in that the VLC tables can be updated every frame to be customized for each video sequence and that the computational complexity from updating VLC tables in FAVLC is not as high as CABAC but comparable to CAVLC. From the experiments, the proposed method clearly demonstrated the better compression performance with reasonable computational complexity. This paper is organized as follows. In Section 2, we introduce an overview of CAVLC, especially run-level coding, 2010 IEEE 14th International Symposium on Consumer Electronics 978-1-4244-6673-3/10/$26.00 ©2010 IEEE

[IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

  • Upload
    euee-s

  • View
    217

  • Download
    4

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

Frame-level Adaptive Variable Length Coding for MPEG-4 AVC/H.264

Sinwook Lee, Gyeong Gi Noh, and Euee S. Jang Digital Media Laboratory

Hanyang University Seoul, Korea

[email protected]

Abstract—In conventional video coding, transforms such as DCT have been playing a crucial role by providing good data compaction for further compression with quantization and entropy coding. Traditionally, transformed coefficients after quantization are represented with nonzero coefficients and their locations (or patterns) in a block. The compression of coefficient patterns was often done with run-length coding, which is also the case in MPEG-4 AVC/H.264. In this paper, we propose a new method to compress the coefficient patterns adaptively on the frame-basis. In the proposed method, we first categorize all possible coefficient patterns (i.e., 216 – 1 = 65535 for 4��4 transform in AVC) into 16 different classes based on the number of nonzero coefficients in a block. The proposed method exploits the fact that the coefficient patterns have strong correlation depending on the video sequence and quantization parameter. The probability of each coefficient pattern is updated after encoding each frame, which is then used to generate a new variable length code of the pattern for the next frame. Experimental results show that the proposed method can provide up to 9.4% bit saving over MPEG-4 AVC/H.264 baseline profile encoder.

Keywords-component; RLC; entropy coding; frame-level;

I. INTRODUCTION In the conventional video coding standards, run-level

coding (RLC) has been widely used for compressing the quantized coefficients efficiently. In RLC, the quantized coefficients are described as pairs of the non-zero coefficients and the run-lengths (consecutive zero coefficients) in between the non-zero coefficients in a predefined order (e.g., zigzag scan order). Many DCT coefficients become zero after quantization and RLC provides an efficient way to separate non-zero coefficients from zero coefficients.

Traditionally, RLC has been designed with the block size of 8�8 because 8�8 DCT has been the popular choice in many video coding standards until MPEG-4 part 2 [1]. In the recent video coding standards such as MPEG-4 AVC/H.264 [2], the transform block size has been mainly chosen to be 4�4 instead of 8�8. With the smaller block size, RLC still plays an important role in describing non-zero coefficients out of 16 quantized coefficients.

As an alternative to RLC, the direct compression of binary pattern of non-zero and zero coefficients in an 8�8 block is theoretically possible. But it is not practical because the number of possible 64-bit binary patterns is very large to model the patterns efficiently for compression (264 1.8 � 1019). This is why RLC has been effective for 8�8 block-based coding. When it comes to the case of 4�4 block— the number of binary patterns becomes 65 536 (or 216) cases, the direct compression of binary patterns becomes very feasible.

In this paper, we investigated the feasibility of the direct compression of binary patterns to replace the existing RLC structure. The binary patterns are variable length coded like Huffman coding. Unlike transformed coefficients, we observed that the binary patterns have strong correlation depending on the video sequence and quantization parameter. It is desired that such characteristics should be exploited in the VLC design.

A good way to design an optimal VLC is to design a VLC for each video sequence and/or quantization parameter. In [3], adaptive multiple VLC tables are generated and transmitted for each image to increase the coding efficiency. When it comes to video—differently from image, the transmission of VLC tables has not been so popular. It is because fixed VLC tables were preferred to customized VLC tables for such applications as broadcasting. However, the latest video codecs employ adaptive entropy coding such as context adaptive binary arithmetic coding (CABAC). CABAC generally outperforms context adaptive VLC (CAVLC) because the probability table is updated whenever each symbol is encoded [4].

In this paper, we proposed frame-level adaptive VLC (FAVLC) for encoding the binary patterns of DCT coefficients. FAVLC has several advantages over fixed VLC in that the VLC tables can be updated every frame to be customized for each video sequence and that the computational complexity from updating VLC tables in FAVLC is not as high as CABAC but comparable to CAVLC. From the experiments, the proposed method clearly demonstrated the better compression performance with reasonable computational complexity.

This paper is organized as follows. In Section 2, we introduce an overview of CAVLC, especially run-level coding,

2010 IEEE 14th International Symposium on Consumer Electronics

978-1-4244-6673-3/10/$26.00 ©2010 IEEE

Page 2: [IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

and we propose our VLC coding. Experimental results are presented in Section 3 and we make a conclusion in section 4.

II. RUN LEVEL CODING In RLC, there are several elements to encode: the sign and

magnitude of each non-zero coefficient and the run length of the consecutive zero coefficients between non-zero coefficients. Traditionally, the coding elements have been represented as pairs of (sign, magnitude, zero run). These pairs are encoded with the unique code words from the pre-defined VLC table. Especially in such a codec as MPEG-4 part 2, the pair of magnitude and zero run is treated as one symbol prior to VLC. However, magnitude and zero run are encoded separately in the recent video coding standard such as MPEG-4 AVC/H.264. It is largely resulted from the change of the transform block size from 8�8 to 4�4.

The block size of 4�4 opens many possible ways to improve RLC design because the size of data to process is no longer 64 elements, but only 16. For example, the smaller block size enables the more efficient design of VLC in that the probability distribution of 16 elements is better and easier to model than that of 64 elements. In CAVLC of MPEG-4 AVC/H.264, four different VLC types—1) the combination of the total number of non-zero coefficients (TC), the number of trailing ones (TO), and the sign values of TO, 2) levels, 3) the total number of zeros (TZ), and 4) zero runs—are used to represent and encode the 16 coefficients as shown in Fig. 1.

An example on how RLC is done in MPEG-4 AVC/H.264 is given in Fig. 2. From the figure, it is apparent that multiple VLC tables are defined for VLC types 1 and 2. It is to increase the compression efficiency by context adaptive selection of VLC tables. In the case of VLC types 3 and 4, only one VLC table for each type is defined. In the case of VLC types 1 and 2, the correlation between neighboring blocks and between non-zero coefficients within a block is used as the context information in the selection of VLC tables. However, such correlation is hard to be applied to the pattern of non-zero coefficient occurrence in a block. For example, the TZ value in the current block is more dependent upon the quantization parameter and/or the image activity than upon the TZ values of the neighboring blocks, so is the case of zero run. Therefore, it would be desirable to design a different context model for VLC types 3 and 4 from that for VLC types 1 and 2 to further increase the compression efficiency.

III. THE PROPOSED METHOD In this section, we propose a new model to describe the

binary pattern of non-zero coefficient occurrence in a block to design an alternative VLC for VLC types 3 and 4 in MPEG-4 AVC/H.264.

A. Binary Pattern The VLC types such as TZ and zero runs can be replaced

by the binary pattern. Instead of representing the zero coefficient occurrence in a block with many symbols (i.e., TZ and zero runs), the binary pattern of non-zero coefficient occurrence can be represented as a vector B as follows,

(1)

where

, and Ci denotes the magnitude of the coefficient located in the i-th position in the scan order. If the block size is 4�4, the number of cases in �B is 216 (=65 536). It is far less than that of B when the block is 8�8 (i.e., 264 =1.8 � 1019 cases)

Figure 1. The Block Diagram of CAVLC

Figure 2. An example of RLC of 16 coefficients in MPEG-4 AVC/H.264

B. Binary Pattern Representation There can be 65 534 binary patterns in a 4�4 block

excluding two cases that coefficients are all zeros or all ones. One simple way to design an entropy coding of binary pattern is to make a VLC table based on the probability of each binary pattern. However, we can divide the binary patterns into 15 groups by the number of non-zero coefficients (TC). Because TC is already encoded, there is no further cost in subdividing the binary patterns into groups.

The number of binary patterns in each group differs from group to group because the number of binary patterns is determined by the number of non-zero coefficients in a 16-bit sequence. The number of binary patterns in the i-th group (Ni) can be described as follows:

(2)

Page 3: [IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

where i denotes the number of non-zero coefficients (or TC) and C(a, b) denotes b-combination from a set of a. The number of binary patterns in each group is listed in Table I. The number of binary patterns varies 16 to 12 780 depending on the value of TC, which makes it possible to represent the binary patterns in each group from 4 to 14 bits without entropy coding. From the table, it is clearly noticed that context modeling of binary patterns using TC results in further reduction in data representation.

C. Frame-level Adaptive VLC The binary patterns in Table I can be encoded with a

dedicated VLC table for each group as shown in Fig. 3. In designing VLC tables for the binary patterns, we found that the statistics of binary patterns are different from video sequence to video sequence as well as from a QP to another. Fig. 4 shows the probability distribution of binary patterns per each group for Mother&Daughter and Mobile sequences, respectively. As can be seen from the figure, the binary patterns generated in mobile sequence are more active than those generated in Mother&Daughter sequence. From this observation, we adopted the frame-level adaptive VLC design that updates the codewords in VLC tables every frame to make the VLC adaptively follow the local statistics of binary patterns.

Fig. 5 shows a flowchart to encode binary patterns using FAVLC. In addition to VLC tables, probability tables of binary patterns are maintained to accumulate the occurrence of individual binary patterns. Whenever a new frame is started, the probability tables are used to generate new codewords in the VLC tables as shown in the figure. Compared to fixed VLC, FAVLC is slightly more complex in that the probability tables get updated whenever a binary pattern is encoded. However, this update operation is very marginal in complexity because it involves an increment in a probability table. Additional increase in complexity comes from the generation of new VLC tables every frame, which is not severe because this update is done once a frame.

TABLE I. THE NUMBER OF BINARY PATTERNS IN EACH GROUP

i Ni Represent bits ( 1 16 4.0 2 120 6.9 3 560 9.1 4 1820 10.8 5 4368 12.1 6 8008 13.0 7 11440 13.5 8 12780 13.6 9 11440 13.5

10 8008 13.0 11 4368 12.1 12 1820 10.8 13 560 9.1 14 120 6.9 15 16 4.0

Average 12.9

Figure 3. An example of the binary pattern VLC

Figure 4. The probabilities of binary pattern occurrences in each group (Mobile and Mother&Daughter)

Figure 5. Flowchart of the Proposed Method

0

5

10

15

20

25

30

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Mother&Daughter Mobile

Page 4: [IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

When a binary pattern is encoded using the given VLC table, it must be noted that the statistics of binary patterns are different from video sequence to sequence. Fig. 6 is an example that shows the probability distribution difference between Mobile and Mother&Daughter sequences in VLC table 2 (TC = 2). In the case of the first binary pattern in the table of Fig. 6, there was about 17 percent difference on the occurrence of the particular binary pattern. Not only that, we found that about one-fifth of binary patterns are not occurred when we ran the test with eight different test sequences. This strongly implies that a context-adaptive design in VLC coding is clearly necessary. As a context-adaptive design, we introduced dual VLC tables (i.e., customized and default tables).

Fig. 7 illustrates the process of binary pattern coding by dual VLC. In dual VLC, there are customized and default VLC tables in the same group in Table I. The customized VLC table is to maintain the used binary patterns of the given video sequence at a certain QP, whereas the default VLC table is a pre-defined VLC table according to the QP value. The basic idea of dual VLC is to maximize the use of the customized VLC tables because the use of the customized tables can provide the more compression efficiency. If a binary pattern is not found in the customized VLC table, this pattern is encoded with the default VLC table. At this time, two symbols (i.e., an escape code in the customized table and a codeword in the default table) are encoded. The use of the default tables is not as efficient in compression as that of the customized tables because an extra symbol (i.e., escape code) has to be encoded additionally. Whenever a new binary pattern is encoded using the default VLC table, this pattern is added to the customized VLC table. From the next frame, this pattern is encoded using the customized VLC table. Therefore, whenever a new binary pattern is encoded using the default VLC table, the use of the default VLC table for the binary pattern is limited only for the frame.

IV. EXPERIMENTAL RESULTS In order to evaluate the performance of the proposed

method, we compared the proposed method with CAVLC. More specifically, we compared the bit rates corresponding to the binary patterns in FAVLC and to TZ and zero runs in CAVLC. For the accurate comparison between two methods, the compression efficiency is computed as follows:

(3)

We evaluated the performance of the proposed method with four CIF and four HD sequences as shown in Table II. In order to measure the performance of CAVLC, we used MPEG-4 AVC/H.264 JM 11.0 baseline profile. Our proposed method is also implemented in JM 11.0 to replace CAVLC with FAVLC.

As Table III shows, FAVLC always outperformed CAVLC in compressing the binary patterns. Compared with CAVLC, the proposed method achieved a compression gain from 2.17 to 8.92 percent in the case of CIF sequences and from 1.94 to 10.88 percent in the case of HD sequences, respectively. The average of compression efficiency recorded 5.05 percent in CIF and 5.39 percent in HD sequences. As expected, the

context-adaptive VLC design of the binary patterns is found to be very effective in the test sequences. As the table shows, the performance of the proposed method varies from sequence to sequence as well as from a QP to another.

Figure 6. Probability Distribution Difference of Binary Patterns in

NumVLC2 between Mobile and Mother&Daughter

Figure 7. Example of Dual VLC coding

TABLE II. TEST ENVIRONMENT

Reference Software Version JM 11.0 baseline profile

Test sequences CIF (Foreman, Mobile, Mother&Daughter,

Stefan) and HD (Traffic, Crowd run, Kimono, Park scene)

Sequence resolutions CIF (352 � 288) and HD (1920 � 1080)

Quantization parameters 22, 28, and 34

Frame structure IPPP ... P

0

2

4

6

8

10

12

14

16

18

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101

106

111

116

perc

enta

ge (%

)

Index in the VLC table

Page 5: [IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

TABLE III. COMPRESSION EFFICIENCY COMPARISON OF CAVLC AND FAVLC

Sequence AVC (bits)

Proposed (bits)

Efficiency (%)

AVC (bits)

Proposed (bits)

Efficiency (%)

AVC (bits)

Proposed (bits)

Efficiency (%)

Average (%)

CIF (352 �� 288) QP 22 QP 28 QP 34

Foreman 3,459,599 3,244,201 6.23 833,947 798,670 4.23 179,578 175,686 2.17 4.21

Mobile 12,856,399 11,709,043 8.92 4,863,280 4,553,937 6.36 1,122,846 1,092,955 2.66 5.98

Stefan 2,047,942 1,916,144 6.44 695,897 667,163 4.13 173,901 166,818 4.07 4.88

Mother&Daughter 656,196 633,640 3.44 163,434 152,967 6.40 34,814 32,902 5.49 5.11

Average 6.26 5.28 3.6 5.05

HD (1920 �� 1080) QP 22 QP 28 QP 34

Traffic 34,356,841 33,051,693 3.80 9,211,550 8,873,134 3.67 2,175,282 2,094,074 3.73 3.73

Kimono 19,804,941 17,649,753 10.88 4,703,251 4,310,654 8.35 1,402,410 1,273,684 9.18 9.47

Crowd run 212,456,772 201,181,355 5.31 64,180,643 62,933,181 1.94 18,473,987 17,650,303 4.46 3.9

Park scene 37,567,030 36,414,160 3.07 8,716,414 8,454,121 3.01 2,050,016 1,900,492 7.29 4.46

Average 5.77 4.24 6.17 5.39

Table IV shows the number of binary patterns used in the customized VLC table. Out of 65 534 patterns, only a portion of binary patterns do appear depending on video sequence and QP value. One clear tendency is that the number of occurred binary patterns decreases if the QP value increases. Further analysis should be made to figure out the correlation between the compression efficiency and the number of occurred binary patterns. However, it should be noted that the maximum compression gain in CIF sequences is achieved in Mobile sequence when the number of occurred binary patterns is also the largest in the test sequence. Another interesting fact from the table is that the compression gain of Mother&Daughter sequence when QP is set to 34 is better than the other cases when QPs are set to 22 and 28, although the number of occurred binary patterns is the lowest of all test sequences and QPs.

We also evaluated how efficient the dual VLC algorithm was in the proposed method. From Table IV, it is quite obvious that the default VLC tables are not well suited to adapt the local entropy from difference sequences and QPs. We measured the usage ratio (R) of the customized VLC tables as follows:

(4)

where NC denotes the number of binary patterns encoded by the customized VLC table and ND denotes those encoded by the default VLC table. Table V shows the usage ratio of the customized VLC tables in the test sequences. More than 95 percent of binary patterns are encoded using the customized VLC tables in all test sequences. If the number of occurred binary patterns are high (e.g., Mobile sequence), the usage ratio is relatively lower than the other cases. On the other hand, if the number of occurred binary patterns is low, the usage ratio becomes higher.

V. CONCLUSION In this paper, we proposed a new frame-level context

adaptive VLC method which is suitable to encode the binary patterns in 4�4 block. Considering the different probabilities of

binary patterns depending on the given video sequence, we developed the frame-level adaptive VLC design that updates the codewords in VLC tables every frame. Moreover, we can further increase the compression efficiency by using dual VLC design for the context-adaptive coding. From the experimental results, the proposed FAVLC method increases the compression efficiency by from 5.05 % to 5.39 % on average, compared to CAVLC in MPEG-4 AVC/H.264.

TABLE IV. THE NUMBER OF USED BINARY PATTERNS OCCURRED IN THE CUSTOMIZED VLC TABLES

Sequence QP = 22 QP = 28 QP = 34

Foreman 15,044 3,662 912

Mobile 46,882 20,331 6,208

Stefan 13,801 4,614 1,351

Mother&Daughter 2,138 525 137

Traffic 24,394 8,030 1,769

Crowd run 64,740 33,730 7,764

Kimono 6,839 1,301 255

Park scene 36,133 11,072 2,103

TABLE V. THE USAGE OF CUSTOMIZE VLC (IN %)

Sequence QP = 22 QP = 28 QP = 34

Foreman 97.45 98.38 98.83

Mobile 95.93 96.86 97.12

Stefan 94.31 97.03 98

Mother&Daughter 99.13 99.54 99.66

Traffic 99.7 99.77 99.88

Crowd run 99.76 99.75 99.89

Kimono 99.89 99.94 99.95

Park scene 99.52 99.6 99.79

ACKNOWLEDGMENT (HEADING 5) This work was supported by National Research Laboratory,

Korea.

Page 6: [IEEE 2010 IEEE 14th International Symposium on Consumer Electronics - (ISCE 2010) - Braunschweig, Germany (2010.06.7-2010.06.10)] IEEE International Symposium on Consumer Electronics

REFERENCES [1] ISO/IEC 14496-2:2004 - Information technology -- Coding of audio-

visual objects -- Part 2: Visual [2] ISO/IEC 14496-10, Draft of Version 4 of MPEG-4 AVC/ H.264, 2005. [3] G. Lakhani, "Optimal huffman coding for dct blocks." IEEE Trans.

Circuits and Systems for Video Technology, vol. 14, no. 4, pp. 522-527, Apr, 2004.

[4] D. Marpe, H. Schwarz, and T. Wiegand, “Context-based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard”, IEEE Trans. Circuits and Systems for Video Technology, Vol. 13, No. 7, pp.620 – 636, July 2003.

[5] JVT Reference Software Version 11.0, http://iphome.hhi.de//suehring/tml/download/jm_old/jm11.0.zip