3
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 3, MARCH 2004 359 Balance of 0, 1 Bits for Huffman and Reversible Variable-Length Coding Jia-Yu Lin, Ying Liu, and Ke-Chu Yi  Abstract—This let terpropose sa nov el alg ori thm toobtaina sub- optimal solution for the balance of bit distribution after Huffman coding. The algorithm is simple, and can be embedded in the con- ventional Huffman coding process. In addition, the letter also dis- cusses the bit-balance problem for reversi ble variable-length codes (RVLCs) based on Huffman coding. Analytical and experimental results suggest that the new algorithm is very useful in improving the 0 1 balance property for Huffman codes and RVLCs.  Index T erms—Balanc e of 0/1 bits , Huffman code s, rev ersi ble variable-length codes (RVLCs). I. INTRODUCTION H UFFMAN coding [1] and its variants have been widely used in data compression, audio coding, image coding, and video coding. Normally, the performance of Huffman en- coders is measured not only by the compression effectiveness, but also by other criteria [ 2], such as self-synchronizing ability , memory, and searching efficiency. To improve error-resiliency capabilities, reversible variable-length codes (RVLCs) [ 4]–[7] based on Huffman coding have been introduced. In this letter, the problem of probability distribution of zeros and ones in binary Huffman and RVLC streams is discussed. In general, the more balanced the zeros and ones are in the bit stream, the better the bit stream is for further processing and transmission. Bit balance in the output stream of the source encod er mini mize s the influen ce of sour ce stat isti cs on the channel-coding performance. For error correction, it is always ass ume d tha t the inp ut data sequence is bas ica lly ran dom with equally probable zeros and ones [ 8]. Correspondingl y, the ass ump tio n tha t eac h codeword is equall y likely to be tran smit ted is usual ly made when anal yzin g the proba bili ty of decoding error for the binary symmetric channel [ 3]. Fur- thermore, the performance of subsystems, such as bit timing recovery, frame synchronization, equalization in time domain, and time-frequency property of modulation signals, could be improved by bit-balanced transmission. Howev er, the bit distribution in conve ntional Huffman codes may be significantly unbalanced. In [ 3], Montgomery pointed out that a source has balanced-bit probabilities for any optimal source code if and only if the sourceis dyadic. This is a ra re situ- Paper approved by K. Rose, the Editor for Source-Channel Coding of the IEEE Communications Society. Manuscript received January 13, 2002; revised November 7, 2002; April 5, 2003; and July 24, 2003. This work was supported by NSFC 60172029. J.-Y. Lin and Y. Liu are with the School of Electronic Science and Engi- neerin g, Natio nal Univ ersityof Defence Te chno logy , Chang sha,Hunan 4100 73, China (e-mail: [email protected], [email protected]; [email protected]). K.-C. Yi is with the State Key Laboratory on Integrated Service Networks, Xidian Univers ity , Xi’an , Shanxi 7100 71, China (e-ma il: kchy i@mai l.xi- dian.edu.cn). Digital Object Identifier 10.1109/TCOMM.2004.82356 8 ation. Even for codes that are 97% to 99% efficient, probability of the more like ly bit may be signi fic antl y grea ter than [ 3]. In [3], the upper bounds for the maximum and minimum probability values of the more likely bit are given. However, no work was done on algorithms to minimize the difference in bit probabilities. This letter proposes a suboptimal algorithm for the construction of Huffman codes with balanced-bit probabil- ities, and discusses the bit balance in bidirectionally decodable streams [4] and RVLCs [5]–[7]. II. CONSTRUCTION OF BIT-BALANCED HUFFMAN CODES Gen era lly , the re are two steps in construct ing Huf fman cod es. First, we construct a Huffman tree according to the occurrence probabilities of source symbols; then, assign either zero or one (in the binary case) to each branch of the Huffman tree. Usually , the assignment of zeros or ones to the left or right branches is fixed. We will refer to this kind of Huffman code as “conven- tional Huffman codes.” However, we may simply reverse the assignment for each pair of sibling branches independently, and sti ll obt ain a Huf fman cod er sat isf yin g the pre fix con dit ion . Thi s idea allows us to balance the bit probabilities in the final bit stream. Fo r source symb ol s occu rr in g wi th pr ob abil it ies , we den ote the cor res pon din g code lengths as . Then the average code length is gi ven by . Suppose in the th codeword, the number s of one and zer o bit s are and , res pec ti ve ly . We have . We define the average number s of one and ze ro in all the codewords as and . The bit probabilities in the code stream are and . Assuming that a Huffman tree has been generated, and that bits have been assigned to the branches, we can calculate statis- tic all y theoccur ren cefreque ncyof oneand zer o acc ord ingto the we ight s of branches. Le t be the set of we ight s of all br anches la be le d aszer o. Le t bedef in edana lo go us ly . Let an d denote the weights of the sibling pair of branches connected to the th internal node, with and . We have and (note that there are internal nodes, including the root). Thus, the difference between and is , and our goal is to minimize . After the construction of the Huffman tree (before the assign- ment of bit labels), the weights of the left and right branches conn ected to the th inter nal node are sett led, deno ted as and , respectively . We may assume , according to the common routine to construct the Huffman tree. We de- note the difference of weights in a sibling pair of branches as . 0090-6778/04$20.00 © 2004 IEEE

balance of 0,1 bit

Embed Size (px)

Citation preview

7/27/2019 balance of 0,1 bit

http://slidepdf.com/reader/full/balance-of-01-bit 1/3

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 3, MARCH 2004 359

Balance of 0, 1 Bits for Huffman and Reversible Variable-Length Coding

Jia-Yu Lin, Ying Liu, and Ke-Chu Yi

 Abstract—This letterproposes a novel algorithm to obtaina sub-optimal solution for the balance of bit distribution after Huffmancoding. The algorithm is simple, and can be embedded in the con-ventional Huffman coding process. In addition, the letter also dis-cusses the bit-balance problem for reversible variable-length codes(RVLCs) based on Huffman coding. Analytical and experimentalresults suggest that the new algorithm is very useful in improvingthe

0 1 

balance property for Huffman codes and RVLCs.

 Index Terms—Balance of 0/1 bits, Huffman codes, reversiblevariable-length codes (RVLCs).

I. INTRODUCTION

HUFFMAN coding [1] and its variants have been widely

used in data compression, audio coding, image coding,

and video coding. Normally, the performance of Huffman en-coders is measured not only by the compression effectiveness,

but also by other criteria [2], such as self-synchronizing ability,

memory, and searching efficiency. To improve error-resiliency

capabilities, reversible variable-length codes (RVLCs) [4]–[7]

based on Huffman coding have been introduced.In this letter, the problem of probability distribution of zeros

and ones in binary Huffman and RVLC streams is discussed.

In general, the more balanced the zeros and ones are in the bit

stream, the better the bit stream is for further processing and

transmission. Bit balance in the output stream of the source

encoder minimizes the influence of source statistics on the

channel-coding performance. For error correction, it is always

assumed that the input data sequence is basically randomwith equally probable zeros and ones [8]. Correspondingly,

the assumption that each codeword is equally likely to be

transmitted is usually made when analyzing the probability

of decoding error for the binary symmetric channel [3]. Fur-

thermore, the performance of subsystems, such as bit timing

recovery, frame synchronization, equalization in time domain,

and time-frequency property of modulation signals, could be

improved by bit-balanced transmission.

However, the bit distribution in conventional Huffman codes

may be significantly unbalanced. In [3], Montgomery pointed

out that a source has balanced-bit probabilities for any optimal

source code if and only if the source is dyadic. This is a rare situ-

Paper approved by K. Rose, the Editor for Source-Channel Coding of theIEEE Communications Society. Manuscript received January 13, 2002; revisedNovember 7, 2002; April 5, 2003; and July 24, 2003. This work was supportedby NSFC 60172029.

J.-Y. Lin and Y. Liu are with the School of Electronic Science and Engi-neering, National Universityof Defence Technology, Changsha,Hunan 410073,China (e-mail: [email protected], [email protected]; [email protected]).

K.-C. Yi is with the State Key Laboratory on Integrated Service Networks,Xidian University, Xi’an, Shanxi 710071, China (e-mail: [email protected]).

Digital Object Identifier 10.1109/TCOMM.2004.823568

ation. Even for codes that are 97% to 99% efficient, probability

of the more likely bit may be significantly greater than [3].In [3], the upper bounds for the maximum and minimum

probability values of the more likely bit are given. However, no

work was done on algorithms to minimize the difference in bit

probabilities. This letter proposes a suboptimal algorithm for

the construction of Huffman codes with balanced-bit probabil-

ities, and discusses the bit balance in bidirectionally decodable

streams [4] and RVLCs [5]–[7].

II. CONSTRUCTION OF BIT-BALANCED HUFFMAN CODES

Generally, there are two steps in constructing Huffman codes.

First, we construct a Huffman tree according to the occurrence

probabilities of source symbols; then, assign either zero or one

(in the binary case) to each branch of the Huffman tree. Usually,

the assignment of zeros or ones to the left or right branches is

fixed. We will refer to this kind of Huffman code as “conven-

tional Huffman codes.” However, we may simply reverse the

assignment for each pair of sibling branches independently, and

still obtain a Huffman coder satisfying the prefix condition. This

idea allows us to balance the bit probabilities in the final bit

stream.

For source symbols occurring with probabilities

, we denote the corresponding code

lengths as . Then the average code length

is given by . Suppose in the th codeword, the

numbers of one and zero bits are and , respectively. Wehave . We define the average

numbers of one and zero in all the codewords as and .

The bit probabilities in the code stream are and

.

Assuming that a Huffman tree has been generated, and that

bits have been assigned to the branches, we can calculate statis-

tically theoccurrence frequencyof oneand zero accordingto the

weights of branches. Let be the set of weights of all branches

labeled aszero. Let bedefinedanalogously. Let and

denote the weights of the sibling pair of branches connected to

the th internal node, with and . We have

and (note that there are

internal nodes, including the root). Thus, the difference

between and is , and

our goal is to minimize .

After the construction of the Huffman tree (before the assign-

ment of bit labels), the weights of the left and right branches

connected to the th internal node are settled, denoted as

and , respectively. We may assume , according

to the common routine to construct the Huffman tree. We de-

note the difference of weights in a sibling pair of branches as

.

0090-6778/04$20.00 © 2004 IEEE

7/27/2019 balance of 0,1 bit

http://slidepdf.com/reader/full/balance-of-01-bit 2/3

360 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 3, MARCH 2004

TABLE IHUFFMAN CODES AND RVLCS FOR ENGLISH ALPHABET (C1: CONVENTIONAL HUFFMAN CODES; C2: OUR BIT-BALANCED HUFFMAN CODES; C3: SYMMETRICAL

RVLC IN [6]; C4: OUR CORRESPONDING SYMMETRICAL RVLC; C5: ASYMMETRICAL RVLC IN [7]; C6: OUR CORRESPONDING ASYMMETRICAL RVLC)

Now, when assigning zero and one to every pair of sib-

ling branches in the conventional way (without generality,

assuming that zero and one are assigned to the left and right

branches, respectively), we have .Thus, . But, the assignment of  

zero and one to any pair of sibling branches could be re-

versed, resulting in . So, we have

, with the conventional

assignment for a sibling pair of branches leading to ,

and reversed assignment to . As to the bit assignment

in conventional Huffman codes, for all the pairs of sibling

branches, . Obviously, this removes

the obligation to minimize . As far as the bit-balance

criterion is concerned, it could be the worst code-assignment

scheme.

We should be able to find the most suitable to minimize, which is a special scheme to assign the labels for

the pairs of sibling branches accordingly. We can find

the optimal solution by exhaustively searching the pos-

sible cases of . But when is large, it is computationally

expensive to find such a global optimum solution. We present a

suboptimal algorithm, which involves much less computational

complexity, as follows.

First of all, we sort in an increasing

order with respect to their values, and denote the sorted list by

, where .

The problem can then be restated as minimizing ,

subject to the constraint .

The search for is carried out iteratively

as follows.

Step 0 Let . We may arbitrarily set

, then , and set .Step 1 If , set ; otherwise, set

.

Step 2 Set . If , go to Step 1; otherwise,

end.

We get the proper , whose values are segment-wise taken as

1 and 1. A suboptimal solution for the will be obtained

accordingly.

Although the solution may not be optimal, the algorithm is

simple to implement. Except for the ordering of , only

times of additions and comparisons are needed. The algorithm

may be embedded in the assignment of Huffman code bits.

When constructing the Huffman tree, weight differences inpairs of sibling branches could be recorded by producing

internal nodes. Then, the suboptimal is found out using

the algorithm shown above, and the bit-labels assignment to

branches is decided accordingly. That is, when is one, assign

zero and one to the left and right branches of internal node ,

respectively. When is , assignment of zero and one is

reversed.

The English alphabet [5] is shown in Table I. The occur-

rence probabilities of letters, the conventional Huffman codes

(C1), and our bit-balanced Huffman codes (C2) are listed. The

two Huffman codes have the same average code lengths, i.e.,

. Both schemes minimize the code variances,

7/27/2019 balance of 0,1 bit

http://slidepdf.com/reader/full/balance-of-01-bit 3/3

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 52, NO. 3, MARCH 2004 361

resulting in the value 0.888 613. However, and of the

conventional Huffman codes are 0.456 410 72 and 0.543 58928,

respectively, with the absolute difference 0.087 178 55; while

and of our Huffman codes are both close to 0.5, with

the absolute difference only 0.000 096 35. The result shows that

the new method is much better than the conventional one under

the bit–balance criterion.

III. BALANCE OF ZEROS AND ONES FOR RVLCS

 A. Bidirectionally Decodable Streams

The bidirectionally decodable stream [4] is generated from

a Huffman code by reversing the original codewords, and per-

forming a bitwise exclusive OR operation on the original and

reversed bit streams, where the codebook of the Huffman code

is left unchanged. So, our method depicted above can be ap-

plied here. This section discusses the relationship between the

bit balance in the original Huffman code and the bit balance in

the corresponding bidirectionally decodable stream.

Assume the bit probabilities in the original encoded stream

are and . According to the bitwise exclusive OR opera-tion, the bit probabilities in the bidirectionally decodable stream

are , and , ignoring

the leading and trailing zeros [4]. So, with the difference be-

tween the bit probabilities in the original encoded stream being

, the one in the bidirectionally decodable

stream is . Since ,

we have . That is, the bidirectionally decodable

stream decreases the bit-probability difference, compared with

the original one of the Huffman code stream.

 B. RVLC With Redesigned Codebook 

There are symmetrical [5], [6] and asymmetrical [5]–[7]RVLCs constructed from a given Huffman code. These ap-

proaches design new codebooks based on original Huffman

codes. The redesigned codebooks satisfy both the prefix and

suffix condition. Our method discussed above can not be

applied to either of them, since the suffix condition may be

violated when the bit assignment is changed. However, if RVLC

codewords with the same length are permutated, i.e., their

assignment changed to different source symbols which have

the same codeword length, the prefix and suffix conditions will

still be satisfied, and the compression effectiveness and coding

efficiency will be retained. This is adjustment of bit-probability

distribution at the codewords level, while we notice that the

adjustment of the Huffman codes discussed above is processedat the bits level.

We call the group of source symbols with equal codeword

length “a source symbol segment,” in which permutations of 

codewords’ assignment are tried. Since the number of source

symbols in a source symbol segment is usually not large (the

maximum is eight, in the example of Table I), we can try all

of the permutations. We process from source symbol seg-

ments with shorter codeword lengths (with larger occurrence

probabilities) to those with longer lengths (with smaller proba-

bilities). We search in each source symbol segment to minimize

, where

means the partial sum of the probability difference till

symbol segment and belong to the index set of symbol

segment . is the element of permutation matrix , and

when , codeword is assigned to symbol .

In Table I, C3 is the symmetrical code from [6], with the

bit-probability difference of 0.018 68104, which could be

decreased to 0.001 824 64 (see C4), without loss in compres-

sion performance. C5 is asymmetrical RVLC from [7], with

the bit-probability difference of 0.048 581 26, which could be

decreased to 0.004 790 58 (see C6).

IV. CONCLUSION

In this letter, we have discussed the problem of Huffman

codes and RVLCs with balanced zeros and ones in encoded bit

streams. This letter proposes an effective algorithm to make

the bit probabilities balanced. The algorithm can be embedded

in the construction of the Huffman codebook, with little

complexity. In the analysis of RVLCs based on the Huffman

codes, we showed that the bidirectionally decodable stream hadgood performance under the bit-balance criterion, and it could

be combined with the proposed algorithm to further decrease

the bit-probability difference. For symmetrical and asymmet-

rical RVLCs, probability differences could be decreased by

reassigning codewords to source symbols after the creation

of codebooks. The analytic and experimental results suggest

that the proposed algorithm is quite promising in designing

Huffman codes with balanced zero and one probabilities.

ACKNOWLEDGMENT

The authors would like to thank the reviewers who provided

very valuable feedback. References [2], [3], and [7] were rec-ommended by them. This paper was clarified substantially with

their help. The authors would also like to thank Prof. W.-D. Kou,

the Director of the State Key Lab of Integrated Service Net-

works (Xidian University), China, and Prof. K. Rose, the Ed-

itor, who provided considerable help in modifying the text of 

the letter.

REFERENCES

[1] D. A. Huffman, “A method for the construction of minimum redundancycodes,” Proc. IRE , vol. 40, pp. 1098–1101, Sept. 1952.

[2] J. Abrahams, “Code and parse trees for lossless source encoding,”Commun. Inform. Syst., vol. 1, no. 2, pp. 113–146, Apr. 2001.

[3] B. L. Montgomery, H. Diamond, and B. Kumar, “Bit probabilities of 

optimal binary source codes,” IEEE Trans. Inform. Theory, vol. 36, pp.1446–1450, June 1990.

[4] B. Girod, “Bidirectionally decodable streams of prefix codewords,” IEEE Commun. Lett., vol. 3, pp. 245–247, Aug. 1999.

[5] Y. Takishima, M. Wada, and H. Murakami, “Reversible variable-lengthcodes,” IEEE Trans. Commun., vol. 43, pp. 158–162, Mar. 1995.

[6] C. W. Tsai and J. L. Wu, “On constructing the Huffman code-basedreversible variable-length codes,” IEEE Trans. Commun., vol. 49, pp.1506–1509, Sept. 2001.

[7] K. Lakovic and J. Villasenor, “On design of error-correcting reversiblevariable-length codes,” IEEE Commun. Lett., vol. 6, pp. 337–339, Aug.2002.

[8] J. G. Proakis, Digital Communications, 3rd ed. New York: McGraw-Hill, 1998.