Upload
brandon-mcdaniel
View
239
Download
0
Embed Size (px)
Citation preview
8/11/2019 Adaptive Huffman Coding[1]
1/26
Adaptive Huffman Coding
8/11/2019 Adaptive Huffman Coding[1]
2/26
http://www.vidyarthiplus.com/
Why Adaptive Huffman Coding? Huffman coding suffers from the fact that the
uncompresser need have some knowledge ofthe probabilities of the symbols in thecompressed files
this can need more bit to encode the file
if this information is unavailable compressing the filerequires two passes
first pass: find the frequency of each symbol andconstruct the huffman tree
second pass: compress the file
8/11/2019 Adaptive Huffman Coding[1]
3/26
http://www.vidyarthiplus.com/
The key idea The key idea is to build a Huffman tree
that is optimal for the part of the
message already seen, and toreorganize it when needed, to maintainits optimality
8/11/2019 Adaptive Huffman Coding[1]
4/26
http://www.vidyarthiplus.com/
Pro & Con - I Adaptive Huffman determines the mapping to
codewords using a running estimate of thesource symbols probabilities
Effective exploitation of locality
For example suppose that a file starts out with a series ofa character that are not repeated again in the file. In staticHuffman coding, that character will be low down on thetree because of its low overall count, thus taking lots of
bits to encode. In adaptive huffman coding, the characterwill be inserted at the highest leaf possible to be decoded,before eventually getting pushed down the tree by higher-frequency characters
8/11/2019 Adaptive Huffman Coding[1]
5/26
http://www.vidyarthiplus.com/
Pro & Con - II only one pass over the data
overhead
In static Huffman, we need to transmit someway the
model used for compression, i.e. the tree shape. This costsabout 2n bits in a clever representation. As we will see, inadaptive schemes the overhead is nlogn.
sometimes encoding needs some more bits w.r.t.
static Huffman (without overhead)But adaptive schemes generally compare well with static
Huffman if overhead is taken into account
8/11/2019 Adaptive Huffman Coding[1]
6/26
http://www.vidyarthiplus.com/
Some history Adaptive Huffman coding was first conceived
independently by Faller (1973) and Gallager(1978)
Knuth contributed improvements to theoriginal algorithm (1985) and the resulting
algorithm is referred to as algorithm FGK
A more recent version of adaptive Huffmancoding is described by Vitter (1987) and calledalgorithm V
8/11/2019 Adaptive Huffman Coding[1]
7/26
http://www.vidyarthiplus.com/
An important question Better exploiting locality, adaptive Huffman
coding is sometimes able to do better thanstatic Huffman coding, i.e., for somemessages, it can have a greater compression
... but weve assessed optimality of staticHuffman coding, in the sense of minimalredundancy
There is a contradiction?
8/11/2019 Adaptive Huffman Coding[1]
8/26
http://www.vidyarthiplus.com/
Algorithm FGK - I The basis for algorithm FGK is the Sibling
Property (Gallager 1978)
A binary code tree with nonnegative weights has thesibling property if each node (except the root) has asibling and if the nodes can be numbered in order ofnondecreasing weight with each node adjacent to itssibling. Moreover the parent of a node is higher inthe numbering
A binary prefix code is a Huffman code if andonly if the code tree has the sibling property
8/11/2019 Adaptive Huffman Coding[1]
9/26
Algorithm FGK - II
Note that node numbering corresponds to the order in which thenodes are combined by Huffmans algorithm, first nodes 1 and 2,
then nodes 3 and 4 ...
2a
3b
5d
6e
5c
11f
32
21
1110
5
1 2
3 4 5 6
7 8
9 10
11
http://www.vidyarthiplus.com/
8/11/2019 Adaptive Huffman Coding[1]
10/26
http://www.vidyarthiplus.com/
Algorithm FGK - III In algorithm FGK, both encoder and decoder
maintain dynamically changing Huffman codetrees. For each symbol the encoder sends thecodeword for that symbol in current tree andthen update the tree
The problem is to change quickly the tree optimalafter tsymbols (not necessarily distinct) into the tree
optimal for t+1 symbols
If we simply increment the weight of the t+1-thsymbols and of all its ancestors, the sibling propertymay no longer be valid we must rebuild the tree
8/11/2019 Adaptive Huffman Coding[1]
11/26
http://www.vidyarthiplus.com/
Algorithm FGK - IV Suppose next
symbol is b
if we update the
weigths... ... sibling
property isviolated!!
This is no more a
Huffman tree
2a
3b
5d
6e
5c
11f
32
21
1110
5
1 2
3 4 5 6
7 8
9 10
11 b
4
6
11
22
33
no more ordered bynondecreasing weight
8/11/2019 Adaptive Huffman Coding[1]
12/26
http://www.vidyarthiplus.com/
Algorithm FGK - V The solution can be described as a two-
phase process
first phase: original tree is transformed inanother valid Huffman tree for the first tsymbols, that has the property that simpleincrement process can be appliedsuccesfully
second phase: increment process, asdescribed previously
8/11/2019 Adaptive Huffman Coding[1]
13/26
http://www.vidyarthiplus.com/
Algorithm FGK - V The first phase starts at the leaf of the t+1-th
symbol
We swap this node and all its subtree, but notits numbering, with the highest numberednode of the same weight
New current node is the parent of this latternode
The process is repeated until we reach theroot
8/11/2019 Adaptive Huffman Coding[1]
14/26
http://www.vidyarthiplus.com/
Algorithm FGK - VI First phase
Node 2: nothingto be done
Node 4: to be
swapped withnode 5
Node 8: to beswapped withnode 9
Root reached:stop!
Second phase
2
a
3
b
5
d
6
e
5
c
11
f
32
21
1110
5
1 2
3 4 5 6
7 8
9 10
11 b
4
6
12
33
8/11/2019 Adaptive Huffman Coding[1]
15/26
http://www.vidyarthiplus.com/
Why FGK works? The two phase procedure builds a valid
Huffman tree for t+1 symbols, as the
sibling properties is satisfied In fact, we swap each node which weight is
to be increased with the highest numberednode with the same weight
After the increasing process there is nonode with previous weight that is highernumbered
8/11/2019 Adaptive Huffman Coding[1]
16/26
The Not Yet Seen problem - I
When the algorithm starts and sometimesduring the encoding we encounter a symbolthat has not been seen before.
How do we face this problem? We use a single 0-node (with weight 0) that
represents all the unseen symbols. When a newsymbol appears we send the code for the 0-nodeand some bits to discern which is the new symbol.
As each time we send logn bits to discern thesymbol, total overhead is nlogn bits
It is possible to do better, sending only the index of thesymbol in the list of the current unseen symbols.
In this way we can save some bit, on average
http://www.vidyarthiplus.com/
8/11/2019 Adaptive Huffman Coding[1]
17/26
http://www.vidyarthiplus.com/
The Not Yet Seen problem - II
Then the 0-node is splitted into two leaves,that are sibling, one for the new symbol, withweight 1, and a new 0-node
Then the tree is recomputed as seen before inorder to satisfy the sibling property
8/11/2019 Adaptive Huffman Coding[1]
18/26
http://www.vidyarthiplus.com/
Algorithm FGK - summary
The algorithm starts with only one leaf node,the 0-node. As the symbols arrive, new leavesare created and each time the tree is
recomputed
Each symbol is coded with its codeword in thecurrent tree, and then the tree is updated
Unseen symbols are coded with 0-node
codeword and some other bits are needed tospecify the symbol
8/11/2019 Adaptive Huffman Coding[1]
19/26
Algorithm FGK - VII
Algorithm FGK compares favourablywith static Huffman code, if we consideralso overhead costs (it is used in the Unixutility compact)
Exercise Construct the static Huffman tree and the FGK tree
for the message e eae de eabe eae dcf and evaluate
the number of bits needed for the coding with boththe algorithms, ignoring the overhead for Huffman
SOL. FGK 60 bits, Huffman 52 bits
FGK is obtained using the minimum number of bits for theelement in the list of the unseen symbolshttp://www.vidyarthiplus.com/
8/11/2019 Adaptive Huffman Coding[1]
20/26
http://www.vidyarthiplus.com/
Algorithm FGK - VIII
if T=total number of bits transmitted byalgorithm FGK for a message of length tcontaining n distinct symbols, then
where S is the performance of the staticHuffman (Vitter 1987)
So the performance of algorithm FGK is never
much worse than twice optimal
1 2 4 2S n T S t n
8/11/2019 Adaptive Huffman Coding[1]
21/26
http://www.vidyarthiplus.com/
Algorithm V - I
Vitter in his work of the 1987 introduces two
improvements over algorithm FGK, calling the
new scheme algorithm
As a tribute to his work, the algorithm is
become famous... with the letter flipped
upside-down... algorithm V
8/11/2019 Adaptive Huffman Coding[1]
22/26
http://www.vidyarthiplus.com/
The key ideas - I
swapping of nodes during encoding anddecoding is onerous In FGK algorithm the number of swapping
(considering a double cost for the updates that move
a swapped node two levels higher) is bounded by
, where is the length of the added symbol in
the old tree (this bound require some effort to be proved
and is due to the work of Vitter)
In algorithm V, the number of swapping is bounded
by 1
2t
dt
d
8/11/2019 Adaptive Huffman Coding[1]
23/26
http://www.vidyarthiplus.com/
The key ideas - II
Moreover algorithm V, not only minimize
as Huffman and FGK, but also , i.e.
the height of the tree, and , i.e. is bettersuited to code next symbol, given it could be
represented by any of the leaves of the tree
This two objectives are reached through a new
numbering scheme, called implicit numbering
i i
i
w l maxil
i
il
8/11/2019 Adaptive Huffman Coding[1]
24/26
8/11/2019 Adaptive Huffman Coding[1]
25/26
http://www.vidyarthiplus.com/
An invariant
The key to minimize the other kind ofinterchanges is to maintain the followinginvariant
for each weight w, all leaves of weight w precede (inthe implicit numbering) all internal nodes of weight w
The interchanges, in the algorithm V, aredesigned to restore implicit numbering, whena new symbol is read, and to preserve theinvariant
8/11/2019 Adaptive Huffman Coding[1]
26/26
http://www.vidyarthiplus.com/
Algorithm V - II
if T=total number of bits transmitted byalgorithm V for a message of length tcontaining n distinct symbols, then
At worst then, Vitter's adaptive method maytransmit one more bit per codeword than thestatic Huffman method
Empirically, algorithm V slightly outperformsalgorithm FGK
1 2 2 1S n T S t n