Cjb0912010 lz algorithms

LIMPEL ZIV ALGORITHMS

BYS T RAJAN

CSN-CJB0912010

M.S Ramaiah School of Advanced Studies –Bangalore 1

INTRODUCTION• Lempel-Ziv is a lossless date compression method algorithm• The generalized idea comes from pigeonhole principle .• If N items are placed into M pigeonholes where n>m

.

Fig: A pigeonhole

• An image of pigeons in holes. Here there are n = 10 pigeons in m = 9 holes. Since 10 is greater than 9, the pigeonhole principle says that at least one hole has more than one pigeon

• This concept may varies with input.so it cannot applied every time.M.S Ramaiah School of Advanced Studies -BangaloreM.S Ramaiah School of Advanced Studies – 2

Lempel Ziv Algorithms• Lossless Data compression is technique used to produce the original

information from a compressed data.• Like Huffman coding ,run length coding ,Arithmetic coding etc., Lempel-

Ziv is a lossless data compression technique used more often.• The Lempel Ziv algorithms belong to yet another category of lossless

compression techniques known as dictionary coders.• Abraham Lempel & Jacob ziv together published their first compression

method is sometimes referred to as "LZ77," for the year 1977, in which the duo published an article entitled "A Universal Algorithm for Sequential Data Compression" .The pair wrote another paper in 1978 outlining another dictionary approach know as LZ78 algorithm which was modified by Terry Welch in 1984.


Limpel Ziv Algorithm Family

APPLICATIONS: APPLICATIONS:• ZIP GIF• GZIP V.42• STACKER COMPRESS

Fig 1: Limpel Ziv Algorithm Family


LZ77

LZR LZHLZSS LZB

LZ78

LZFG LZT

LZMW

LZW

LZJ

LZC

Types of Dictionary • The dictionary holds a list of strings of symbols and it may be static or

dynamic (adaptive).• Static dictionary – permanent, sometimes allowing the addition of strings

but no deletions• Dynamic dictionary – holding strings previously found in the input

stream, allowing for additions and deletions of strings as new input is being read

• LZ Algorithms are used in “ADAPTIVE DICTIONARY”• The dictionary is being built in a single pass, while at the same time

encoding take places.• It continuously rewrites the dictionary for a file, discarding patterns it

previously included and adding new ones when necessary.


LZ77


General approach• Dictionary is a portion of the previously encoded sequence• Use a sliding window for compression

Mechanism• Find the maximum length match for the string pointed to by the search pointer in the search buffer, and encode it

Rationale• If patterns tend to repeat locally, we should be able to get more efficient representation

LZ77• Sliding window is composed of a search buffer and a look ahead buffer

(note: window size W = S + LA).

Match pointer search pointer

a _ r r a _

look ahead buffer

Search buffer (size LA=7)

(size S=8)

_ a b r a - a d a b r a r r a


Explanation

• Offset = search pointer – match pointer (o = 7)• Length of match = number of consecutive letters matched

(l = 4)• Code word (c = C(r)), where C(r) is the code word for r• Encoding triple: <o, l, c> = <7, 4, C(r)>• If FLC is used and alphabet size is |A|, <o, l, c> can be

encoded with [log2S] + [log2W] + [log2|A|] bits.


Possible Cases for Triples

• There could be three different possibilities that may

be encountered during the coding process:

-No match for the next character to be encoded in the window

-There is a match

-The matched string extends inside the look-ahead buffer• For each of these cases, we have a triple to signal

the case to the decoder.


ENCODING

• Sequence cabracadabrarrarrad - |cadabrar|rarrad|

W = 13, S = 7 |cadabrar|rarrad|

- |cabraca|dabrar|rarrad |cadabrar|rarrad| no match for d send <3, 3, C(r)> send <0, 0, C(d)> Could we do better?

-|abracad|abrarr|arrad Send <3, 5, C(d)> instead

|abracad|abrarr|arrad

|abracad|abrarr|arrad

|abracad|abrarr|arrad send <7, 4, C(r)>


DECODING

• Current input: <0, 0, C(d)> <7, 4, C(r)> <3, 5, C(d)>• Current output: cabraca Decode: <0, 0, C(d)>

Decode C(d): c|abracad| Decode: <7, 4, C(r)>

Start with the first ‘a’, copy four letters: cabra|cadabra

Decode C(r): cabrac|adabrar Decode: <3, 5, C(d)>

Start with the first ‘r’, copy three letters: cabracada|brarrar|

Copy two more letters: cabracadabr|arrarar|

Decode C(d): cabracadabrarrarard


Algorithm

while (lookAheadBuffer not empty) {

get a reference (position, length) to longest match;

if (length > 0) {

output (position, length, next symbol);

shift the window length+1 positions along;

} else {

output (0, 0, first symbol in the lookahead buffer);

shift the window 1 character along;

}

}


Points

• For LZ77, we have -Adaptive scheme, no prior knowledge

-Asymptotically approaches the source statistics

- Assumes that recurring patterns close to each others

• Possible improvements

-Variable-bit encoding: PKZip, zip, gzip, …, etc., uses a

variable-length coder to encode <o, l, c>.

-Variable buffer size: larger buffer requires faster searches

- Elimination of <0, 0, C(x)>

-LZSS sends a flag bit to signal whether the next “token” is an

<o, l> pair or the codeword of a symbol


Improvements

• LZR

The Lempel - Ziv - Renau modification allows pointers to reference anything that has been encoded without being limited by the length of the search.• LZSS The popular modification by Storer and Szymanski (1982) which is used for the mandatory inclusion of the next non-matching symbol into each codeword will lead to situations in which the symbol is being explicitly coded despite the possibility of it being part of the next match.• LZB LZB uses an elaborate scheme for encoding the references and lengths with varying sizes.• LZH The LZH implementation employs Huffman coding to compress the pointers.M.S Ramaiah School of Advanced Studies –Bangalore 14

http://en.wikipedia.org/wiki/Abraham_Lempel

http://en.wikipedia.org/wiki/Jacob_Ziv

LZ78

• LZ78 improvements from LZ77

-No search buffer – explicit dictionary instead

-Encoder/decoder must build dictionary in sync

- Encoding: <i, c>

i = index in dictionary table

c = code of the following character

• Example: encode the following contents

wabba_wabba_wabba_wabba_woo_woo_woo


EXAMPLE-1

• Input: wabba_wabba_wabba_wabba_woo_woo_woo• Dictionaries: Final Dictionary

Initial dictionary is empty Encoder output index entry


index entry<0,c(w)> 1 w

<0,c(a)> 2 a

<0,c(b)> 3 b

<3,c(a)> 4 ba

<0,c(_)> 5 _

<1,c(a)> 6 wa

<3,c(b)> 7 bb

<2,c(_)> 8 a_

EXAMPLE(Continue..)

Encoder output

index entry

<6,c(b)> 9 wab

<4,c(_)> 10 ba_

<9,c(b)> 11 wabb

<8,c(w)> 12 a_w

<0,c(o)> 13 o

<13,c(_)> 14 o_

<13,c(o)> 15 wo

<1,c(w)> 16 o_w

<13,c(o)> 17 oo


Remarks

• Observation

If we keep on encoding, the dictionary will keep on growing

• Possible solutions Stop growing the dictionary

Effectively switch to a static dictionary

Prune it

Based on usage statistics

Reset it

Start all over again

• The best solution depends on the knowledge of the

sourceM.S Ramaiah School of Advanced Studies –Bangalore 18

Improvements

• LZ78 has limitation as it grows explicitly. • LZW was developed by Terry Welch • The dictionary has to be initialized with all the symbols of the input

alphabet and this initial dictionary needs to be made known to the decoder.• IDEA• Instead of <i, c>, encode i only


• Input: wabba_wabba_wabba_wabba_woo_woo_woo• OUTPUT: 5 2 3 3 2 1 6 8 10 12 9 11 7 16 5 4 4 11 21 23 4 Final

• Dictionaries:

Intial DictionaryINDEX ENTRY INDEX ENTRY1 _ 14 a_w2 a 15 wabb3 b 16 ba_4 o 17 _wa5 w 18 abb6 wa 19 ba_w7 ab 20 wo8 bb 21 oo9 ba 22 o_10 a_ 23 _wo11 _w 24 oo_12 wab 25 _woo12 bba

INDEX ENTRY1 _2 a3 b4 o5 w


Algorithm while (!done)

read next symbol into a

if (p*a) is in dictionary // Note: ‘*’ stands for concatenation

p = p*a

else

send out index of p

add p*a to the dictionary

p = a

end


APPLICATION:COMPRESS

• An early implementation of LZW• Adaptive dictionary, starts with 2^bmax–1entries• Dictionary grows up to double in size (2bmax)• User can configure max codeword length bmax = 9~16• When dictionary reaches 2bmax entries, it becomes a static dictionary

encoder• If compression ratio falls below a threshold, dictionary is reset.

APPLICATION :

GIF IMAGES

PNG IMAGES


References• BELL, T. C., CLEARY, J. G., AND WITTEN, I. H. Text Compression.

Prentice Hall, Upper Sadle River, NJ, 1990.• SAYOOD, K. Introduction to Data Compression. Academic Press, San

Diego, CA, 1996, 2000.• ZIV, J., AND LEMPEL, A. A universal algorithm for sequential data

compression. IEEE Transactions on Information Theory 23 (1977),

337• ZIV, J., AND LEMPEL, A. Compression of individual sequences via

variable-rate coding. IEEE Transactions on Information Theory 24

(1978),530–536.


Documents

Cjb0912010 lz algorithms