28
Huffman Coding

huffman - CS Dept

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Huffman Coding

Compression

• Representing data using less space than the original representation

• Encoding data using fewer bits than the original encoding

Compression• Lossless compression – The original encoding can be fully

reconstructed from the compressed encoding – No information is lost

• Lossy compression – Information is lost; the original encoding

cannot be fully reconstructed from the compressed encoding

Compression Examples

• Lossless – Lempel-Ziv – Huffman Encoding

• Lossy – JPEG – MP3

Huffman Encoding

• Create variable length encodings for each distinct character in the input

• Encodings are created so characters that appear more frequently are given shorter encodings compared to characters that appear less frequently

• In a standard text file each character uses the same number of bits

• ASCII • Unicode

Huffman Encoding• 1. Find the frequency of each character in the input file

• 2. Build a Huffman tree from the frequency data

• 3. Traverse the Huffman tree and build the encodings for each character found in the input file

• 4. Write a representation of the Huffman tree to the output file

• 5. Write the number of characters in the input file to the output file

• 6. For each character in the input file, write the bits of the Huffman encoding to the output file.

Huffman Decoding• 1. Read the representation of the Huffman tree from

the input file and rebuild the Huffman tree

• 2. Read the number of characters from the input file

• 3. Starting at the root of the Huffman tree, read each bit from the input file and walk down the Huffman tree. When a leaf is reached write the character value in the leaf to the output file and go back to the root of the Huffman tree.

• 4. Repeat step 3 until the number of characters written to the output file match the value found in step 2.

Huffman Tree

A I

S E

F !

N UC

Huffman Encoding

• Find the binary representation for each character in the Huffman tree shown on the previous slide

9

Huffman Tree

A I

S E

F !

N UC

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

0

Huffman Encoding

• Decode the following bit string based on the Huffman tree shown in the previous slide.

• 11000000110011000011001001100101001001101

11

• 11000000110011000011001001100101001001101

A I

S E

F !

N UC

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

0

11000 000 11001 10 000 11001 0011 00101 00100 1101

C !NUFSIS

Huffman Encoding

• Encode the following bit string based on the Huffman tree shown in the previous slide.

• FUN CASE

13

• FUN CASE

A I

S E

F !

N UC

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

0

11000 00011001 010011 00101 00100 111

F U N C A S E

Build a Huffman Tree Given the following Frequency Counts

Character Frequency

A 100

B 30

C 40

D 10

E 130

F 20

G 5

H 25

I 57

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

75

175

55

112

242

417

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

55

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

75

55

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

75

55

112

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

75

175

55

112

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

75

175

55

112

242

Huffman Tree

5

G

10

D

20

F

40

C

100

A

25

H

30

B

57

I

130

E

15

35

75

175

55

112

242

417

Huffman Bit Encodings

Char Bits

A 01

B 1001

C 001

D 00001

E 11

F 0001

G 00000

H 1000

I 101

Encoding

• Input – ABCDEFGHIAA

• Output –

Decoding

• Input – 101001110000101001100011

• Output –