- Home
- Documents
*Lec05, Entropy Coding, v1.06.pptce. ... Page 13 Multimedia Systems, Entropy Coding Entropy Coding...*

If you can't read please download the document

View

1Download

0

Embed Size (px)

Multimedia Systems

Entropy Coding

Mahdi Amiri

October 2015

Sharif University of Technology

Course Presentation

Assumptions:

Single source and user

Unlimited complexity and delay

Page 1 Multimedia Systems, Entropy Coding

Source and Channel Coding Shannon's Separation Principle

Information

Source

Generates information

we want to transmit or

store.

Source

Coding

Channel

Coding

Reduces number of bits

to store or transmit

relevant information.

Increases number of bits

or changes them to protect

against channel errors.

What about joint source and channel coding?

Coding related elements in a communication system.

Claude E.

Shannon,

1916-2001

Ref.: en.wikipedia.org/wiki/Information_theory

information source: en.wikipedia.org/wiki/Information_source

source coding : en.wikipedia.org/wiki/Data_compression

Channel coding: en.wikipedia.org/wiki/Forward_error_correction

Page 2 Multimedia Systems, Entropy Coding

Source Coding Motivation

Data storage and transmission cost money.

Use fewest number of bits to represent information source.

Pro:

Less memory, less transmission time.

Cons:

Extra processing required.

Distortion (if using lossy compression ).

Data has to be decompressed to be represented, this

may cause delay.

Page 3 Multimedia Systems, Entropy Coding

Source Coding Principles

Example

The source coder shall represent the video signal by the minimum number of

(binary) symbols without exceeding an acceptable level of distortion.

Two principles are utilized:

1. Properties of the information source that are known a priori result in

redundant information that need not be transmitted (“redundancy

reduction“).

2. The human observer does not perceive certain deviations of the

received signal from the original (“irrelevancy reduction“).

Approaches:

Lossless coding: completely reversible, exploit 1. principle only.

Lossy coding: not reversible, exploit 1. and 2. principle.

Page 4 Multimedia Systems, Entropy Coding

Data Compression Lossless and Lossy

Lossless

Exact reconstruction is possible.

Applied to general data.

Lower compression rates.

Examples: Run-length, Huffman, Lempel-Ziv.

Lossy

Higher compression rates.

Applied to audio, image and video.

Examples: CELP, JPEG, MPEG-2.

Page 5 Multimedia Systems, Entropy Coding

Data Compression Codec (Encoder and Decoder)

T

Transform,

prediction

Reconstructed

signal

Original

signal

T-1

Q

Q-1

E

E-1

Compressed

bit-stream

Inverse

Transform

Quantization

Dequantization

Entropy

encoder

Entropy

decoder

General structure of a Codec.

In information theory an entropy encoding is a

lossless data compression scheme that is independent

of the specific characteristics of the medium. Ref.: en.wikipedia.org/wiki/Entropy_(information_theory)

en.wikipedia.org/wiki/Entropy_encoding

Run-length encoding

Fixed Length Coding (FLC)

Variable Length Coding (VLC)

Huffman Coding Algorithm

Entropy, Definition

Lempel-Ziv (LZ77)

Lempel-Ziv-Welch (LZW)

Arithmetic Coding

Page 6 Multimedia Systems, Entropy Coding

Entropy Coding Selected Topics and Algorithms

Page 7 Multimedia Systems, Entropy Coding

Lossless Compression Run-Length Encoding (RLE)

BBBBHHDDXXXXKKKKWWZZZZ 4B2H2D4X4K2W4Z

Image of a rectangle

0, 40

0, 40

0,10 1,20 0,10

0,10 1,1 0,18 1,1 0,10

0,10 1,1 0,18 1,1 0,10

0,10 1,1 0,18 1,1 0,10

0,10 1,20 0,10

0,40

RLE used in

Fax machines.

Page 8 Multimedia Systems, Entropy Coding

Lossless Compression Fixed Length Coding (FLC)

A simple example

►♣♣♠☻►♣☼►☻The message to code:

5 different symbols � at least 3 bits

Message length: 10 symbols

Total bits required to code: 10*3 = 30 bits

Codeword table

Page 9 Multimedia Systems, Entropy Coding

Lossless Compression Variable Length Coding (VLC)

Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code

►♣♣♠☻►♣☼►☻The message to code:

Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits

Codeword table To identify end of a codeword as

soon as it arrives, no codeword can

be a prefix of another codeword

How to find the optimal codeword table?

Page 10 Multimedia Systems, Entropy Coding

Lossless Compression VLC, Example Application

Morse code

nonprefix code

Needs separator symbol

for unique decodability

Page 11 Multimedia Systems, Entropy Coding

Lossless Compression Huffman Coding Algorithm

Step 1: Take the two least probable symbols in the alphabet

(longest codewords, equal length, differing in last digit)

Step 2: Combine these two symbols into a single symbol, and repeat.

P(n): Probability of

symbol number n

Here there is 9 symbols.

e.g. symbols can be

alphabet letters ‘a’, ‘b’, ‘c’,

‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’

Page 12 Multimedia Systems, Entropy Coding

Lossless Compression Huffman Coding Algorithm

David A. Huffman

1925-1999

Paper: "A Method for the Construction of

Minimum-Redundancy Codes“, 1952

Results in "prefix-free codes“

Most efficient

No other mapping will produce a smaller average output size,

If the actual symbol frequencies agree with those used to create the code.

Cons:

Have to run through the entire data in advance to find frequencies.

‘Minimum-Redundancy’ is not favorable for error correction techniques (bits

are not predictable if e.g. one is missing).

Does not support block of symbols: Huffman is designed to code single

characters only. Therefore at least one bit is required per character, e.g. a word of

8 characters requires at least an 8 bit code.

Page 13 Multimedia Systems, Entropy Coding

Entropy Coding Entropy, Definition

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

Measure of information content (in bits)

A quantitative measure of the disorder of a system

It is impossible to compress the data such that the average

number of bits per symbol is less than the Shannon entropy

of the source(in noiseless channel)

The Intuition Behind the Formula

( ) ( ) ( )2 1

log x X

H X P x P x∈

= ⋅∑

Claude E. Shannon

1916-2001( ) ( ) 1

amount of uncertatinty P x H P x

↑ ⇒ ↓ ⇒ ∼

( ) ( )2

1 bringing it to the world of bits log , information content of H I x x

P x ⇒ ∼ =

( )weighted average number of bits required to encode each possible value and P x⇒ × ∑

X � Information Source

P(x) � Probability that symbol x in X will occur

Information Theory

Point of View

Page 14 Multimedia Systems, Entropy Coding

Lossless Compression Lempel-Ziv (LZ77)

Algorithm for compression of character sequences

Assumption: Sequences of characters are repeated

Idea: Replace a character sequence by a reference to an earlier occurrence 1. Define a: search buffer = (portion) of recently encoded data

look-ahead buffer = not yet encoded data

2. Find the longest match between

the first characters of the look ahead buffer

and an arbitrary character sequence in the search buffer

3. Produces output

offset + length = reference to earlier occurrence

next_character = the first character following the match in the look ahead buffer

Page 15 Multimedia Systems, Entropy Coding

Lossless Compression Lempel-Ziv-Welch (LZW)

Drops the search buffer and keeps an explicit dictionary

Produces only output

Used by unix "compress", "GIF", "V24.bis", "TIFF”

Example: wabbapwabbapwabbapwabbapwoopwoopwoo

Progress clip at 12th entry

Encoder output sequence so far: 5 2 3 3 2 1

Page 16 Multimedia Systems, Entropy Coding

Lossless Compression Lempel-Ziv-Welch (LZW)

Example: wabbapwabbapwabbapwabbapwoopwoopwoo

Progress clip at the end of above example

Encoder output sequence: 5 2 3 3 2 1

6 8 10 12 9 11 7 16 5 4 4 11 21 23 4

Page 17 Multimedia Systems, Entropy Coding

Lossless Compression Arithmetic Coding

Encodes the block of symbols into a single number, a

fraction n where (0