If you can't read please download the document
View
1
Download
0
Embed Size (px)
Multimedia Systems
Entropy Coding
Mahdi Amiri
October 2015
Sharif University of Technology
Course Presentation
Assumptions:
Single source and user
Unlimited complexity and delay
Page 1 Multimedia Systems, Entropy Coding
Source and Channel Coding Shannon's Separation Principle
Information
Source
Generates information
we want to transmit or
store.
Source
Coding
Channel
Coding
Reduces number of bits
to store or transmit
relevant information.
Increases number of bits
or changes them to protect
against channel errors.
What about joint source and channel coding?
Coding related elements in a communication system.
Claude E.
Shannon,
1916-2001
Ref.: en.wikipedia.org/wiki/Information_theory
information source: en.wikipedia.org/wiki/Information_source
source coding : en.wikipedia.org/wiki/Data_compression
Channel coding: en.wikipedia.org/wiki/Forward_error_correction
Page 2 Multimedia Systems, Entropy Coding
Source Coding Motivation
Data storage and transmission cost money.
Use fewest number of bits to represent information source.
Pro:
Less memory, less transmission time.
Cons:
Extra processing required.
Distortion (if using lossy compression ).
Data has to be decompressed to be represented, this
may cause delay.
Page 3 Multimedia Systems, Entropy Coding
Source Coding Principles
Example
The source coder shall represent the video signal by the minimum number of
(binary) symbols without exceeding an acceptable level of distortion.
Two principles are utilized:
1. Properties of the information source that are known a priori result in
redundant information that need not be transmitted (“redundancy
reduction“).
2. The human observer does not perceive certain deviations of the
received signal from the original (“irrelevancy reduction“).
Approaches:
Lossless coding: completely reversible, exploit 1. principle only.
Lossy coding: not reversible, exploit 1. and 2. principle.
Page 4 Multimedia Systems, Entropy Coding
Data Compression Lossless and Lossy
Lossless
Exact reconstruction is possible.
Applied to general data.
Lower compression rates.
Examples: Run-length, Huffman, Lempel-Ziv.
Lossy
Higher compression rates.
Applied to audio, image and video.
Examples: CELP, JPEG, MPEG-2.
Page 5 Multimedia Systems, Entropy Coding
Data Compression Codec (Encoder and Decoder)
T
Transform,
prediction
Reconstructed
signal
Original
signal
T-1
Q
Q-1
E
E-1
Compressed
bit-stream
Inverse
Transform
Quantization
Dequantization
Entropy
encoder
Entropy
decoder
General structure of a Codec.
In information theory an entropy encoding is a
lossless data compression scheme that is independent
of the specific characteristics of the medium. Ref.: en.wikipedia.org/wiki/Entropy_(information_theory)
en.wikipedia.org/wiki/Entropy_encoding
Run-length encoding
Fixed Length Coding (FLC)
Variable Length Coding (VLC)
Huffman Coding Algorithm
Entropy, Definition
Lempel-Ziv (LZ77)
Lempel-Ziv-Welch (LZW)
Arithmetic Coding
Page 6 Multimedia Systems, Entropy Coding
Entropy Coding Selected Topics and Algorithms
Page 7 Multimedia Systems, Entropy Coding
Lossless Compression Run-Length Encoding (RLE)
BBBBHHDDXXXXKKKKWWZZZZ 4B2H2D4X4K2W4Z
Image of a rectangle
0, 40
0, 40
0,10 1,20 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,1 0,18 1,1 0,10
0,10 1,20 0,10
0,40
RLE used in
Fax machines.
Page 8 Multimedia Systems, Entropy Coding
Lossless Compression Fixed Length Coding (FLC)
A simple example
►♣♣♠☻►♣☼►☻The message to code:
5 different symbols � at least 3 bits
Message length: 10 symbols
Total bits required to code: 10*3 = 30 bits
Codeword table
Page 9 Multimedia Systems, Entropy Coding
Lossless Compression Variable Length Coding (VLC)
Intuition: Those symbols that are more frequent should have smaller codes, yet since their length is not the same, there must be a way of distinguishing each code
►♣♣♠☻►♣☼►☻The message to code:
Total bits required to code: 3*2 +3*2+2*2+3+3 = 24 bits
Codeword table To identify end of a codeword as
soon as it arrives, no codeword can
be a prefix of another codeword
How to find the optimal codeword table?
Page 10 Multimedia Systems, Entropy Coding
Lossless Compression VLC, Example Application
Morse code
nonprefix code
Needs separator symbol
for unique decodability
Page 11 Multimedia Systems, Entropy Coding
Lossless Compression Huffman Coding Algorithm
Step 1: Take the two least probable symbols in the alphabet
(longest codewords, equal length, differing in last digit)
Step 2: Combine these two symbols into a single symbol, and repeat.
P(n): Probability of
symbol number n
Here there is 9 symbols.
e.g. symbols can be
alphabet letters ‘a’, ‘b’, ‘c’,
‘d’, ‘e’, ‘f’, ‘g’, ‘h’, ‘i’
Page 12 Multimedia Systems, Entropy Coding
Lossless Compression Huffman Coding Algorithm
David A. Huffman
1925-1999
Paper: "A Method for the Construction of
Minimum-Redundancy Codes“, 1952
Results in "prefix-free codes“
Most efficient
No other mapping will produce a smaller average output size,
If the actual symbol frequencies agree with those used to create the code.
Cons:
Have to run through the entire data in advance to find frequencies.
‘Minimum-Redundancy’ is not favorable for error correction techniques (bits
are not predictable if e.g. one is missing).
Does not support block of symbols: Huffman is designed to code single
characters only. Therefore at least one bit is required per character, e.g. a word of
8 characters requires at least an 8 bit code.
Page 13 Multimedia Systems, Entropy Coding
Entropy Coding Entropy, Definition
The entropy, H, of a discrete random variable X is a measure of the
amount of uncertainty associated with the value of X.
Measure of information content (in bits)
A quantitative measure of the disorder of a system
It is impossible to compress the data such that the average
number of bits per symbol is less than the Shannon entropy
of the source(in noiseless channel)
The Intuition Behind the Formula
( ) ( ) ( )2 1
log x X
H X P x P x∈
= ⋅∑
Claude E. Shannon
1916-2001( ) ( ) 1
amount of uncertatinty P x H P x
↑ ⇒ ↓ ⇒ ∼
( ) ( )2
1 bringing it to the world of bits log , information content of H I x x
P x ⇒ ∼ =
( )weighted average number of bits required to encode each possible value and P x⇒ × ∑
X � Information Source
P(x) � Probability that symbol x in X will occur
Information Theory
Point of View
Page 14 Multimedia Systems, Entropy Coding
Lossless Compression Lempel-Ziv (LZ77)
Algorithm for compression of character sequences
Assumption: Sequences of characters are repeated
Idea: Replace a character sequence by a reference to an earlier occurrence 1. Define a: search buffer = (portion) of recently encoded data
look-ahead buffer = not yet encoded data
2. Find the longest match between
the first characters of the look ahead buffer
and an arbitrary character sequence in the search buffer
3. Produces output
offset + length = reference to earlier occurrence
next_character = the first character following the match in the look ahead buffer
Page 15 Multimedia Systems, Entropy Coding
Lossless Compression Lempel-Ziv-Welch (LZW)
Drops the search buffer and keeps an explicit dictionary
Produces only output
Used by unix "compress", "GIF", "V24.bis", "TIFF”
Example: wabbapwabbapwabbapwabbapwoopwoopwoo
Progress clip at 12th entry
Encoder output sequence so far: 5 2 3 3 2 1
Page 16 Multimedia Systems, Entropy Coding
Lossless Compression Lempel-Ziv-Welch (LZW)
Example: wabbapwabbapwabbapwabbapwoopwoopwoo
Progress clip at the end of above example
Encoder output sequence: 5 2 3 3 2 1
6 8 10 12 9 11 7 16 5 4 4 11 21 23 4
Page 17 Multimedia Systems, Entropy Coding
Lossless Compression Arithmetic Coding
Encodes the block of symbols into a single number, a
fraction n where (0