1
Data compression
Data compression - why?
�Money, money, money…
– Data storage
more data > more storage space >> more money
– Data transmission
more data > more time >> more money
Data compression - how??
�With loss of data
– jpeg, mpeg, mp3
�Without loss of data
– Ziv-Lempel, fax, Huffman, tiff
Data compression - how???
�Keyword: Entropy
Example 1
�YSTRDY LL M TRBLS SMD S FR WY H
BLV N YSTRDY
Example 1
�YESTERDAY ALL MY TROUBLES
SEEMED SO FAR AWAY OH I BELIEVE
IN YESTERDAY
– 69 letters >> 552 bits
2
Example 1
space 12 B 2
Y 6 F 1
E 11 W 1
S 5 I 3
T 3 H 1
R 4 N 1
D 3 STOP 1
A 6 byte 70
L 4 bit 560
M 2
O 3
U 1
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
3
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
4
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
15
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
15
20
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
15
20
23
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
15
20
23
27
5
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
15
20
23
27
43
Example 1
sp 12
E 11
Y 6
A 6
S 5
R 4
L 4
T 3
O 3
I 3
D 3
M 2
B 2
U 1
F 1
W 1
H 1
N 1
stop 1 2
2
2
4
4
5
6
7
8
9
11
12
15
20
23
27
43
70
Example 1
space 12 111 36 B 2 00011 10
Y 6 011 18 F 1 000100 6
E 11 110 33 W 1 000011 6
S 5 1011 20 I 3 10010 15
T 3 0010 12 H 1 000010 6
R 4 1010 16 N 1 000001 6
D 3 10001 15 STOP 1 000000 6
A 6 010 18
L 4 0011 16 bit 560 270
M 2 10000 10 byte 70 34
O 3 10011 15 49%
U 1 000101 6
Example 1
0 1
00 01 10 11
110 111
spE
010 011
A Y
100 101
1010 1011
R S
1000 1001
1001010000 1001110001
OIM D
000 001
0010 0011
LT
0000 0001
00000 00001
000000 000001 000010 000011
stop N H W
00010 00011
B000100 000101
F U
Example 1
00001011,00011001,11001100,00000000
Example 1
�That was the Huffman-code, which is a
prefix-code.
�Used by ARJ utility
6
Example 2
� Fax machine
Example 2
X X X X X X X X
X X X X X X X X
X X X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X
X X X X
10, -1, 1, 8, 1, -1, 1, 8, 1, -1, 1, 1,
2, 2, 2, 1, 1, -1, 4, 2, 4, -1, 4, 2, 4, -
1, 4, 2, 4, -1, 4, 2, 4, -1, 4, 2, 4, -1,
4, 2, 4, -1, 4, 2, 4, -1, 4, 2, 4, -1, 4,
2, 4, -1, 4, 2, 4, -1, 4, 2, 4, -1, 4, 2,
4, -1, 4, 2, 4, -1, 3, 4, 3, -1, 10, -1, -
1
77 byte
Example 2
10, -1, 1, 8, 1, -1, 10, -1, 2, 2, 2, 2,
2, -1, 1, 1, 6, 1, 1, -1, 10, -1, 10, -1,
10, -1, 10, -1, 10, -1, 10, -1, 10, -1,
10, -1, 10, -1, 10, -1, 10, -1, 10, -1,
3, 1, 2, 1, 3, -1, 3, 4, 3, -1, -1
55 byte >> 71,43%
X X X X X X X X
X X X X
X X
X X
X X X X
Example 3
�Ziv-Lempel: table based lookup algorithm
YESTERDAY ALL MY TROUBLES SEEMED SO FAR AWAY OH I BELIEVE IN YESTERDAY
@ ALL MY TROUBLES SEEMED SO FAR AWAY OH I BELIEVE IN @\@YESTERDAY
69 letters >> 65 letters
Compression with loss of data
�Type 1: YESTERDAY >> YSTRDY
– Based on human reason and cognitive skills
Compression with loss of data
�Type 2: jpeg, mpeg, mp3
– Cheating the human senses
– XXXXXXX
– XXXXXXX
7
Compression with loss of data
�Type 2: jpeg, mpeg, mp3
– Cheating the human senses
– XXXXXXX (38, 168, 24 the middle)
– XXXXXXX (36, 163, 23 the middle)
Compression with loss of data
�Type 3: Original data is not needed
– I want to check only
3 billion
bytes
Compression with loss of data
�Type 3: Original data is not needed
– I want to check only
3 billion
bytes
Spies!
Compression with loss of data
�Type 3: Original data is not needed
– I want to check only
1 byte = 1 USD
Compression with loss of data
�Type 3: Original data is not needed
– Transmission is secure.
– They calculate the total of 3 billion bytes and
divide it by a randomly selected 150 digit
number on both sides. They send the divisor
and the residual.
– If the residual is the same on both sides:
probability of error < (1E-150) squared
Data enlarging
�To recognize an error
– CRC
�To correct an error
– Audio CD
8
Compression utilities
� WinZip
�Windows Commander
�WindowsXP
�…