Data compession

Chameli Devi Group Of Institutions

Chameli Devi School Of Engineering

Guided By:-

Shadab Pasha

Submitted By:-

Arvind Carpenter

Contents

1. Introduction

2. Categorization of Compression

3. Lossless Compression

4. Run-length Encoding

5. Huffman Coding

6. Lempel Ziv (LZ) Encoding

7. Lossy Compression

8. Image Compression (JPEG) Encoding

9. Video Compression (MPEG) Encoding

10. Audio Compression (MP3)

11 Conclusion

12 References

Why

Video: 30 pictures per second

Each picture = 200,000 dots or pixels

8-bits to represent each primary color

For RGB = 28 x 28 x 28

Bits required for one second movie = 503316480 pixels

Two hour movie requires = 2 x 60 x 60 x 503316480

Compression is a way to reduce the number of bits in a

frame but retaining its meaning.

Decreases space, time to transmit, and cost

Technique is to identify redundancy and to eliminate it

If a file contains only capital letters, we may encode all

the 26 alphabets using 5-bit numbers instead of 8-bit

ASCII code

If the file had n-characters, then the savings = (8n-5n)/8n

=> 37.5%

Introduction

Categories of Compression

Lossless Compression

In lossless data compression:-

o The integrity of the data is preserved.

o The original data and the data after compression and

decompression are exactly the same.

o No data loss.

o Redundant data is removed in compression and added

during decompression.

o Lossless compression methods are normally used

when we cannot afford to lose any data.

Run-length Encoding

Run-length encoding is simple and lossless

Here

How

It Works

Is

Notice that here are 9

pieces of fruits

We can store these information as is.....

There is a much better way.......

Check

It

Out !

Currently to read the line

of fruits aloud exactly

it appears you would say.

Kind of redundant.......

To save on space We can

“Compress” The

Information.....

Notice that there are multiples of

certain fruits....

Simplify...

Now if we read these aloud it’s not

So weird

“Three apples, two pears, one banana, two oranges

and one apple”

.........And it saves SPACE

Now to translate into

computer terms...

A scan line contains a run of numbers...

55556987444425555611111988888222222222

...Using run-length Encoding

(4,5) (1,6) (1,9) (1,8) (1,7)

(4,4) (1,2) (4,5) (1,6) (5,1)

(1,9) (5,8) (9,2)

Run-length encoding (RLE) is a very simple

form of data compression in which runs of data

(that is, sequences in which the same data

value occurs in many consecutive data

elements) are stored as a single data value

and count, rather than as the original run

To Sum it up.....

In Wikipedia terms.....

http://en.wikipedia.org/wiki/Data_compression

Huffman Coding

Huffman coding is credited to David Albert Huffman

Huffman coding is an entropy encoding algorithm used

for lossless data compression.

Huffman coding is a method of storing strings of data as

binary code in efficient manner

Huffman coding uses variable length coding which

means that symbols in the data you are encoded are

converted in to a binary symbol based on how often that

symbol is used

There is a way to decide what binary code to give to each

character using trees

The (Real) Basic Algorithm

Scan text to be compressed and tally occurrence of all

characters.

Sort or prioritize characters based on number of

occurrences in text.

Build Huffman code tree based on prioritized list.

Perform a traversal of tree to determine all code words.

Scan text again and create new file using the Huffman

codes.

CS 102

Consider the following short text:

Eerie eyes seen near lake.

Count up the occurrences of all characters in the text

Building a TreeScan the original text

CS 102


What characters are present?

E e r i space

y s n a r l k .


CS 102


What is the frequency of each character in the

text?

Char FreqE 1

e 8

r 2

i 1

Space 4

y 1

s 2

n 2

Char Freqa 2

l 1

k 1

. 1


CS 102

The queue after inserting all nodes

Null Pointers are not shown

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Building a Tree

CS 102

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

BUILDING A TREE

CS

102

E1

i1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

BUILDING A TREE

CS

102

E1

i1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

82

BUILDING A TREE

CS

102

E1

i1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

82

y1

l1

2

BUILDING A TREE

CS

102

E1

i1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y1

l1

2

BUILDING A TREE

CS

102

BUILDING A TREE

E1

i1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

CS

102

BUILDING A TREE

E1

i1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

CS

102

BUILDING A TREE

E1

i1

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

CS

102

E1

i1

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

BUILDING A TREE

CS

102

E1

i1

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

BUILDING A TREE

CS

102

E1

i1

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

BUILDING A TREE

CS

102

E1

i1

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

4

BUILDING A TREE

CS

102

E1

i1

sp

4

e

82

y1

l1

2k1

.1

2

r2

s2

4

n2

a2

4 4

BUILDING A TREE

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4 4

6

BUILDING A TREE

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2r2

s2

4

n2

a2

4 4 6

What is happening to the characters with a low number of occurrences?

BUILDING A TREE

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

BUILDING A TREE

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46 8

BUILDING A TREE

CS

102

E1

i1

sp4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

10

BUILDING A TREE

CS

102

E1

i1

sp4

e

8

2

y1

l1

2

k1

.1

2r2

s2

4

n2

a2

4 46

8 10

BUILDING A TREE

CS

102

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

10

16

BUILDING A TREE

CS

102

E1

i1

sp4

e82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

BUILDING A TREE

CS

102

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

BUILDING A TREE

CS

102

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

After enqueueing this node

there is only one node left

in priority queue.

BUILDING A TREE

CS

10

2

Perform a traversal of the

tree to obtain new code

words

Going left is a 0 going right

is a 1

code word is only

completed when a leaf

node is reached E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

Encoding the FileTraverse Tree for Codes

CS

10

2ENCODING THE FILE

TRAVERSE TREE FOR CODES

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space 011e 10r 1100s 1101n 1110a 1111

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

CS

10

2

ENCODING THE FILE

Rescan text and encode file

using new code words


Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space 011e 10r 1100s 1101n 1110a 1111

0000101100000110011100010101101101

00111110101111110001100111111010010

0101

Why is there no need for a

separator character?.

CS

10

2ENCODING THE FILE

RESULTS

Have we made things any

better?

73 bits to encode the text

ASCII would take 8 * 26 =

208 bits

0000101100000110011100010101101101

00111110101111110001100111111010010

0101

Lemple Ziv (LZ) Encoding

Data compression up until the late 1970's mainly directed

towards creating better methodologies for Huffman coding.

An innovative, radically different method was introduced

in1977 by Abraham Lempel and Jacob Ziv.

This technique ( called Lempel-Ziv) actually consists of two

considerably different algorithms, LZ77 and LZ78.

Due to patents, LZ77 and LZ78 led to many variants.

LZHLZBLZSSLZRLZ77

Variants

LZFGLZJLZMWLZTLZCLZWLZ78

Variants

The zip and unzip use the LZH technique while UNIX's compress methods belong to the LZW and LZC classes

EXAMPLE : LZ78 COMPRESSIONEncode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm.

The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)

Note: The above is just a representation, the commas and parentheses are not transmitted;

we will discuss the actual form of the compressed message later on in slide 12.

EXAMPLE : LZ78 COMPRESSION (CONT’D)

1. A is not in the Dictionary; insert it

2. B is not in the Dictionary; insert it

3. B is in the Dictionary.

BC is not in the Dictionary; insert it.


BC is in the Dictionary.

BCA is not in the Dictionary; insert it.


BA is not in the Dictionary; insert it.



BCA is in the Dictionary.

BCAA is not in the Dictionary; insert it.



BCA is in the Dictionary.

BCAA is in the Dictionary.

BCAAB is not in the Dictionary; insert it.

Lossy Compression Methods

Used for compressing images and video files

(our eyes cannot distinguish subtle changes, so

lossy data is acceptable).

These methods are cheaper, less time and

space.

Several methods:

JPEG: compress pictures and graphics

MPEG: compress video

MP3: compress audio

JPEG Encoding

Used to compress pictures and graphics.

In JPEG, a grayscale picture is divided into 8x8

pixel blocks to decrease the number of

calculations.

Basic idea: Change the picture into a linear (vector) sets of numbers that

reveals the redundancies.

The redundancies is then removed by one of lossless

compression methods.

JPEG Encoding - DCT

DCT: Discrete Concise Transform

DCT transforms the 64 values in 8x8 pixel block

in a way that the relative relationships between

pixels are kept but the redundancies are

revealed.

Example:

A gradient grayscale

Quantization & Compression

Quantization: After T table is created, the values are quantized to reduce the

number of bits needed for encoding.

Quantization divides the number of bits by a constant, then

drops the fraction. This is done to optimize the number of bits

and the number of 0s for each particular application.

• Compression: Quantized values are read from the table and redundant 0s are

removed.

To cluster the 0s together, the table is read diagonally in an

zigzag fashion. The reason is if the table doesn’t have fine

changes, the bottom right corner of the table is all 0s.

JPEG usually uses lossless run-length encoding at the

compression phase.

JPEG Encoding

MPEG Encoding

Used to compress video.

Basic idea:

Each video is a rapid sequence of a set of

frames. Each frame is a spatial combination

of pixels, or a picture.

Compressing video =

spatially compressing each frame

+

temporally compressing a set of

frames.

MPEG Encoding

• Spatial Compression• Each frame is spatially compressed by JPEG.

• Temporal Compression• Redundant frames are removed.

• For example, in a static scene in which someone is talking,

most frames are the same except for the segment around the

speaker’s lips, which changes from one frame to the next.

Audio Compression

Used for speech or music Speech: compress a 64 kHz digitized signal

Music: compress a 1.411 MHz signal

Two categories of techniques: Predictive encoding

Perceptual encoding

•Predictive Encoding•Only the differences between samples are encoded, not

the whole sample values.

•Several standards: GSM (13 kbps), G.729 (8 kbps), and

G.723.3 (6.4 or 5.3 kbps)

•Perceptual Encoding: MP3•CD-quality audio needs at least 1.411 Mbps and cannot

be sent over the Internet without compression.

•MP3 (MPEG audio layer 3) uses perceptual encoding

technique to compress audio.

Audio Encoding

Conclusion

Compression is used in all types of data

to save space and time. There are two

types of data compression-lossy and

lossless. Lossy techniques are used for

images, videos and audios, where we

can bear data loss. Lossless technique

is used for textual data it can be

encoded through run-length, Huffman

and Lempel Ziv.

References

http://www.csie.kuas.edu.tw/course/cs/englis

h/ch-15.ppt

CS157B-Lecture 19 by Professor Lee

http://cs.sjsu.edu/~lee/cs157b/cs157b.html

“The essentials of computer organization

and architecture” by Linda Null and Julia

Nobur

.

http://www.wekipedia.com

http://www.csie.kuas.edu.tw/course/cs/english/ch-15.ppt

http://cs.sjsu.edu/~lee/cs157b/cs157b.html

ThankYou

Data Compression

Questions

Education

Data compession