68
Chameli Devi Group Of Institutions Chameli Devi School Of Engineering Guided By: - Shadab Pasha Submitted By: - Arvind Carpenter

Data compession

Embed Size (px)

Citation preview

Page 1: Data compession

Chameli Devi Group Of Institutions

Chameli Devi School Of Engineering

Guided By:-

Shadab Pasha

Submitted By:-

Arvind Carpenter

Page 2: Data compession

Contents

1. Introduction

2. Categorization of Compression

3. Lossless Compression

4. Run-length Encoding

5. Huffman Coding

6. Lempel Ziv (LZ) Encoding

7. Lossy Compression

8. Image Compression (JPEG) Encoding

9. Video Compression (MPEG) Encoding

10. Audio Compression (MP3)

11 Conclusion

12 References

Page 3: Data compession

Why

Page 4: Data compession

Video: 30 pictures per second

Each picture = 200,000 dots or pixels

8-bits to represent each primary color

For RGB = 28 x 28 x 28

Bits required for one second movie = 503316480 pixels

Two hour movie requires = 2 x 60 x 60 x 503316480

Page 5: Data compession
Page 6: Data compession

Compression is a way to reduce the number of bits in a

frame but retaining its meaning.

Decreases space, time to transmit, and cost

Technique is to identify redundancy and to eliminate it

If a file contains only capital letters, we may encode all

the 26 alphabets using 5-bit numbers instead of 8-bit

ASCII code

If the file had n-characters, then the savings = (8n-5n)/8n

=> 37.5%

Introduction

Page 7: Data compession

Categories of Compression

Page 8: Data compession

Lossless Compression

In lossless data compression:-

o The integrity of the data is preserved.

o The original data and the data after compression and

decompression are exactly the same.

o No data loss.

o Redundant data is removed in compression and added

during decompression.

o Lossless compression methods are normally used

when we cannot afford to lose any data.

Page 9: Data compession

Run-length Encoding

Run-length encoding is simple and lossless

Here

How

It Works

Is

Page 10: Data compession

Notice that here are 9

pieces of fruits

We can store these information as is.....

Page 11: Data compession
Page 12: Data compession

There is a much better way.......

Check

It

Out !

Page 13: Data compession

Currently to read the line

of fruits aloud exactly

it appears you would say.

Kind of redundant.......

Page 14: Data compession

To save on space We can

“Compress” The

Information.....

Page 15: Data compession

Notice that there are multiples of

certain fruits....

Page 16: Data compession

Simplify...

Page 17: Data compession

Now if we read these aloud it’s not

So weird

“Three apples, two pears, one banana, two oranges

and one apple”

.........And it saves SPACE

Page 18: Data compession

Now to translate into

computer terms...

A scan line contains a run of numbers...

55556987444425555611111988888222222222

...Using run-length Encoding

(4,5) (1,6) (1,9) (1,8) (1,7)

(4,4) (1,2) (4,5) (1,6) (5,1)

(1,9) (5,8) (9,2)

Page 19: Data compession

Run-length encoding (RLE) is a very simple

form of data compression in which runs of data

(that is, sequences in which the same data

value occurs in many consecutive data

elements) are stored as a single data value

and count, rather than as the original run

To Sum it up.....

In Wikipedia terms.....

Page 20: Data compession

Huffman Coding

Huffman coding is credited to David Albert Huffman

Huffman coding is an entropy encoding algorithm used

for lossless data compression.

Huffman coding is a method of storing strings of data as

binary code in efficient manner

Huffman coding uses variable length coding which

means that symbols in the data you are encoded are

converted in to a binary symbol based on how often that

symbol is used

There is a way to decide what binary code to give to each

character using trees

Page 21: Data compession

The (Real) Basic Algorithm

Scan text to be compressed and tally occurrence of all

characters.

Sort or prioritize characters based on number of

occurrences in text.

Build Huffman code tree based on prioritized list.

Perform a traversal of tree to determine all code words.

Scan text again and create new file using the Huffman

codes.

Page 22: Data compession

CS 102

Consider the following short text:

Eerie eyes seen near lake.

Count up the occurrences of all characters in the text

Building a TreeScan the original text

Page 23: Data compession

CS 102

Eerie eyes seen near lake.

What characters are present?

E e r i space

y s n a r l k .

Building a TreeScan the original text

Page 24: Data compession

CS 102

Eerie eyes seen near lake.

What is the frequency of each character in the

text?

Char FreqE 1

e 8

r 2

i 1

Space 4

y 1

s 2

n 2

Char Freqa 2

l 1

k 1

. 1

Building a TreeScan the original text

Page 25: Data compession

CS 102

The queue after inserting all nodes

Null Pointers are not shown

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

Building a Tree

Page 26: Data compession

CS 102

E

1

i

1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

BUILDING A TREE

Page 27: Data compession

CS

102

E1

i1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

BUILDING A TREE

Page 28: Data compession

CS

102

E1

i1

y

1

l

1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

82

BUILDING A TREE

Page 29: Data compession

CS

102

E1

i1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

82

y1

l1

2

BUILDING A TREE

Page 30: Data compession

CS

102

E1

i1

k

1

.

1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y1

l1

2

BUILDING A TREE

Page 31: Data compession

CS

102

BUILDING A TREE

E1

i1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

Page 32: Data compession

CS

102

BUILDING A TREE

E1

i1

r

2

s

2

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

Page 33: Data compession

CS

102

BUILDING A TREE

E1

i1

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

Page 34: Data compession

CS

102

E1

i1

n

2

a

2

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

BUILDING A TREE

Page 35: Data compession

CS

102

E1

i1

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

BUILDING A TREE

Page 36: Data compession

CS

102

E1

i1

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

BUILDING A TREE

Page 37: Data compession

CS

102

E1

i1

sp

4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

4

BUILDING A TREE

Page 38: Data compession

CS

102

E1

i1

sp

4

e

82

y1

l1

2k1

.1

2

r2

s2

4

n2

a2

4 4

BUILDING A TREE

Page 39: Data compession

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4 4

6

BUILDING A TREE

Page 40: Data compession

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2r2

s2

4

n2

a2

4 4 6

What is happening to the characters with a low number of occurrences?

BUILDING A TREE

Page 41: Data compession

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

BUILDING A TREE

Page 42: Data compession

CS

102

E1

i1

sp4

e

82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46 8

BUILDING A TREE

Page 43: Data compession

CS

102

E1

i1

sp4

e

8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

10

BUILDING A TREE

Page 44: Data compession

CS

102

E1

i1

sp4

e

8

2

y1

l1

2

k1

.1

2r2

s2

4

n2

a2

4 46

8 10

BUILDING A TREE

Page 45: Data compession

CS

102

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

10

16

BUILDING A TREE

Page 46: Data compession

CS

102

E1

i1

sp4

e82

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

BUILDING A TREE

Page 47: Data compession

CS

102

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

BUILDING A TREE

Page 48: Data compession

CS

102

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

After enqueueing this node

there is only one node left

in priority queue.

BUILDING A TREE

Page 49: Data compession

CS

10

2

Perform a traversal of the

tree to obtain new code

words

Going left is a 0 going right

is a 1

code word is only

completed when a leaf

node is reached E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

Encoding the FileTraverse Tree for Codes

Page 50: Data compession

CS

10

2ENCODING THE FILE

TRAVERSE TREE FOR CODES

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space 011e 10r 1100s 1101n 1110a 1111

E1

i1

sp4

e8

2

y1

l1

2

k1

.1

2

r2

s2

4

n2

a2

4

46

8

1016

26

Page 51: Data compession

CS

10

2

ENCODING THE FILE

Rescan text and encode file

using new code words

Eerie eyes seen near lake.

Char CodeE 0000i 0001y 0010l 0011k 0100. 0101space 011e 10r 1100s 1101n 1110a 1111

0000101100000110011100010101101101

00111110101111110001100111111010010

0101

Why is there no need for a

separator character?.

Page 52: Data compession

CS

10

2ENCODING THE FILE

RESULTS

Have we made things any

better?

73 bits to encode the text

ASCII would take 8 * 26 =

208 bits

0000101100000110011100010101101101

00111110101111110001100111111010010

0101

Page 53: Data compession

Lemple Ziv (LZ) Encoding

Data compression up until the late 1970's mainly directed

towards creating better methodologies for Huffman coding.

An innovative, radically different method was introduced

in1977 by Abraham Lempel and Jacob Ziv.

This technique ( called Lempel-Ziv) actually consists of two

considerably different algorithms, LZ77 and LZ78.

Due to patents, LZ77 and LZ78 led to many variants.

LZHLZBLZSSLZRLZ77

Variants

LZFGLZJLZMWLZTLZCLZWLZ78

Variants

The zip and unzip use the LZH technique while UNIX's compress methods belong to the LZW and LZC classes

Page 54: Data compession

EXAMPLE : LZ78 COMPRESSIONEncode (i.e., compress) the string ABBCBCABABCAABCAAB using the LZ78 algorithm.

The compressed message is: (0,A)(0,B)(2,C)(3,A)(2,A)(4,A)(6,B)

Note: The above is just a representation, the commas and parentheses are not transmitted;

we will discuss the actual form of the compressed message later on in slide 12.

Page 55: Data compession

EXAMPLE : LZ78 COMPRESSION (CONT’D)

1. A is not in the Dictionary; insert it

2. B is not in the Dictionary; insert it

3. B is in the Dictionary.

BC is not in the Dictionary; insert it.

4. B is in the Dictionary.

BC is in the Dictionary.

BCA is not in the Dictionary; insert it.

5. B is in the Dictionary.

BA is not in the Dictionary; insert it.

6. B is in the Dictionary.

BC is in the Dictionary.

BCA is in the Dictionary.

BCAA is not in the Dictionary; insert it.

7. B is in the Dictionary.

BC is in the Dictionary.

BCA is in the Dictionary.

BCAA is in the Dictionary.

BCAAB is not in the Dictionary; insert it.

Page 56: Data compession

Lossy Compression Methods

Used for compressing images and video files

(our eyes cannot distinguish subtle changes, so

lossy data is acceptable).

These methods are cheaper, less time and

space.

Several methods:

JPEG: compress pictures and graphics

MPEG: compress video

MP3: compress audio

Page 57: Data compession

JPEG Encoding

Used to compress pictures and graphics.

In JPEG, a grayscale picture is divided into 8x8

pixel blocks to decrease the number of

calculations.

Basic idea: Change the picture into a linear (vector) sets of numbers that

reveals the redundancies.

The redundancies is then removed by one of lossless

compression methods.

Page 58: Data compession

JPEG Encoding - DCT

DCT: Discrete Concise Transform

DCT transforms the 64 values in 8x8 pixel block

in a way that the relative relationships between

pixels are kept but the redundancies are

revealed.

Example:

A gradient grayscale

Page 59: Data compession

Quantization & Compression

Quantization: After T table is created, the values are quantized to reduce the

number of bits needed for encoding.

Quantization divides the number of bits by a constant, then

drops the fraction. This is done to optimize the number of bits

and the number of 0s for each particular application.

• Compression: Quantized values are read from the table and redundant 0s are

removed.

To cluster the 0s together, the table is read diagonally in an

zigzag fashion. The reason is if the table doesn’t have fine

changes, the bottom right corner of the table is all 0s.

JPEG usually uses lossless run-length encoding at the

compression phase.

Page 60: Data compession

JPEG Encoding

Page 61: Data compession

MPEG Encoding

Used to compress video.

Basic idea:

Each video is a rapid sequence of a set of

frames. Each frame is a spatial combination

of pixels, or a picture.

Compressing video =

spatially compressing each frame

+

temporally compressing a set of

frames.

Page 62: Data compession

MPEG Encoding

• Spatial Compression• Each frame is spatially compressed by JPEG.

• Temporal Compression• Redundant frames are removed.

• For example, in a static scene in which someone is talking,

most frames are the same except for the segment around the

speaker’s lips, which changes from one frame to the next.

Page 63: Data compession

Audio Compression

Used for speech or music Speech: compress a 64 kHz digitized signal

Music: compress a 1.411 MHz signal

Two categories of techniques: Predictive encoding

Perceptual encoding

Page 64: Data compession

•Predictive Encoding•Only the differences between samples are encoded, not

the whole sample values.

•Several standards: GSM (13 kbps), G.729 (8 kbps), and

G.723.3 (6.4 or 5.3 kbps)

•Perceptual Encoding: MP3•CD-quality audio needs at least 1.411 Mbps and cannot

be sent over the Internet without compression.

•MP3 (MPEG audio layer 3) uses perceptual encoding

technique to compress audio.

Audio Encoding

Page 65: Data compession

Conclusion

Compression is used in all types of data

to save space and time. There are two

types of data compression-lossy and

lossless. Lossy techniques are used for

images, videos and audios, where we

can bear data loss. Lossless technique

is used for textual data it can be

encoded through run-length, Huffman

and Lempel Ziv.

Page 66: Data compession

References

http://www.csie.kuas.edu.tw/course/cs/englis

h/ch-15.ppt

CS157B-Lecture 19 by Professor Lee

http://cs.sjsu.edu/~lee/cs157b/cs157b.html

“The essentials of computer organization

and architecture” by Linda Null and Julia

Nobur

.

http://www.wekipedia.com

Page 67: Data compession

ThankYou

Page 68: Data compession

Data Compression

Questions