seminarDataCompression.ppt

Embed Size (px)

Citation preview

  • 8/14/2019 seminarDataCompression.ppt

    1/32

    DATA COMPRESSION

    $ITS TECHNIQUES

    Presented By:

    Gursheen Kaur Kohli

    Roll No.- 133517

    M.Tech CSE

  • 8/14/2019 seminarDataCompression.ppt

    2/32

    DATA COMPRESSION

    The process of reducing the volume ofdata by applying a compression technique iscalled compression.The resulting data is

    called compressed data. The reverse process of reproducing the

    original data from compressed data is called

    decompression.The resulting data is calleddecompressed data.

  • 8/14/2019 seminarDataCompression.ppt

    3/32

    REASONS TO COMPRESS

    Make optimal use of limited storage space

    Save time and help to optimize resources

    If compression and decompression are done in I/O

    processor, less time is required to move data to orfrom storage subsystem, freeing I/O bus for other

    work

    In sending data over communication line: less timeto transmit and less storage to host

  • 8/14/2019 seminarDataCompression.ppt

    4/32

    TYPES OF COMPRESSION TECHNIQUES

    Compression techniques can be

    categorized based on following

    consideration: Lossless or lossy

    Symmetrical or asymmetrical

    Software or hardware

  • 8/14/2019 seminarDataCompression.ppt

    5/32

    TYPES OF COMPRESSION TECHNIQUES

    1. Lossless or lossy

    If the decompressed data is the same as the original data, it isreferred to as lossless compression, otherwise the compression islossy.

    2. Symmetrical or asymmetrical

    In symmetrical compression, the time required to compress and todecompress are roughly the same.

    In asymmetrical compression, the time taken for compression is

    usually much longer than decompression.3. Software or hardware

    A compression technique may be implemented either in hardwareor software. As compared to software codecs (coder and decoder),

    hardware codecs offer better quality and performance.

  • 8/14/2019 seminarDataCompression.ppt

    6/32

    DATA COMPRESSION METHODS

    Data compression is about storing and sending a smaller

    number of bits.

    Therere two major categories for methods to compress

    data: lossless and lossy methods

  • 8/14/2019 seminarDataCompression.ppt

    7/32

    LOSSLESS COMPRESSION

    METHODS

    In lossless methods, original data and the data after

    compression and decompression are exactly the same.

    Redundant data is removed in compression and addedduring decompression.

    Lossless methods are used when we cant afford to loseany data: legal and medical documents, computer

    programs.

  • 8/14/2019 seminarDataCompression.ppt

    8/32

    RUN-LENGTH ENCODING

    Simplest method of compression.

    How: replace consecutive repeating occurrences of a symbol by 1 occurrence of

    the symbol itself, then followed by the number of occurrences.

    Example: Consider string Xtmprsqzntwlfb

    After RLE encoding, this string becomes:

    1X1t1m1p1r1s1q1z1n1t1w1l1f1bRLE schemes are simple and fast, but their compression efficiency depends on the type of data being

    encoded.

    Example:A black-and-white image that is mostly white, such as the page of a book, will encode very

    well, due to the large amount of contiguous data that is all the same color. An image with many colors

    that is very busy in appearance, however, such as a photograph, will not encode very well. This is

    because the complexity of the image is expressed as a large number of different colors. And because of

    this complexity there will be relatively few runs of the same color.

  • 8/14/2019 seminarDataCompression.ppt

    9/32

    HUFFMAN CODING

    Assign fewer bits to symbols that occur more frequently and morebits to symbols appear less often.

    Theres no unique Huffman code and every Huffman code has thesame average code length.

    Algorithm:1. Make a leaf node for each code symbol

    Add the generation probability of each symbol to the leaf node

    2. Take the two leaf nodes with the smallest probability and connect them into a newnode

    Add 1 or 0 to each of the two branches

    The probability of the new node is the sum of the probabilities of the twoconnecting nodes

    3. If there is only one node left, the code construction is completed. If not, go back to(2)

  • 8/14/2019 seminarDataCompression.ppt

    10/32

    HUFFMAN CODING

    Example

  • 8/14/2019 seminarDataCompression.ppt

    11/32

    HUFFMAN CODING

    Encoding

    Decoding

  • 8/14/2019 seminarDataCompression.ppt

    12/32

    LEMPEL ZIV ENCODING

    It is dictionary-basedencoding

    Dictionary coding techniques rely upon the observation that there arecorrelations between parts of data (recurring patterns). The basic idea is to

    replace those repetitions by (shorter) references to a "dictionary" containing the

    original.

    The dictionary based method may be static or dynamic depending upon the

    creation and use of dictionary.

    Static dictionary is prepared before the communication of the encoded message

    to the receivers end. All possible chars/words/phrases are inserted into the

    dictionary and indexed.

    The main drawback of static method is that performance depends upon the text

    to be encoded and is highly dependent on the organization of thechars/words/phrases in the dictionary.

    Secondly, if there is any word not in the dictionary, it fails.

    The solution to the problem is dynamic dictionary compression. In this method,

    the dictionary is prepared at the time of encoding of text.

    LZ77, LZ78 AND LZW techniques use dynamic dictionary compressiontechni ue.

  • 8/14/2019 seminarDataCompression.ppt

    13/32

    LZ77 (LEMPEL-ZIV) COMPRESSION

    TECHNIQUE

    The dictionary used is actually a portion of the input text, which has beenrecently encoded.

    The text that needs to be encoded is compared with the strings of symbols in the

    dictionary.

    The longest matched string in the dictionary is characterized by a pointer

    (sometimes called a token), which is represented by a triple of data items.

    Note that this triple functions as an index to the dictionary.

    In this way,a variable-length string of symbols is mapped to a fixed-length

    pointer.

    There is a sliding window in the LZ77 algorithms. The window consists of two

    parts: a search buffer and a look-ahead buffer.

    The search buffer contains: the portion of the text stream that has recently beenencoded ---the dictionary.

    The look-ahead buffer contains: the text to be encoded next.

    The window slides through the input text stream from beginning to end during

    the entire encoding process.

  • 8/14/2019 seminarDataCompression.ppt

    14/32

  • 8/14/2019 seminarDataCompression.ppt

    15/32

    LZ77 (LEMPEL-ZIV) COMPRESSION

    TECHNIQUE

    3. Once the longest match has been found, the encoder encodes it with a

    triple where o is the offset, l is the length of the match and c isthe code-word corresponding to the symbol in the look-ahead bufferthat follows the match.

    In the diagram, the longest match is the first a of the search buffer.The offset o in this case is 2, l is 4, and the symbol in the look-ahead

    buffer following the match is r.

    The reason for sending the third element in the triple is to take care ofthe situation where no match for the symbol in the look-ahead buffer can

    be found in the search buffer. In this case, the offset and the matchlength values are set to 0, and the third element of the triple is the codefor the symbol itself.

    For the decoding process, it is basically a table look-up procedure andcan be done by reversing the encoding procedure.

    The limitation of the approach is that if the distance between therepeated patterns in the input text stream is larger than the size of thesearch buffer, then the approach cannot utilize the structure to compressthe text. The longest match possible is roughly the size of the look-ahead

    buffer.

  • 8/14/2019 seminarDataCompression.ppt

    16/32

    LZ78 (LEMPEL-ZIV) COMPRESSION

    TECHNIQUE

    No use of the sliding window.

    Use encoded text as a dictionary which, potentially, does not have a fixed

    size.

    Each time a pointer (token) is issued, the encoded string is included in thedictionary.

    Once a preset limit to the dictionary size has been reached, either the

    dictionary is fixed for the future (if the coding efficiency is good), or it is

    reset to zero,i.e., it must be restarted.

    Instead of the triples used in the LZ77,only pairs are used in the LZ78.

    Specifically, only the position of the pointer to the matched string and the

    symbol following the matched string need to be encoded.

  • 8/14/2019 seminarDataCompression.ppt

    17/32

    Example: The string S =001212121021012101221011 is to be encoded.Figure shows the encoding process.

  • 8/14/2019 seminarDataCompression.ppt

    18/32

    DECODING PROCESS

  • 8/14/2019 seminarDataCompression.ppt

    19/32

    LZW COMPRESSION

    TECHNIQUE

    Improved version of the original LZ78 algorithm is perhaps the

    most famous modification and is sometimes even mistakenly

    referred to as theLempel Ziv algorithm.

    It basically applies the principle of not explicitly transmitting the

    next non-matching symbol to the LZ78 algorithm. The onlyremaining output of this improved algorithm are fixed-length

    references to the dictionary (indexes).

    The dictionary has to be initialized with all the symbols of the

    input alphabet and this initial dictionary needs to be made knownto the decoder.

    LOSSY COMPRESSION

  • 8/14/2019 seminarDataCompression.ppt

    20/32

    LOSSY COMPRESSION

    METHODS

    Used for compressing images and video files (our eyes

    cannot distinguish subtle changes, so lossy data is

    acceptable).

    These methods are cheaper, less time and space. Several methods:

    JPEG: compress pictures and graphics

    MPEG: compress video

    MP3: compress audio

  • 8/14/2019 seminarDataCompression.ppt

    21/32

    THE JPEG STANDARD

    Joint Photographic Experts Group Jpeg is the standard compression techniques for still images

    Lossy compression

    Employs a transform coding method using the DCT (Discrete

    Cosine Transform) Main Steps in JPEG Image Compression

    1. Transform RGB to YIQ or YUV and subsample color.

    2. DCT on image blocks.

    3. Quantization.4. Zig-zag ordering and run-length encoding.

    5. Entropy coding.

  • 8/14/2019 seminarDataCompression.ppt

    22/32

    BLOCK DIAGRAM FOR JPEG

    ENCODER

  • 8/14/2019 seminarDataCompression.ppt

    23/32

    DCT ON IMAGE BLOCKS

    Each image is divided into 8 8 blocks. The 2D

    DCT is applied to each block imagef(i, j), with

    output being the DCT coefficientsF(u, v) for

    each block.

    By applying Discrete Cosine Transform (DCT),

    the data in time (spatial) domain can be

    transformed into frequency domain.

  • 8/14/2019 seminarDataCompression.ppt

    24/32

    QUANTIZATION IN JPEG

    ^F(u, v) = round(F(u, v)/Q(u, v)) F(u, v) represents a DCT coefficient, Q(u, v) is a

    quantization matrix entry, and F(u, v)

    represents the quantized DCT coefficients whichJPEG will use in the succeeding entropy coding.

    The quantization step is the main source for loss

    in JPEG compression.

    MPEG

  • 8/14/2019 seminarDataCompression.ppt

    25/32

    MPEG

    ENCODING

    Used to compress video.

    Basic idea: Each video is a rapid sequence of a set of frames. Each

    frame is a spatial combination of pixels, or a picture.

    Compressing video =

    spatially compressing each frame

    +

    temporally compressing a set of frames.

    MPEG

  • 8/14/2019 seminarDataCompression.ppt

    26/32

    MPEG

    ENCODING

    Spatial Compression Each frame is spatially compressed by JPEG.

    Temporal Compression Redundant frames are removed.

    For example, in a static scene in which someone is talking, most framesare the same except for the segment around the speakers lips, which

    changes from one frame to the next.

  • 8/14/2019 seminarDataCompression.ppt

    27/32

    THREE TYPES OF FRAMES

    Intra frames (same as JPEG)

    Self contained frames

    Predictive frames

    encode from previous I or P reference frame

    Bi-directional frames

    encode from previous and future I or P frames

    I P IP PB B B B B B B B

    3 D SCALABLE MEDICAL IMAGE

  • 8/14/2019 seminarDataCompression.ppt

    28/32

    The method employs the 3-D integer wavelet

    transform and EBCOT(Embedded block

    coding with optimized truncation) to create

    bit stream.

    Provides random access.

    Provides resolution and quality scalability.

    High reconstruction quality.

    Optimized VOI coding.

    3-D SCALABLE MEDICAL IMAGE

    COMPRESSION WITH OPTIMIZED VOLUME

    OF INTEREST CODING

    3 D SCALABLE MEDICAL IMAGE

  • 8/14/2019 seminarDataCompression.ppt

    29/32

    3-D SCALABLE MEDICAL IMAGE

    COMPRESSION WITH OPTIMIZED VOLUME

    OF INTEREST CODING

    Block Diagram

  • 8/14/2019 seminarDataCompression.ppt

    30/32

    Based on combination of five techniques:

    1. Fast 2-D wavelet transform.

    2. Rearrangement of wavelet coefficients for efficient

    processing.

    3. Zerotree coding of correlated coefficients.4. Gradual successive approximation of the wavelet

    coefficients.

    5. Lossless entropy coding of the quantized coefficients

    using either adaptive arithmetic coding,which is slow andgives better compression rate,or adaptive run-length

    coding which is faster and has less performance we get

    from arithmetic coding.

    LOW BIT-RATE EFFICIENT COMPRESSIONFOR SEISMIC DATA

  • 8/14/2019 seminarDataCompression.ppt

    31/32

    Wavelet

    coefficients

    Setthreshold

    Classify pass

    Coding pass

    Arithmetic

    coder

    The quantization

    loop

    output

    Fig:Compression Algorithm

    LOW BIT-RATE EFFICIENT COMPRESSIONFOR SEISMIC DATA

  • 8/14/2019 seminarDataCompression.ppt

    32/32

    THANK YOU