04 32 bit loss less comp.doc

Embed Size (px)

Citation preview

  • 7/27/2019 04 32 bit loss less comp.doc

    1/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    CHAPTER 1

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    2/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    Introduction to VLSI

    1.1 Historical perspective:

    The electronics industry has achieved a phenomenal growth over the last two decades, mainly

    due to the rapid advances in integration technologies, large-scale systems design - in short, due to the

    advent of VLSI. The number of applications of integrated circuits in high-performance computing,

    telecommunications, and consumer electronics has been rising steadily, and at a very fast pace.

    Typically, the required computational power (or, in other words, the intelligence) of these applications

    is the driving force for the fast development of this field. The current leading-edge technologies (such

    as low bit-rate video and cellular communications) already provide the end-users a certain amount of

    processing power and portability. This trend is expected to continue, with very important implications

    on VLSI and systems design. One of the most important characteristics of information services is their

    increasing need for very high processing power and bandwidth (in order to handle real-time video, for

    example). The other important characteristic is that the information services tend to become more and

    more personalized (as opposed to collective services such as broadcasting), which means that the

    devices must be more intelligent to answer individual demands, and at the same time they must be

    portable to allow more flexibility/mobility. As more and more complex functions are required in

    various data processing and telecommunications devices, the need to integrate these functions in a

    small system/package is also increasing. The level of integration as measured by the number of logic

    gates in a monolithic chip has been steadily rising for almost three decades, mainly due to the rapid

    progress in processing technology and interconnects technology

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    3/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    1.2 Advantages of IC:

    The most important message here is that the logic complexity per chip has been (and still is) increasing

    exponentially. The monolithic integration of a large number of functions on a single chip usually

    provides

    Less area/volume and therefore, compactness

    Less power consumption

    Less testing requirements at system level

    Higher reliability, mainly due to improved on-chip interconnects

    Higher speed, due to significantly reduced interconnection length

    Significant cost savings

    1.3 Levels Of ICs:

    Digital circuits are constructed with integrated circuits. An integrated circuit (IC) is a small

    silicon semiconductor crystal, called a chip. Containing the electronic components for the digital gates.

    The various gates are interconnected inside the chip to form the circuit.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    4/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    Digital ICs are categorized according to their circuit complexity. As measured by the number of logic

    gates in a single package, they are:

    Small Scale Integration (SSI).

    Medium Scale Integration (MSI)

    Large Scale Integration (LSI)

    Very Large Scale Integration (VLSI).

    1.4 Classification of ICS by device count:

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    5/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    NomenclatureActive Device

    CountFunctions Technology

    SSI 1-100Gates, op-amps,Many linear

    ApplicationsBipolar

    MSI 100-1,000 Registers, filters etc

    Bipolar like

    TTL, ECL

    LSI 1,000-10,000 Microprocessors MOS: NMOS,PMOS

    VLSI 1,00,000-10,00,000Memories, computers,Signal processors CMOS

    Very Large Scale Integration:

    Micro electron chip which contains billions of physical components or millions of logical

    components integrated (embedded) on an IC.

    The feature size (physical dimension) of the component which is placing on a VLSI chip

    is measured in terms of microns.

    ACMOS IC fabricated with Very Deep Sub-micron (VDSM) technology (0.09micron

    1.5 VLSI Design Flow:

    1. Design Specification:

    The first step in high level design flow is the design specification process. This process involves

    specifying the behavior expected of the final design. The specifications specify the expected function

    and behavior of the design using textual description and graphic element.

    2. Behavioral Description:

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    6/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    Behavioral description is created to analyze the functionality and algorithm and then framed and

    its performance and compliance to standard is verified.

    VLSI Design Flow:

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    Design Specification

    Behavioral Description

    RTL description (VHDL)

    Functional Verifying &testing

    Logic Synthesis

    Gate level

    Logic Verification& testing

    Flour Planning Automatic

    Planning & Routing

    Physical Layout

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    7/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    3. RTL Description (VHDL):

    Once the algorithm is scrutinized, the code is written keeping in mind the functionality & its ability

    to be synchronized, the RTL description can be written in Gate level, Data flow or behavioral levels. A

    standard VHDL simulator can be used to read the RTL description and verify the correctness of the

    design.

    4. Functional Verifying & Testing:

    The VHDL simulator reads VHDL description compiles it in to an internal format, and then

    executes the compiled format using test vectors, after compilation if any syntax errors are there they has

    to be removed and recompiled. After analyzing the results of the simulation stimulus for the design has

    to be added. This may be file of input stimulus design (or) the file output stimulus design using

    waveform editor the respective output waveform are to be observed to test the functionality of the

    design.

    5. Logic Synthesis:

    Once the code is validated to implement the design process VHDL synthesis tool are used. The

    goal of the VHDL synthesis step is to create a design that implements the required functionality and

    constraints provided. Logic synthesis tool convert the given RTL code in to Optimized Gate level net

    list.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    8/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    6. Gate level:

    A gate level net list is the description of the design (circuit) interms of the Gates and connections

    between them. Gate level is an input to automatic place and route tool.

    7. Logic Verification &Testing:

    The VHDL synthesis tool report syntax & synthesis errors. It gives errors & warnings. If it founds

    mismatches between RTL Simulation results & output netlist simulation results. If it is error free the

    next step is to map the design.

    8. Flour Planning, Automatic Placing and Routing:

    Place and route tools are used to take the design netlist and implement the design to the target

    technology device.

    9. Physical Layout:

    In this each component or primitive from the netlist are placed on the target device according to

    the design or architecture. The signals from one module to the other are also connected to form a

    Physical layout.

    1.6 INTRODUCTION TO VHDL1.6.1 What is VHDL?

    VHDL stands for VHSIC Hardware Description Language, where VHSIC stands for Very High

    Speed Integrated Circuit. Like the name implies, VHDL is a language for describing the behavior of

    digital hardware. VHDL is just another way of describing what outputs of a digital circuit are desired

    when it is given certain inputs. The critical difference between VHDL and these other languages are

    that it can be readily interpreted by software, enabling the computer to accomplish your design work for

    you.

    As the size and complexity of digital systems increase, more computers aided design tools are

    introduced into the hardware design process. The early paper-and-pencil design methods have given

    way to sophisticated design entry, verification, and automatic hardware generation tools. The newest

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    9/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    addition to this design methodology is the hardware description languages (HDL). Although the

    concept of HLDs is not new, their widespread use in digital system design is no more than a decade old.

    Based on HDLs, new digital system CAD tools have been developed and are now being utilized by

    hardware designers.

    1.6.2 VHDL History:

    In 1980 the US government developed the Very High Speed Integrated Circuit (VHSIC) project

    to enhance the electronic design process, technology, and procurement, spawning development of many

    advanced integrated circuit (IC) process technologies. This was followed by the arrival of VHSIC

    Hardware Description Language (VHDL).

    1.6.3 Why We Use VHDL?

    There are many reasons why it makes good design sense to use VHDL:

    1. Portability:

    Technology changes so quickly in the digital industry that discrete digital devices require

    constant rework in order to remain current. VHDL is designed to be device-independent, meaning that

    if you describe your circuit in VHDL, as opposed to designing it with discrete devices, changing

    hardware becomes a (relatively) trivial process.

    2. Flexibility:

    Most working engineers can recall a situation where they felt frustrated with their customer,

    supervisor, or team members because the design specification that they were working with was

    constantly changing. Sometimes these changes can't be helped. Design work is usually focused on

    creating small, easily maintainable components and then integrating these components into a larger

    device. On larger projects different teams of engineers will each design separate parts of the project at

    the same time. This can mean that if one component in the project changes, all of the components must

    change, even those being worked on by other engineering teams. Suppose you were told to design a

    simple counter that set an output bit after it had counted to 100. However, the software engineer

    working on this project discovered that the entire design could be radically simplified if your counter

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    10/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    could count down from 300 instead of up to 100. If you had implemented your design in discrete

    circuits, you'd have to start over from scratch. But, if you'd designed using VHDL, all you'd have to do

    is change your code.

    1.6.4 VHDL Features:

    General features:

    VHDL can be used for design documentation, high level design, simulation, synthesis, and

    testing of hardware and as a driver for a physical design tool.

    1. Concurrency:

    In VHDL the transfer statements, descriptions of components, and instantiations of gates or

    logical units can all be executed such that in the end they appear to have been executed

    simultaneously.

    2. Support for design hierarchy:

    In VHDL the operation of a system can be specified based on its functionality, or it can be

    specified structurally in terms of its smaller subcomponents.

    3. Library support:

    User and system defined primitives and descriptions reside in the library system. VHDL

    provides a mechanism for accessing various libraries. Moreover different designers can access theselibraries.

    4. Sequential statement:

    VHDL provides mechanism for executing sequential statements. These statements provide an

    easy method for modeling hardware components based on their functionality. Sequential or

    procedural capability is only for convenience, and the overall structure of the VHDL language

    remains highly concurrent.

    5. Type declaration and usage:

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    11/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    VHDL is not limited to just bit or boolean types, but it also supports integer, floating-point,

    enumerated types and user-defined types. In addition, VHDL also allows array-type declarations

    and composite-type definitions.

    6. Use of subprograms:

    VHDL allows the use of functions and procedures which can be used in type conversions, logic

    unit definitions, operator redefinitions, new operation definitions, and other applications.

    7. Timing control:

    VHDL allows the designer to schedule values to signals and delay the actual assignment of

    values until a later time. It also allows the use of any number of explicitly defined clock signals. It

    provides features for edge detection, delay specification, setup and hold time specification, pulse

    width checking, and setting various time constraints.

    8. Structural specification:

    VHDL allows the designer to describe a generic 1-bit design and use it when describing

    multibit regular structures in one or more dimensions.

    1.7 Advantages of VHDL:

    VHDL offers the following advantages for digital design:

    1. Standard:

    VHDL is an EKE standard. Just like any standard (such as graphics X- window standard,

    bus communication interface standard, high-level programming languages, and so on), it reduces

    confusion and makes interfaces between tools, companies, and products easier. Any development

    to the standard would have better chances of lasting longer and have less chance of becoming

    obsolete due to incompatibility with others.

    2. Government support:

    VHDL is a result of the VHSIC program; hence, it is clear that the US government supports

    the VHDL standard for electronic procurement. The Department of Defense (DOD) requires

    contractors to supply VHDL for all Application Specific Integrated Circuit (ASIC) designs.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    12/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    3. Industry support:

    With the advent of more powerful and efficient VHDL tools has come the growing support

    of the electronic industry. Companies use VHDL tools not only with regard to defense contracts,

    but also for their commercial designs.

    4. Portability:

    The same VHDL code can be simulated and used in many design tools and at different

    stages of the design process. This reduces dependency on a set of design tools whose limited

    capability may not be competitive in later markets. The VHDL standard also transforms design

    data much easier than a design database of a proprietary design tool.

    5. Modeling capability:

    VHDL was developed to model all levels of designs, from electronic boxes to transistors.

    VHDL can accommodate behavioral constructs and mathematical routines that describe complex

    models, such as queuing networks and analog circuits. It allows use of multiple architectures and

    associates with the same design during various stages of the design process. VHDL can describe

    low-level transistors up to very large systems.

    6. Reusability:

    Certain common designs can be described, verified, and modified slightly in VHDL for

    future use. This eliminates reading and marking changes to schematic pages, which is time

    consuming and subject to error. For example, a parameterized multiplier VHDL code can be

    reused easily by changing the width parameter so that the same VHDL code can do either 16 by

    16 or 12 by 8 multiplication.

    7. Technology and foundry independence:

    The functionality and behavior of the design can be described with VHDL and verified,

    making it foundry and technology independent. This frees the designer to proceed without having

    to wait for the foundry and technology to be selected.

    8. Documentation:

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    13/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    Single place by embedding it in the code. The combining of comments and the code that

    actually dictates what the design should do reduces the ambiguity between specification and

    implementation.

    9. New design methodology:

    Using VHDL and synthesis creates a new methodology that increases the design

    productivity, shortens the design cycle, and lowers costs. It amounts to a revolution comparable to

    that introduced by the automatic semi-custom layout synthesis tools of the last few years.

    Synthesis, in the domain of digital design, is a process of translation and optimization. For

    example, layout synthesis is a process of taking a design netlist and translating it into a form of

    data that facilitates placement and routing, resulting in optimizing timing and/or chip size. Logic

    synthesis, on the other hand, is the process of taking a form of input (VHDL), translating it into a

    form (Boolean equations and synthesis tool specific), and then optimizing in terms of propagation

    delay and/or area. After the VHDL code is translated into an internal form, the optimization

    process can be performed based on constraints such as speed, area, power.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    14/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    CHAPTER 2www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    15/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    INTRODUCTION TO LOSSLESS COMPRESSION

    2.1. Objective

    With the increase in silicon densities, it is becoming feasible for multiple compression

    systems to be implemented in parallel onto a single chip. A 32-BITsystem with distributed memory

    architecture is based on having multiple data compression and decompression engines working

    independently on different data at the same time. This data is stored in memory distributed to each

    processor. The objective of the project is to design a lossless parallel data compression system which

    operates in high-speed to achieve high compression rate. By using Parallel architecture of compressors,

    the data compression rates are significantly improved. Also inherent scalability of parallel architecture

    is possible. The main parts of the system are the two Xmatchpro based data compressors in

    parallel and the control blocks providing control signals for the Data compressors, allowing

    appropriate control of the routing of data into and from the system. Each Data compressor can process

    four bytes of data into and from a block of data every clock cycle. The data entering the system

    needs to be clocked in at a rate of 4n bytes every clock cycle, where n is the number of

    compressors in the system. This is to ensure that adequate data is present for all compressors to process

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    16/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    rather than being in an idle state.

    2.2.Goal of the Thesis

    To achieve higher compression rates using 32-bit compression/decompression architecture with

    least increase in latency.

    2.3.LITERATURE SURVEY

    2.3.1. Compression Techniques

    At present there is an insatiable demand for ever-greater bandwidth in communication networks

    and forever-greater storage capacity in computer system. This led to the need for an efficient

    compression technique. The compression is the process that is required either to reduce the volume of

    information to be transmitted text, fax and images or reduce the bandwidth that is required for its

    transmission speech, audio and video. The compression technique is first applied to the source

    information prior to its transmission. Compression algorithms can be classified in to two types, namely

    O Lossless Compression

    O Lossy Compression

    2.3.1.1. Lossless Compression

    In this type of lossless compression algorithm, the aim is to reduce the amount of source

    information to be transmitted in such a way that, when the compressed information is

    decompressed, there is no loss of information. Lossless compression is said therefore, to be

    reversible. i.e., Data is not altered or lost in the process of compression or decompression.

    Decompression generates an exact replica of the original object. The Various lossless Compression

    Techniques are,

    Packbits encoding

    CCITT Group 3 1D

    CCITT Group 3 2D

    Lempel-Ziv and Welch algorithm LZW

    Huffman

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    17/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    Arithmetic

    Example applications of lossless compression are transferring data over a network as a

    text file since, in such applications, it is normally imperative that no part of the source

    information is lost during either the compression or decompression operations and file storage

    systems (tapes, hard disk drives, solid state storage, file servers) and communication networks

    (LAN, WAN, wireless).

    2.3.1.2. Lossy Compression

    The aim of the Lossy compression algorithms is normally not to reproduce an exact copy

    of the source information after decompression but rather a version of it that is perceived by the recipient

    as a true copy.

    The Lossy compression algorithms are:

    JPEG (Joint Photographic Expert Group)

    MPEG (Moving Picture Experts Group)

    CCITT H.261 (Px64)

    Example applications of lossy compression are the transfer of digitized images and audio

    and video streams. In such cases, the sensitivity of the human eye or ear is such that any fine details

    that may be missing from the original source signal after decompression are not detectable .

    2.3.1.3. Text Compression

    There are three different types of text unformatted, formatted and hypertext and all are

    represented as strings of characters selected from a defined set. The compression algorithm associated

    with text must be lossless since the loss of just a single character could modify the meaning of a

    complete string. The text compression is restricted to the use of entropy encoding and in practice,

    statistical encoding methods. There are two types of statistical encoding methods which are used with

    text: one which uses single character as the basis of deriving an optimum set of code words and the

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    18/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    other which uses variable length strings of characters. Two examples of the former are the Huffman and

    Arithmetic coding algorithms and an example of the latter is Lempel-Ziv (LZ) algorithm.

    The majority of work on hardware approaches to lossless parallel data compression has used an

    adapted form of the dictionary-based Lempel-Ziv algorithm, in which a large number of simple

    processing elements are arranged in a systolic array [1], [2], [3], [4].

    2.3.2. Previous work on Lossless Compression Methods

    A second Lempel-Ziv method used a content addressable memory (CAM) capable of

    performing a complete dictionary search in one clock cycle [5], [6], [7]. The search for the most

    common string in the dictionary (normally, the most computationally expensive operation in the

    Lempel-Ziv algorithm) can be performed by the CAM in a single clockcycle, while the systolic array

    method uses a much slower deep pipelining technique to implement its dictionary search.

    However, compared to the CAM solution, the systolic array method has advantages in terms of

    reduced hardware costs and lower power consumption, which may be more important criteria in

    some situations than having faster dictionary searching. In [8], the authors show that hardware

    main memory data compression is both feasible and worthwhile. The authors also describe the

    design and implementation of a novel compression method, the XMatchPro algorithm. The authors

    exhibit the substantial impact such memory compression has on overall system performance. The

    adaptation of compression code for parallel implementation is investigated by Jiang and Jones [9].

    They recommended the use of a processing array arranged in a tree-like structure. Although

    compression can be implemented in this manner, the implementation of the decompressors

    search and decode stages in parallel hardware would greatly increase the complexity of the

    design and it is likely that these aspects would need to be implemented sequentially. An FPGA

    implementation of a parallel binary arithmetic coding architecture that is able to process 8 bits

    per clock cycle compared to the standard 1 bit per cycle is described by Stefo et al [10].

    Although little research has been performed on architectures involving several independent

    compression units working in a concurrent cooperative manner, IBM has introduced the MXT

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    19/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    chip [11], which has four independent compression engines operating on a shared memory area.

    The four Lempel-Ziv compression engines are used to provide data throughput sufficient for

    memory compression in computer servers. Adaptation of software compression algorithms to make use

    of multiple CPU systems was demonstrated by research of Penhorn [12] and Simpson and Sabharwal

    [13]. Penhorn used two CPUs to compress data using a technique based on the Lempel-Ziv

    algorithm and showed that useful compression rate improvements can be achieved, but only at

    the cost of increasing the learning time for the dictionary. Simpson and Sabharwa described the

    software implementation of compression system for a multiprocessor system based on the

    parallel architecture developed by Gonzalez and Smith and Storer [14].

    2.3.2.1. Statistical Methods

    Statistical Modeling of lossless data compression system is based on assigning Values to

    events depending on their probability. The higher the value, the higher the probability. The accuracy

    with which this frequency distribution reflects reality determines the efficiency of the model. In

    Markov modeling, predictions are done based on the symbols that precede the current symbol.

    Statistical Methods in hardware are restricted to simple higher order modeling using binary

    alphabets that limits speed, or simple multisymbol alphabets using zeroth-order models that

    limits compression. Binary alphabets limit speed because only a few bits (typically a single bit)

    are processed in each cycle while zeroth order models limit compression because they can only

    provide an inexact representation of the statistical properties of the data source.

    2.3.2.2. Dictionary Methods

    Dictionary Methods try to replace a symbol or group of symbols by a dictionary location code.

    Some dictionary-based techniques use simple uniform binary codes to rocess the information

    supplied. Both software and hardware based dictionary models achieve good throughput and

    competitive compress

    The UNIX utility compress uses Lempel-Ziv-2 (LZ2) algorithm and the data

    compression Lempel-Ziv (DCLZ) family of compressors initially invented by Hewlett-

    Packard[16] and currently being developed by AHA[17],[18] also use LZ2 derivatives. Bunton

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    20/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    and Borriello present another LZ2 implementation in [19] that improves on the Data

    Compression Lempel-Ziv method. It uses a tag attached to each dictionary location to identify which

    node should be eliminated once the dictionary becomes full.

    2.4. XMatchPro Based System

    The Lossless data compression system is a derivative of the XMatchPro Algorithm which

    originates from previous research of the authors [15] and advances in FPGA technology. The

    flexibility provided by using this technology is of great interest since the chip can be adapted to the

    requirements of a particular application easily. The drawbacks of some of the previous methods are

    overcome by using the XmatchPro algorithm in design. The objective is then to obtain better

    compression ratios and still maintain a high throughput so that the compression/decompression

    processes do not slow the original system down.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    21/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    CHAPTER 3www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    22/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    23/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    FUNCTIONS OF LOSSLESS COMPRESSION

    3.1. BASICS OF COMMUNICATION

    A sender can compress data before transmitting it and a receiver can decompress the data after

    receiving it, thus effectively increasing the data rate of the communication channel. Lossless data

    compression is the process of encoding a body of data into a smaller body of data that can at a

    later time be uniquely decoded back to the original data.

    Lossless compression removes redundant information from the data while they are being

    transmitted or before they are stored in memory, and lossless decompression reintroduces the

    redundant information to recover fully the original data. In the same way, the data is

    compressed before it is stored and decompressed when it is retrieved, thus increasing theeffective capacity of the storage device.

    3.2. Proposed Method

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    24/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    In [1], the author discusses about the Parallel Algorithm that can be implemented form

    High Speed Data Compression. The authors gives the basic idea about how the Data Compression is

    carried out using the Lempel-Ziv Algorithm and how it could be altered for Parallelism of the

    algorithm. The author describes the Lempel-Ziv algorithm as a very efficient universal data

    compression technique, based upon an incremental parsing technique, which maintains codebooks

    of parsed phrases at the transmitter and at the receiver. An important feature of the algorithm is

    that it is not necessary to determine a model of the source, which generates the data. According to the

    author, in an attempt to increase the speed of the algorithm on general-purpose processors, the

    algorithm has been parallelised to run on two processors.

    3.3. Background

    The author explains a novel architecture for a high-performance lossless data compressor

    that is organized around a selectively shiftable Content Addressable Memory, which permits full

    matching, the processor offers very high performance with good compression of computer-based

    data. The author also gives details about the operation, architecture and performance of the

    Data Compression Techniques. He also introduces the XMatchPro lossless data compressor. In [3],

    the authors discuss about the parallelism in Data Compression Techniques and the authors

    explain the Parallel Architecture for High Speed Data Compression. In this paper, the author

    expresses Data Communication as an essential component of high-speed data communication and

    storage. In [4], the authors discuss about the various methods of Data Compression and their

    Techniques and drawbacks and propose a new methodology for a high speed Parallel Lossless

    Data Compression. The authors describes the research and hardware implementation of a high

    performance parallel multi compressor chip which could able to meet the intensive data processing

    demands of highly concurrent system. The authors also investigate the performance of

    alternative input and output routing strategies for realistic data sets demonstrate that the design

    of parallel compression devices involves important trade offs that affect compression performance,

    latency and throughput. Compression ratio achieved by the proposed universal code uniformly

    approaches the lower bounds on the compression ratios attainable by block-to-variable codes and

    variable-to-block codes designed to match a completely specified source.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    25/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    3.4. Usage of XMatchPro Algorithm

    The Lossless Parallel Data Compression system designed uses the XMatchPro Algorithm.

    The XMatchPro algorithm uses a fixed-width dictionary of previously seen data and attempts to

    match the current data element with a match in the dictionary. It works by taking a 4-byte word and

    trying to match or partially match this word with past data. This past data is stored in a dictionary,

    which is constructed from a content addressable memory. As each entry is 4 bytes wide, several

    types of matches are possible. If all the bytes do not match with any data present in the dictionary

    they are transmitted with an additional miss bit. If all the bytes are matched then the match location

    and match type is coded and transmitted, this match is then moved to the front of the dictionary.

    The dictionary is maintained using a move to front strategy whereby a new tuple is placed at the front

    of the dictionary while the rest move down one position. When the dictionary becomes full the

    tuple placed in the last position is discarded leaving space for a new one.

    The coding function for a match is required to code several fields as follows: A zero followed

    by:

    1). Match location: It uses the binary code associated to the matching location.

    2). Match type: Indicates which bytes of the incoming tuple have matched.

    3). Characters that did not match transmitted in literal form.

    A description of the XMatchPro algorithm in pseudo-code is given in the figure below.

    clear the dictionary;

    set the next free location (NFL) to 0;

    Do

    {

    read in a tuple T from the data stream;

    search the dictionary for tuple T;

    IF (full or partial hit)

    {

    determine the best match location ML and match type MT;

    output 0;

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    26/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    output any required literal characters of T;

    }

    ELSE

    { output 1;

    output tuple T;

    }

    IF (full hit)

    {

    move dictionary entries 0 to ML -1 down by one location ;

    }

    ELSE

    {move all dictionary entries down by one location;

    increment NFL (if dictionary is not full);

    }

    copy tuple T to dictionary location 0;

    }

    WHILE (more data is to be compressed);

    Fig.3.2. Pseudo Code for XMatchPro Algorithm

    With the increase in silicon densities, it is becoming feasible for multiple XMatchPros to

    be implemented in parallel onto a single chip. A parallel system with distributed memory

    architecture is based on having multiple data compression and decompression engines working

    independently on different data at the same time. This data is stored in memory distributed to each

    processor. There are several approaches in which data can be routed to and from the compressors that

    will affect the speed, compression and complexity of the system. Lossless compression removes

    redundant information from the data while they are transmitted or before they are stored in memory.

    Lossless decompression reintroduces the redundant information to recover fully the original data.

    There are two important contributions made by the current parallel compression &

    decompression work, namely, improved compression rates and the inherent scalability. Significant

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    27/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    improvements in data compression rates have been achieved by sharing the computational

    requirement between compressors without significantly compromising the contribution made by

    individual compressors. The scalability feature permits future bandwidth or storage demands to be

    met by adding additional compression engines.

    3.4.1. The XMatchPro based Compression system

    Previous research on the lossless XMatchPro data compressor has been on optimising

    and implementing the XMatchPro algorithm for speed, complexity and compression in hardware.

    The XMatchPro algorithm uses a fixed width dictionary of previously seen data and attempts to

    match the current data element with a match in the dictionary. It works by taking a 4-byte word and

    trying to match this word with past data. This past data is stored in a dictionary, which is constructed

    from a content addressable memory.

    Initially all the entries in the dictionary are empty & 4-bytes are added to the front of the

    dictionary, while the rest move one position down if a full match has not occurred. The larger the

    dictionary, the greater the number of address bits needed to identify each memory location, reducing

    compression performance. Since the number of bits needed to code each location address is a

    function of the dictionary size greater compression is obtained in comparison to the case where

    a fixed size dictionary uses fixed address codes for a partially full dictionary.

    In the parallel XMatchPro system, the data stream to be compressed enters the

    compression system, which is then partitioned and routed to the compressors. For parallel

    compression systems, it is important to ensure all compressors are supplied with sufficient data by

    managing the supply so that neither stall conditions nor data overflow occurs.

    3.4.2. The Main Component- Content Addressable Memory

    Dictionary based schemes copy repetitive or redundant data into a lookup table (such as

    CAM) and output the dictionary address as a code to replace the data. The compression architecture is

    based around a block of CAM to realize the dictionary. This is necessary since the search

    operation must be done in parallel in all the entries in the dictionary to allow high and data-independent

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    28/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    throughput.

    Fig.3.3. Conceptual view of CAM

    The number of bits in a CAM word is usually large, with existing implementations

    ranging from 36 to 144 bits. A typical CAM employs a table size ranging between a few

    hundred entries to 32K entries, corresponding to an address space ranging from 7 bits to 15 bits. The

    length of the CAM varies with three possible values of 16, 32 or 64 tuples trading complexity for

    compression.

    The no. of tuples present in the dictionary has an important effect on compression. In principle,

    the larger the dictionary the higher the probability of having a match and improving

    compression. On the other hand, a bigger dictionary uses more bits to code its locations degrading

    compression when processing small data blocks that only use a fraction of the dictionary length

    available. The width of the CAM is fixed with 4bytes/word. Content Addressable Memory (CAM)

    compares input search data against a table of stored data, and returns the address of the matching

    data. CAMs have a single clock cycle throughput making them faster than other hardware and

    software-based search systems.

    The input to the system is thesearch wordthat is broadcast onto the searchlines to the table

    of stored data. Each stored word has a matchline that indicates whether the search word and

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    29/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    stored word are identical (the match case) or are different (a mismatch case, or miss). The matchlines

    are fed to an encoder that generates a binary match location corresponding to the matchline that is

    in the match state. An encoder is used in systems where only a single match is expected. The overall

    function of a CAM is to take a search word and return the matching memory location.

    3.4.2.1. Managing Dictionary entries

    Since the initialization of a compression CAM sets all words to zero, a possible input

    word formed by zeros will generate multiple full matches in different locations. TheXmatchpro

    compression system simply selects the full match closer to the top. This operational mode initializes

    the dictionary to a state where all the words with location address bigger than zero are declared

    invalid without the need for extra logic. The reason is that location x can never generate a match until

    the data contents of location x-1 are different from zero because locations closer to the top have

    higher priority generating matches. Also to increase dictionary efficiency, only one dictionary

    position contains repeated information and in the best case, all the dictionary positions contain

    different data.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    30/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    CHAPTER 4

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    31/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    XMATCHPRO LOSSLESS COMPRESSION SYSTEM

    4.1. DESIGN METHODOLOGYThe XMatchPro algorithm is efficient at compressing the small blocks of data necessary

    with cache and page based memory hierarchies found in computer systems. It is suitable for high

    performance hardware implementation. The XMatchPro hardware achieves a throughput 2-3

    times greater than other high-performance hardware implementation. The core component of the

    system is the XMatchPro based Compression / Decompression system. The XMatchPro is a high-

    speed lossless dictionary based data compressor. The XMatchPro algorithm works by taking an

    incoming four-byte tuple of data and attempting to match fully or partially match the tuple with the

    past data.

    4.2. FUNCTIONAL DESCRIPTION

    The XMatchPro algorithm maintains a dictionary of data previously seen and attempts to

    match the current data element with an entry in the dictionary, replacing it with a shorter code

    referencing the match location. Data elements that do not produce a match are transmitted in full

    (literally) prefixed by a single bit. Each data element is exactly 4 bytes in width and is referred to

    as tuple. This feature gives a guaranteed input data rate during compression and thus also guaranteed

    data rates during decompression, irrespective of the data mix. Also the 4-byte tuple size gives an

    inherently higher throughput than other algorithms, which tend to operate on a byte stream.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    32/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    The dictionary is maintained using move to front strategy, where by the current tuple is

    placed at the front of the dictionary and the other tuples move down by one location as

    necessary to make space. The move to front strategy aims to exploit locality in the input data. If the

    dictionary becomes full, the tuple occupying the last location is simply discarded.

    A full match occurs when all characters in the incoming tuple fully match a Dictionary

    entry. A partial match occurs when at least any tow of the characters in the incoming tuple

    match exactly with a dictionary entry, with the characters that do not match being transmitted

    literally.

    The use of partial matching improves the compression ratio when compared with

    allowing only 4 byte matches, but still maintains high throughput. If neither a full nor

    partial match occurs, then a miss is registered and a single miss bit of 1 is transmitted followed by the

    tuple itself in literal form. The only exception to this is the first tuple in any compression operation,

    which will always generate a miss as the dictionary begins in an empty state. In this case no miss bit is

    required to prefix the tuple.

    At the beginning of each compression operation, the dictionary size is reset to zero. The

    dictionary then grows by one location for each incoming tuple being placed at the front of the

    dictionary and all other entries in the dictionary moving down by one location. A full match

    does not grow the dictionary, but the move-to-front rule is still applied. This growth of the

    dictionary means that code words are short during the early stages of compressing a block. Because the

    XMatchPro algorithm allows partial matches, a decision must be made about which of the

    locations provides the best overall match, with the selection criteria being the shortest possible

    number of output bits.

    4.3 Parallel Xmatchpro Compression

    The Input router of the system divide the data to be processed and Output router concatenate

    the data to give as output of the parallel compression system respectively. The split data by Input

    Router are sent to each of the compression system or XMatchPro compression engines where the

    data is compressed and is sent to the Output Router to merge the compressed data and sent out as

    the compressed data.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    33/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    For multiple compression systems, it is important to ensure all compressors are supplied

    with sufficient data by managing the supply so that neither stall conditions nor data overflow occurs.

    There are several approaches in which data can be routed in and out of the compressors. The

    basic method for input routing used in this project is done by getting twice the size of the input to the

    XMatchPro compressor, the lower 32 bit is given to the Compressor 0 and the higher 32 bits are

    given to the other Compressor 1. The method is used for output routing and additional output

    pins are assigned for both the Compressor 0 and Compressor 1.

    4.4. DATA FLOW FOR PARALLEL XMATCHPRO COMPRESSOR

    The below figure shows graphically the general concept of this approach. Thedata

    stream to be compressed enters the compression system, which is then partitioned and routed to

    the compressors. Appropriate methods for routing the data are discussed below, but to achieve

    good compression performance, it is important that the partitioning mechanism supplies the

    compressors with sufficient data to keep them active for as great a proportion of the time that the stream

    is entering the system as is possible.

    As the compressors operate independently, each producing its own compressed data

    stream, a mechanism is required to merge these streams in such a way that subsequent

    decompression can be performed correctly. Also, subsequent decompression needs to be capable of

    operating in an appropriate parallel fashion, otherwise, a disparity in compression and decompression

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    34/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    speeds will occur, reducing overall throughput.

    The data Flow for parallel compression system is given in Figure 3 below.

    4.5. INPUT ROUTINGAs per the Algorithm, XMatchPro can process four bytes of data per clock cycle, then to ensure

    that all are busy, data must enter the system at a rate of 4n bytes per clock cycle, where n is the

    number of compressors in the system. It can be achieved by 2 methods.

    1. Interleaved input method

    2. Blocked Input method

    4.5.1INTERLEAVED INPUT METHOD

    In the Interleaved input approach, the router divides the input data into 4-byte widedata streams that are fed into the compressors. This is illustrated in the below figure for two

    compressors, but the technique can be extended to supply data to any required number of

    compressors.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    35/111

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    7 5 3 1

    8 6 4 2

    IR

    7 5 3 1

    8 6 4 2

    XMatchPro

    XMatchPro

    Fig.4.3. Interleaved Input Routing

    The interleaved method avoids the need for input buffering as data are continuously fed

    to the compressors and acted upon immediately on arrival. This minimization of latency is an

    important advantage of the approach.

    4.5.2. BLOCKED INPUT METHOD

    In the blocked input approach, a fixed length block of data is sent from the incoming data

    stream to each of the compressors in turn, as shown in the following figure. In this scheme, the

    data has to arrive at the dedicated memory of the compressor at a rate slower than it can be processed,

    thereby allowing the memory to be filled with data.

    www.1000projects.com

    www.fullinterview.comwww.chetanasprojects.com

    http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/
  • 7/27/2019 04 32 bit loss less comp.doc

    36/111

    To minimize the latency introduced in blocked mode, compressors need to start

    processing data as it arrives. It is also important to ensure that sufficient data are available for the

    compressor to work on while data are being routed to the other compressors, as no new data can be

    added to the dedicated memory until this process has been completed.

    4.5.3. PROPOSED INPUT ROUTING

    In this project, Blocked Input Routing method is used for inputting data to compression

    system as it is more advantageous than interleaved input approach. The advantage of going for

    this method is that the complexity in designing and coding is reduced and helps in achieving

    superior compression ratio. But at the same time number of input pins increase as it assigns another

    set of pins for the second XMatchPro compressor. Actually, the input data size for one

    XMatchPro compressor is 32 bit, so another 32 bit is required for the second XMatchPro

    compressor. In order to achieve this, while designing the parallel compressor an input data is assigned

    as 64 bits and the lower order 32 bits is sent to one XMatchPro compressor and the higher order 32 bit

    is sent to the second XMatchPro compressor. Thus, by doing so, both the XMatchPro compressor is

    supplied with the data simultaneously and this increases the speed of compression.

    4.6. OUTPUT ROUTING

    The lengths of the compressed data output blocks from an array of parallel compressors

    will generally not be constant due to the variability of redundancy in the data. As in

    decompression, the system would not know the data boundaries of each block, these data cannot be sent

    directly to the output bus and additional manipulation is needed in order to guarantee that the original

    data can be recovered.

    It is achieved by 3 methods, namely,

    1. Single Compressed Block

    2. Multiple Compressed Block

    3. Interleaved Compressed Block

    4.6.1. SINGLE COMPRESSED BLOCK

    In this method, it is assumed that the data enters the system using the blocked mode

    technique and that the compressed data are collected in the compressors output buffers. The

    buffer outputs are routed in strict order of the compressor number and a boundary tag that

  • 7/27/2019 04 32 bit loss less comp.doc

    37/111

    contains information on the block length is added so as to precede the data. As the tag will enter the

    decompression system, first, it will know the length of the compressed data input belonging to any

    given decompression engine. The introduction of tags is detrimental to the compression ratio, but this

    effect diminishes as the block length is increased, as the overhead of one tag per block of compressed

    data is largely constant.

    One of the drawbacks of this approach is that the data output may contain idle time.

    This arises since a whole block of data needs to be compressed before the appropriate tag

    values can be determined and, so, a compressor may still be compressing its data when router becomes

    available.

    4.6.2. MULTIPLE COMPRESSED BLOCK

    The Figure 2.7 illustrates the format of an output data stream containing multiple blocks. This

    is similar to the single block scheme, but, instead of waiting for each compressor to finish

    processing its block of data, all compressors need to finish compressing blocks before the data

    are sent. In this technique, the tag provides information on the length or the compressed data to

    enable correct decompression. As all compressors need to have completed their operations before an

    output can be produced, this approach has a greater latency compared with the single compressed block

    case, but, as fewer tags are needed, the effect on the compression ratio is reduced. The combined tag isshorter than the sum of the individual tags as the output bus granularity is of fixed width. Output tags

    are sized in accordance with the output but width in order to simplify the routing architecture and

    decoding operations, even though fewer bits are required to determine block size boundaries.

  • 7/27/2019 04 32 bit loss less comp.doc

    38/111

    4.6.3. INTERLEAVED COMPRESSED BLOCK

    The figure illustrates the interleaved approach for routing multiple compressed blocks of

    data to the output stream. Instead of waiting for a whole block to be compressed, a predefined fixed

    length of compressed data is always sent to the output. If a compressor has not completed its

    operations, the system must wait until the data block has been produced.

    There are two benefits of this approach compared with the previously discussed methods.

    First, there is a reduction in latency since data can be sent to the output before the whole block is

    compressed. Second, since no boundary tags are required, the compression ratio is improved.

    At the end of compression sequence, the interleaved approach needs to add dummy tags

    to the output stream in receipt of the stop signal, output routing continues until all compressors

    have completed operations on their input blocks. It is likely that the final interleaved block from each

    compressor will contain insufficient data to fill the required fixed output length and, so, the dummy

    data tags are added as required in order to maintain the interleave length.

  • 7/27/2019 04 32 bit loss less comp.doc

    39/111

    4.6.4. PROPOSED OUTPUT ROUTING

    In this project, the Interleaved technique was selected as the Output Routing method as it

    imparts no overhead to maintain compressed data boundaries, and so has no detrimental effect on the

    compression ratio. The advantage of going for this method is that the complexity in designing and

    coding is reduced. But at the same time number of input pins increase as it assigns another set of pinsfor the second XMatchPro compressor. Actually, the output data size for one 32 bit compressor is either

    7 bit (match is found) or 33 bit (match not found), so another set of 33 bit in case of no match and 7 bit

    in case of match is required for the second compressor. In order to achieve this, while designing the

    parallel compressor an output data is assigned with two sets of 7 bits as well as two 33 bit output pins.

    Thus, by doing so, both the compressors are supplied with data simultaneously and the output

    data is transmitted via the external bus

    4.7. IMPLEMENTATION OF XMATCHPRO BASED COMPRESSOR

    The block diagram gives the details about the components of a single 32 bit Compressor.

    There are three components namely, COMPARATOR, ARRAY, CAMCOMPARATOR. The

    comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0 for

    unequal. The CAM COMPARATOR searches the CAM dictionary entries for a full match of the input

    data given.

    The reason for choosing a full match is to get a prototype of the high throughout Xmatchpro

  • 7/27/2019 04 32 bit loss less comp.doc

    40/111

    compressor with reduced complexity and high performance.

    If a full match occurs, the match-hit signal is generated and the corresponding match

    location is given as output by the CAM Comparator.. If no full match occurs, the corresponding data

    that is given as input at the given time is got as output.

    Array is of length of 64X32 bit locations. This is used to store the unmatched incoming data and

    when a new data comes, the incoming data is compared with all the data stored in this array. If a match

    occurs, the corresponding match location is sent as output else the incoming data is stored in

    next free location of the array & is sent as output. The last component is the cam comparator and

    is used to send the match location of the CAM dictionary as output if a match has occurred. This is

    done by getting match information as input from the comparator.

    Suppose the output of the comparator goes high for any input, the match is found and the

    corresponding address is retrieved and sent as output along with one bit to indicate that match is

    found. At the same time, suppose no match occurs, or no matched data is found, the incoming data is

    stored in the array and it is sent as the output. These are the functions of the three components of the

    Compressor. The hardware descriptions of these modules are done using VHDL Language. VHDL is

    an acronym for Very high-speed integrated circuits Hardware Description Language. It can be

    used to model a digital system at many levels of the abstraction, ranging from the algorithmic

    level to gate level.

    The VHDL language can be regarded as an integrated amalgamation of the following

    languages:

    o Sequential language

    o Concurrent language

    o Net-list language

    o Timing specifications

    o Waveform generation language.

    So the language has constructs that enable you to express the concurrent or sequential

    behavior of a digital system with or without timing. It also allows modeling the system as an

    inter-connection of components. Test waveforms can also be generated using the same constructs. The

    language not only defines the syntax but also defines very clear simulation semantics for each language

    construct. Therefore, models written in this language can be verified using a VHDL simulator.

    VHDL is event driven, to allow for efficient simulation of hardware. Computations are

    only performed when some data has changed (event occurred).

  • 7/27/2019 04 32 bit loss less comp.doc

    41/111

    CHAPTER 5

  • 7/27/2019 04 32 bit loss less comp.doc

    42/111

    DESIGN OF PARALLEL LOSSLESS

    COMPRESSION/DECOMPRESSION SYSTEM

    5.1. DESIGN OF COMPRESSOR / DECOMPRESSOR

    The block diagram [Fig.12] gives the details about the components of a single 32-bit

    compressor / decompressor. The Same design approach is used for designing a 64-bit

    Compression/Decompression system which is essentially used for comparison of increased

    compression rates given by the 64-bit Lossless Parallel High-Speed Data Compression System.

    There are three components namely COMPRESSOR, DECOMPRESSOR and CONTROL.The

    compressor has the following components - COMPARATOR, ARRAY, and CAMCOMPARATOR.

    The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0

    for unequal.

    Array is of length of 64X32bit locations. This is used to store the unmatched in coming

    data and when the next new data comes, that data is compared with all the data stored in this array. If

    the incoming data matches with any of the data stored in array, the Comparator generates a match

    signal and sends it to Cam Comparator. The last component is the Cam comparator and is

    used to send the incoming data and all the stored data in array one by one to the comparator.

    Suppose output of comparator goes high for any input, then the match is found and the

    corresponding address (match location) is retrieved and sent as output along with one bit to indicate

    the match is found. At the same time, suppose no match is found, then the incoming data stored in the

    array is sent as output. These are the functions of the three components of the XMatchPro

    based compressor.

    The decompressor has the following components Array and Processing Unit. Array has

    the same function as that of the array unit which is used in the Compressor. It is also of the same length.

    Processing unit checks the incoming match hit data and if it is 0, it indicates that the data is not present

    in the Array, so it stores the data in the Array and if the match hit data is 1, it indicates the data is

    present in the Array, then it instructs to find the data from the Array with the help of the address input

    and sends as output to the data out.

  • 7/27/2019 04 32 bit loss less comp.doc

    43/111

    Fig.5.1. Block Diagram of 32 bit Compressor/Decompressor

  • 7/27/2019 04 32 bit loss less comp.doc

    44/111

    The Control has the input bit called C / D i.e., Compression / Decompression indicates

    whether compression or decompression has to be done. If it has the value 0 then compressor is stared

    when the value is 1 decompression is done.

    5.2. DESIGN OF 64 BIT SINGLE COMPRESSION/DECOMPRESSION SYSTEM

    The 64 bit single Compression /Decompression system is done to compare the compression

    rate & area with the parallel compression / decompression system which gives higher throughput.

    The design & functionality of the 64- bit Single compression system is same as that of

    the 32-bit compressor / decompressor discussed above except the input is changed from 32-bit to 64-

    bit & hence to accommodate more data in CAM dictionary, the array size is increased from 64X32 to

    128 X 64. The match location is now given by 7 bits for the fixed 128 locations of the memory.

    In the Compression system, the comparator compares the incoming 64 bit data with data

    entries that are previously stored in the memory. If any of the dictionary entries matches with the

    incoming data, then a match signal is generated to provide the corresponding match location as

    output along with match signal. If no match occurs, then the incoming data is stored in the dictionary

    entry and the data is given as output of the compressor.

    The Decompression system hence gets 64 bit data if a match has not occurred or 1 bit match

    signal & 7 bit match location to be processed by the 128 X 64 array in decompressor to give

    the data in the match location as output. The block diagram of the 64 bit Compression / Decompression

    System is given below.

  • 7/27/2019 04 32 bit loss less comp.doc

    45/111

    Fig.5.2. Block Diagram of 64 bit Compression / Decompression system

    5.3. PARALLEL COMPRESSION / DECOMPRESSION SYSTEM

  • 7/27/2019 04 32 bit loss less comp.doc

    46/111

    5.3.1 DESIGN OF PARALLEL COMPRESSION SYSTEM

    The block diagram gives the details about the components of a parallel Compression

    system. Here the compressor is instantiated twice for the two processors. The number of input as

    well as the number of output pins are twice as that of the single compressor.

    The components of the single instantiated compressor are same as that of the 32-bit compressor.

    The components involved in the 32-bit compressor are namely, COMPARATOR, ARRAY, and

    CAMCOMPARATOR.

    The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for

    equal and 0 for unequal. Array is of length of 64X32bit locations.

    This is used to store the unmatched incoming data and when a new data comes, that

    data is compared with the all the data stored in this array for a match. If no match occurs, the

    incoming data is stored in next free location of the array. The last component is the cam comparator and

    is used to send the incoming data and all the stored data in array one by one to the comparator.

  • 7/27/2019 04 32 bit loss less comp.doc

    47/111

    Comparator goes high for any input the match is found and the corresponding address is

    retrieved and sent as output along with one bit to indicate that a match is found. At the same time,

    suppose that no match is found, then the incoming data is stored in the array and is sent as output.

    These are the functions of the three components of the 32-bit Compressor.

    5.3.2 DESIGN OF PARALLEL DECOMPRESSION SYSTEM

    The parallel Decompression system is also implemented by concatenating the outputs of

    two compressors in parallel architecture and giving those data as input to the parallel decompression

    system comprising two 32-bit decompression system discussed above for single compression

    system. The 32-bit decompressor has the following components Array and Processing Unit.

    Array has the same function as that of the array unit which is used in the Compressor. It is

    also of the same length. Processing unit checks the incoming match hit data and if it is 0, it indicates

    that the data is not present in the Array, so it stores the data in the Array. If the match hit data is 1, it

    indicates the data is present in the Array, then it instructs to find the data from the Array with the

    help of the address input (match location) and sends as output to the data out.

    5.4. SIMULATION RESULTS

    The design coded in VHDL is simulated using Modelsim of Mentor Graphics. The obtained

    waveforms are as follows

    Fig.5.4.Comparator

  • 7/27/2019 04 32 bit loss less comp.doc

    48/111

    Fig.5.5. Cam Comparator

  • 7/27/2019 04 32 bit loss less comp.doc

    49/111

    Fig.5.6.Content Addressable Memory

  • 7/27/2019 04 32 bit loss less comp.doc

    50/111

    Fig.5.7. 32-bit Single Compression Top Module

    Fig.5.8. 32-bit Single Compression Top Module Decimal inputs

  • 7/27/2019 04 32 bit loss less comp.doc

    51/111

    Fig.5.9. 64-bit Single Compression System -Top module

    Fig.5.10. 64-bit Single Compression System -Test bench Waveform

  • 7/27/2019 04 32 bit loss less comp.doc

    52/111

    Fig.5.11. 32-bit Single Decompression Top Module

    Fig.5.12. 32-bit Single Decompression- Test bench Waveform

  • 7/27/2019 04 32 bit loss less comp.doc

    53/111

    Fig.5.13. Parallel Compression System - 64-bit input Top module

    Fig.5.14. Parallel Compression System - 64-bit input Test bench

  • 7/27/2019 04 32 bit loss less comp.doc

    54/111

    5.5. RTL SCHEMATIC

    The RTL Schematic for vhdl codes are generated using Xilinx Project Navigator 8.1i

    Fig.5.15. 32 bit Single Compression System

    Fig.5.16. 32 bit Single Compression System

  • 7/27/2019 04 32 bit loss less comp.doc

    55/111

    Fig.5.17. 64 bit Single Compression System

    Fig.5.18. RTL Schematic for 64 bit Single Compression System

  • 7/27/2019 04 32 bit loss less comp.doc

    56/111

    Fig5.19. 64 bit Parallel Compression System

    Fig.5.20. RTL Schematic for 64 bit Parallel Compression System

    5.6. Xilinx Synthesis Results for Target Device xc2v1500bg575-6

  • 7/27/2019 04 32 bit loss less comp.doc

    57/111

    5.6.1. 32-bit Single Compression System

    ===============================================================

    * Synthesis Options Summary *

    ===============================================================---- Source Parameters

    Input File Name : "xmatchpro.prj"

    Input Format : mixedIgnore Synthesis Constraint File : NO

    ---- Target Parameters

    Output File Name : "xmatchpro"Output Format : NGC

    Target Device : xc2v1500-6-bg575

    ===============================================================* HDL Compilation *

    ===============================================================

    Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/comparator.vhd" in Library

    work.

    Architecture arch _comp of Entity comparator is up to date.

    Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/camcomp.vhd" in Library work.

    Architecture arch_cam64 of Entity camcomp is up to date.

    Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/cam.vhd" in Library work.

    Architecture arch_cam of Entity cam is up to date.

    Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/xmatchpro.vhd" in Library

    work.

    Architecture arch_xmatch of Entity xmatchpro is up to date.

    Table 4.1. 32-bit Single Compression System - HDL Synthesis Report

    Macro Statistics

    # ROMS

    4x1 bit ROM

    No.

    64

    64

    # Adders/Subtractors 1

    32-bit adder

    # Registers

    1-bit register32-bit register

    6-bit register

    # Latches

    1-bit latch

    6-bit latch

    # Comparators

    32-bit comparator equal

    1

    68

    166

    1

    2

    11

    64

    64

    43

  • 7/27/2019 04 32 bit loss less comp.doc

    58/111

    5.6.2. 64-bit Single Compression System===============================================================

    * Synthesis Options Summary *===============================================================

    ---- Source ParametersInput File Name : "xmatchpro.prj"Input Format : mixed

    Ignore Synthesis Constraint File : NO

    ---- Target Parameters

    Output File Name : "xmatchpro"

    Output Format : NGC

    Target Device : xc2v1500-6-bg575===============================================================

    * HDL Compilation *

    ===============================================================Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/comparator.vhd" in Library

    work.

    Architecture arch_comp of Entity comparator is up to date.Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/camcomp.vhd" in Library work.

    Architecture arch_cam64 of Entity camcomp is up to date.

    Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/cam.vhd" in Library work.Architecture arch_cam of Entity cam is up to date.

    Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/xmatchpro.vhd" in Library

    work.Architecture arch_xmatchpro of Entity xmatchpro is up to date.

    Table 5.2. 64-bit Single Compression System - HDL Synthesis Report

    Macro Statistics

    # ROMS

    4x1 bit ROM

    Nos.

    128

    128

    # Adders/Subtractors 1

    32-bit adder

    # Registers

    1-bit register

    32-bit register

    64-bit register

    7-bit register# Latches

    1-bit latch

    7-bit latch

    # Comparators

    64-bit comparator equal

    1

    132

    1

    1

    129

    12

    1

    1

    128

    128

  • 7/27/2019 04 32 bit loss less comp.doc

    59/111

  • 7/27/2019 04 32 bit loss less comp.doc

    60/111

    5.6.4. 64-bit Parallel Decompression System

    ===============================================================

    * Synthesis Options Summary *===============================================================

    ---- Source Parameters

    Input File Name : "LL_decomp.prj"Input Format : mixed

    Ignore Synthesis Constraint File : NO

    ---- Target Parameters

    Output File Name : "LL_decomp"

    Output Format : NGC

    Target Device : xc2v1500-6-bg575===============================================================

    * HDL Compilation *

    ===============================================================Compiling vhdl file "E:/proj/xilinx/dual_decomp/dual_decomp/de_xmatchpro.vhd" in

    Library work.

    Architecture arch_de_camcomparator of Entity de_xmatchpro is up to date.Compiling vhdl file "E:/proj/xilinx/dual_decomp/dual_decomp/LL_decomp.vhd" in

    Library work.

    Architecture arch_dualdecomp of Entity ll_decomp is up to date.

    Table 5.4. 64-bit Parallel Decompression System - HDL Synthesis Report

    Macro Statistics Nos.

    # Adders / Subtractors 2

    32-bit adder 2

    # Latches 130

    32-bit latch 130

    # Multiplexers 2

    32-bit 64-to-1 multiplexer 2

  • 7/27/2019 04 32 bit loss less comp.doc

    61/111

    CHAPTER 6

  • 7/27/2019 04 32 bit loss less comp.doc

    62/111

    CHAPTER 6 -ANALYSIS OF RESULTS

    6.1. Device Utilization of Various Modules

    Table 6.1.

    Compression Device Utilization Summary for Selected Device: xc2v1500bg575-6

    Modules:32-bit Single

    Compression

    64-bit Single

    Compression

    64-bit Parallel

    Compression

    Number of Slices: 1756 out of 768022%

    6819 out of 768088%

    3560 out of 768046%

    Number of Slice 2064 out of 15360 8168 out of 15360 4206 out of 15360

    Flip Flops: 13% 53% 27%

    Number of 4 input 1368 out of 15360 4776 out of 15360 2930 out of 15360LUTs:

    Number of bonded

    IOBs:

    IOB Flip Flops:

    8%

    74 out of 392

    18%

    39

    31%

    139 out of 392

    35%

    72

    19%

    145 out of 392

    36%

    78

    Number of

    GCLKs:

    2 out of 16 12% 2 out of 16 12% 2 out of 16 12%

  • 7/27/2019 04 32 bit loss less comp.doc

    63/111

    6.2. CADENCE RTL Compiler Reports

    The Hardware designs done are compiled in Cadence RTL compiler and the

    results are as follows:

    6.2.1. 32-bit Single Compression System

    6.2.1.1. Area Report

    ============================================================Generated by: Encounter(r) RTL Compiler v06.10-p003_1Generated on: Apr 17 2007 08:42:56 PMModule: scomp_32Technology libraries: typical 1.3

    tpz973gtc 230ram_128x16A 0.0ram_256x16A 0.0

    rom_512x16A 0.0pllclk 4.3Operating conditions: typical (balanced_tree)Wireload mode: segmented

    ============================================================

    Instance Cells Cell Area Net Area Wireload----------------------------------------------------------------------scomp_32 5393 116863 0 TSMC32K_Conservative (S)

    6.2.1.2. Power Report============================================================

    Generated by: Encounter(r) RTL Compiler v06.10-p003_1Generated on: Apr 17 2007 08:43:13 PMModule: scomp_32Technology libraries: typical 1.3

    tpz973gtc 230ram_128x16A 0.0ram_256x16A 0.0rom_512x16A 0.0pllclk 4.3

    Operating conditions: typical (balanced_tree)Wireload mode: segmented

    ============================================================

    Leakage Internal Net SwitchingInstance Cells Power(nW) Power(nW) Power(nW) Power(nW)

    -----------------------------------------------------------------------scomp_32 5393 4.255 5832894.166 2001783.940 7834678.105

    6.2.1.3. Timing Report============================================================

    Generated by: Encounter(r) RTL Compiler v06.10-p003_1Generated on: Apr 17 2007 08:42:21 PMModule: scomp_32Technology libraries: typical 1.3

  • 7/27/2019 04 32 bit loss less comp.doc

    64/111

    tpz973gtc 230ram_128x16A 0.0ram_256x16A 0.0rom_512x16A 0.0pllclk 4.3

    Operating conditions: typical (balanced_tree)Wireload mode: segmented============================================================

    Pin Type Fanout Load Slew Delay Arrival(fF) (ps) (ps) (ps)

    ----------------------------------------------------------------------(clock clk) launch 0 Ru3

    Wr_addr_reg_reg[31]/CK setup 0 +365 7451 R- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -(clock clk)