documentation.pprojectdf

  • Upload
    mvsrr

  • View
    14

  • Download
    0

Embed Size (px)

DESCRIPTION

project work

Citation preview

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 1

    Chapter 1

    INTRODUCTION

    1.1 Overview of the project

    Convolutional coding has been used in communication systems

    including deep space communications and wireless communications. It offers an

    alternative to block codes for transmission over a noisy channel. An advantage of

    convolutional coding is that it can be applied to a continuous data stream as well as to

    blocks of data. IS-95, a wireless digital cellular standard for CDMA (code division

    multiple access), employs convolutional coding. A third generation wireless cellular

    standard, under preparation, plans to adopt turbo coding, which stems from

    convolutional coding.

    The Viterbi decoding algorithm, proposed in 1967 by Viterbi, is a

    decoding process for convolutional codes in memory-less noise. The algorithm can

    be applied to a host of problems encountered in the design of communication

    systems. The Viterbi decoding algorithm provides both a maximum-likelihood

    and a maximum a posteriori algorithm. A maximum a posteriori algorithm

    identifies a code word that maximizes the conditional probability of the decoded

    code word against the received code word, in contrast a maximum likelihood

    algorithm identifies a code word that maximizes the conditional probability of

    the received code word against the decoded code word. The two algorithms give the

    same results when the source information has a uniform distribution.

    Traditionally, performance and silicon area are the two most important

    concerns in VLSI design. Recently, power dissipation has also become an important

    concern, especially in battery- powered applications, such as cellular phones, pagers

    and laptop computers. Power dissipation can be classified into two categories, static

    power dissipation and dynamic power dissipation Typically, static power dissipation

    is due to various leakage currents, while dynamic power dissipation is a result

    of charging and discharging the parasitic capacitance of transistors and wires. Since

    the dynamic power dissipation accounts for about 80 to 90 percent of overall power

    dissipation in CMOS circuits; numerous techniques have been proposed to reduce

    dynamic power dissipation. These techniques can be applied at different levels of

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 2

    digital design, such as the algorithmic level, the architectural level, the gate level and,

    the circuit level.

    A Viterbi decoder uses the Viterbi algorithm for decoding a bit

    stream that has been encoded using Forward error correction based on a

    Convolutional code. The Viterbi algorithm is commonly used in a wide range of

    communications and data storage applications. It is used for decoding convolutional

    codes, in baseband detection for wireless systems, and also for detection of recorded

    data in magnetic disk drives. The requirements for the Viterbi decoder or Viterbi

    detector, which is a processor that implements the Viterbi algorithm, depend on the

    applications where they are used. This results in very wide range of required data

    throughputs and power or area requirements.

    Viterbi detectors are used in cellular telephones with low data rates, of

    the order below 1Mb/s but with very low energy dissipation requirement. They are

    used for trellis code demodulation in telephone line modems, where the throughput is

    in the range of tens of kb/s, with restrictive limits in power dissipation and the

    area/cost of the chip. On the opposite end, very high speed Viterbi detectors are used

    in magnetic disk drive read channels, with throughputs over 600Mb/s. But at these

    high speeds, area and power are still limited.

    Convolutional coding has been used in communication systems including

    deep space communications and wireless communications. It offers an alternative to

    block codes for transmission over a noisy channel. An advantage of convolutional

    coding is that it can be applied to a continuous data stream as well as to blocks of

    data. IS-95, a wireless digital cellular standard for CDMA (code division multiple

    access), employs convolutional coding.

    1.2 Motivation

    Unlike wired digital networks, wireless digital networks are much

    more prone to bit errors. Packets of bits that are received are more likely to be

    damaged and considered unusable in a packetized system. Error detection and

    correction mechanisms are vital and numerous techniques exist for reducing the effect

    of bit-errors and trying to ensure that the receiver eventually gets an error free version

    of the packet. The major techniques used are error detection with Automatic Repeat

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 3

    Request (ARQ), Forward Error Correction (FEC) and hybrid forms of ARQ and FEC

    (H-ARQ).

    This project focuses on FEC techniques. Forward Error Correction

    (FEC) is the method of transmitting error correction information along with the

    message. At the receiver, this error correction information is used to correct any bit-

    errors that may have occurred during transmission. The improved performance comes

    at the cost of introducing a considerable amount of redundancy in the transmitted

    code. There are various FEC codes in use today for the purpose of error correction.

    Most codes fall into either of two major categories: block codes and convolutional

    codes. Block codes work with fixed length blocks of code. Convolutional codes deal

    with data sequentially (i.e. taken a few bits at a time) with the output depending on

    both the present input as well as previous inputs.

    In terms of implementation, block codes become very complex as

    their length codes, are less complex and therefore easier to implement. In

    packetized digital networks convolutionally coded data would still be transmitted as

    packets or blocks. However these blocks would be much larger in comparison to those

    used by block codes. The fact that convolutional codes are easier to implement,

    coupled with the emergence of a very efficient convolutional decoding algorithm,

    known as Viterbi Algorithm is one of the reasons for convolutional codes becoming

    the preferred method for real time communication technologies.

    This project studies the use of various error detection and correction

    techniques for mobile networks with a focus on non-recursive convolutional coding

    and the Viterbi Algorithm. The constraint length of a non-recursive convolutional

    code results from the number of stages present in the combinatorial logic of the

    encoder. The error correction power of a convolutional code increases with its

    constraint length. However, decoding complexity increases exponentially as the

    constraint length increases. Fortunately, the efficiency of the Viterbi algorithm allows

    the use of convolutional coding with quite reasonable constraint lengths in many

    applications. Due to its high accuracy in finding the most likely sequence of states, the

    Viterbi algorithm is used in many applications ranging from communication

    networks, optical character recognition and even DNA sequence analysis.

    Recently, interest has grown in the use of certain error correction

    codes that provide much superior performance. Two of these codes are Low Density

    Parity Check codes and Turbo Codes. The ideas presented in this thesis are likely to

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 4

    be relevant to these more advanced codes as well as non-recursive convolutional

    codes, but this thesis will concentrate on convolutional codes. Since preservation of

    battery energy is a major concern for mobile devices, it is desirable that the error

    detection and correction mechanism take the minimum amount of energy to execute.

    This project explores the possibility of improving the energy efficiency of the Viterbi

    decoder and develops an algorithm to achieve this.

    1.3 Outline and Context of the Report

    This project focuses on the use of Viterbi Algorithm for forward error

    correction in mobile networks. It is desirable to keep energy consumption at a

    minimum in order to optimize use of available battery energy. In order to get good

    error correcting capabilities, the constraint length must be kept high and since the

    complexity of a convolutional decoder increases exponentially with its constraint

    length, optimizing the decoding mechanism with respect to energy consumption

    becomes a worthwhile goal. The growing need for improved energy efficiency of

    decoders has resulted in several approaches being explored.

    The main focus of the project is to explore an idea, proposed by Barry

    Cheetham which is to switch off the Viterbi decoder and use a simpler decoder when

    no bit-errors are occurring. It is possible that by doing this, a significant amount of

    energy could be saved. When bit-errors are detected, the Viterbi decoder can be

    switched back on to take advantage of its error correction functionality. This process

    at the receiver depends on having a memory of previous bits received. Correctly

    maintaining and using this previous memory (previous history) when switching

    between the two decoders is one of the main technical challenges in the project. The

    energy saving mechanism proposed by Barry Cheetham is based on an earlier idea

    published by Wei Shao, though it is hoped that the new approach will be easier to

    implement.

    This algorithm can be developed using verilog though it will require a

    custom designed version of the Viterbi algorithm to be developed from scratch, and

    then adapted to the new energy saving idea. Possible problems that may affect the

    accuracy and energy saving capabilities of the algorithm must be analyzed and

    solutions to these problems must be developed. The performance of the resulting

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 5

    algorithm must be studied in terms of bit-error performance, packet loss rates and

    processing time.

    In principle, evaluating the performance of the new technique requires

    profiling of the energy consumption of the two algorithms involved. To do this

    accurately would require resources beyond the scope of the project verilog provides

    some profiling facilities. But relating information obtained to energy consumption as

    would be observed in a VLSI implementation of the code is a complex issue.

    Nevertheless, it is believed that the execution times of particular parts of the

    algorithms can give some idea of the likely relationship between the energy

    consumption of these particular parts. Hence, in place of quoting estimations of the

    likely energy consumption of different techniques, execution times will be quoted

    with an implicit assumption that this gives a first order approximation to the likely

    energy consumption. By comparison with the standard Viterbi decoder available

    verilog an analysis will be made of whether this method provides a significant

    improvement over existing mechanisms.

    1.4 Contributions and main objectives

    The main objectives of this project are as follows

    1. An understanding of the background literature relevant to error detection and

    error control mechanisms as currently used in packetized digital communication

    networks.

    2. A detailed understanding of the concept of convolutional coding, and decoding

    using the Viterbi algorithm.

    3. An implementation of the Viterbi algorithm in verilog to obtain a custom

    designed version called My Viterbi and check that it is working correctly by

    comparing its performance with that of the Viterbi decoder function provided by

    verilog (A custom designed Viterbi decoder is needed because verilog does not

    provide access to the code.

    4. A resolution of questions that still need to be answered about the new

    algorithm including the correct initialization of component decoders and the stability

    of the feedback mechanism

    5. An implementation in verilog of the new algorithm as a modification of the

    custom designed Viterbi algorithm.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 6

    6. An evaluation of the new algorithm in terms of its accuracy and capacity for

    achieving energy saving the Analysis will be performed on the basis of bit-error

    performance, packet loss rates and execution time (considered to provide a first order

    approximation to energy.

    1.5 Scope of the Project

    This project is intended to further develop and implement the energy saving decoding

    algorithm developed by Barry Cheetham. Solutions to some issues that still remained

    to be resolved at the beginning of this project. The main focus of this project is to

    provide a working demonstration of the algorithm by implementation in verilog and to

    analyze its performance by comparison with the standard Viterbi decoder available in

    verilog. The system will be developed using a hard decision Viterbi decoder but may

    be extended to using a soft decision decoder. The project does not consider the circuit

    level design of the algorithm but uses a high level approach to test the proposed

    algorithm. This may be considered in future work if it is found that this algorithm

    promises considerable benefits over existing mechanisms.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 7

    Chapter 2

    2.1 Overview of VLSI

    The first semiconductor chips held one transistor each. Subsequent

    advances added more and more transistors, and, as a consequence, more individual

    functions or systems were integrated over time. The first integrated circuits held only

    a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors,

    making it possible to fabricate one or more logic gates on a single device. Now known

    retrospectively as "small-scale integration" (SSI), improvements in technique led to

    devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.

    systems with at least a thousand logic gates. Current technology has moved far past

    this mark and today's microprocessors have many millions of gates and hundreds of

    millions of individual transistors.

    At one time, there was an effort to name and calibrate various levels of large-scale

    integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used.

    But the huge number of gates and transistors available on common devices has

    rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of

    integration are no longer in widespread use. Even VLSI is now somewhat quaint,

    given the common assumption that all microprocessors are VLSI or better.

    As of early 2008, billion-transistor processors are commercially available, an example

    of which is Intel's Montecito Itanium chip. This is expected to become more

    commonplace as semiconductor fabrication moves from the current generation of 65

    nm processes to the next 45 nm generations (while experiencing new challenges such

    as increased variation across process corners). Another notable example is NVIDIAs

    280 series GPU.

    This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable

    of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor

    count is largely due to the 24MB L3 cache). Current designs, as opposed to the

    earliest devices, use extensive design automation and automated logic synthesis to lay

    out the transistors, enabling higher levels of complexity in the resulting logic

    functionality. Certain high-performance logic blocks like the SRAM cell, however,

    are still designed by hand to ensure the highest efficiency (sometimes by bending or

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 8

    breaking established design rules to obtain the last bit of performance by trading

    stability).

    2.2 INTRODUCTION OF VLSI

    Very-large-scale integration (VLSI) is the process of creating integrated

    circuits by combining thousands of transistor-based circuits into a single chip. VLSI

    began in the 1970s when complex semiconductor and communication technologies

    were being developed. The microprocessor is a VLSI device. The term is no longer as

    common as it once was, as chips have increased in complexity into the hundreds of

    millions of transistors.

    The first semiconductor chips held one transistor each. Subsequent advances

    added more and more transistors, and, as a consequence, more individual functions or

    systems were integrated over time. The first integrated circuits held only a few

    devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it

    possible to fabricate one or more logic gates on a single device. Now known

    retrospectively as "small-scale integration" (SSI), improvements in technique led to

    devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.

    systems with at least a thousand logic gates. Current technology has moved far past

    this mark and today's microprocessors have many millions of gates and hundreds of

    millions of individual transistors.

    At one time, there was an effort to name and calibrate various levels of large-scale

    integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used.

    But the huge number of gates and transistors available on common devices has

    rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of

    integration are no longer in widespread use. Even VLSI is now somewhat quaint,

    given the common assumption that all microprocessors are VLSI or better.

    As of early 2008, billion-transistor processors are commercially available, an example

    of which is Intel's Montecito Itanium chip. This is expected to become more

    commonplace as semiconductor fabrication moves from the current generation of 65

    nm processes to the next 45 nm generations (while experiencing new challenges such

    as increased variation across process corners). Another notable example is NVIDIAs

    280 series GPU.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 9

    This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable

    of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor

    count is largely due to the 24MB L3 cache). Current designs, as opposed to the

    earliest devices, use extensive design automation and automated logic synthesis to lay

    out the transistors, enabling higher levels of complexity in the resulting logic

    functionality. Certain high-performance logic blocks like the SRAM cell, however,

    are still designed by hand to ensure the highest efficiency (sometimes by bending or

    breaking established design rules to obtain the last bit of performance by trading

    stability).

    2.3 What is VLSI?

    VLSI stands for "Very Large Scale Integration". This is the field which involves

    packing more and more logic devices into smaller and smaller areas.

    VLSI

    Simply we say Integrated circuit is many transistors on one chip.

    Design/manufacturing of extremely small, complex circuitry using modified

    semiconductor material

    Integrated circuit (IC) may contain millions of transistors, each a few mm in

    size

    Applications wide ranging: most electronic logic devices

    2.3.1 History of scale integration

    late 40s Transistor invented at Bell Labs

    late 50s First IC (JK-FF by Jack Kilby at TI)

    early 60s Small Scale Integration (SSI)

    10s of transistors on a chip

    100s of transistors on a chip

    early 70s Large Scale Integration (LSI)

    1000s of transistor on a chip

    early 80s VLSI 10,000s of transistors on a

    chip (later 100,000s & now 1,000,000s)

    Ultra LSI is sometimes used for 1,000,000s

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 10

    SSI - Small-Scale Integration (0-102)

    MSI - Medium-Scale Integration (102-103)

    LSI - Large-Scale Integration (103-105)

    VLSI - Very Large-Scale Integration (105-107)

    ULSI - Ultra Large-Scale Integration (>=107)

    2.3.2 Advantages of ICs over discrete components

    While we will concentrate on integrated circuits, the properties of integrated circuits-

    what we can and cannot efficiently put in an integrated circuit-largely determine the

    architecture of the entire system. Integrated circuits improve system characteristics in

    several critical ways. ICs have three key advantages over digital circuits built from

    discrete components:

    Size. Integrated circuits are much smaller-both transistors and

    wires are shrunk to micrometer sizes, compared to the millimeter or centimeter scales

    of discrete components. Small size leads to advantages in speed and power

    consumption, since smaller components have smaller parasitic resistances,

    capacitances, and inductances.

    Speed. Signals can be switched between logic 0 and logic 1

    much quicker within a chip than they can between chips. Communication within a

    chip can occur hundreds of times faster than communication between chips on a

    printed circuit board. The high speed of circuits on-chip is due to their small size-

    smaller components and wires have smaller parasitic capacitances to slow down the

    signal.

    Power consumption. Logic operations within a chip also take

    much less power. Once again, lower power consumption is largely due to the small

    size of circuits on the chip-smaller parasitic capacitances and resistances require less

    power to drive them.

    2.3.4 VLSI and systems

    These advantages of integrated circuits translate into advantages at the system level:

    Smaller physical size. Smallness is often an advantage in

    itself-consider portable televisions or handheld cellular telephones.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 11

    Lower power consumption. Replacing a handful of standard

    parts with a single chip reduces total power consumption. Reducing power

    consumption has a ripple effect on the rest of the system: a smaller, cheaper power

    supply can be used; since less power consumption means less heat, a fan may no

    longer be necessary; a simpler cabinet with less shielding for electromagnetic

    shielding may be feasible, too.

    Reduced cost. Reducing the number of components, the

    power supply requirements, cabinet costs, and so on, will inevitably reduce system

    cost. The ripple effect of integration is such that the cost of a system built from

    custom ICs can be less, even though the individual ICs cost more than the standard

    parts they replace.

    Understanding why integrated circuit technology has such profound influence on the

    design of digital systems requires understanding both the technology of IC

    manufacturing and the economics of ICs and digital systems.

    Applications

    Electronic system in cars.

    Digital electronics control VCRs

    Transaction processing system, ATM

    Personal computers and Workstations

    Medical electronic systems.

    2.4 Applications of VLSI

    Electronic systems now perform a wide variety of tasks in daily life. Electronic

    systems in some cases have replaced mechanisms that operated mechanically,

    hydraulically, or by other means; electronics are usually smaller, more flexible, and

    easier to service. In other cases electronic systems have created totally new

    applications. Electronic systems perform a variety of tasks, some of them visible,

    some more hidden:

    Personal entertainment systems such as portable MP3

    players and DVD players perform sophisticated algorithms with remarkably little

    energy.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 12

    Electronic systems in cars operate stereo systems and

    displays; they also control fuel injection systems, adjust suspensions to varying

    terrain, and perform the control functions required for anti-lock braking (ABS)

    systems.

    Digital electronics compress and decompress video, even at

    high-definition data rates, on-the-fly in consumer electronics.

    Low-cost terminals for Web browsing still require

    sophisticated electronics, despite their dedicated function.

    Personal computers and workstations provide word-

    processing, financial analysis, and games. Computers include both central processing

    units (CPUs) and special-purpose hardware for disk access, faster screen display, etc.

    Medical electronic systems measure bodily functions and

    perform complex processing algorithms to warn about unusual conditions. The

    availability of these complex systems, far from overwhelming consumers, only creates

    demand for even more complex systems.

    2.5 VERILOG HDL

    Verilog HDL is a hardware description language that can be used to model a

    digital system at many levels of abstraction ranging from the algorithmic-level to the

    gate-level to the switch-level. The complexity of the digital system being modelled

    could vary from that of a simple gate to a complete electronic digital system, or

    anything in between. The digital system can be described hierarchically and timing

    can be explicitly modelled within the same description.

    The Verilog HDL language includes capabilities to describe the behaviour-al

    nature of a design, the dataflow nature of a design, a design's structural composition,

    delays and a waveform generation mechanism including aspects of response

    monitoring and verification, all modelled using one single language. In addition, the

    language provides a programming language interface through which the internals of a

    design can be accessed during simulation including the control of a simulation run.

    The language not only defines the syntax but also defines very clear simulation

    semantics for each language construct. Therefore, models written in this language can

    be verified using a Verilog simulator. The language inherits many of its operator

    symbols and constructs from the C programming language. Verilog HDL provides an

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 13

    extensive range of modelling capabilities, some of which are quite difficult to

    comprehend initially. However, a core subset of the language is quite easy to learn

    and use. This is sufficient to model most applications.

    2.5.1 History:

    The verilog HDL language was first developed by Gateway Design Automation in

    1983 as hardware are modelling language for their simulator product, At that time was

    a propnetary language. Because of the popularity of the simulator product, Verilog

    HDL gained acceptance as a usable and practical language by a number of designers.

    In an effort to increase the popularity of the language, the language was placed in the

    public domain in 1990. Open verilog International (OVI) was formed to promote

    Verilog. In 1992 OVI decided to pursue standardization of verilog HDL as an IEEE

    standard. This effort was successful and the language became an IEEE standard in

    1995. The complete standard is described in the verilog hardware description

    language reference manual. The standard is called std 1364-1995.

    2.5.2 Major Capabilities:

    Listed below are the major capabilities of the verilog hardware description:

    Primitive logic gates, such as and, or and nand, are built-in into the language.

    Flexibility of creating a user-defined primitive (UDP). Such a primitive could

    either be a combinational logic primitive or a sequential logic primitive.

    Switch-level modelling primitive gates, such as pmos and nmos, are also built-

    in into the language.

    Explicit language constructs are provided for specifying pin-to-pin delays,

    path delays and timing checks of a design.

    A design can be modelled in three different styles or in a mixed style. These

    styles are: behavioural style - modelled using procedural constructs; dataflow style -

    modelled using continuous assignments; and structural style - modelled using gate and

    module instantiations.

    There are two data types in Verilog HDL; the net data type and the register

    data type. The net type represents a physical connection between structural elements

    while a register type represents an abstract data storage element.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 14

    Figure.2-1 shows the mixed-level modeling capability of Verilog HDL, that is,

    in one design, each module may be modeled at a different level.

    Verilog HDL also has built-in logic functions such as & (bitwise-and) and I

    (bitwise-or).

    High-level programming language constructs such as condition- als, case

    statements, and loops are available in the language.

    Notion of concurrency and time can be explicitly modelled.

    Powerful file read and write capabilities fare provided.

    The language is non-deterministic under certain situations, that is, a model

    may produce different results on different simulators; for example, the ordering of

    events on an event queue is not defined by the standard.

    2.6 Verilog synthesis

    Synthesis is the process of constructing a gate level netlist from a register-transfer

    level model of a circuit described in Verilog HDL. Figure.2-2 shows such a process.

    A synthesis system may as an intermediate step, generate a netlist that is comprised of

    register-transfer level blocks such as flip-flops, arithmetic-logic-units, and

    multiplexers, interconnected by wires. In such a case, a second program called the

    RTL module builder is necessary. The purpose of this builder is to build, or acquire

    Fig2.1: Mixed level modelling

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 15

    from a library of predefined components, each of the required RTL blocks in the user-

    specified target technology.

    Having produced a gate level netlist, a logic optimizer reads in the netlist and

    optimizes the circuit for the user-specified area and timing constraints. These area and

    timing constraints may also be used by the module builder for appropriate selection or

    generation of RTL blocks. In this book, we assume that the target netlist is at the gate

    level. The logic gates used in the synthesized netlists are described in Appendix B.

    The module building and logic optimization phases are not described in this book.

    The above figure shows the basic elements of Verilog HDL and the elements used in

    hardware. A mapping mechanism or a construction mechanism has to be provided that

    translates the Verilog HDL elements into their corresponding hardware elements as

    shown in figure.

    Figure: 2.2 synthesis process

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 16

    2.7 Tools used and explanation

    Requirements:

    Xilinx 9.1i

    Modelsim 6.2

    2.8 Introduction about the Software:

    Xilinx ISE 8.2i software includes the new Xilinx Smart Compile

    technology, which significantly improves run times by up to 6 times faster than the

    previous version, while maintaining exact design preservation of unchanged logic.

    Modelsim SE 6.2C is a verification and simulation tool for VHDL, Verilog,

    System-Verilog, and mixed language designs.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 17

    CHAPTER-3

    3.1 THE VITERBI DECODER ALGORITHM

    The Viterbi decoding algorithm is a decoding process for

    convolutional codes for memory-less channel. It depicts the normal flow of

    information over a noisy channel. For the purpose of error recovery, the encoder

    adds redundant information to the original Information, and the output is transmitted

    through a channel. Input at receiver end (r) is the information with redundancy and

    possibly, noise. The receiver tries to extract the original information through a

    decoding algorithm and generates an estimate (e). A decoding algorithm that

    maximizes the probability p(r|e) is a maximum likelihood (ML) algorithm. An

    algorithm which maximizes the p(r|e) through the proper selection of the estimate (e)

    is called a maximum a posteriori (MAP) algorithm. The two algorithms have identical

    results when the source information has a uniform distribution.

    Figure 3.1 The Convolutional Decoding

    The Viterbi Algorithm was developed by Andrew J. Viterbi and first published

    in the IEEE transactions journal on Information theory in 1967. It is a maximum

    likelihood decoding algorithm for convolutional codes. This algorithm provides a

    method of finding the branch in the trellis diagram that has the highest probability of

    matching the actual transmitted sequence of bits. Since being discovered, it has

    become one of the most popular algorithms in use for convolutional decoding. Apart

    from being an efficient and robust error detection code, it has the advantage of having

    a fixed decoding time. This makes it suitable for hardware implementation. The

    algorithm has found universal application in decoding the convolutional codes used in

    both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space

    communications, and 802.11 wireless LANs. It is now also commonly used in speech

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 18

    recognition, speech synthesis, keyword spotting, computational linguistics,

    and bioinformatics. For example, in speech-to-text (speech recognition), the acoustic

    signal is treated as the observed sequence of events, and a string of text is considered

    to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most

    likely string of text given the acoustic signal. The terms Viterbi path and Viterbi

    algorithm are also applied to related dynamic programming algorithms that discover

    the single most likely explanation for an observation. For example, in statistical

    parsing a dynamic programming algorithm can be used to discover the single most

    likely context-free derivation (parse) of a string, which is sometimes called the Viterbi

    parse.

    3.2 Convolutional Encoders

    Like any error-correcting code, a convolutional code works by

    adding some structured redundant information to the user's data and then correcting

    errors using this information. A convolutional encoder is a linear system. A binary

    convolutional encoder can be represented as a shift register. The outputs of the

    encoder are modulo 2 sums of the values in the certain register's cells. The input to the

    encoder is either the unencoded sequence (for non-recursive codes) or the unencoded

    sequence added with the values of some register's cells (for recursive codes).

    In telecommunication, a convolutional code is a type of error-correcting

    code in which

    Each m-bit information symbol (each m-bit string) to be encoded is

    transformed into an n-bit symbol, where m/n is the code rate (n m) and

    The transformation is a function of the last k information symbols, where k is

    the constraint length of the code.

    Convolutional codes can be systematic and non-systematic.

    Systematic codes are those where an unencoded sequence is a part of the output

    sequence. Systematic codes are almost always recursive, conversely, non-recursive

    codes are almost always non-systematic. Convolutional codes are used extensively in

    numerous applications in order to achieve reliable data transfer, including digital

    video, radio, mobile communication, and satellite communication. These codes are

    often implemented in concatenation with a hard-decision code, particularly Reed

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 19

    Solomon. Prior to turbo codes, such constructions were the most efficient, coming

    closest to the Shannon limit.

    To convolutionally encode data, start with k memory registers, each holding 1

    input bit. Unless otherwise specified, all memory registers start with a value of 0. The

    encoder has nmodulo-2 adders (a modulo 2 adder can be implemented with a

    single Boolean XOR gate, where the logic is: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 = 0),

    and n generator polynomials one for each adder (see figure below). An input

    bit m1 is fed into the leftmost register. Using the generator polynomials and the

    existing values in the remaining registers, the encoder outputs n bits. Now bit shift all

    register values to the right (m1 moves to m0, m0 moves to m-1) and wait for the next

    input bit. If there are no remaining input bits, the encoder continues output until all

    registers have returned to the zero state.

    The figure below is a rate 1/3 (m/n) encoder with constraint length (k) of 3.

    Generator polynomials are G1 = (1,1,1), G2 = (0,1,1), and G3 = (1,0,1). Therefore,

    output bits are calculated (modulo 2) as follows:

    n1 = m1 + m0 + m-1

    n2 = m0 + m-1

    n3 = m1 + m-1.

    A combination of register's cells that forms one of the output streams (or that is added

    with the input stream for recursive codes) is defined by a polynomial. Let m be the

    maximum degree of the polynomials constituting a code, then K=m+1 is a constraint

    length of the code.

    Figure 3.2 The Convolutional Encoder

    Figure A standard convolutional encoder with polynomials (171,133). For example,

    for the decoder on the Figure 3.2, the polynomials are:

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 20

    g1(z)=1+z+z2+z3+z6

    g2(z)=1+z2+z3+z5+z6

    Encoder polynomials are usually denoted in the octal notation. For the above example,

    these designations are 1111001 = 171 and 1011011 = 133.The constraint length

    of this code is 7.An example of a recursive convolutional encoder is on fig3.3

    .

    Figure 3.3. A recursive convolutional encoder

    3.2.1 Trellis Diagram

    A convolutional encoder is often seen as a finite state machine. Each state

    corresponds to some value of the encoder's register. Given the input bit value, from a

    certain state the encoder can move to two other states. These state transitions

    constitute a diagram which is called a trellis diagram. A trellis diagram for the code

    on the Figure 2 is depicted on the Figure 3. A solid line corresponds to input 0, a

    dotted line to input 1 (note that encoder states are designated in such a way that the

    rightmost bit is the newest one).

    Each path on the trellis diagram corresponds to a valid sequence from the

    encoder's output. Conversely, any valid sequence from the encoder's output can be

    represented as a path on the trellis diagram. One of the possible paths is denoted as

    red (as an example).Note that each state transition on the diagram corresponds to a

    pair of output bits. There are only two allowed transitions for every state, so there are

    two allowed pairs of output bits, and the two other pairs are forbidden.

    If an error occurs, it is very likely that the receiver will get a set of forbidden

    pairs, which don't constitute a path on the trellis diagram. So, the task of the decoder

    is to find a path on the trellis diagram which is the closest match to the received

    sequence.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 21

    Figure 3.4 A trellis diagram corresponding to the encoder on the Figure 3.3

    Let's define a free distance df as a minimal Hamming distance between two different

    allowed binary sequences (a Hamming distance is defined as a number of differing

    bits).

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 22

    Chapter -4

    4. Viterbi decoder

    A Viterbi decoder uses the Viterbi algorithm for decoding a bit stream that

    has been encoded using a convolutional code. There are other algorithms for decoding

    a convolutionally encoded stream (for example, the Fano algorithm). The Viterbi

    algorithm is the most resource-consuming, but it does the maximum

    likelihood decoding. It is most often used for decoding convolutional codes with

    constraint lengths k

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 23

    For each state, the Hamming distance between the received bits and the expected bits

    is calculated. Hamming distance between two symbols of the same length is

    calculated as the number of bits that are different between them. These branch metric

    values are passed to Block 2. If soft decision inputs were to be used, branch metric

    would be calculated as the squared Euclidean distance between the received symbols.

    The squared Euclidean distance is given as (a1-b1)2 + (a2-b2)2 + (a3-b3)2 where a1, a2, a3

    and b1, b2, b3 are the three soft decision bits of the received and expected bits

    respectively.

    value Meaning

    000 strongest 0

    001 relatively strong 0

    010 relatively weak 0

    011 weakest 0

    100 weakest 1

    101 relatively weak 1

    110 relatively strong 1

    111 strongest 1

    Figure 4.2 A recursive convolutional encoder

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 24

    4.2. Path Metric Computation and Add-Compare-Select (ACS) Unit

    A path metric unit summarizes branch metrics to get metrics for

    paths, where K is the constraint length of the code, one of which can eventually be

    chosen as optimal. Every clock it makes decisions, throwing off wittingly

    nonoptimal paths. The results of these decisions are written to the memory of a

    traceback unit.

    The core elements of a PMU are ACS (Add-Compare-Select) units. The way in

    which they are connected between themselves is defined by a specific code's trellis

    diagram. Since branch metrics are always , there must be an additional circuit

    preventing metric counters from overflow (it isn't shown on the image). An alternate

    method that eliminates the need to monitor the path metric growth is to allow the path

    metrics to "roll over", to use this method it is necessary to make sure the path metric

    accumulators contain enough bits to prevent the "best" and "worst" values from

    coming within 2(n-1) of each other. The compare circuit is essentially unchanged.

    Figure 4.3 ACS Unit

    It is possible to monitor the noise level on the incoming bit stream by monitoring the

    rate of growth of the "best" path metric. A simpler way to do this is to monitor a

    single location or "state" and watch it pass "upward" through say four discrete levels

    within the range of the accumulator. As it passes upward through each of these

    thresholds, a counter is incremented that reflects the "noise" present on the incoming

    signal.

    The path metric or error probability for each transition state at a particular time

    instant is measured as the sum of the path metric for its preceding state and the branch

    metric between the previous state and the present state. The initial path metric at the

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 25

    first time instant is infinity for all states except state 0. For each state, there are two

    possible predecessors. The mechanism of calculating the predecessors (and

    successors) is the path metrics from both these predecessors are compared and the one

    with the smallest path metric is selected. This is the most probable transition that

    occurred in the original message. In addition, a single bit is also stored for each state

    which specifies whether the lower or upper predecessor was selected.

    Figure 4.4 A sample implantation of a path metric unit for a specific k=4 decoder

    In cases where both paths result in the same path metric to the state, either the higher

    or lower state may consistently be chosen as the surviving predecessor. For the

    purpose of this project the higher state is consistently chosen as the surviving

    predecessor. Finally, the state with the least accumulated path metric at the current

    time instant is located. This state is called the global winner and is the state from

    which traceback operation will begin. This method of starting the traceback operation

    from the global winner instead of an arbitrary state was described by Linda

    Brackenbury in her design of an asynchronous Viterbi decoder. This greatly improves

    probability of finding the correct traceback path quicker and hence reduces the

    amount of history information that needs to be maintained. It also reduces the number

    of updates required to the surviving path. Both these measures result in improved

    energy savings. The values for the surviving predecessors (also called local winners)

    and the global winner are passed to Block 3.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 26

    Figure 4.5 A sample implementation of an ACS Unit

    4.3. Survivor memory unit or Trace back Unit

    Back-trace unit restores an (almost) maximum-likelihood path from the

    decisions made by PMU. Since it does it in inverse direction, a viterbi decoder

    comprises a FILO (first-in-last-out) buffer to reconstruct a correct order. Note that the

    implementation shown on the image requires double frequency. There are some tricks

    that eliminate this requirement.

    The global winner for the current state is received from Block 2. Its

    predecessor is selected in the manner. In this way, working backwards through the

    trellis, the path with the minimum accumulated path metric is selected. This path is

    known as the traceback path. A diagrammatic description will help visualize this

    process. The trellis diagram for a K=3 (7, 5) coder with sample input taken as the

    received data.

    The general approach to traceback is to accumulate path metrics for up to five

    times the constraint length (5 * (K 1)), find the node with the largest accumulated

    cost, and begin traceback from this node.

    However, computing the node which has accumulated the largest cost (either

    the largest or smallest integral path metric) involves finding the maxima or minima of

    several (usually 2K-1) numbers, which may be time consuming when implemented on

    embedded hardware systems.

    Most communication systems employ Viterbi decoding involving data packets

    of fixed sizes, with a fixed bit/byte pattern either at the beginning or/and at the end of

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 27

    the data packet. By using the known bit/byte pattern as reference, the start node may

    be set to a fixed value, thereby obtaining a perfect

    Maximum Likelihood Path during traceback.

    Figure 4.6 Selected minimum error path for a k=3(7, 5) coder

    The state having minimum accumulated error at the last time instant is

    State 10 and traceback is started here. Moving backwards through the trellis, the

    minimum error path out of the two possible predecessors from that state is selected.

    This path is marked in blue. The actual received data is described at the bottom while

    the expected data written in blue along the selected path. It is observed that at time

    slot three there was an error in received data (11). This was corrected to (10) by the

    decoder.

    Local winner information must be stored for five times the constraint

    length. For a K =7 decoder, this results in storing history for 7 x 5 = 35 time slots. The

    state of the decoder at the time instant 35 time slots prior can then be accurately

    determined. This state value is passed to Block 4. At the next time slot, all the trellis

    values are shifted left to the previous time slot. The path metric for the last received

    data and compute the minimum error path is then calculated. If the global winner at

    this stage is not a child of the previous global winner, the traceback path has to be

    updated accordingly until the traceback state is a child of the previous state [22].

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 28

    Figure 4.7 Trace back path unit

    Multiple traceback paths are possible and it may be thought that traceback up to

    the first bit is necessary to correctly determine the surviving path. However, it was

    found that all possible paths converge within a certain distance or depth of traceback.

    This information is useful as it allows the setting of a certain traceback depth beyond

    which it is neither necessary nor advantageous to store path metric and other

    information. This greatly reduces memory storage requirements and hence energy

    consumption of the decoder. Empirical observations showed that a depth of five times

    the constraint length was sufficient to ensure merging of paths. Therefore, local

    winner information is stored for 35 slots (five times seven) in the decoder used for this

    project. Block 4. Data Input Determination Now going forwards through the

    traceback path, the state transitions at successive time intervals are studies and the

    data bit that would have caused this transition is determined. This represents the

    decoded output.

    Determining Successors to a particular State, Each state is represented

    by 6 shift registers (in the case of a K=7 encoder or decoder). The next state can

    therefore be obtained by a right shift of the values of the shift registers. The first shift

    register is given a value of 0. The resulting state represents the next state of the coder

    if the input bit was 0. By adding 32 (1x25) to this value, the next state of the coder if

    the input bit was 1 Determining Predecessors to a particular State In a similar way, the

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 29

    first predecessor can be calculated this time by a left shift of the values of the shift

    registers. By adding one (1x20) to this value, the value of the second predecessor to

    the state is derived.

    4.3.1 State Metric Storage

    The block stores the partial path metric of each state at the current stage.

    4.3.2 Output Generator:

    This block generates the decoded output sequence. In the traceback approach, the

    block incorporates combinational logic, which traces back along the survivor path and

    latches the path (equivalently the decoded output sequence) to a register.

    Figure 4.8 the block diagram of a general Viterbi Decoder

    4.4. Encoding Mechanism

    Data is coded by using a convolutional encoder, as described. It consists of a series of

    shift registers and an associated combinatorial logic. The combinatorial logic is

    usually a series of exclusive-or gates. The conventional encoder K=7, (171,133) is

    used for the purpose of this project. The octal numbers 171 and 133 when represented

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 30

    in binary form correspond to the connection of the shift registers to the upper and

    lower exclusive-or gates respectively. Figure 3.1 represents this convolutional encoder

    that will be used for the project.The encoder consists of series of xor gates for the

    mechanism of encoding.

    Figure 4.9: Rate=1/2 k=7, (171,133) Convolution Encoder

    . 4.5. Decoding Mechanism

    There are two main mechanisms by which Viterbi decoding may be

    carried out namely, the Register Exchange mechanism and the Traceback mechanism.

    Register exchange mechanisms, as explained by Ranpara and Sam Ha store the

    partially decoded output sequence along the path. The advantage of this approach is

    that it eliminates the need for traceback and hence reduces latency. However at each

    stage, the contents of each register needs to be copied to the next stage. This makes

    the hardware complex and more energy consuming than the traceback mechanism.

    Traceback mechanisms use a single bit to indicate whether the survivor branch came

    from the upper or lower path. This information is used to traceback the surviving path

    from the final state to the initial state. This path can then be used to obtain the

    decoded sequence. Traceback mechanisms prove to be less energy consuming and

    will hence be the approach followed in this project.

    Decoding may be done using either hard decision inputs or soft

    decision inputs. Inputs that arrive at the receiver may not be exactly zero or one.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 31

    Having been affected by noise, they will have values in between and even higher or

    lower than zero and one. The values may also be complex in nature.

    In the hard decision Viterbi decoder, each input that arrives at the

    receiver is converted into a binary value (either 0 or 1). In the soft decision Viterbi

    decoder, several levels are created and the arriving input is categorized into a level

    that is closest to its value. If the possible values are split into 8 decision levels, these

    levels may be represented by 3 bits and this is known as a 3 bit Soft decision.

    This project uses a hard decision Viterbi decoder for the purpose of

    developing and verifying the new energy saving algorithm. Once the algorithm is

    verified, a soft decision Viterbi decoder may be used in place of the hard decision

    decoder. Figure 3.2 shows the various stages required to decode data using the Viterbi

    Algorithm. The decoding mechanism comprises of three major stages namely the

    Branch Metric Computation Unit, the Path Metric Computation and Add-Compare-

    Select (ACS) Unit and the Traceback Unit. A schematic representation of the decoder

    is described below

    Figure 4.10: Schematic representation of the Viterbi decoding block

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 32

    CHAPTER-5

    METHODS AND TYPES OF VITERBI DECODER

    5.1 REGISTER EXCHANGE METHOD

    The register exchange (RE) method is the simplest conceptually and a

    commonly used technique. Because of the large power consumption and large area

    required in VLSI implementations of the RE method, the trace back method (TB)

    method is the preferred method in the design of large constraint length, high

    performance Viterbi decoders. In the register exchange, a register assigned to each

    state contains information bits for the survivor path from the initial state to the current

    state. In fact, the register keeps the partially decoded output sequence along the path,

    as illustrated in Figure 3.3. The register of state S1 at t=3 contains '101'. This is the

    decoded output sequence along the hold path from the initial state.

    Figure 5.1 Register Exchange Method

    The register-exchange method eliminates the need to trace back

    since the register of the final state contains the decoded output sequence. However,

    this method results in complex hardware due to the need to copy the contents of all the

    registers in a stage to the next stage. The survivor path information is applied to the

    least significant bit of each register, and all the registers perform a shift left operation

    at each stage to make room for the next bits. Hence, each register fills in the survivor

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 33

    path information from the least significant bit toward the most significant bit. The

    scheme is called shift update.

    The shift update method is simple in implementation but causes high

    switching activity due to the shift operation and, hence, results in high power

    dissipation.

    5.2 Trace back mechanism

    Register exchange mechanisms, as explained by Ranpara and Sam Ha store the

    partially decoded output sequence along the path. The advantage of this approach is

    that it eliminates the need for traceback and hence reduces latency. However at each

    stage, the contents of each register needs to be copied to the next stage. This makes

    the hardware complex and more energy consuming than the traceback mechanism.

    Traceback mechanisms use a single bit to indicate whether the survivor branch came

    from the upper or lower path. This information is used to traceback the surviving path

    from the final state to the initial state. This path can then be used to obtain the

    decoded sequence. Traceback mechanisms prove to be less energy consuming and

    will hence be the approach followed in this project.

    5.3 TYPES OF VITERBI DECODING

    In order to realize a certain coding scheme a suitable measure of similarity or

    distance metric between two code words is vital. The two important metrics used to

    measure the distance between two code words are the Hamming distance and

    Euclidian distance adopted by the decoder depending on the code scheme, required

    accuracy, channel characteristics and demodulator type.

    5.3.1 HARD DECISION VITERBI DECODING

    In the hard-decision decoding, the path through the trellis is determined using

    the Hamming distance measure. Thus, the most optimal path through the trellis is the

    path with the minimum Hamming distance. The Hamming distance can be defined as

    a number of bits that are different between the observed symbol at the decoder and the

    sent symbol from the encoder. Furthermore, the hard decision decoding applies one

    bit quantization on the received bits. Hard decision decoding takes a stream of bits say

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 34

    from the 'threshold detector' stage of a receiver, where each bit is considered

    definitely one or zero. E.g. For binary signalling, received pulses are sampled and the

    resulting voltages are compared with a single threshold. If a voltage is greater than

    the threshold it is considered to be definitely a 'one' say regardless of how close it is to

    the threshold. If it is less, it is definitely zero.

    5.3.2 SOFT DECISION VITERBI DECODING

    Soft-decision decoding is applied for the maximum likelihood

    decoding, when the data is transmitted over the Gaussian channel. On the contrary to

    the hard decision decoding, the soft-decision decoding uses multi-bit quantization for

    the received bits, and Euclidean distance as a distance measure instead of the

    hamming distance. The demodulator input is now an analog waveform and is usually

    quantized into different levels in order to help the decoder decide more easily.

    A 3-bit quantization results in an 8-array output. Soft decision decoding requires a

    stream of 'soft bits' where we get not only the 1 or 0 decision but also an indication of

    how certain we are that the decision is correct. One way of implementing this would

    be to make the threshold detector generate instead of 0 or 1, say:

    000 (definitely 0), 001 (probably 0), 010 (maybe 0), 011 (guess 0),

    100 (guess 1), 101 (maybe 1), 110 (probably 1), 111(definitely 1).

    We may call the last two bits 'confidence' bits. This is easy to do with eight voltage

    thresholds rather than one. This helps when we anticipate errors and have some

    'forward error correction' coding built into the transmission.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 35

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 36

    CHAPTER-6

    Applications

    The Viterbi algorithm has a wide range of applications ranging from

    satellite and space communications, DNA sequence analysis and Optical Character

    Recognition.

    An attempt to perform optical character recognition of text was investigated

    by Neuhoff. The initial approach considered was to create a dictionary which

    simulated vocabularies. Each time a character was read by the optical reader, it would

    search the dictionary for the most likely estimate. The huge amount of computational

    and storage requirements required under this approach made it impractical. However,

    another approach makes use of statistical information about the language such as

    relative frequency of letter pairs. A maximum a priori probability (MAP) of a word is

    determined based on its probability as the output of the source model. The Viterbi

    algorithm may then be used to perform this MAP sequence estimation.

    An interesting application discussed by Metzner investigated among others,

    the use of Viterbi decoding with soft decision to increase the probability of

    successfully transmitting a data packet during a meteor burst. Since meteor trails are

    made up of ionized material, these can be used for reliable communications. Some

    characteristics of such meteor burst communication and descriptions of its practical

    applications are detailed in. Metzner showed that convolutional codes with soft

    decision were considerably better for meteor burst applications as compared to Reed-

    Solomon codes.

    Low power applications of the Viterbi decoder are particularly relevant to

    many digital communication and recording systems today. As described by Kawokgy

    and Salama systems like these are increasingly being used in wireless applications

    which being battery operated, require low power consumption. In addition, these

    systems also require processing speeds of over 100Mbps to allow multimedia

    transmission. Following this trend, many papers have been written on designing low

    power Viterbi decoding algorithms targeted for next generation wireless applications,

    particularly CDMA systems. Some of these energy saving ideas that have been

    investigated are described in the next section.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 37

    6.1 Research Work

    In mobile networks, decoding capabilities are limited by the receiver which is

    a mobile handset. As such, it has limited resources of energy and computation power.

    Another factor that affects wireless communication is that bandwidth is expensive.

    Therefore, there is a high demand for codes that can correct errors very efficiently

    while at the same time utilizing minimum energy. Hence, a lot of the past research has

    been focused on how this may be achieved.

    The fixed T-algorithm algorithm is an optimization of the Viterbi algorithm

    which applies a pruning threshold to the accumulated path metrics of the Viterbi

    decoder. Instead of storing all the survivor paths for all 2K-1 states, only some of the

    most-likely paths are kept at every trellis stage. This results in fewer paths being

    found and stored. The following Figure 3.4 demonstrates the result of an experiment

    conducted by Henning and Chakrabarti [34] which compares normalized energy

    estimates for the Viterbi and the fixed T-algorithm decoders as it varies with signal to

    noise ratio (Eb/No) and code rate.

    Figure 6.1: Normalised energy edtimated for the Viterbi and fixed T-algorithm

    (Tf) decoders as code rate and signal to noise ratio (Eb/No) vary.

    From the graph, it is estimated that a 33% to 83 % reduction in energy

    consumption can be achieved when the signal to noise ratio is between 2.1 and 4 dB.

    One of the other approaches taken has been to develop an adaptive T-algorithm which

    adjusts parameters of the decoder based on real-time variations in signal to noise ratio

    (SNR), code rate and maximum acceptable bit-error rate. The parameters adjusted are

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 38

    truncation length and pruning threshold of the T-algorithm along with trace-back

    memory management. Henning and Chakrabarti demonstrate in their paper how this

    can achieve a potential energy reduction of 70% to 97.5% as compared to Viterbi

    decoding. Truncation length refers to the number of bits a path is followed back

    before a decision is made on the bit that was encoded. By reducing the truncation

    length more bits can be decoded per traceback. Similarly, lowering the pruning

    threshold means fewer paths need to be found and stored. Both of these measures can

    reduce the number of memory accesses required by the decoder and hence reduce

    energy consumption.

    However, these measures may cause significant reduction in the error

    correcting capability of the decoder.

    Nevertheless, adjusting these parameters based on real-time changes in

    the channel can optimize energy consumption. The following figure, Figure 3.5

    demonstrates the results of an experiment conducted by Henning and Chakrabarti [34]

    in which pruning threshold and truncation length are adapted to maintain bit-error rate

    below 0.0037. From the graph, it is estimated that an energy consumption reduction of

    70 to 97.5 % compared to the Viterbi decoder can be achieved when the signal to

    noise ratio is between 2.1 and 4 dB.

    However, the adaptive T-algorithm does require an additional overhead in

    terms of monitoring the real-time variations and choosing the appropriate truncation

    and threshold parameters from a lookup table. Since these operations are not complex

    it is assumed that their energy consumption is negligible.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 39

    Figure 6.2: Normalised energy estimates for the Viterbi and adaptive T-

    algorithm (Ta) decoders as code rate and signal to noise ratio (Eb/No) vary while

    maintaining bit-error rate below 0.0037

    Yet another approach that was put forward by Jie Jin and Chi-Ying Tsui in the

    2006 International Symposium on Low Power Electronics and Design, was to

    integrate the T-algorithm with a Scarce-StateTransition (SST) decoder structure. The

    SST structure first pre-decodes the received data (Rx) by performing an inverse

    operation of the encoder. The pre-decoded signal will contain the original message

    along with bit errors (Pre-Dec). This message Pre-Dec is re-encoded and XORed

    with Rx, the original received data. The operation results in an output which consists

    of mainly 0s and the errors in the message. This output is then fed to the Viterbi

    decoder and the errors are corrected. In the end, the pre-decoded data (Pre-Dec) is

    added to the decoded output of the Viterbi decoder using modulo-2 addition. When

    channel bit-errors are low, most of the Viterbi decoder output bits are zero and thus

    reduces switching activity.

    The SST structure was used to reduce the switching activities of the

    decoder and combined with the T-algorithm to reduce the average number of Add-

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 40

    Compare Select calculations. In their experiments, Jie Jin and Chi-Ying Tsui achieved

    a 30%-76% reduction in power consumption over the traditional Viterbi design for a

    range of SNR values varying from 4 dB to 12 dB.

    A different approach investigated by Sherif Welsen Shaker, Salwa Hussein Elramly

    and Khaled Ali Shehata at a Telecommunications forum held in Belgrade last year

    (2009) was to use the traceback approach with clock gating. In clock gating, the clock

    of each register is enabled only when the register updates it survivor path information.

    This reduces power dissipation. Their simulations showed a 30% reduction in

    dynamic power dissipation which gives a good indication of power reduction on

    implementation.

    A similar approach investigated by Ranpara and Sam Ha and presented

    in the International ASIC conference at Washington in 1999 was the use of clock

    gating in combination with a concept known as toggle filtering. Signals may arrive at

    the inputs of a combinational block at different times and this causes the block to go

    through several intermediate transitions before it stabilizes. By blocking early signals,

    the number of intermediate transitions can be reduced and hence power disspation can

    be minimized. This mechanism of blocking early signals until all input signals arrive,

    called toggle filtering, was used by Ranpara, et al, to reduce energy consumption of

    the Viterbi decoder. Recently a new approach, targeted towards wireless applications

    has been introduced [38] and involves a pre-traceback architecture for the survivor

    path memory unit.

    The start state of decoding is obtained directly through a pointer register

    pointing to the target traceback state instead of estimating the start state through a

    recursive traceback operation. This approach makes use of the similarity between bit

    write and decode traceback operation to introduce the pre-traceback operation.

    Effectively resulting in a trace forward type of operation, it results in a 50% reduction

    in survivor memory read operations. Apart from improving latency by 25%,

    implementation results predict up to 11.9% better energy efficiency when compared to

    conventional traceback architecture for typical wireless applications.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 41

    6.3 Low power consumption

    For the branch metric of the Viterbi decoder, our design employs a soft-decision

    method to improve its correction capability. In order to find the survivor path

    efficiently, we modify the classical Viterbi decoding algorithm into a new one. This

    new algorithm is similar to the register-exchange method with lower latency, but

    using RAM instead of register banks for recording the output bit-stream of the

    survivor path. Hence, our design can provide a low-power design. Finally, the chip of

    this design consumes about 28.6 K gates using TSMC 0.18 m CMOS technology.

    The power consumption of our chip is about 19.5 mW at 100 MHz. The power usage

    in the implementation is around 367 mw.

    Figure 6.3: Demonstration Of Power Consumption

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 42

    6.4 Summary

    This chapter has explained the decoding mechanism of the Viterbi decoder in detail

    and described a few of its applications. A number of energy saving techniques that

    have been investigated in the past has been discussed. The next chapter gives a

    detailed description of the proposed energy saving algorithm that will be used in this

    project.

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 43

    CHAPTER-7

    SYNTHESIS AND SIMULATION RESULTS

    7.1 Sample code

    /******************************************************/

    module pDFF(DATA,QOUT,CLOCK,RESET);

    /****************************************************** /

    Code for d flip flop

    parameter WIDTH = 1;

    input [WIDTH-1:0] DATA;

    input CLOCK, RESET;

    output [WIDTH-1:0] QOUT;

    reg [WIDTH-1:0] QOUT;

    always @( posedge CLOCK or negedge RESET)

    if (~RESET) QOUT

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 44

    wire [`WD_CODE-1:0]

    wire [`WD_DIST-1:0] D0,D1,D2,D3,D4,D5,D6,D7;// output distances

    reg [`WD_CODE-1:0] CodeRegister ;

    always @( posedge Clock2 or negedge Reset)

    begin

    if (~Reset) CodeRegister

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 45

    Code for hamming distance calculation

    module HARD_DIST_CALC (InputSymbol , BranchOutput ,

    OutputDistance) ;

    / /desc. : performs 2 bits hamming DISTance calculation

    /*-----------------------------------*/

    input [`WD_CODE-1:0] InputSymbol , BranchOutput ;

    output [`WD_DIST-1:0] OutputDistance;

    reg [`WD_DIST-1:0] OutputDistance; 77

    wireMS,LS; 79

    assign MS = (InputSymbol[1] ^ BranchOutput[1]) ;

    assign LS = (InputSymbol[0] ^ BranchOutput[0]) ; 82

    always @(MS or LS)

    begin

    OutputDistance[1]

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 46

    assign wA = PolyA & BranchID;

    assign wB = PolyB&BranchID;

    always @(wA or wB)

    begin

    EncOut[1] = (((wA[0]^wA[1]) ^ (wA[2]^wA[3]))^((wA[4]^wA[5] ) ^

    (wA[6]^wA[7]))^wA[8]) ;

    EncOut[0] = (((wB[0]^wB[1]) ^ (wB[2]^wB[3]))^((wB[4]^wB[5] ) ^

    (wB[6]^wB[7]))^wB[8]) ;

    end

    code for viterbi encoder

    /***************************************** *************/

    module viterbi_encode9(X,Y,Clock,Reset) ;

    /****************************************************** /

    Input X,Clock,Reset;

    output [1:0]

    wire [1:0] Y;

    wire X, Clock,Reset;

    wire [8:0] PolyA, PolyB;

    wire [8:0] wA, wB, ShReg;

    assign PolyA =9'b111_101_011; assign PolyB =

    9'b101_110_001; assign PolyA = 9'b110_101_111;

    assign PolyB = 9'b100_011_101;

    assign wA = PolyA &

    ShReg;

    assign wB = PolyB & ShReg;

    assign ShReg[8] = X;

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 47

    pDFF dff7(ShReg[8] , ShReg[7], Clock, Reset) ;

    pDFF dff6(ShReg[7] , ShReg[6], Clock, Reset) ;

    pDFF dff5(ShReg[6] , ShReg[5], Clock, Reset) ;

    pDF dff4(ShReg[5], ShReg[4],Clock,Reset) ;

    pDFF dff3(ShReg[4] , ShReg[3], Clock, Reset) ;

    pDFF dff2(ShReg[3] , ShReg[2], Clock.Reset) ;

    pDFF dff1(ShReg[2] , ShReg[1], Clock, Reset) ;

    pDFF dff0(ShReg[1] , ShReg[0], Clock,Reset) ;

    assign Yt[1] = wA[0] ^ wA[1] ^ wA[2] ^ wA[3] ^ wA[4] ^ wA[5] ^

    wA[6] ^ wA[7] ^ wA[8];

    assign Yt[0] = wB[0] ^ wB[1] ^ wB[2] ^ wB[3] ^ wB[4] ^ wB[5] ^

    wB[6] ^ wB[7] ^ wB[8];

    pDFF dffy1(Yt[1] , Y[1], Clock, Reset);

    pDFF dffy0(Yt[0] , Y[0], Clock, Reset);

    endmodule

    code for viterbi decoder

    / / Module : VITERBIDECODER

    / / File : decoder.v

    / / Description : Top Level Module of Viterbi Decoder

    / /module VITERBIDECODER (Reset , CLOCK, Active, Code,

    DecodeOut) ;

    module VITERBIDECODER (Reset , CLOCK, Active, Code,

    DecodeOut) ;

    input Reset , CLOCK, Active;

    input [`WD_CODE-1:0] Code;

    output DecodeOut;

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 48

    wire [`WD_DIST*2*`N_ACS-1:0] Distance; / / BMG

    Output

    wire [`WD_FSM-1:0] ACSSegment; / /

    wire [`WD_DEPTH-1:0] ACSPage; / / Control Output

    wire CompareStart , Hold, Init ; / /

    wire [`N_ACS-1:0] Survivors; / / ACS Output

    wire [`WD_STATE-1:0] LowestState ;

    wire TB_EN;

    wire RAMEnable;

    wire ReadClock, WriteClock, RWSelect; 31

    wire [`WD_RAM_ADDRESS-1:0] AddressRAM; / / RAM

    AddressBus,

    / / generated by TBU and ACSU

    wire [`WD_RAM_DATA-1:0] DataRAM; / / RAM Databus

    wire [`WD_RAM_DATA-1:0] DataTB;

    wire [`WD_RAM_ADDRESS-`WD_FSM-1:0] AddressTB;

    wire Clock1, Clock2;

    / / for metric memory connection

    wire [`WD_METR*2*`N_ACS-1:0] MMPathMetric ;

    wire [`WD_METR*`N_ACS-1:0] MMMetric;

    wire [`WD_FSM-2:0] MMReadAddress;

    wire [`WD_FSM-1:0] MMWriteAddress ;

    wire MMBlockSelect ;

    / / instantiation of Viterbi Decoder Modules

    CONTROL ctl (Reset , CLOCK, Clock1, Clock2, ACSPage,

    ACSSegment ,

    Active, CompareStart , Hold, Init , TB_EN);

    BMU bmu (Reset , Clock2, ACSSegment , Code, Distance);

    ACSUNIT acs (Reset , Clock1, Clock2, Active, Init , Hold,

    CompareStart ,

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 49

    ACSSegment , Distance, Survivors , LowestState ,

    MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric ,

    MMPathMetric) ;

    MMU mmu (CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ,

    ACSPage,

    ACSSegment [`WD_FSM-1:1], Survivors ,

    DataTB, AddressTB,

    RWSelect , ReadClock, WriteClock, RAMEnable , AddressRAM,

    DataRAM);

    TBU tbu (Reset , Clock1, Clock2, TB_EN, Init , Hold, LowestState

    Code for main memory unit

    / / Module : MMU

    / / Description : Description of MMU Unit in Viterbi Decoder

    module MMU (CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ,

    ACSPage,

    ACSSegment_minusLSB, Survivors ,

    DataTB, AddressTB,

    RWSelect , ReadClock, WriteClock,

    RAMEnable , AddressRAM, DataRAM);

    / / connection from Control

    input CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ;

    input [`WD_DEPTH-1:0] ACSPage;

    input [`WD_FSM-2:0] ACSSegment_minusLSB; 21

    / / connection from ACS Unit

    input [`N_ACS-1:0] Survivors;

    / / connection from/to TB Unit

    output [`WD_RAM_DATA-1:0] DataTB;

    input [`WD_RAM_ADDRESS-`WD_FSM-1:0] AddressTB;

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    Department of ECE, GITAM UNIVERSITY 50

    // connection from/to RAM

    output RWSelect , ReadClock, WriteClock, RAMEnable;

    output [`WD_RAM_ADDRESS-1:0] AddressRAM;

    inout [`WD_RAM_DATA-1:0] DataRAM;

    wire [`WD_RAM_DATA-1:0] WrittenSurvivors ; 35 reg dummy,

    SurvRDY;

    reg [`WD_RAM_ADDRESS-1:0] AddressRAM;

    reg [`WD_DEPTH-1:0] TBPage;

    wire [`WD_DEPTH-1:0] TBPage_;

    wire [`WD_DEPTH-1:0] ACSPage;

    wire [`WD_TB_ADDRESS-1:0] AddressTB;

    / / Read and Write clock

    / / Dummy variable used because Write Clock only occur every 2

    Clocks.

    always @( posedge Clock2 or negedge Reset)

    i f (~Reset) dummy

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    51

    Department Of ECE, GITAM UNIVERSITY

    / / every negedge Clock2 : - TBPage is decreased by 1, OR

    / / - When Init is Active, TBPage equal ACSPage - 1

    always @( negedge Clock2 or negedge Reset)

    begin

    if (~Reset) begin

    TBPage

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    52

    Department Of ECE, GITAM UNIVERSITY

    end

    end

    endmodule

    module ACSSURVIVORBUFFER (Reset , Clock1, Active, SurvRDY,

    Survivors ,

    Writ tenSurvivors) ;

    / /

    / / To accomodate the use of 8 bit wide RAM DATA BUS, the

    Survivor

    / / (which is only 4 on every clock) must be buffered first .

    /*-----------------------------------*/

    input Reset , Clock1, Active, SurvRDY;

    input [`N_ACS-1:0] Survivors;

    output [`WD_RAM_DATA-1:0] WrittenSurvivors ;

    wire[`WD_RAM_DATA-1:0] WrittenSurvivors ;

    reg [`N_ACS-1:0] WrittenSurvivors_; 123

    always @( posedge Clock1 or negedge Reset)

    begin

    if (~Reset) WrittenSurvivors_ = 0;

    else if (Active)

    Writ tenSurvivors_ = Survivors;

    end

    code for ACS unit( add compare and select)

    / / Module : ACSUNIT

    / / File : acs.v

    / / Description : Description of ACS Unit in Viterbi Decoder

    module ACSUNIT (Reset , Clock1, Clock2, Active, Init , Hold,

    CompareStart ,

    ACSSegment , Distance, Survivors , LowestState ,

  • Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

    Verilog HDL

    53

    Department Of ECE, GITAM UNIVERSITY

    MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric ,

    MMPathMetric) ;

    /*-----------------------------------*/

    / / ACS UNIT consists of :

    / / - 4 ACS modules (ACS)

    / / - RAM Interface

    / / - State with smallest metric finder (LOWESTPICK)

    /*-----------------------------------*/

    input Reset , Clock1, Cloc