Upload
vishnupriya-akinapelli
View
110
Download
1
Tags:
Embed Size (px)
Citation preview
1. INTRODUCTION
1.1. INTRODUCTION:
Filter is the component which passes certain band of frequencies and opposes other
frequency components. Filter is the basic component in any Digital Signal Processor (DSP)
applications. For this we have two filters they are Finite Impulse Response (FIR) filter and
Infinite Impulse Response (IIR) filter.
FIR filter is digital type of filter where we consider finite number of samples. In FIR
filter the impulse response settle down to zero after final sample of interval, where as in IIR filter
we consider infinite number of samples for analysis.
Here in our project we designed FIR filter with less resources and less delay using
Distributed Arithmetic (DA) algorithm. If we use direct method i.e. Multiplication and
Accumulate (MAC) for implementing FIR filter it consumes much area (resource) and is
expensive to implement on FPGA. To overcome this drawback DA came into existence, which is
a multiplier-less architecture. As DA is a very efficient solution especially suited for LUT-based
FPGA architectures.
The main problem of DA is that the LUT size will increase exponentially with the order
of the filter. To overcome this problem a hardware-efficient DA architecture is used which
reduces the LUT size by modifying the architecture of the filter to achieve high performance.
CVR College of Engineering (VLSI) Page 1
1.2. FIR Filter:
FIR filter is a one polynomial coefficient. FIR filter needs much high order polynomial to
get an equivalent filter as IIR filter, which results in longer delay.
H (Z) =B (Z)/ZN
Y[n] = b0x[n] +b1x [n-1] +b2x [n-2]………………..+ bn x[n-N]
N is the filter order an Nth-order filter has (N + 1) terms on the right-hand side; these are
commonly referred to as taps.
This equation can also be expressed as a convolution of the coefficient sequence bi with the
input signal
That is, the filter output is a weighted sum of the current and a finite number of previous
values of the input.
CVR College of Engineering (VLSI) Page 2
1.3. Block Diagram of FIR filter:
Fig1.1:Block diagram of FIR filter
1.4. Spartan III features:
The Spartan -3E family reduces system cost to by offering the lowest cost-per-logic of any FPGA family, supporting the lowest-cost configuration solutions including commodity serial (SPI) and parallel flash memories, and efficiently integrating the functions of many chips into a single FPGA.
Advanced, Low-Cost Features
Five devices with 100K to 1.6M system gates
From 66 to 376 I/Os with package and destiny migration
Up to 648K bits of block RAM and up to 231K bits of distributed RAM
Up to 36 embedded 18x18 multipliers for high-performance DSP applications
Up to eight Digital Clock Mangers
CVR College of Engineering (VLSI) Page 3
Cost-Saving System Interfaces and Solutions
Support for Xilinx Platform Flash as well as commodity serial(SPI) and byte-wide flash memory for configuration
Easy-to-implement interfaces to DDR memory
Support for 18 common I/O standards, including PCI-X, mini-LVDs, and RSDS
Industry-Leading Design Tools and IP
ISE design tools to shorten design and verification time
Hundreds of pre-verified, pre-optimized Intellectual Property(IP) cores and reference
designs
Chip Scope ProTM system-debugging environment
Easy-to-Use, Low-Cost FPGA Development Systems
Complete Spartan-3E Standard Kit available for only $149 USD
Includes XC3S500E FPGA, SPI Flash, 32Mb DDR memory support for USB2.0
1.4. CONCLUSION:
In this chapter we discussed about FIR filter and its block diagram. The Spartan-3 pro-
FPGA features are described.
CVR College of Engineering (VLSI) Page 4
2. LITERATURE SURVEY
2.1 INTRODUCTION:
The signal is the one which carries information from one source to the destination. There
are different types of signals. Filter plays essential role in Digital Signal Processing (DSP). Filter
is a system that passes certain frequency components and rejects other frequency components.
Filters are designed for the specifications of the desired properties of the system. FPGA is a
prototype device which is used to implement simpler algorithms.
2.2. Signal:
In the field of communications, signal processing and in electrical engineering more
generally, a signal is any time-varying or spatial-varying quantity.
In the physical world, any quantity measurable through time or over space can be taken as a
signal. Within a complex society, any set of human information or machine data also be taken as
a signal. Such information or machine data must all be part systems existing in the physical
world- either living or non-living.
Despite the complexity of such systems, their outputs and inputs can often be represented as
simple quantities measurable through time or across space. In the latter half of the 20th century
Electrical engineering itself separated into several disciplines, specializing in the design and
analysis of physical signals and systems, on one hand and in the functional behavior and
conceptual structure of the complex human and machine systems, on the other. These
engineering disciplines have led the way in the design, study, and implementation of systems that
CVR College of Engineering (VLSI) Page 5
take advantage of signals as simple measurable quantities in order to facilitate
the transmission, storage and manipulation of information.
2.2.1. Definition of the signal:
In information theory, a signal is a codified message, that is, the sequence of states in a
communication channel that encodes a message.
In the context of signal processing, arbitrary binary data streams are not considered as signals,
but only analog and digital signals that are representations of analog physical quantities.
In a communication system, a transmitter encodes a message into a signal, which is
carried to a receiver by the communications channel. For example, the words "Mary had a little
lamb" might be the message spoken into a telephone. The telephone transmitter converts the
sounds into an electrical voltage signal. The signal is transmitted to the receiving telephone by
wires; and at the receiver it is reconverted into sounds.
In telephone networks, signaling, for example common channel signaling, refers to phone
number and other digital control information rather than the actual voice signal.
Signals can be categorized in various ways. The most common distinction is between
discrete and continuous spaces that the functions are defined over, for example discrete and
continuous time domains. Discrete-time signals are often referred to as time series in other
fields. Continuous-time signals are often referred to as continuous signals even when the signal
functions are not continuous; an example is a square-wave signal.
A second important distinction is between discrete-valued and continuous-valued. Digital
signals are sometimes defined as discrete-valued sequences of quantified values that may or may
not be derived from an underlying continuous-valued physical process. In other contexts, digital
signals are defined as the continuous-time waveform signals in a digital system, representing a
bit-stream. In the first case, a signal that is generated by means of a digital modulation method is
considered as converted to an analog signal, while it is considered as a digital signal in the
second case.
CVR College of Engineering (VLSI) Page 6
2.2.2. Types of Signals:
2.2.2.1. Discrete-time and continuous time signal:
If for a signal, the quantities are defined only on a discrete set of times, we call it a
discrete-time signal. In other words, a discrete-time real (or complex) signal can be seen as a
function from the set of integers to the set of real (or complex) numbers. Discrete signals have
frequency domain analysis. A discrete signal usually uses Z- Transform to analyze its frequency
response, where discrete signals are denoted by u (k) and k= -1, 0, 1, 2, 3…..
A continuous-time real (or complex) signal is any real-valued (or complex-
valued) function which is defined for all time t in an interval, most commonly an infinite
interval. Continuous signals have continuous frequency spectrum. It uses Fourier Transform (FT)
to obtain its frequency response, where continuous signals are denoted by u (t), t is continuous.
2.2.2.2. Analog and Digital signal:
There are mainly two types of signals encountered in practice, analog and digital. In
short, the difference between them is that digital signals are discrete and quantized, as defined
below, while analog signals possess neither property.
DISCRETIZATION:
One of the fundamental distinctions between different types of signals is
between continuous and discrete time. In the mathematical abstraction, the domain of a
continuous-time (CT) signal is the set of real numbers (or some interval thereof), whereas the
CVR College of Engineering (VLSI) Page 7
domain of a discrete-time (DT) signal is the set of integers (or some interval). What these
integers represent depends on the nature of the signal.
DT signals often arise via sampling of CT signals. An audio signal, for example consists
of a continually fluctuating voltage on a line that can be digitized by an ADC circuit, wherein the
circuit will read the voltage level on the line, say, every 50 µs. The resulting stream of numbers
is stored as digital data on a discrete-time signal. Computers and other digital devices are
restricted to discrete time.
QUANTIZATION:
If a signal is to be represented as a sequence of numbers, it is impossible to maintain
arbitrarily high precision - each number in the sequence must have a finite number of digits. As a
result, the values of such a signal are restricted to belong to a finite set; in other words, it
is quantized.
2.3 Filters in signal processing:
In signal processing, a filter is a device or process that removes from a signal some
unwanted or component or feature. In general, it takes an input that is a function of time and
produces an output that is a function of time (usually delayed from the input).
Filtering is a class of signal processing, the defining feature of filters being the complete
or partial suppression of some aspect of the signal. Most often, this means removing
some frequencies and not others in order to suppress interfering signals and reduce
background noise. However, filters do not exclusively act in the frequency domain; especially in
the field of image processing many other targets for filtering exist.
There are many different bases of classifying filters and these overlap in many different
ways, there is no simple hierarchical classification. Filters may be:
analog or digital
CVR College of Engineering (VLSI) Page 8
discrete-time (sampled) or continuous-time
linear or non-linear
passive or active type of continuous-time filter
Infinite impulse response (IIR) or finite impulse response (FIR)
type of discrete-time or digital filter.
2.3.1. Analog Filter:
Analog filters are a basic building block of signal processing much used in electronics.
Amongst their many applications are the separation of an audio signal before application
to bass, mid-range and tweeter loudspeakers; the combining and later separation of multiple
telephone conversations onto a single channel; the selection of a chosen radio station in a radio
receiver and rejection of others.
Passive linear electronic analogue filters are those filters which can be described
with linear differential equations (linear); they are composed of capacitors, inductors and,
sometimes, resistors (passive) and are designed to operate on continuously varying (analogue)
signals. There are many linear filters which are not analogue in implementation (digital filter),
and there are many electronic filters which may not have a passive topology – both of which may
have the same transfer function of the filters described in this article. Analogue filters are most
often used in wave filtering applications, that is, where it is required to pass particular frequency
components and to reject others from analog (continuous-time) signals.
2.3.2. Digital Filters:
In electronics, computer science and mathematics, a digital filter is a system that
performs mathematical operations on a sampled, discrete-time signal to reduce or enhance
certain aspects of that signal. This is in contrast to the other major type of electronic filter,
the analog filter, which is an electronic circuit operating on continuous-time analog signals. An
CVR College of Engineering (VLSI) Page 9
analog signal may be processed by a digital filter by first being digitized and represented as a
sequence of numbers, then manipulated mathematically, and then reconstructed as a new analog
signal. In an analog filter, the input signal is "directly" manipulated by the circuit.
A digital filter system usually consists of an analog-to-digital converter (to sample the
input signal), a microprocessor (often a specialized digital signal processor), and a digital-to-
analog converter. Software running on the microprocessor can implement the digital filter by
performing the necessary mathematical operations on the numbers received from the ADC. In
some high performance applications, an FPGA or ASIC is used instead of a general purpose
microprocessor.
Digital filters may be more expensive than an equivalent analog filter due to their
increased complexity, but they make practical many designs that are impractical or impossible as
analog filters. Since digital filters use a sampling process and discrete-time processing, they
experience latency (the difference in time between the input and the response), which is almost
irrelevant in analog filters.
Digital filters are commonplace and an essential element of everyday electronics such
as radios, cell phones, and stereo receivers.
2.3.3. Passive filter:
Passive implementations of linear filters are based on combinations
of resistors (R), inductors (L) and capacitors (C). These types are collectively known as passive
filters, because they do not depend upon an external power supply and/or they do not contain
active components such as transistors.
Inductors block high-frequency signals and conduct low-frequency signals,
while capacitors do the reverse. A filter in which the signal passes through an inductor, or in
which a capacitor provides a path to ground, presents less attenuation to low-frequency signals
than high-frequency signals and is a low-pass filter. If the signal passes through a capacitor, or
has a path to ground through an inductor, then the filter presents less attenuation to high-
frequency signals than low-frequency signals and is a high-pass filter. Resistors on their own
CVR College of Engineering (VLSI) Page 10
have no frequency-selective properties, but are added to inductors and capacitors to determine
the time-constants of the circuit, and therefore the frequencies to which it responds.
The inductors and capacitors are the reactive elements of the filter. The number of
elements determines the order of the filter. In this context, an LC tuned circuit being used in a
band-pass or band-stop filter is considered a single element even though it consists of two
components.
At high frequencies (above about 100 megahertz), sometimes the inductors consist of
single loops or strips of sheet metal, and the capacitors consist of adjacent strips of metal. These
inductive or capacitive pieces of metal are called stubs.
2.3.4. Active Filter:
Active filters are implemented using a combination of passive and active (amplifying)
components, and require an outside power source. Operational amplifiers are frequently used in
active filter designs. These can have high Q, and can achieve resonance without the use of
inductors. However, their upper frequency limit is limited by the bandwidth of the amplifiers
used.
2.3.5. Linear- Continuous time filter:
Linear continuous-time circuit is perhaps the most common meaning for filter in the
signal processing world, and simply "filter" is often taken to be synonymous. These are filters
that are designed to remove certain frequencies and allow others to pass. Such a filter is, of
necessity, a linear filter. Any non-linearity will result in the output signal containing components
of frequency which were not present in the input signal.
The modern design methodology for linear continuous-time filters is called network
synthesis. Some important filter families designed in this way are;
CVR College of Engineering (VLSI) Page 11
Chebyshev filter , has the best approximation to the ideal response of any filter for a
specified order and ripple.
Butterworth filter , has a maximally flat frequency response.
Bessel filter , has a maximally flat phase delay.
Elliptic filter , has the steepest cutoff of any filter for a specified order and ripple.
The difference between these filter families is that they all use a different polynomial
function to approximate to the ideal filter response. This results in each having a
different transfer function.
Another methodology which is dead but can still is seen walking around now and again is
the image parameter method. Filters designed by this methodology are archaically called "wave
filters". Some important filters designed by this method are;
Constant k filter , the original and simplest form of wave filter.
M-derived filter , a modification of the constant k with improved cutoff steepness
and impedance matching.
2.3.6. Terminology to classify linear filter:
Some terms used to describe and classify linear filters:
The frequency response can be classified into a number of different band
forms describing which frequencies the filter passes (the pass band) and which it rejects
(the stop band);
Low-pass filter – low frequencies are passed, high frequencies are attenuated.
High-pass filter – high frequencies are passed, Low frequencies are attenuated.
Band-pass filters – only frequencies in a frequency band are passed.
Band-stop filter or band-reject filters – only frequencies in a frequency band are
attenuated.
CVR College of Engineering (VLSI) Page 12
Notch filter – rejects just one specific frequency - an extreme band-stop filter.
Comb filter – has multiple regularly spaced narrow pass bands giving the band form the
appearance of a comb.
All-pass filter – all frequencies are passed, but the phase of the output is modified.
Cutoff frequency is the frequency beyond which the filter will not pass signals. It is
usually measured at a specific attenuation such as 3dB.
Roll-off is the rate at which attenuation increases beyond the cut-off frequency.
Transition band , the (usually narrow) band of frequencies between a pass band and stop
band.
Ripple is the variation of the filters insertion loss in the pass band.
The order of a filter is the degree of the approximating polynomial and in passive filters
corresponds to the number of elements required to build it. Increasing order increases
roll-off and brings the filter closer to the ideal response.
2.3.7. FIR Filter:
A Finite Impulse Response (FIR) filter is a type of a digital filter. The impulse response,
the filter's response to a Kronecker delta input, is finite because it settles to zero in a finite
number of sample intervals. This is in contrast to Infinite Impulse Response (IIR) filters, which
have internal feedback and may continue to respond indefinitely. The impulse response of an
Nth-order FIR filter lasts for N+ 1 sample, and then dies to zero.
The difference equation that defines the output of an FIR filter in terms of its input is:
Y[n] = b0x[n] +b1x [n-1] +b2x [n-2]………………..+ bn x [n-N]
where:
x[n] is the input signal,
CVR College of Engineering (VLSI) Page 13
y[n] is the output signal,
bi are the filter coefficients, and
N is the filter order – an Nth-order filter has (N + 1) terms on the right-hand side;
these are commonly referred to as taps.
This equation can also be expressed as a convolution of the coefficient sequence bi with
the input signal:
That is, the filter output is a weighted sum of the current and a finite number of previous
values of the input.
2.3.8. IIR filters:
Infinite Impulse Response (IIR) is a property of signal processing systems. Systems with
this property are known as IIR systems or, when dealing with filter systems, as IIR filters. IIR
systems have an impulse response function that is non-zero over an infinite length of time. This
is in contrast to FIR, which have fixed-duration impulse responses. The simplest analog IIR filter
is an RC filter made up of a single resistor (R) feeding into a node shared with a
single capacitor (C). This filter has an exponential impulse response characterized by an RC time
constant.
IIR filters may be implemented as either analog or digital filters. In digital IIR filters, the
output feedback is immediately apparent in the equations defining the output. Note that unlike
with FIR filters, in designing IIR filters it is necessary to carefully consider "time zero" case in
which the outputs of the filter have not yet been clearly defined.
Design of digital IIR filters is heavily dependent on that of their analog counterparts
because there are plenty of resources, works and straightforward design methods concerning
analog feedback filter design while there are hardly any for digital IIR filters. As a result,
CVR College of Engineering (VLSI) Page 14
usually, when a digital IIR filter is going to be implemented, an analog filter (e.g. Chebyshev
filter, Butterworth filter, Elliptic filter) is first designed and then is converted to a digital filter by
applying discretization techniques such as Bilinear transform or Impulse invariance.
Digitals filters are often described and implemented in terms of the difference
equation that defines how the output signal is related to the input signal:
where:
is the feed forward filter order
are the feed forward filter coefficients
is the feedback filter order
are the feedback filter coefficients
is the input signal
Is the output signal.
A more condensed form of the difference equation is:
CVR College of Engineering (VLSI) Page 15
2.4. FPGA:
FPGAs offer an opportunity to accelerate your digital signal processing application up to
1000 times over a traditional DSP microprocessor.
Microprocessors are slow:
Digital signal processing has traditionally been done using enhanced microprocessors. While the
high volume of generic product provides a low cost solution, the performance falls seriously
short for many applications. Until recently, the only alternatives were to develop custom
hardware (typically board level or ASIC designs), buy expensive fixed function processors (e.g.
an FFT chip), or use an array of microprocessors.
FPGAs accelerate DSP:
Recent increases in Field Programmable Gate Array performance and size offer a new
hardware acceleration opportunity. FPGAs are an array of programmable logic cells
interconnected by a matrix of wires and programmable switches.. Each cell performs a simple
logic function defined by a user's program. An FPGA has a large number (64 to over 20,000) of
these cells available to use as building blocks in complex digital circuits. Custom hardware has
never been so easy to develop.
Performance up to 1000x:
The ability to manipulate the logic at the gate level means you can construct a custom
processor to efficiently implement the desired function. By simultaneously performing all of the
algorithm’s sub functions, the FPGA can outperform a DSP by as much as 1000:1.
CVR College of Engineering (VLSI) Page 16
Fig 2.1 comparision of DSP and FPGA.
DSP performance is limited by the serial instruction stream. FPGAs are a better solution
in the region above the curve.
FPGA DSPs are flexible:
Like microprocessors, many FPGAs can be infinitely reprogrammed in-circuit in only a
fraction of a second. Design revisions, even for a fielded product, can be implemented quickly
and painlessly. Hardware can also be reduced by taking advantage of reconfiguration.
Highly integrated:
The programmable logic in an FPGA can absorb much of the interface and ‘glue’ logic
associated with microprocessors. The tighter integration can make a product smaller, lighter,
cheaper and lower power.
CVR College of Engineering (VLSI) Page 17
Competitively priced:
FPGAs are a generic product customized at the point of use. They enjoy the cost
advantages of high production volumes. There is also none of the NRE charges or fabrication
delays associated with ASIC development and get you to market on time.
The FPGA’s flexibility eliminates the long design cycle associated with ASICs. With
FPGAs there are no delays for prototypes or early production volume. Design revisions are
easily implemented, often taking less than a day. The devices are fully tested by the
manufacturer, eliminating production test development.
2.5. CONCLUSION:
In this chapter we discussed about signals, different types of signals, filters, different types of
filters and FPGA in Digital Signal Processing.
CVR College of Engineering (VLSI) Page 18
3. DESIGN METHODOLOGY
3.1. INTRODUCTION:
A Finite Impulse Response (FIR) filter is a type of a digital filter. The direct
implementation of the FIR filter requires more number of resources, to reduce the number of
resources Distributed Arithmetic came into existence which replaces multiplications by additions
and siftings. To reduce ROM size the proposed DA algorithm came into existence which uses
multiplexers. The LUT-less algorithm uses multiplexers to remove the usage of ROM memory.
3.2. DIRECT IMPLEMENTATION OF FIR FILTER:
Generally FIR filter is designed using Multiply and Accumulate (MAC) principle where
the filter coefficients undergo multiplication and additions. The MAC principle is common in
Digital Signal Processing algorithms.
The following expression explains the MAC operation.
Note a few points:
h=[h0,h1, h2,…, hK-1] is a matrix of “constant” values
CVR College of Engineering (VLSI) Page 19
h=[h0,h1, h2,…, hK-1] is a matrix of “constant” values
Each hk is of M-bits
Each hk is of N-bits
y should be able large enough to accommodate the result
A numerical example:
Fig 3.1. Block diagram of 1-tap filter using direct implementation.
CVR College of Engineering (VLSI) Page 20
Fig 3.2. Block diagram of 4-tap FIR filter using direct implementation.
In direct implementation we follow Multiply and Accumulate (MAC) operation. In this
type of operation we directly multiply the coefficient of the filter with the variable and add them
to get final result. If we consider 1-tap filter, filter coefficient h0 is directly multiplied with
variable x0 and result is assigned to the output. In 4-tap filter filter-coefficient are multiplied
with corresponding variables, the result of four multipliers are added and assigned to the result.
If we follow this method we require four multipliers, which require many resources. To reduce
resource utilization and improve speed we follow Distributed Arithmetic (DA) Algorithm, which
is multiplier less architecture.
CVR College of Engineering (VLSI) Page 21
3.3. IMPLEMENTING FIR FILTER USING DISTRIBUTED ARITHMETIC:
Distributed arithmetic is a bit level rearrangement of a multiply accumulate to avoid the
multiplications. It is a powerful technique for reducing the size of a parallel hardware multiply-
accumulate that is well suited to FPGA designs. It can also be extended to other sum functions
such as complex multiples, Fourier transforms and so on.
In most of the multiply accumulate applications in signal processing, one of the
multiplicands for each product is a constant. Usually each multiplication uses a different
constant.
Using our most compact multiplier, the scaling accumulator, we can construct a multiple
product term parallel multiply-accumulate function in a relatively small space if we are willing to
accept a serial input. In this case, we feed four parallel scaling accumulators with unique
serialized data. Each multiplies that data by a possibly unique constant, and the resulting
products are summed in an adder tree as shown below
Fig 3.3. 4-tap FIR filter using DA algorithm.
CVR College of Engineering (VLSI) Page 22
If we stop to consider that the scaling accumulator multiplier is really just a sum of
vectors, then it becomes obvious that we can rearrange the circuit.
Here, the adder tree combines the 1 bit partial products before they are accumulated by
the scaling accumulator. All we have done is rearranged the order in which the 1xN partial
products are summed. Now instead of individually accumulating each partial product and then
summing the results, we postpone the accumulate function until after we’ve summed all the 1xN
partials at a particular bit time. This simple rearrangement of the order of the adds has effectively
replaced N multiplies followed by an N input add with a series of N input adds followed by a
multiply. This arithmetic manipulation directly eliminates N-1 Adders in an N product term
multiply-accumulate function. For larger numbers of product terms, the savings becomes
significant.
Fig 3.4. block diagram of 4- tap filter using LUT less algorithm.
CVR College of Engineering (VLSI) Page 23
Further hardware savings are available when the coefficients Cn are constants. If that is
true, then the adder tree shown above becomes a Boolean logic function of the 4 serial inputs.
The combined 1xN products and adder tree is reduced to a four input look up table. The sixteen
entries in the table are sums of the constant coefficients for all the possible serial input
combinations. The table is made wide enough to accommodate the largest sum without overflow.
Negative table values are sign extended to the width of the table, and the input to the scaling
accumulator should be sign extended to maintain negative sums.
Fig 3.5. block diagram which explains MUX operations.
Obviously the serial inputs limit the performance of such a circuit. As with most hardware
applications, we can obtain more performance by using more hardware. In this case, more than
one bit sum can be computed at a time by duplicating the LUT and adder tree as shown here. The
second bit computed will have a different weight than the first, so some shifting is required
before the bit sums are combined. In this 2 bit at a time implementation, the odd bits are fed to
one LUT and adder tree, while the even bits are simultaneously fed to an identical tree. The odd
bit partials are left shifted to properly weight the result and added to the even partials before
accumulating the aggregate. Since two bits are taken at a time, the scaling accumulator has to
shift the feedback by 2 places.
CVR College of Engineering (VLSI) Page 24
Fig 3.6. block diagram which explains MUX operations for more number of inputs
This paralleling scheme can be extended to compute more than two bits at a time. In the
extreme case, all input bits can be computed in parallel and then combined in a shifting adder
tree. No scaling accumulator is needed in this case, since the output from the adder tree is the
entire sum of products. This fully parallel implementation has a data rate that matches the serial
clock, which can be greater than 100 MS/S in today's FPGAs.
CVR College of Engineering (VLSI) Page 25
Fig 3.7. Block digram which explains shifting and addition operations.
Most often, we have more than 4 product terms to accumulate. Increasing the size of the
LUT might look attractive until you consider that the LUT size grows exponentially. Considering
the construction of the logic we stuffed into the LUT, it becomes obvious that we can combine
the results from the LUTs in an adder tree. The area of the circuit grows by roughly 2n-1 using
adder trees to expand it rather than the 2n growth experienced by increasing LUT size. For
FPGAs, the most efficient use of the logic occurs when we use the natural LUT size (usually a 4-
LUT, although and 8-LUT would make sense if we were using an 8 input block RAM) for the
LUTs and then add the outputs of the LUTs together in an adder tree, as shown below:
Fig 3.8. Block diagram of 8-tap FIR filter.
CVR College of Engineering (VLSI) Page 26
.
3.4. MATHEMATICAL ANALYSIS OF DISTRIBUTED ARITHMETIC:
General equation of FIR filter is
-----1
Let xk be a N-bits scaled two’s complement number i.e.
| xk | < 1
xk: {bk0, bk1, bk2……, bk(N-1) }
where bk0 is the sign bit
We can express xk as
-----2
Now by substituting (2) in (1), we get
----3
And now
CVR College of Engineering (VLSI) Page 27
By expanding the term we get
Now by expanding the sigma term we get the following equation
By taking common multiples into consideration we can re arrange the equation in the
following fashion.
CVR College of Engineering (VLSI) Page 28
Expanding this part
Finally the equation is reduced in the following way.
The equation 4 is the final formula of the distributed arithmetic.
For ROM construction the equation 4 is reduced in the following fashion.
has only 2K possible values i.e.
(5) Can be pre-calculated for all possible values of b1n b2n …bKn
We can store these in a look-up table of 2K words addressed by K-bits i.e. b1n b2n …bKn s
3.5. Block Diagram of FIR filter using DA algorithm:
Here in our project we are designing 4-tap FIR filter. The original LUT based DA
implementation of FIR filter is shown in the following figure.
CVR College of Engineering (VLSI) Page 29
-----4
---- 5
-----4
Fig:3.9. Block diagram of 4-tap FIR filter using DA based Algorithm
The block diagram of LUT based DA implemented FIR filter consists of three units such as the
shift register unit, the DA-LUT unit and the adder/shifter unit.
The four input signals each of four bits are given to parallel in serial out shift registers.
The output of parallel in serial out register is single bit value. The coefficients of filter are stored
in the Look up Table and depending on the output of the four parallel in serial out registers a
value is selected from Look-Up. The output of the look up table is given to the Adder and Shifter
unit. The Adder and Shifter unit adds this value to the left shifted previous output and gives it to
CVR College of Engineering (VLSI) Page 30
the output. This process is repeated for four clock cycles, after four clock cycles we will get the
required output.
3.5.1. Shift Register unit:
A serial-in/parallel-out shift register is similar to the serial-in/ serial out
shift register in that it shifts data into internal storage elements and shifts data out at the serial-
out, data-out and pin. It is different in that it makes all the internal stages available as outputs.
Therefore, a serial-in/parallel-out shift register converts data from serial format
to parallel format. If four data bits are shifted in by four clock pulses via a single wire at data-in,
below, the data becomes available simultaneously on the four outputs QA to QD after the fourth
clock pulse.
Fig 3.10. Serial in parallel out shift register with 4- stages.
The practical application of the serial-in/parallel-out shift register is to convert data
from serial format on a single wire to parallel format on multiple wires. Perhaps, we will
illuminate four LEDs (Light Emitting Diodes) with the four outputs (QA QB QC QD ).
CVR College of Engineering (VLSI) Page 31
Fig 3.11. serial in parallel out shift register in detail.
The above details of the serial-in/parallel-out shift register are fairly simple. It looks like
a serial-in/ serial-out shift register with taps added to each stage output. Serial data
shifts in at SI (Serial Input). After a number of clocks equal to the number of stages, the first data
bit in appears at SO (QD) in the above figure. In general, there is no SO pin. The last stage
(QD above) serves as SO and is cascaded to the next package if it exists.
Note that serial-in/ serial-out shift registers come in grater than 8-bit lengths of 18 to 64-bits.
It is not practical to offer a 64-bit serial-in/parallel-out shift register requiring that many output
pins. See waveforms below for above shift register.
CVR College of Engineering (VLSI) Page 32
Fig 3.12. Serial in parallel out register waveforms.
The shift register has been cleared prior to any data by CLR', an active low signal, which clears
all type D Flip-Flops within the shift register. Note the serial data 1011pattern presented at
the SI input. This data is synchronized with the clock CLK. This would be the case if it is being
shifted in from something like another shift register, for example, a parallel-in/ serial-
out shift register (not shown here). On the first clock at t1, the data 1 at SI is shifted
from D to Q of the first shift register stage. After t2 this first data bit is at QB. After t3 it is at QC.
After t4 it is at QD. Four clock pulses have shifted the first data bit all the way to the last
stage QD. The second data bit a 0 is at QC after the 4th clock. The third data bit a 1 is at QB. The
fourth data bit another 1 is at QA. Thus, the serial data input pattern 1011is
contained in (QD QC QB QA). It is now available on the four outputs.
CVR College of Engineering (VLSI) Page 33
It will available on the four outputs from just after clock t4 to just before t5. This parallel data
must be used or stored between these two times, or it will be lost due to shifting out the QD stage
on following clocks t5 to t8 as shown above.
3.5.2. Look Up Table unit:
The binary data is stored in the solid-state devices. Those storage "cells" within solid-state
memory devices are easily addressed by driving the "address" lines of the device with the proper
binary value(s). Suppose we had a ROM memory circuit written, or programmed, with certain
data, such that the address lines of the ROM served as inputs and the data lines of the ROM
served as outputs, generating the characteristic response of a particular logic function.
Theoretically, we could program this ROM chip to emulate whatever logic function we wanted
without having to alter any wire connections or gates.
Consider the following example of a 4 x 2 bit ROM memory (a very small memory!)
programmed with the functionality of a half adder:
Fig 3.13. Functionality of Half Adder.
CVR College of Engineering (VLSI) Page 34
If this ROM has been written with the above data (representing a half-adder's truth table),
driving the A and B address inputs will cause the respective memory cells in the ROM chip to be
enabled, thus outputting the corresponding data as the Σ (Sum) and Cout bits. Unlike the half-
adder circuit built of gates or relays, this device can be set up to perform any logic function at all
with two inputs and two outputs, not just the half-adder function. To change the logic function,
all we would need to do is write a different table of data to another ROM chip. We could even
use an EPROM chip which could be re-written at will, giving the ultimate flexibility in function.
It is vitally important to recognize the significance of this principle as applied to digital
circuitry. Whereas the half-adder built from gates or relays processes the input bits to arrive at a
specific output, the ROM simply remembers what the outputs should be for any given
combination of inputs. This is not much different from the "times tables" memorized in grade
school: rather than having to calculate the product of 5 times 6 (5 + 5 + 5 + 5 + 5 + 5 = 30),
school-children are taught to remember that 5 x 6 = 30, and then expected to recall this product
from memory as needed. Likewise, rather than the logic function depending on the functional
arrangement of hard-wired gates or relays (hardware), it depends solely on the data written into
the memory (software).
Such a simple application, with definite outputs for every input, is called a look-up table, because
the memory device simply "looks up" what the output(s) should to be for any given combination
of inputs states.
3.5.3. Adder and Shifter unit:
The adder and shifter unit consists of manly two blocks they are shifter and accumulator.
The input to the adder and shifter unit is the output of LUT. The input is added to the left shifted
previous output and it is assigned to the output. Here we use 16-bit adder. This process is
repeated k times to obtain the final output, where k is the number of input bits. Here we designed
4-Tap filter where input is 4 bit size so it requires four clock cycles to get the required output.
The adder and shifter unit one which eliminates the multiplication process by using shifting and
accumulate process.
CVR College of Engineering (VLSI) Page 35
3.6. Proposed Distributed Arithmetic:
The lower half of the LUT of the original LUT based DA implementation of FIR filter is the
sum of the sum of upper half of the LUT. The lower half is nothing but the locations where b3=1
and the upper half is the locations where b3=0. To avoid this wastage of memory we are using
proposed DA where the LUT size is reduced by an half with the additional 2*1 multiplexer and
full adder as shown in the following figure.
By using this proposed Distributed Arithmetic the LUT size is reduced to the half of its size.
The output of the fourth input i.e. b3 is given to the multiplexer. If the output is one then h[3] will
be the output of the multiplexer and if the b3 is zero then zero will be the output of the
multiplexer. The output of the multiplexer is added with the output of the LUT and then given to
the adder/shifter unit.
CVR College of Engineering (VLSI) Page 36
Fig 3.14. Block diagram of 4-tap FIR filter using proposed DA algorithm.
3.7. LUT-less Distributed Arithmetic:
The LUT reduction procedure discussed above will be further developed to obtain LUT-less
DA architecture. The LUT-less DA architecture is as shown below:
Fig 3.15. Block diagram of 4-tap FIR filter using LUT less algorithm.
Here in this procedure all LUT’s are replaced by multiplexers and full adders so that
memory usage is reduced completely. The output of the parallel in serial out is given to the
multiplexer where the value of the output is one then respective constant value is obtained
CVR College of Engineering (VLSI) Page 37
otherwise zero will be obtained. The output of multiplexer is given to the adder/shifter unit of the
filter.
3.8. VLSI implementation methods:
At the engineering level digital VLSI chips are classified by the approach used to
implement and the circuit. Several design styles can be considered for chip implementation of
specified algorithms or logic styles can be considered for chip implementation for specified
algorithms or logic functions. Each design has its own merits and demerits and thus a proper
choice has to be made by designers in order to provide the functionality at low cost.
3.8.1. PLD (PROGRAMMABLE LOGIC DEVICE):
PLD’s are standard ICs that are available in standard configurations from a catalog of parts
and are sold in very high volume to many different customers. PLD’s may be configured or
programmed to create a part customized to a specified application, and so they also belong to the
family of ASIC’s. PLD’s use different technologies to allow programming of the device.
There are four types of PLD’s
1. Programmable Logic Array (PLA)
2. Programmable Array Logic (PAL)
3. Complex Programmable Logic Device (CPLD)
4. Field Programmable Gate Array (FPGA)
1. Programmable Logic Array (PLA):
A Programmable Logic Array is a small PLD that contains two levels of logic, an
AND-plane and an OR-plane, where both levels are programmable.
CVR College of Engineering (VLSI) Page 38
2. Programmable Array Logic (PAL):
A Programmable Array Logic is a small PLD that has programmable AND plane
followed by a fixed OR plane.
3. Complex Programmable Logic Device (CPLD):
A Complex Programmable Logic Device is a PLD that consists of an arrangement
of multiple PLA/PAL like blocks on a single chip.
4. Field Programmable Gate Array (FPGA):
A Field Programmable Gate Array is a PLD that allows a very high logic capacity
than CPLD.
3.8.2. Features of PLD:
1. No customized mask layers or logic cells.
2. Fast design turnaround.
3. Single large blocks of programmable interconnect.
3.8.3. FPGA (Field Programmable Gate Array):
3.8.3.1. Introduction:
Field Programmable Gate Arrays are specific integrated circuits that can be user-
programmed easily. The FPGA contains versatile functions, configurable interconnects and
input/output interface to adapt to the user specification. FPGA allow rapid prototyping using
custom logic structures, and are very popular for limited production products. Modern FPGA are
extremely dense, with complexity of several millions of gates which enable the emulation of
very complex hardware such as parallel microprocessors, mixture of processor and signal
processing. One key advantage of FPGA is their ability to be reprogrammed, in order to create a
completely different hardware by modifying the logic gate array. FPGA not only exist as simple
CVR College of Engineering (VLSI) Page 39
components, but also as CPU ram-blocks in system-on-chip designs. FPGA consists of Slices
where each Slice consists of 2 look up tables and 2 D-Flip Flops.
3.8.3.2. Look Up Table (LUT):
Look Up Table (LUT) is a one-bit wide memory array, where the address lines for the
memory are inputs of the logic block and the one-bit output from the memory is the input for the
next block. A LUT with n inputs would correspond to (2^n)*1 bit memory, can realize any logic
function of its n inputs by programming the logic functions truth table directly into the memory.
Fig 3.16 LUT’s in FPGA.
CVR College of Engineering (VLSI) Page 40
3.8.3.3. Classification of FPGA:
FPGAs are classified based on Switching Technology.
1. SRAM based FPGAs
2. ANTIFUSE based FPGAs
XILINX and ALTERA are the leading manufacturers in SRAM based FPGA.
ACTEL, QUICKLOGIC, CYPRESS are the leading manufactures in
ANTIFUSE based FPGA.
3.8.3.4. FPGA Design flow:
The involved in implementation of a design on FPGA involves System Specifications.
Specifications refer to kind of inputs and kind of outputs and the range of values that the kit can
take it. Based on these System specifications we move on to the next step i.e. Architecture
describes the interconnections between all the blocks involved in our design.
Each and every block in the Architecture along with their interconnections is modeled in
either VHDL or Verilog depending on our ease. All these blocks are then simulated and the
outputs are verified for correct functioning.
From this simulation step we head towards the next step i.e. Synthesis. This is a very
important step in knowing whether our design can be implemented on a FPGA kit or not.
Synthesis converts our VHDL code into its functional components which are vendor specific.
After performing synthesis we can have a look of RTL schematic and Technology Schematic.
We can also see the timing delays that will be present in the FPGA if the design is implemented
on it.
Place & Route is the next step in which the tool places all the components on a FPGA die
for optimum performance both in terms of area and speed. We also see the interconnections,
which will be made, in this part of the implementation flow. In post place and route simulation
step the actual delays, which will be involved on the FPGA kit, are considered by the tool and
simulation is performed taking into consideration these delays, which will be present in the
implementation on the kit. Delays here mean electrical loading effect, wiring delays, stray
capacitances.
CVR College of Engineering (VLSI) Page 41
Fig:3.17. FPGA implementation design flow
CVR College of Engineering (VLSI) Page 42
System specifications
Architecture
VHDL Module
Simulation
Synthesis
Place and route
Post place and route place
Generating BIT map file
Download on to FPGA
Configuring FPGA
Timing verification
Placing of FPGA die and interconnections
Generating a net list
Functional verification
Coding
Block diagram
Initials
After post place and route, comes generating the bit-map file, which means converting the
VHDL code bit streams which is useful to configure the FPGA kit. A bit file is generated after
we perform this step.
After this comes final step of downloading the bit map file on to the FPGA board which is
done by connecting the computer to FPGA board with the help of JTAG cable (Joint Test Action
Group) which is a IEEE standard. The bit map file contains the whole design, which is placed on
the FPGA die; the outputs can now be observed from the FPGA LED’s or multiplexed seven
segment displays. This step completes the whole process of implementing our design on an
FPGA.
3.8.3.5. Characteristics of FPGA:
None of the mask layers are customized.
A method for programming the basic logic cells and the interconnect.
The core is a regular array of programmable basic logic cells that can implement
combinational as well as sequential logic (flip-flops).
A matrix of programmable interconnect surround the basic logic cells.
Programmable I/O cells surround the core.
Design turnaround is a few hours.
3.9. Applications of FPGA:
1. Device controllers
2. Random logic
3. Emulation of hardware
4. Integrating multiple SPLD’s
3.10. Conclusion:This chapter discussed about Design methodologies, implementation of FIR filter in different methods and FPGA design flow.
4. DESIGN ANALYSIS
CVR College of Engineering (VLSI) Page 43
4.1. INTRODUCTION:
After implementation of the design, next is to analyze the design. Now this chapter will
discuss about the analysis of the design after the implementation FIR filter in all four different
algorithms. In this chapter we are flow chapter of the algorithm is presented. Here synthesis and
simulation reports are discussed in the view of the performance of the design.
4.2. Flow chart:
Fig.4.1 flow chart
4.3. Simulation:
CVR College of Engineering (VLSI) Page 44
4.3.1. Direct implementation of FIR filter:
Fig 4.2. Simulation report of Direct Implementation of FIR filter.
In direct implementation we use Multiply and Accumulate (MAC) operation. Here we take filter co-efficient as constants and variables as input. Here we multiply the variables with filter co-efficient based on the FIR equation. Here we get output in one clock cycle but the delay will be more. Here time period of clock is 20 ns. For 1st 20 ns reset is one, so the output is zero.
4.3.2. One tap filter:
CVR College of Engineering (VLSI) Page 45
Fig 4.3 Simulation report of one tap filter
We apply input to the Parallel in Serial Out register (PISO), after clock event we’ll check reset, if reset is one then we will make all intermediate variable, signals, counter and output port zero, if reset is zero for first clock cycle we’ll get the MSB of inputs from Parallel in Serial Out register. Based on the output of PISO we’ll select the value from Look Up Table (LUT). This value is added to the left shifted previous output. This result is given to output.
Here for one tap filter we have one filter co-efficient and one 4-bit input. We store the pre-calculated values (in this case h0 and “0000”) in the ROM and we give input to the PISO register. At first reset is one so the output is zero. After reset becoming zero and clk event we load the input port with input. During first clock cycle we’ll get the MSB of the input from parallel in serial out register. Based on output of PISO we’ll decide whether we have to select h0 or “0000”, if bit from PISO is zero we’ll select “0000”, if it is one we’ll select one of the values from LUT. The output of the LUT is added to left shift previous output. During 1 st clock cycle output is zero, so if we shift zero we’ll get zero. This is added to output of LUT. During second clock cycle we get second MSB from PISO, based on this we’ll select value from the LUT, this added to left shifted previous output. After four clock cycles we’ll get the require output.
CVR College of Engineering (VLSI) Page 46
4.3.3. Implementation of FIR filter using DA algorithm:
Fig 4.4. Simulation report of DA algorithm based FIR filter.
We apply inputs to the Parallel in Serial Out register (PISO), after clock event we’ll check reset, if reset is one then we will make all intermediate variable, signals, counter and output port zero, if reset is zero for first clock cycle we’ll get the MSB of inputs from Parallel in Serial Out register. Based on the output of PISO we’ll select the value from Look Up Table (LUT). This value is added to the left shifted previous output. This result is given to output.
Here for 4-tap filter we have four filters co-efficient 16 bit each and four 4-bit input. We store the pre-calculated values in the ROM and we give input to the PISO register. At first reset is one, so the output is zero. After reset becoming zero and clk event we load the input port with input. During first clock cycle we’ll get the MSB’s of all four inputs from parallel in serial out registers. Based on output of PISO registers we’ll select a value from the LUT. The output of the LUT is added to left shift previous output. During 1st clock cycle output is zero, so if we shift zero we’ll get zero. This is added to output of LUT. During second clock cycle we get second MSB from PISO, based on this we’ll select value from the LUT, this added to left shifted previous output. After four clock cycles we’ll get the require output.
4.3.4. Implementation of FIR filter using Proposed DA algorithm:
CVR College of Engineering (VLSI) Page 47
Fig:4.5. Simulation report of proposed DA based FIR filter.
We applied the input to the Parallel in Serial Out register (PISO), after clock event we’ll check reset, if reset is one then we will make all intermediate variable, signals, counter and output port zero, if reset is zero for first clock cycle we’ll get the MSB of inputs from Parallel in Serial Out register. Output of one PISO register is given to the MUX and remaining three outputs from other PISO register is given to the 3-tap LUT. Based on the output of three PISO registers we’ll select the value from Look Up Table (LUT). This value is added to the left shifted previous output. This result is given to output.
Here for 4- tap filter we have four filters co-efficient 16 bits each and four 4-bit inputs. We give inputs to the PISO register. At first reset is one so the output is zero. After reset becoming zero and clk event we load the input port with input. During first clock cycle we’ll get the MSB of the input from parallel in serial out register. Three are given to the LUT and remaining is given to given to MUX. The output of the MUX and LUT are added. This result is added to the left shifted previous output. During 1st clock cycle output is zero, so if we shift zero we’ll get zero. This is added to the sum of LUT and MUX. During second clock cycle we get second MSB from PISO, based on this we’ll select value from the LUT and MUX, this added to left shifted previous output. After four clock cycles we’ll get the require output.
CVR College of Engineering (VLSI) Page 48
4.3.5. Implementation of FIR filter using LUT-Less algorithm:
Fig 4.6. Simulation report of LUT-Less based FIR filter.
We applied the input to the Parallel in Serial Out register (PISO), after clock event we’ll check reset, if reset is one then we will make all intermediate variable, signals, counter and output port zero, if reset is zero for first clock cycle we’ll get the MSB of inputs from Parallel in Serial Out register. The output of each PISO is fed to the four different MUX. The output of all MUX is added and the result is given to output.
Here for 4-tap filter we have four filters co-efficient, 16 bit each and four 4-bit inputs. We’ll give one input to four PISO registers. At first reset is one so the output is zero. After reset becoming zero and clk event we load the input port with input. During first clock cycle we’ll get the MSB of the input from parallel in serial out register. Each output of PISO is given to four MUX. In MUX based on the input we’ll decide whether the we have to select filter co-efficient or “0000”, if bit from PISO is zero we’ll select “0000”, if it is one we’ll select corresponding filter co-efficient. The output of MUX is added. The result is added to the left shifted previous output. During 1st clock cycle output is zero, so if we shift zero we’ll get zero. This is added to output of LUT. During second clock cycle we get second MSB from PISO, based on this we’ll get values from MUX, this are added and the result is added to left shifted previous output. After four clock cycles we’ll get the require output.
CVR College of Engineering (VLSI) Page 49
4.4. Synthesis report:
4.4.1. Direct implementation:
=========================================================================
* Advanced HDL Synthesis *
=========================================================================
Loading device for application Rf_Device from file '3s400.nph' in environment C:\Xilinx92i.
=========================================================================
Advanced HDL Synthesis Report
Macro Statistics
# Multipliers : 7
4x16-bit multiplier : 7
# Adders/Subtractors : 6
20-bit adder : 6
# Registers : 80
Flip-Flops : 80
=========================================================================
Timing Summary
=========================================================================
Speed Grade: -5
Minimum period: No path found
Minimum input arrival time before clock: 11.542ns
Maximum output required time after clock: 6.216ns
Maximum combinational path delay: No path found
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
CVR College of Engineering (VLSI) Page 50
=========================================================================
Timing constraint: Default OFFSET IN BEFORE for Clock 'clk'
Total number of paths / destination ports: 77620 / 160
4.4.2. DA algorithm based FIR filter:
=========================================================================
* Advanced HDL Synthesis *
=========================================================================
Loading device for application Rf_Device from file '3s400.nph' in environment C:\Xilinx92i.
=========================================================================
Advanced HDL Synthesis Report
Macro Statistics
# ROMs : 1
16x16-bit ROM : 1
# Adders/Subtractors : 2
16-bit adder : 1
3-bit adder : 1
# Registers : 53
Flip-Flops : 53
# Comparators : 4
4-bit comparator not equal : 4
=========================================================================
Timing Summary
Speed Grade: -5
Minimum period: 9.602ns (Maximum Frequency: 104.140MHz)
CVR College of Engineering (VLSI) Page 51
Minimum input arrival time before clock: 10.130ns
Maximum output required time after clock: 6.280ns
Maximum combinational path delay: No path found
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
=========================================================================
Timing constraint: Default period analysis for Clock 'clk'
Clock period: 9.602ns (frequency: 104.140MHz)
Total number of paths / destination ports: 7762 / 68
4.4.3. Proposed DA algorithm:
=========================================================================
* Advanced HDL Synthesis *
=========================================================================
Loading device for application Rf_Device from file '3s400.nph' in environment C:\Xilinx92i.
=========================================================================
Advanced HDL Synthesis Report
Macro Statistics
# ROMs : 1
8x16-bit ROM : 1
# Adders/Subtractors : 3
16-bit adder : 2
3-bit adder : 1
# Registers : 53
Flip-Flops : 53
CVR College of Engineering (VLSI) Page 52
# Comparators : 4
4-bit comparator not equal : 4
Timing Summary
=========================================================================
Speed Grade: -5
Minimum period: 10.188ns (Maximum Frequency: 98.154MHz)
Minimum input arrival time before clock: 10.672ns
Maximum output required time after clock: 6.280ns
Maximum combinational path delay: No path found
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
=========================================================================
Timing constraint: Default period analysis for Clock 'clk'
Clock period: 10.188ns (frequency: 98.154MHz)
Total number of paths / destination ports: 8655 / 68
4.4.4. LUT-Less algorithm:
=========================================================================
* Advanced HDL Synthesis *
=========================================================================
Loading device for application Rf_Device from file '3s400.nph' in environment C:\Xilinx92i.
=========================================================================
Advanced HDL Synthesis Report
Macro Statistics
CVR College of Engineering (VLSI) Page 53
# Adders/Subtractors : 5
16-bit adder : 4
3-bit adder : 1
# Registers : 53
Flip-Flops : 53
# Comparators : 4
4-bit comparator not equal : 4
=========================================================================
Timing Summary
Speed Grade: -5
Minimum period: 12.615ns (Maximum Frequency: 79.268MHz)
Minimum input arrival time before clock: 13.055ns
Maximum output required time after clock: 6.280ns
Maximum combinational path delay: No path found
Timing Detail:
--------------
All values displayed in nanoseconds (ns)
=========================================================================
Timing constraint: Default period analysis for Clock 'clk'
Clock period: 12.615ns (frequency: 79.268MHz)
Total number of paths / destination ports: 64845 / 68
4.5. Conclusion:
In this chapter we simulation reports and synthesis reports are observed.
CVR College of Engineering (VLSI) Page 54
5. RESULT
This project is comprised of several chapters which devoted to the designing of
FIR filter using four different algorithms. Here the comparison tables are presented to
have overview on the resource usage and the timing comparisons. A simple overview
of the project, including the scope, motivation and objectives are discussed. Finally a
FIR filter is designed and the output is observed on the Spartan III FPGA kit.
Resource Specification Direct implementation
Distributed arithmetic algorithm
Proposed DA algorithm
LUT less algorithm
Rom
8*16bit - 1 -
16*16bit 1 - -
Adder/ subtractor
16 bit adder 1 2 4
3 bit adder 1 1 1
Registers Flip-flop 53 53 53
Comparator 4-bit comparator
4 4 4
Resource comparison table:
CVR College of Engineering (VLSI) Page 55
Delay comparison:
Techniques Time delay (ns) Frequency (MHz)
Direct implementation
--
Distributed Arithmetic
9.602 104.14
Proposed DA algorithm
10.188 98.154
LUT less algorithm
12.615 79.268
CVR College of Engineering (VLSI) Page 56
6. CONCLUSION
This project presents the proposed DA architectures for FIR filter. The architectures
reduces the memory usage by half at every iteration of LUT reduce the memory usage by half at
every iteration of LUT reduction at the cost of the limited decrease of the system frequency. We
also divide high order filters into several groups of small filters, hence we can reduce the LUT
size also. As to get the speed implementation of the FIR filter a proposed DA algorithm is
adopted.
We have successfully implemented high efficient 4-tap FIR filter, using both original DA
architecture and the proposed DA architecture on Spartan III FPGA kit device. It shows that the
proposed DA architecture is the hardware efficient for the FPGA implementation.
CVR College of Engineering (VLSI) Page 57
7. FUTURE SCOPE
The speed of the filter can be further increased using pipelining principle where parallel
processing can implemented. By using pipelining principle the filter can be extended to the
higher order. The 70-tap can be implemented using symmetrical structure, so that we can reduce
it to 35-tap. Then by dividing 35-tap filter to the 7 smaller filters each having 5-tap DA-LUT unit
could be implemented by a 4-input LUT with an additional 2*1 multiplexer and a full adder.
Thus our 4-tap can be extended to the 70-tap and more higher filter.
APPENDIX-A
CVR College of Engineering (VLSI) Page 58
Xilinx FPGA
Field Programmable Gate Arrays (FPGAs) are Specific Integrated Circuits that can be easily user-
programmed. The FPGA contains versatile functions, configurable interconnects and input/output
interface to adapt to the user specifications. The FPGA allows rapid Prototyping using Custom Logic
Structures, and are very popular for limited production products. Modern FPGAs are extremely dense,
with the complexity of several millions of gates, which enable the emulation of very complex hardware
such as Parallel Microprocessors, mixture of Processors and Signal Processing. One key advantage of
FPGA is their ability to be reprogrammed, in order to create a completely different hardware by
modifying the Logic Gate Array.
Advantages of FPGA:
1. Short turnaround time
2. Design independent
3. Flexibility
Classification of FPGAs:
The FPGAs are classified based on switching technology.
1. SRAM based FPGAs
2. ANTIFUSE based FPGAs
XILINX and ALTERA are the leading manufactures in SRAM based FPGA.
ACTEL, QUICKLOGIC, CYPRESS are the leading manufactures in ANTIFUSE based FPGA.
XILINX FPGA
Xilinx is a developer of FPGA and CPLD devices that are used in numerous applications within
telecommunications, consumer, defense, and others fields. The Xilinx offers device families for glue
logic (Cool Runner, Cool Runner II), low-cost (Spartan), and high-end (Virtex) applications. The Xilinx
also provides different application oriented optimized series FPGA’s as LX (For Logic), SX (For Signal
Processing) and (FX for Fully Featured).
Xilinx develops IP (intellectual property) cores designed in HDL which allow designers to minimize
time to market. These IP cores range from simple functions (such as BCD encoders, counters, etc.) to
complex systems (such as multi-gigabit networking cores and custom embedded microcontrollers like the
fully-featured Micro blaze soft microprocessor, and the compact Pico blaze microcontroller.) In addition,
Xilinx Design Services (XDS) can create custom cores.
CVR College of Engineering (VLSI) Page 59
Xilinx offers Electronic Design Automation (EDA) tools for use with its devices. Chief among these
is ISE, which offers a complete EDA flow. Domain specific tools include Xilinx's Embedded Developer's
Kit (EDK), which is aimed primarily at designers wishing to use the embedded PowerPC 405 core in the
Virtex-II Pro and Virtex-4, or Xilinx's own soft microprocessor/microcontroller in their designs. Other
domain-specific tools include Xilinx's System Generator for DSP, which provides seamless simulation
and implementation of high-performance DSP designs on Xilinx's FPGAs. The design in Xilinx FPGA
can be implemented by using the basic block of the FPGA. The main purpose of the logic block is to
design the desired functionality by using available components of the Logic Block.
Basic Logic Block
In Xilinx FPGA the basic blocks that can be used for design are Configurable Logic Blocks (CLB’s).
Each CLB consists of the SLICES. The SLICES consists of two-LUT’s and 2-D flip-flops.
Look-up Table (LUT):
The Look-up Table (LUT) is a one-bit wide memory array, where the address line for the memory are
inputs of the logic block and the one-bit output from the memory is the LUT output. A LUT with n inputs
would correspond to 2^n x 1 bit memory, and can realize any logic function of its n inputs by
programming the logic functions truth table directly into the memory. The LUT can be used to design any
combinational circuit which is having single output.
XILINX FPGA DESIGN FLOW:
The Xilinx FPGA Design flow is shown in Figure A-1. The first step involved in implementation of a
design on FPGA involves Specifications. The Specifications consists of number of inputs and number of
outputs and the range of values that the kit can take in. Based on these specifications the architecture will
be designed. The Architecture describes the interconnections between all the blocks involved in the
design.
Each and every block in the Architecture along with their interconnections is modeled in either VHDL
or Verilog depending on requirement. All these blocks are then simulated and the outputs are verified for
correct functionality. The Simulation can be done at various levels of abstractions. The other simulations
include the post synthesis simulation, post place and route simulation etc. The simulation needs a set of
test vectors for checking the functionality.
Once the functional simulation is correct then the next step is Synthesis. The Synthesis
converts the HDL description in to Net list. The Net list gives the information about the
functional hardware elements. The synthesis step gives two views of the design. One is
CVR College of Engineering (VLSI) Page 60
Technology Dependent and other is Technology independent. The Technology independent view
of the synthesis gives the design information in terms of gates and other components which are
not dependent to any technology. The Technology dependent view of the synthesis gives the
hardware information in terms of the LUT’s and other components, which are dependent to
Xilinx Technology.
Place & Route is the next step in which the tool places all the components on a FPGA die
for optimum performance both in terms of area and speed. After placing the components the
interconnections between the components can also be done.
In post place and route simulation step the actual delays which will be involved in the
design are considered by the tool and simulation is performed by considering these delays. These
Delays are because of electrical loading effect, wiring delays, stray capacitances.
After post place and route, comes generating the bit-map file, which means converting the
VHDL/Verilog code into bit streams which is useful to configure the FPGA kit. A .bit file is
generated after this step.
After this comes final step of downloading the bit map file on to the FPGA board which is
done by connecting the computer to FPGA board with the help of JTAG cable (Joint Test Action
Group) which is an IEEE standard. The bit map file contains the whole design which is to be
used in FPGA board.
APPENDIX-B
CVR College of Engineering (VLSI) Page 61
Spartan-3 Starter Kit – Introduction:
The Xilinx Spartan-3 Starter Kit provides a low-cost, easy-to-use development and
evaluation platform for Spartan-3 FPGA designs.
Figure-B-1 shows the Spartan-3 Starter Kit board, which includes the following
components and features:
200,000-gate Xilinx Spartan-3 XC3S400 FPGA in a 256-ball thin Ball Grid Array package
(XC3S400FT256) [1]
The Table B-1 shows the device information of the Xilinx Spartan-3 FPGA.
Family Name Xilinx Spartan-3
Device Name XC3S400
Capacity 20,000 gates
Package Ball Grid Array
Speed Grade -4/-5
Table B-1: Device Information of Spartan-3
The components and capacity of the Xilinx Spaatan-3 FPGA are shown below.
CVR College of Engineering (VLSI) Page 62
4,320 logic cell equivalents
Twelve 18K-bit block RAMs (216K bits)
Twelve 18x18 hardware multipliers
Four Digital Clock Managers (DCMs)
Up to 173 user-defined I/O signals
2Mbit Xilinx XCF02S Platform Flash, in-system programmable configuration PROM [2]
1Mbit non-volatile data or application code storage available after FPGA configuration
Jumper options allow FPGA application to read PROM data or FPGA configuration from
other sources[3]
1M-byte of Fast Asynchronous SRAM (bottom side of board) [4]
Two 256Kx16 ISSI IS61LV25616AL-10T 10 ns SRAMs
Configurable memory architecture
- Single 256Kx32 SRAM array, ideal for Micro Blaze code images
- Two independent 256Kx16 SRAM arrays
Individual chip select per device
Individual byte enables
3-bit, 8-color VGA display port[5]
9-pin RS-232 Serial Port [6]
DB9 9-pin female connector (DCE connector)
Maxim MAX3232 RS-232 transceiver/translator[7]
Uses straight-through serial cable to connect to computer or workstation serial port
Second RS-232 transmit and receive channel available on board test points[8]
PS/2-style mouse/keyboard port[9]
Four-character, seven-segment LED display[10]
Eight slide switches[11]
Eight individual LED outputs[12]
Four momentary-contact push button switches[13]
50MHz crystal oscillator clock source( as in FigureB-2)[14]
Socket for an auxiliary crystal oscillator clock source[15]
FPGA configuration mode selected via jumper settings[16]
Push button switch to force FPGA reconfiguration (FPGA configuration happens
CVR College of Engineering (VLSI) Page 63
automatically at power-on)[17]
LED indicates when FPGA is successfully configured[18]
Three 40-pin expansion connection ports to extend and enhance the Spartan-3 Starter Kit
Board[19][20][21]
See www.xilinx.com/s3board for compatible expansion cards
Compatible with Diligent, Inc. peripheral boards
https://digilent.us/Sales/boards.cfm#Peripheral
FPGA serial configuration interface signals available on the A2 and B1 connectors -
PROG_B, DONE, INIT_B, CCLK, DONE
JTAG port [22] for low-cost download cable[23]
Diligent JTAG download/debugging cable connects to PC parallel port [23].
JTAG download/debug port compatible with the Xilinx Parallel Cable IV and MultiPRO
Desktop Tool [24].
AC power adapter input for included international unregulated +5V power supply[25].
Power-on indicator LED [26].
On-board 3.3V [27], 2.5V [28] , and 1.2V[29] regulators Component Locations.
Figure B-2 indicates the component locations on the top side and bottom side of the
board, respectively.
CVR College of Engineering (VLSI) Page 64
Figure B-2: XILINX Spartan-3 Starter Kit
CVR College of Engineering (VLSI) Page 65