DDFS Report

8/13/2019 DDFS Report

1/68

Direct Digital Frequency Synthesizer

E&C 1

ABSTRACT

This project investigates novel direct digital frequency synthesizer architecture.

The new approach allows reducing the total number of segments with respect to the well-

known uniform segmentation. In this way the size of the coefficient ROM is also reduced

with beneficial effects in terms of speed and power. We show that the optimal nonuniform

segmentation (that maximizes the spurious-free dynamic range for a given number of

nonuniform segments) can be obtained as them solution of a mixed-integer linear

programming problem. Three simple, suboptimal, nonuniform segmentation schemes

(which lend themselves to efficient hardware implementation) are proposed in this paper.

We present also several design examples and VLSI implementation results, which

demonstrate the effectiveness of the developed technique.

Key-Words:DDFS, direct-digital frequency synthesizer, nonuniform segmentation,

piecewise linear approximation, polynomial interpolation.


2/68


E&C 2

CHAPTER 1

INTRODUCTION

A frequency synthesizer is defined as a system that generates one or many

frequencies derived from a single time base (frequency reference), in such a way that the

ratio of the output to the reference frequency is a rational fraction. The frequency

synthesizer output frequency preserves the long-term frequency stability (the accuracy) of

the reference and operates as a device whose function is to generate frequencies that are

multiples of the reference frequency (multiples by a single or many numbers). These

multiples may be whole or fractions, but since only linear operations are used (in the

frequency domain), these numbers can only be rational.

Three main, conventional techniques are being used currently for sine-wave

synthesizers and are common throughout the industry. The most common and most

popular technique uses the phase - locked loop synthesis. PLL synthesizers can be found

in the most sophisticated radar systems or the most demanding satellite communications

terminals as well as in car radios and stereo systems for home entertainment. The PLL isa feedback mechanism locking its output frequency to a reference. PLL synthesizers

gained popularity for their simplicity and economics.

Another synthesizer technique is known as direct analog (DA) frequency synthesis.

In this technique, a group of reference frequencies is derived from the main reference; and

these frequencies are mixed and filtered, added, subtracted, or divided according to the

required output. However, there are no feedback mechanisms in the basic technique. The

DA frequency synthesis technique offers excellent spectral purity, especially close to the

carrier, and excellent switching speed, which is a critical parameter in many designs and

determines how fast the synthesizer can hop from one frequency to another.

The DA technique is usually much more complicated than PLL to execute and is

therefore more expensive. DA synthesizers found applications in medical imaging and

spectrometers, fast-switching anti-jam communications and radar, electronic warfare


3/68


E&C 3

(EW) simulation, automatic test equipment (ATE), radar cross-section (RCS)

measurement, and such uses where the advantages of the DA technique are a must at a

premium cost.

The third technique, is direct digital synthesis (DDS), which is a digital signal

processing (DSP) discipline and uses digital circuitry and techniques to create, manipulate,

and modulate a signal, digitally, and eventually convert the digital signal to its analog form

by using a digital-to-analog converter (DAC).

Although the direct digital synthesizer [sometimes referred to as numerically

controlled oscillator (NCO)] was invented almost 30 years ago it started to attract attention

only in the last 10 to 12 years. Due to the enormous evolution of digital technology and its

tools, the technique evolved remarkably into an economical, high-performance tool and is

now a major frequency synthesis method used by almost all synthesizer designers from

instrument makers to applications like satellite communications, radar, medical imaging,

and cellular telephony and amateur radios (most of which are anything but amateur).

Direct digital synthesizers offer fast switching speed, high resolution (the step size

of the synthesizer), small size and low power, good economics, and the reliability andproducibility of digital designs. In addition, since the signal is manipulated digitally, it is

easy to modulate and achieve accuracies not attained by analog techniques and to

conveniently interface with the computing a chines that usually control the synthesizer.

A direct digital frequency synthesizer (DDFS) uses digital signal processing to

generate frequency and phase tunable output signals. The generated output frequency is a

division of the reference clock frequency. The division factor is set in a binary tuning word.

The DDFS has the advantages of fast frequency switching, fine frequency resolution,

direct digital phase and frequency modulation in the digital domain and low phase noise.

DDFS has a variety of applications from instrumentations and measurements to modern

digital communication systems. For example, they can be utilized as a clock generator,

which produces output frequencies with N the resolution of its phase accumulator. This

characteristic is useful for the systems that need multiple clock frequencies with no integer

relationship between them and they need to be changed rapidly and frequently. In modern

communication systems, DDFS seems to be an alternative to phase-locked loops (PLL).


4/68


E&C 4

Fast switching speed is becoming more and more important in todays wireless

communication systems, such as in spread spectrum communication systems. The

limitation of the tuning speed of the PLL comes from the produced delay due to its internal

feedback.

Aside from these advantages, DDFS is only capable of producing the exact integer

division of the reference clock frequency when the FCW is 2 to the power of an integer.

However, PLL has the ability to lock its output to the input phase of a reference clock.

Moreover, PLL is capable of producing higher output frequencies. In order to take

advantages of both PLL and DDFS, some applications use a hybrid frequency synthesizer,

combining PLL and DDFS. Moreover, conventional direct digital frequency synthesizers

are considered power hungry systems due to the use of ROM look up table in theirarchitecture. Consequently, ROM-Less architectures has been introduced. The first

approach in ROM-Less DDFS architecture was to use all thermometer sine-weighted

DAC. However, this approach needed a huge number of current cells. Therefore, to

decrease the number of current cells segmentation algorithm for nonlinear DAC was

proposed. The segmentation of nonlinear DAC is more complicated than the linear ones

and this architecture suffers from more complexity. The second approach in ROM-Less

DDFS design is to use the triangle to sine wave conversion. This method uses the parabolic

approximation, and utilizes the exponential current-voltage relationship of the transistors

to implement it electronically. This method shows a moderate precision in triangle to sine

wave conversion.


5/68


E&C 5

CHAPTER 2

DDFS PRINCIPLES AND ARCHITECTURES

As it was stated earlier, Direct Digital Frequency Synthesizer, DDFS, uses digital

signal processing to generate frequency and phase tunable output signals. In order to

change the frequency of the output signal, frequency control word (FCW) or the frequency

of the reference clock can be changed. In this chapter the DDFS principles are described

through explaining conventional DDFS architecture. Also, the most common DDFS

architectures will be presented.

2.1 CONVENTIONAL DDFS

The block diagram of a conventional DDFS is shown in figure 2.1. The DDFS

consists of a phase accumulator, a phase to sinusoid amplitude converter (PAC) and a

digital to analog converter (DAC) followed by a filter. The phase accumulator consists of

a counter and a register. The register restores the frequency control word (FCW), which is

the jump size of the counter. With each clock cycle, the over flow of the counter is added

to the FCW. The result of this counting is the production of the phase information of the

sine wave. The output of the phase accumulator will be fed to PAC, which converts the

phase information of the sine wave to amplitude. The discrete-time, discrete-amplitude

information of the sine will be converted to analog by passing through a DAC. The final

block of the system is an ant-aliasing filter. The functionality of each block is described in

more details in the following sections.

Fig 2.1Block Diagram of Conventional DDFS.


6/68


E&C 6

2.1.1 PHASE ACCUMULATOR

The phase accumulator is basically a counter which has the responsibility of

generating the phase information of the sine wave. In order to understand how the

frequency is synthesized using a phase accumulator, consider the phase wheel in figure

2.2.

Fig 2.2 Phase Wheel.


7/68


E&C 7

Each point on the phase wheel is correspondent to an equivalent phase of the sine

wave. A complete rotation of the phase wheel with constant speed will generate one

complete period of a sine wave. In every clock cycle, the over flow of the counter is added

to the FCW which is stored in the phase accumulator register. Consequently, FCW

determines how fast the counter travels around the phase wheel. As a result of a higher

jump size, the counter completes one rotation around the phase wheel faster, and

consequently a higher output frequency will be synthesized. The resolution of the phase

accumulator (N) determines how many phase points the phase wheel contains, and

consequently it determines the resolution of the synthesized output frequency. For

example, if N is taken to be 32, then the FCW of 00000001 will result the counter to

overflow after reference clock cycles (a complete rotation) and gives the lowest possible

output frequency. The FCW of 01111111 will result the counter to overflow after only

two reference clock cycles (a complete rotation). The output of phase accumulator is

shown in figure 2.3. The relation between the reference clock frequency, output frequency

the FCW and resolution of the phase accumulator is given in equation 2-1.

fout = (Pfclk)/2j fout


8/68


E&C 8

2.1.2 The phase to amplitude converter

After the phase information is generated by the phase accumulator, it will be fed to the

phase to amplitude converter, which is a ROM look up table in the conventional DDFS.

The look up table contains the amplitude information correspondent with each of the phase

points of the phase wheel. In order to avoid a very large look up table, it is common to use

only a fraction of the most significant bits of the phase accumulator information In order

to produce a sine wave. In this case we say that the DDFS is truncated from j bits to k bits,

for example from 32 bits to 12 bits. The truncation results in spurs in the output spectrum

of the DDFS, which will be discussed in the next chapter. However, 12 bits still results in

a large look up table. A large look up table decreases the speed of the synthesizer and

increase the power consumption and die area, moreover a high resolution DAC will beneeded to design. Therefore, a tremendous work has been done to reduce the size of the

look up table. A very basic one is to use the quarter wave symmetry of the sine wave. The

block diagram of this method is shown in the figure 2.3. In this case only the amplitude

information of the 0 to /2 of the sine wave is stored in the ROM, and the two most

significant bits of the phase accumulator output are used to distinguish the quarter of the

sine wave. The most significant bit illustrates the sign of the sine wave amplitude and the

second most significant bit is used to determine whether the amplitude is increasing or

decreasing. The output of the phase to amplitude converter is shown in figure 2.4.

Other ROM compression techniques include the Sunderland architecture, Nicholas

architecture, polynomial approximation and CORDIC algorithm. In the Sunderland

architecture the large look up table is divided in to two smaller memories. The Nicholas

architecture has improved the Sunderland architecture and hence has achieved a higher

ROM compression. In the Polynomial approximations, the coefficient of the polynomial

is stored in the ROM. In this method the interval of [0, is divided in smaller divisions

and the sine/cosine is produced in for each of them. The CORDIC algorithm has its

advantage over ROM when the needed accuracy is more than 9 bits. Using this algorithm

the needed hardware is not growing exponentially when the output word size is increasing.


9/68


E&C 9

Fig 2.4 Phase to Amplitude Converter.

2.1.3 Exploitation of sine function symmetry:

A well-known technique is to store only / 2 radians of sine information and to

generate the sine look-up table samples for the full range of 2 by exploiting the quarter-

wave symmetry of the sine function, as mentioned earlier. The decrease in the look-up

table capacity is paid for by the additional logic necessary to generate the complements of

the accumulator and the look-up table output. The details of this method are shown in

figure 2.5. The two most significant bits are used to decode the quadrant, while the

remaining k-2 bits are used to address a one quadrant sine look-up table. The most

significant bit determines whether the amplitude is increasing or decreasing. The

accumulator output is used as is for the first and the third quadrants. The bits must be

complemented so that the slope of the saw-tooth is inverted for the second and fourth

quadrant. As shown in figure 2.5, the sampled waveform at the output of the look-up table

is a full wave rectified version of the desired sine wave. The final output sine wave is then

generated by multiplying the full wave rectified version by -1, when the phase is between

and 2.

In most practical DFS digital implementations, the numbers are represented in 2s

complement format. Therefore 2s complement must be used to invert the phase and


10/68


E&C 10

multiply the output of the look-up table by -1. However, it can be shown that if a LSB

offset is introduced into a number that is to be complemented, thena 1s complement may

be used in theplace of the 2s complement without introducing error. This provides

savings in hardware since a 1s complement may be implemented as a set of simple

Exclusive-OR gates. This LSB offset is provided by choosing look-up table samples

such that there is a LSB offset in both the phase and the amplitude of the samples, as

shown in figure 2.6. In figure 2.6, the phase offset has been used to reduce the address bits

by two. If there is no phase offset, 0 and / 2 have the same phase address, and one more

address bit is needed to distinguish between these two values.

Fig 2.5 Logic to exploit quarter wave symmetry of sine wave.

Fig 2.6 Phase addresses with LSB phase offset.


11/68


E&C 11

2.1.4 The Digital to Analog Converter

As it is shown in the figure 2.1, the discrete-time, discrete-amplitude information of the

sine wave is fed to a digital to analog converter to be converted to a continuous-amplitude,

continuous-time sine- wave. The current steering DACs are the best choice for high speed

applications because of their fast switching speed. They can be implemented in binary

weighted, thermometer coded and segmented architectures. The segmented architecture

combines the binary weighted and thermometer coded architectures to take advantage of

the benefits of both architectures. It uses thermometer coded for its most significant bits

(MSB) and binary weighted for its least significant (LSB) bits. The binary weighted

architecture has the advantage of small area and low power consumption. However, it

suffers from differential nonlinearity (DNL) and the presence of glitches, degrades itsdynamic performance. On the other hand, thermometer coded architecture has more

complexity and higher power consumption, but it has improved DNL, low glitches and

small switching errors. In this architecture, all the current sources are equal. The digital

input code is first fed to a thermometer decoder, and the thermometer code turns on the

switches accordingly. The segmented architecture uses the thermometer coded for its most

significant bits, which are more responsible for the dynamic performance, and binary

weighted for its least significant bits. A dummy decoder should be used for the binary

weighted part to compensate for the delay of the thermometer decoder of the thermometer

decoded part. It has to be noted that in the DDFS the dynamic performance of the DAC

plays a significant role in the spectral purity of the output spectrum.

2.1.5 Anti-aliasing Filter

As it will be discussed in more details in the next chapter, the DDFS is a sampling system.

Therefore, there will be images at the frequencies of of the output

spectrum, withfothe output frequency andfclkthe sampling clock. As the result of the zero

order hold functionality of the DAC, the amplitude of the images are weighted by the

function. For most applications, these images are undesirable. In order to remove

these images, a filter by an inverse function called anti-aliasing filter is used at

the end of the system. Ideally, this filter should have unity response over the Nyquist

bandwidth and zero beyond that. However, designing such a filter is not practical;

consequently, some percentage of available bandwidth will be unusable. Therefore, the


12/68


13/68


E&C 13

2.2 ROM-Less Direct Digital Synthesizers

As it was stated earlier, the ROM look up table is the speed, power and area bottleneck of

direct digital synthesizers. Although a tremendous work has been done to compress the

ROM look up table, direct digital synthesizers using this method still have high power

consumption and limitations in higher frequency operations. Consequently, ROM-Less

architectures has been introduced. The two most common ones are described briefly in the

following section.

2.2.1 Direct digital synthesizer using a sine weighted DAC

In order to reduce the power consumption of direct digital synthesizers, ROM-Less

architectures based on sine weighted DACs has been proposed. The block diagram of

DDFS using a sine weighted DAC is shown in the figure 2-5. In this architecture the

sine/cosine mapping and the digital to analog conversion are performed in a same block,

called sine weighted DAC. The design challenges of the sine weighted DAC is mostly the

same with the linear DAC. The main difference between the sine weighted DAC and linear

DAC is that in the linear DAC the current sources are identical with each other or they are

a power of two weighted. However, in the sine weighted DAC the current sources are

weighted according the amplitude of the sine wave. In this architecture, for each phase of

the sine wave the sine weighted DAC switches the corresponding amount of current to the

output. The most two significant bits are used to exploit the quarter wave symmetry of the

sine wave. Initially, these architectures used all thermometer sine weighted DACs. In order

to reduce the number of DAC cells, segmentation techniques were proposed. Segmentation

techniques for nonlinear DACs are more complicated than for linear ones, and these

architectures suffer from complexity when the resolution is high.


14/68


E&C 14

Fig 2.8 DDFS Block Diagram using sine weighted DAC

2.2.2 Direct digital synthesizer using triangle to sine wave converter

The block diagram of a DDFS using triangle to sine wave converter is shown in the figure

2-6. This architecture uses the most significant bit to exploit the half wave symmetry of

the sine wave; consequently, it decreases the truncation error. The output of the

complementor will then fed to a linear DAC. The linear DAC produces a triangle wave

which contains the analog phase information of the sine wave. The triangle wave is then

converted to a sine wave using an analog sine-mapping methodology. This methodology

uses the parabolic approximation.

Fig 2.9 DDFS Block Diagram using triangle to sine wave converter


15/68


E&C 15

CHAPTER 3

NOISE ANSLYSIS OF DDFS OUTPUT SPECTRUM

The direct digital frequency synthesizer has four sources of spurs, which is shown in the

figure 3-1. These error sources include the truncation error of the phase accumulator, the

phase to amplitude conversion error, the errors due to the nonlinearity of the DAC and also

the phase noise. In this chapter these error sources and their effect on the output spectrum

of the DDFS are discussed.

Fig 3.1 DDFS Spur Sources

3.1 Spurious related to the phase truncation error

As it was stated earlier, in order to have fine frequency resolution we would like to increase

the resolution of the phase accumulator. However, this would result in large circuits that

are needed to convert the phase data to amplitude data. Therefore, the output of the phase

accumulator is usually truncated from J bits in to K bits. This truncation will result in a

phase error between the generated phase by the accumulator, and the phase that is used by

the PAC for amplitude generation; consequently, there will be an error in the generated

amplitude. This error is periodic in the time domain and hence shows itself as spurs in the

frequency domain. The periodic nature of the error is due to the fact that after sufficient


16/68


E&C 16

rotation of the phase wheel the accumulator phase and the truncated phase will coincide

and there will be no phase error. The pattern will continue as the phase accumulator

continues to count. However, certain frequency control words result in the maximum level

of the phase truncation spurs while some result in no error. The control words that yield

the maximum spurs level should satisfy the following equation 3.1:

GCD (FCW,2J-K) = 2(J-K-1) (3.1)

Where, GCD denotes the greatest common divisor between the two variables in the

parentheses. Hence, any control word with 1 in the bit position of 2(J-K-1)and 0 in all other

least significant bit positions will result in the maximum truncation spurs level. Moreover,

the control word that yield to no truncation error should satisfy the following equation 3.2:

GCD (FCW,2J-K) = 2(J-K) (3.2)

Hence, any control word with 1 in the bit position of 2(J-K)and 0 in all other least significant

bit positions will result in no phase truncation spurs. The generated spurs due to the phase

truncation are the most significant spurs, if we consider the DAC ideal. They will be mixed

by the DDFS output frequency, and will generate spurs at multiples of the output

frequency, which is calculated by the following equation:

fspurs = fclk . [GCD (FCW,2J-K)]/2(J-K) (3.3)

3.2 Spurious related to the DACs finite resolution

The finite resolution of the DAC and consequently the finite number of quantization levels

of the DAC will result in an error, called the quantization error. The quantization error isbasically the difference between the amplitude of the reconstructed sine wave and the ideal

sine wave, which is due to the limited resolution of the. This error will show itself as spurs

in the output spectrum of DDFS. The quantization error can be decreased by increasing

the resolution of the DAC. The relationship between the resolution of the DAC and the

amount of quantization distortion can be quantified with the following equation:

SQR = 1.76+6.02P (3.4)


17/68


E&C 17

Where, P is the number of bits of the DAC and SQR is the ratio of the signal power to

quantization noise power. It should be noted that this equation does not provide any

information about the total SFDR of the system, and only considers the spurs due to the

quantization error.

3.3 Spurious related to the nonlinearities of the DAC

The most dominant spurs in the output spectrum of the DDFS is the spurs related to the

nonlinearities of the DAC. Both static and dynamic nonlinearities will be discussed in the

following section; however, in high sampling rates circuits the dynamic nonlinearities play

the significant role and being statically linear is the prerequisite for the DAC to have a

good dynamic linearity.

3.3.1 Static performance

The static specifications of a digital to analog converter include offset error, gain error,

integral nonlinearity (INL) and differential nonlinearity (DNL). These errors will result a

nonlinear relation between the actual output level produced by the DAC and the ideal

output level that the designer expects; consequently, there will be harmonic distortions at

the output spectrum of the digital to analog converter. Figure 3.2 shows the ideal and actual

transfer functions of a three bit DAC, together with the correspondent static nonlinearities.

Fig 3.2 Transfer Characteristics of a DAC.


18/68


E&C 18

Offset error: offset error is the shift in the transfer function of the DAC on the vertical

axis, and it shows that for an input value of zero, the DAC will output an analog value, not

equal to zero.

Gain error: In the transfer function of the DAC, the difference between the actual slope

and the ideal slop is defined as the gain error. The gain error is not of a big concern when

a single converter is being used, because rather than the absolute accuracy, the relative

accuracy is of concern.

Monotonicity: The monotonicity of a digital to analog converter is its ability to decrease

or increase in the same direction of its input signal.

Integral nonlinearity (INL) and differential nonlinearity (DNL): If we consider a line

that passes through the end points of the transfer function of the DAC, the integral

nonlinearity (INL) would be the maximum deviation between that line and the actual

analog output of the DAC. The differential nonlinearity (DNL) is the difference between

the actual step size and the ideal one least significant bit step size in the transfer function

of the DAC. These errors are shown in the figure 3-2.

3.3.2 Dynamic performance

The dynamic errors of the digital to analog converter include glitches, settling time and

feed through effects. These errors are shown in the figure 3-4. Dynamic errors have a

significant impact on the performance of the DAC and they even become more critical for

higher output frequencies and sampling rates. These errors are presented in the following

section.


19/68


E&C 19

Fig 3.3 DACs Full Scale Transition.

Glitches: Glitches happen as a result of an unmatched switching time between different

bits, which can be due to skew between bits in the digital part or the timing mismatch in

the switches of the DAC. The result is a signal dependant error from the inputs to the

output of the DAC during the code transitions. For example, consider the case that the

input code is changing from 0111 to 1000. If the switching time of all the current cells do

not be synchronized, it is possible that we get the analog converted of 111 for a very short

period in the output; consequently, a glitch will be occurred in the output. This

phenomenon is much severe in high frequencies. Careful layout and using thermometer

decoding can be used to degrade this effect.

Settling time: is defined as the time which is needed for the analog output to settle between

the accepted error band of its final value and is due to the parasitic capacitances of the

circuit. The settling time should be kept as small as possible to have a low distortion on

the analog output signal.

Feed through effects: feed through effects have two sources in a DAC cells. The first one

is the feed through of the digital signal through or of the switch transistors, which actually

results in distortion in the Nyquist bandwidth of the output spectrum, since its a code


20/68


E&C 20

dependent error. This error can be minimized by a careful layout and switches sizing. The

second one is the feed through of the clock to the analog output, which also can be reduced

by minimizing the size of the switches and hence reducing the capacitive coupling of the

switches to the output. All the dynamic nonlinearities associated with the switches can be

addressed by using return to zero (RTZ) technique, which can be implemented both with

analog or digital solutions. In analog return to zero technique the output of the current cells

is forced to zero when the clock is low and their current is switched to the output only

when the clock is high; consequently, the switching transients do not appear in the DACs

output. As it was stated earlier, finite output impedance of the DAC will also result in

dynamic nonlinearities.

3.3 The phase noise of the DDFS

The dominant contributor to the DDFS phase noise is the phase noise of the reference

clock. In fact, because DDFS is a divider of the sampling clock, the purity of its output

spectrum is directly affected by the purity of its reference clock. However, DDFS has a

great advantage over PLL regarding to its phase noise. This is because PLL multiplies the

phase noise of the reference clock in its feedback loop, but DDFS is a feed forward system,

which its output is a fractional division of the reference clock; consequently, the phase

noise which presents in the output spectrum of DDFS decreases by 20 log (N), where N is

the division ratio. Moreover, as DDFS is a sampling system and the time interval between

the samples are important, and the jitter of the reference clock will have an important role

on the output spectral purity.


21/68


E&C 21

CHAPTER 4

DESIGNED DIRECT DIGITAL FREQUENCY

SYNTHESIZER

4.1 Concept of the architecture used

Instead of a ROM LUT, a hardware-optimized phase-to-sine amplitude converter

approximates the first quadrant of the sine function with eight equal length piecewise

linear segments. The main goal is to maintain low system complexity and reduce power

consumption and chip area requirements. The second aim is to achieve a specified spectral

purity, which is defined as the ratio of the power in the desired frequency to the power in

the greatest harmonic, across the synthesizers tuning bandwidth. Spectral purity is an

essential design parameter for synthesizer used in communication systems, ensuring that

undesired in-band signals remain below a given threshold and are not detected.

In order to achieve the first goal, we approximate a sinusoid as a series of eight equal-

length piecewise continuous linear segments si, where

Si(x) = mi* (x- i/8) + yi, i [0 , 7] (4.1)

is the slope of each segment and is carefully selected to eliminate the requirement for

multiplication by representing each one as a sum of at the most two powers of two. This

is well known and often used technique. We also restrict the precision of slope

representation, i.e., the difference between the smaller and the largest powers of two used,

in effect putting an upper bound on the adders width. Equal length segments are selected

to reduce the control system circuitry costs. In order to achieve a desired spectral purity,

different sets of mi and yi coefficients are evaluated and the best one meeting the

requirements is selected.

The first important feature of our architecture is that we constrain the quantization of the

segment slopes such that they are represented with at most two non-zero binary digits. We

exploit the well-known principle that multiplication by a factor of two can be


22/68


E&C 22

accomplished with a trivial bit shift, and that multiplication by a factor equal to a sum of

two powers of two can be accomplished with at most two trivial bit shifts and one addition.

Consequently, implementing the multiplication in equation (4.1) requires at most one

addition.

The second important feature of our architecture is that we limit the dynamic range of each

slope mi so that each can be expressed with four bits. This implies 16 possibilities, however

we use only a subset of 12. We discard those slopes with more than two nonzero digits but

accept -1 as a valid digit. We scale first quadrant angles from the interval [0, p/2] to the

interval [0, 1], in order to represent them as a binary fraction. Hence, the first derivative

of the sine function in the first quadrant is scaled from the range [0,1] to [0, p/2] [0,

1.57]. Consequently, we select segment slopes in the following set:

{1.5, 1.25, 1.125, 1, .875, .75, .625, .5, .375, .25, .125, 0}.

As mentioned previously, common wisdom in designing a DDF Synthesizer says that one

should minimize the amplitude error on sinusoid amplitudes calculated for any phase

angle. While this may be an important performance parameter for a sine function block, it

is not necessarily so for a DDF Synthesizer. Spectral purity, which is defined as the ratio

of the power in the desired frequency to the power in the greatest harmonic across the

synthesizers tuning bandwidth, is much more important. Spectralpurity is an essential

design parameter for synthesizers in communications systems, ensuring that undesired in-

band signals remain below a given threshold and are not detected.

In order to achieve a desired spectral purity, we evaluate different sets of eight pairs of mi

and yi coefficients, and select the best one meeting our requirements. We solve thisoptimization problem with a Genetic Algorithm, with the fitness function equal to the

spectral purity. All calculations are done taking finite bitwidth effects into account.

Equation (4.2) below gives the slopes and y approximations that we have used for this

architecture. They meet the requirement of 60 dBc spectral purity. Figure 4.1 shows the

corresponding output for angles in the first quadrant.


23/68


E&C 23

(4.2)

Fig 4.1 First Quadrant Sine Approximation.


24/68


E&C 24

In Figure 4.1, the 8 segments are noticeable, as are the amplitude quantization effects for

each angle. Discontinuities at quadrant transitions may also be observed. The maximum

amplitude is equal to 123/128 or 0.9609375. Taking the Discrete Fourier Transform of a

full period of data reveals that the amplitude of the fundamental is approximately

123.1/128. This reduction from a maximum of 127/128 in full-scale output is

inconsequential from a system perspective.

4.2 SYSTEM ARCHITECTURE

The system architecture is shown in Figure 4.2.

The phase accumulators 16 bits are truncated to 12. This limits spurs due to phase

truncation to approximately -72 dBc. The two MSBs are used for quadrant symmetry. The

first MSB determines the sign of the output data. It controls a format converter block which

modifies the sign and magnitude format to the twos complement format required by the

DAC. The second MSB controls a 1s complement block, which inverts the remaining

phase accumulator bits for angles in quadrants 2 and 4. The consequence is that the ramp

output from the phase accumulator is converted to a triangular wave of equal frequency

and twice the amplitude.

The next three MSBs identify one of eight linear segments, and thus they control the

multiplexers that implement equation (4.2), which is defined in 8 parts. The remaining 7

bits identify different sub-angles, or positions along any of the 8 segments. In equations

(4.1) and (4.2), these 7 bits are equal to the quantity (x - xi), so this operation does not

require any processing.

The two upper multiplexers select shifted versions of the 7 least significant phase bits,

passing them to the three-operand adder according to the corresponding segment. In the

figure, the notation {>>n} signifies a right shift by nbits, or division by 2n. The addition

of two shifted versions of an anglex realizes the multiplication operation of an anglexby

a slope mi in equation (4.2).The bottom multiplexer selects one of eight initial

approximations and also passes it to the three operand adder.


25/68


E&C 25

The output from the multiplexers is shown to be 13 bits wide, in order to properly align

the three terms to be added. In actual fact, the first three bits of the two upper multiplexers

are 0, as are the last three bits of the lower multiplexer.

The three-operand adder adds the multiplexer outputs together and rounds the result to 7

bits. The rounding operation is accomplished by adding the 8th bit to the truncated 7-bit

sum.

This architecture is significantly less complex than all those listed in section 2 for a similar

output spectral purity performance. It does not include a ROM. No multipliers nor squaring

circuits are required. Equal-length segments are used to simplify control circuitry. Only 3

integers need to be added, and the multiplexers shown in Figure 4.2 can be optimized by

combining similar inputs, and be implemented with combinational logic.

Fig 4.2 Proposed Architecture.


26/68


E&C 26

4.3 IMPLEMENTATION DETAILS

The system was described in VHDL with less than 200 lines of code. During placement

and routing with automated tools, a clock constraint of 125 MHz was easily met without

having to add pipelining registers. Pipelining would increase this maximum clock rate, but

at the expense of a longer latency when changing the synthesizers output frequency.

Power consumption is estimated at under 10 mW for a 100 MHz clock, or 0.1 mW/MHz.

The Frequency Control Word is 16 bits wide, yielding a frequency resolution of

approximately 1526 Hz for a 100 MHz reference clock. The 8-bit wide output data is in

twoscomplement format, compatible with most commercial DACs. As stated above, this

design is severely IO bound. This is a direct consequence of the tremendous reduction in

complexity when compared to other previously reported designs for similar spectral purity.

Due to limited allocation of silicon area, it was decided not to increase the phase

accumulator width to 32 bits, as is common. This would have added 16 pins to the chip

and approximately 300 mm to each side of the die. The phase control word input could

also have been serialized, but that would have increased the tuning latency.

If system frequency resolution requirements called for a 32 bit wide accumulator and thesame 125 MHz clock rate, a modest increase in system complexity would follow. This is

because several pipelining registers would be required. Alternatively, a more efficient

adder configuration would have to be used with a corresponding increase in the number of

cells. In any case, the present core is very small, which makes it an ideal building

block in a System On a Chip digital receiver.


27/68


E&C 27

Fig 4.3 Simulation Results.


28/68


E&C 28

CHAPTER 5

CONCLUSION

We have presented a low-power sine-output Direct Digital Frequency Synthesizer (DDFS)

realized in 0.18 mm CMOS that achieves 60 dBc spectral purity from DC to the Nyquist

frequency. It includes no ROM and no multipliers but requires an external DAC if an

analog output is desired. Power consumption is 10 mW for a 100 MHz clock, which is

significantly less than figures reported previously. System complexity is greatly reduced

by using an efficient linear interpolation scheme to approximate a sinusoid function.


29/68


E&C 29

APPENDIX A

INTRODUCTION TO PLATFORMS

What is an FPGA?

Before the advent of programmable logic, custom logic circuits were built at the board

level using standard components, or at the gate level in expensive application-specific

(custom) integrated circuits. The FPGA is an integrated circuit that contains many (64 to

over 10,000) identical logic cells that can be viewed as standard components. Each logic

cell can independently take on any one of limited set of personalities. The individual cellsare interconnected by a matrix of wires and programmable switches. A user's design is

implemented by specifying the simple logic function for each cell and selectively closing

the switches in the interconnect matrix. The arrays of logic cells and interconnect form a

fabric of basic building blocks for logic circuits. Complex designs are created by

combining these basic blocks to create the desired circuit.

What does a logic cell do?

The logic cell architecture varies between different device families. Generally speaking,

each logic cell combines a few binary inputs (typically between 3 and 10) to one or two

outputs according to a boolean logic function specified in the user program. In most

families, the user also has the option of registering the combinatorial output of the cell, so

that clocked logic can be easily implemented. The cell's combinatorial logic may be

physically implemented as a small look-up table memory (LUT) or as a set of multiplexers

and gates. LUT devices tend to be a bit more flexible and provide more inputs per cell than

multiplexer cells at the expense of propagation delay.

So what does 'Field Programmable' mean?

Field Programmable means that the FPGA's function is defined by a user's program rather

than by the manufacturer of the device. A typical integrated circuit performs a particular

function defined at the time of manufacture. In contrast, the FPGA's function is defined


30/68


E&C 30

by a program written by someone other than the device manufacturer. Depending on the

particular device, the program is either burned in permanently or semi-permanently as

part of a board assembly process, or is loaded from an external memory each time the

device is powered up. This user programmability gives the user access to complex

integrated designs without the high engineering costs associated with application specific

integrated circuits.

How are FPGA programs created?

Individually defining the many switch connections and cell logic functions would be a

daunting task. Fortunately, this task is handled by special software. The software

translates a user's schematic diagrams or textual hardware description language code then

places and routes the translated design. Most of the software packages have hooks to allow

the user to influence implementation, placement and routing to obtain better performance

and utilization of the device. Libraries of more complex function macros (eg. adders)

further simplify the design process by providing common circuits that are already

optimized for speed or area.

Gates

1987: 9,000 gates, Xilinx 1992: 600,000, Naval Surface Warfare Department Early 2000s: Millions Market size

1985: First commercial FPGA technology invented by Xilinx 1987: $14 million ~1993: >$385 million 2005: $1.9 billion 2010 estimates: $2.75 billion
http://en.wikipedia.org/wiki/CPLD


31/68


E&C 31

CPLDs and FPGAs

The primary differences between CPLDs and FPGAs are architectural. A CPLD has a

somewhat restrictive structure consisting of one or more programmable sum-of-products

logic arrays feeding a relatively small number of clocked registers. The result of this is

less flexibility, with the advantage of more predictable timing delays and a higher logic-

to-interconnect ratio. The FPGA architectures, on the other hand, are dominated by

interconnect. This makes them far more flexible (in terms of the range of designs that are

practical for implementation within them) but also far more complex to design for.

Another notable difference between CPLDs and FPGAs is the presence in most FPGAs of

higher-level embedded functions (such as adders and multipliers) and embedded

memories, as well as to have logic blocks implements decoders or mathematical functions.

Security considerations

With respect to security, FPGAs have both advantages and disadvantages as

compared to ASICs or secure microprocessors. FPGAs' flexibility makes malicious

modifications during fabrication a lower risk. For many FPGAs, the loaded design is

exposed while it is loaded (typically on every power-on). To address this issue, some

FPGAs support bit stream encryption.

Applications

Digital signal processing, radio, aerospace and defence systems, ASIC prototyping,

medical imaging, speech recognition, cryptography, bioinformatics, computer hardware

emulation,radio astronomy,metal detection and a growing range of other areas.

FPGAs especially find applications in any area or algorithm that can make use of the

massive parallelism offered by their architecture. One such area is code breaking, in

particularbrute-force attack,of cryptographic algorithms.

FPGAs are increasingly used in conventionalhigh performance computing applications

where computational kernels such asFFT or Convolution are performed on the FPGA

instead of amicroprocessor.
http://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/Medical_imaginghttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Cryptographyhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Emulatorhttp://en.wikipedia.org/wiki/Emulatorhttp://en.wikipedia.org/wiki/Radio_astronomyhttp://en.wikipedia.org/wiki/Brute-force_attackhttp://en.wikipedia.org/wiki/High_performance_computinghttp://en.wikipedia.org/wiki/FFThttp://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/Microprocessorhttp://en.wikipedia.org/wiki/FFThttp://en.wikipedia.org/wiki/High_performance_computinghttp://en.wikipedia.org/wiki/Brute-force_attackhttp://en.wikipedia.org/wiki/Radio_astronomyhttp://en.wikipedia.org/wiki/Emulatorhttp://en.wikipedia.org/wiki/Emulatorhttp://en.wikipedia.org/wiki/Bioinformaticshttp://en.wikipedia.org/wiki/Cryptographyhttp://en.wikipedia.org/wiki/Speech_recognitionhttp://en.wikipedia.org/wiki/Medical_imaginghttp://en.wikipedia.org/wiki/CPLDhttp://en.wikipedia.org/wiki/CPLD


32/68


E&C 32

APPENDIX B

FPGA: SPARTAN II

INTRODUCTION

The Spartan-II Field Programmable Gate Array family gives users high

performance, abundant logic resources, and a rich feature set, all at an exceptionally low

price. The six-member family offers densities ranging from 15000 to 200000 system gates.

System performance is supported up to 200 MHz. Features include block RAM (to 56K

bits), distributed RAM (to 75264 bits), 16 selectable input-output standards, and fourDLLs. Fast predictable interconnect means that successive design iterations continue to

meet timing requirements.

The Spartan-II family is a superior alternative to mask-programmed ASICs. The

FPGA avoids the initial cost, lengthy development cycles, and inherent risk of

conventional ASICs. Also, FPGA programmability permits design upgrades in the field

with no hardware replacement necessary (impossible with ASICs).

FEATURES

Second generation ASIC replacement technology Densities as high as 5,292 logic cells with up to 200,000 system gates. Streamlined features based on Virtex FPGA architecture. Unlimited reprogrammability. Very low cost. Cost-effective 0.18 micron process.

System level features Select RAM hierarchical memory

16 bits/LUT distributed RAM. Configurable 4K bit block RAM. Fast interfaces to external RAM.

Fully PCI compliant. Low-power segmented routing architecture.


33/68


E&C 33

Full readback ability for verification/observability. Dedicated carry logic for high-speed arithmetic. Efficient multiplier support. Cascade chain for wide-input functions. Abundant registers/latches with enable, set, reset. Four dedicated DLLs for advanced clock control. Four primary low-skew global clock distribution nets. IEEE 1149.1 compatible boundary scan logic.

Versatile I/O and packaging Pb-free package options. Low-cost packages available in all densities. Family footprints compatibility in common packages. 16 high-performance interface standards. Hot swap Compact PCI friendly. Zero hold time simplifies system timing

Core logic powered at 2.5V and I/Os powered at 1.5V, 2.5V, or 3.3V. Fully supported by powerful Xilinx ISE development system

Fully automatic mapping, placement, and routing.

Table Spartan-II FPGA Family Members


34/68


E&C 34

GENERAL OVERVIEW

The Spartan-II family of FPGAs have a regular, flexible, programmable

architecture of Configurable Logic Blocks (CLBs), surrounded by a perimeter of

programmable Input-Output Blocks (IOBs). There are four Delay-Locked Loops (DLLs),

one at each corner of the die. Two columns of block RAM lie on opposite sides of the die,

between the CLBs and the IOB columns. These functional elements are interconnected by

a powerful hierarchy of versatile routing channels.

Spartan-II FPGAs are customized by loading configuration data into internal static

memory cells. Unlimited reprogramming cycles are possible with this approach. Stored

values in these cells determine logic functions and interconnections implemented in the

FPGA. Configuration data can be read from an external serial PROM (master serial mode),

or written into the FPGA in slave serial, slave parallel, or Boundary Scan modes.

Spartan-II FPGAs are typically used in high-volume applications where the

versatility of a fast programmable solution adds benefits. Spartan-II FPGAs are ideal for

shortening product development cycles while offering a cost-effective solution for high

volume production.

Spartan-II FPGAs achieve high-performance, low-cost operation through

advanced architecture and semiconductor technology. Spartan-II devices provide system

clock rates up to 200 MHz.

In addition to the conventional benefits of high-volume programmable logic

solutions, Spartan-II FPGAs also offer on-chip synchronous single-port and dual-port

RAM (block and distributed form), DLL clock drivers, programmable set and reset on all

flip-flops, fast carry logic, and many other features.


35/68


E&C 35

Fig 5.1 Basic Spartan-II Family FPGA Block Diagram

SPARTAN-II PRODUCT AVAILABILITY

The below table shows the maximum user I/Os available on the device and the

number of user I/Os available for each device/package combination. The four global clock

pins are usable as additional user I/Os when not used as a global clock pin. These pins are

not included in user I/O counts.

Table Spartan-II FPGA User I/O Chart


36/68


E&C 36

ARCHITECTURAL DESCRIPTION

SPARTAN-II FPGA ARRAY

The Spartan-II field-programmable gate array, is composed of five major configurableelements:

IOBs provide the interface between the package pins and the internal logic. CLBs provide the functional elements for constructing most logic. Dedicated block RAM memories of 4096 bits each. Clock DLLs for clock-distribution delay compensation and clock domain control. Versatile multi-level interconnect structure.

The CLBs form the central logic structure with easy access to all support and

routing structures. The IOBs are located around all the logic and memory elements for

easy and quick routing of signals on and off the chip.

Values stored in static memory cells control all the configurable logic elements and

interconnect resources. These values load into the memory cells on power-up, and can

reload if necessary to change the function of the device.

INPUT/OUTPUT BLOCK

The Spartan-II FPGA IOB, features inputs and outputs that support a wide variety

of I/O signalling standards. These high-speed inputs and outputs are capable of supporting

various state of the art memory and bus interfaces. Table lists several of the standards

which are supported along with the required reference, output and termination voltages

needed to meet the standards.


37/68


E&C 37

Fig Spartan-II FPGA Input-Output Block (IOB)

The three IOB registers function either as edge-triggered D-type flip-flops or as

level-sensitive latches. Each IOB has a clock signal (CLK) shared by the three registers

and independent Clock Enable (CE) signals for each register, this signal can be

independently configured as a synchronous Set, a synchronous Reset, an asynchronous

Preset, or an asynchronous Clear.

A feature not shown in the block diagram, but controlled by the software, is polarity

control. The input and output buffers and all of the IOB control signals have independent

polarity control.

LOOK-UP TABLES

Spartan-II FPGA function generators are implemented as 4-input look-up tables

(LUTs). In addition to operating as a function generator, each LUT can provide a 16 x 1-

bit synchronous RAM. Furthermore, the two LUTs within a slice can be combined to

create a 16x2-bit or 32 x 1-bit synchronous RAM, or a 16 x 1-bit dual port synchronous

RAM.The Spartan-II FPGA LUT can also provide a 16-bit shift register that is ideal for

capturing high-speed or burst-mode data. This mode can also be used to store data in

applications such as Digital Signal Processing.


38/68


E&C 38

BOUNDARY SCAN

Spartan-II device support all the mandatory boundary scan instructions specified

in the IEEE standard 1149.1. A Test Access Port (TAP) and registers are provided that

implement the EXTEST, SAMPLE/PRELOAD, and BYPASS instructions. The TAP also

supports two USERCODE instructions and internal scan chains.

The TAP uses dedicated package pins that always operate using LVTTL. For TDO

to operate using LVTTL, the VCCOfor bank 2 must be 3.3V. Otherwise, TDO switches

rail-to-rail between ground and VCCO. TDI, TMS, and TCK have a default internal weak

pull-up resistor, and TDO has no default resistor. Bitstream options allow setting any of

the four TAP pins to have an internal pull-up, pull-down, or neither.

Boundary-scan operation is independent of individual IOB configurations, and

unaffected by package type. All IOBs, including unbounded ones, are treated as

independent 3-state bidirectional pins in a single scan chain. Retention of the bidirectional

test capability after configuration facilitates the testing of external interconnections.

The public boundary-scan instructions are available prior to configuration, the

public instructions remain available together with any USERCODE instructions installed

during the configuration. While the SAMPLE and BYPASS instructions are available

during configuration, it is recommended that boundary-scan operations not be performed

during this transitional period.

In addition to the test instructions outlined above, the boundary-scan circuitry can

be used to configure the FPGA, and also to read back the configuration data.

To facilitate internal scan chains, the User Register provides three outputs (Reset,

Update and Shift) that represent the corresponding states in the boundary-scan internal

state machine.

The table lists the boundary-scan instructions supported in Spartan-II FPGAs. The

Internal signals can be captured during EXTEST by connecting them to unbounded or

unused IOBs. They may also be connected to the unused outputs of IOBs defined as

unidirectional input pins.


39/68


E&C 39

Boundary-Scan Command Binary Code [4:0] Description

EXTEST 00000 Enables boundary-scan

EXTEST operation

SAMPLE 00001 Enables boundary-scanSAMPLE operation

USR1 00010 Access user-defined

register 1

USR2 00011 Access user-defined

register 2

CFG_OUT 00100 Access the configuration

bus for configuration

CFG_IN 00101 Enables boundary-scan

INTEST operation

INTEST 00111 Enables shifting out

USER code

USRCODE 01000 Enables shifting out of ID

code

IDCODE 01001 Disables output pins

while enabling the Bypass

Register

HIZ 01010 Clock the start-up

sequence when

StartupClk is TCK

JSTART 01100 Clock the start-up

sequence when

StartupClk is TCK

BYPASS 11111 Enables BYPASS

RESERVED All other codes Xilinx reserved

instructions

Table Boundary-Scan Instruction set


40/68


E&C 40

CONFIGURATION

Configuration is the process by which the Bitstream of a design, as generated by

the Xilinx software, is loaded into the internal configuration memory of the FPGA.

Spartan-II devices support both serial configuration, using the master/slave serial and

JTAG modes, as well as byte-wide configuration employing the Slave Parallel mode.

CONFIGURATION FILE

Spartan-II devices are configured by sequentially loading frames of data that have

been concatenated into a configuration file. The table shows how much non-volatilestorage space is needed for Spartan-II devices.

It is important to note that, while a PROM is commonly used to store configuration

data before loading them into the FPGA, it is by no means required. Any of a number of

different kinds of under populated non-volatile storage already available either on or off

the board (i.e., hard drives, FLASH cards, etc.) can be used.

Table Spartan-II Configuration File Size

Device Configuration File Size

(Bits)

XC2S15 197,696

XC2S30 336,768

XC2S50 559,200

XC2S100 781,216

XC2S150 1,040,096

XC2S200 1,335,840


41/68


E&C 41

5.7 MODES

Spartan-II devices support the following four configuration modes:

Slave Serial mode. Master Serial mode. Slave Parallel mode. Boundary-scan mode.

The Configuration mode pins (M2, M1, M0) select among these configuration

modes with the option in each case of having the IOB pins either pulled up or left

floating prior to the end of configuration. The selection codes are listed in table.

Configuration through the boundary-scan port is always available, independent of

the mode selection. Selecting the boundary-scan mode simply turns off the other modes.

The three mode pins have internal pull-up resistors, and default to a logic High if left

unconnected.

Table Configuration Modes

SLAVE SERIAL MODE

In slave serial mode, the FPGAs CCLK pin is driven by an external source,

allowing FPGAs to be configured from other logic devices such as microprocessors or in

a daisy-chain configuration. A Spartan-II device in slave serial mode should be connected

as shown for the third device from the left. Slave Serial mode is selected by a on

the mode pins (M0, M1, M2).


42/68


E&C 42

Fig Master-Slave Serial Configuration Circuit Diagram

The serial Bitstream must be setup at the DIN input pin a short time before each

rising edge of an externally generated CCLK. Multiple FPGAs in Slave Serial mode can

be daisy-chained for configuration form a single source. The maximum amount of data

that can be sent to the DOUT pin for a serial daisy chain is 220-1 (1,048,575) 32-bit words,

or 33,554,400 bits, which is approximately 25 XC2S200 bitstreams. The configuration

bitstream of downstream devices is limited to this size.

After an FPGA is configured, data for the next device is routed to the DOUT pin

changes on the rising edge of CCLk. Configuration must be delayed until INIT pins of all

daisy-chained FPGAs are High.


43/68


E&C 43

Fig Slave Serial Mode Timing

5.8 PIN TYPES

Most pins on a Spartan-II FPGA are general-purpose, user-defined Input-Output

pins. There are, however different functional types of pins on Spartan-II FPGA packages.

Table Spartan-II Family Package Options


44/68


E&C 44

Fig XC2S100TQ144 DEVICE

Fig DIP Switch & DAC Interface


45/68


E&C 45

XC2S100TQ144 DEVICE PINOUTS

XC2S100 Pad Name

TQI44

XC2S100 Pad Name

TQ144Function Bank Function Bank

GND - P143 VCCO 5 P107

TMS - P142 M2 - P106

I/O 7 P141 I/O 5 P103

I/O 7 P140 I/O, VREF 5 P102

I/O, VREF 7 P139 I/O 5 P101

I/O 7 P138 I/O, VREF 5 P100

I/O, VREF 7 P137 I/O 5 P99

I/O 7 P136 GND - P98

GND - P135 VCCINT - P97

I/O 7 P134 I/O 5 P96

I/O 7 P133 I/O 5 P95

I/O, VREF 7 P132 I/O, VREF 5 P94

I/O 7 P131 I/O 5 P93

I/O 7 P130 VCCINT - P92

I/O, IRDY 7 P129 I, GCK1 5 P91


VCCO 7 P127 VCCO 4 P90

VCCO 6 P127 GND - P89

I/O, TRDY 6 P126 I, GCK0 4 P88

VCCINT - P125 I/O 4 P87

I/O 6 P124 I/O 4 P86


46/68


E&C 46

I/O 6 P123 I/O, VREF 4 P85

I/O,VREF 6 P122 I/O 4 P84

I/O 6 P121 I/O 4 P83


GND 6 P119 GND - P81

I/O 6 P118 I/O 4 P80


I/O 6 P116 I/O 4 P78


I/O 6 P114 I/O 4 P76

I/O 6 P113 I/O 4 P75

I/O 6 P112 I/O 4 P74

M1 - P111 GND - P73

GND - P110 DONE 3 P72

M0 - P109 VCCO 4 P71


XC2S100 Pad Name

TQI44

XC2S100 Pad Name

TQ144Function Bank Function Bank

PROGRAM - P69 TDO 2 P34

I/O (INIT) 3 P68 GND - P33

I/O (D7) 3 P67 TDI - P32

I/O 3 P66 I/O (CS) 1 P31

I/O, VREF 3 P65 I/O(WRITE) 1 P30

I/O 3 P64 I/O 1 P29


47/68


E&C 47


I/O (D6) 3 P62 I/O, VREF 1 P27

GND - P61 I/O 1 P26

I/O (D5) 3 P60 GND - P25


I/O, VREF 3 P58 I/O 1 P23

I/O (D4) 3 P57 I/O 1 P22

I/O 3 P56 I/O, VREF 1 P21

VCCINT - P55 I/O 1 P20

I/O, TRDY 3 P54 I/O 1 P19

VCCO 3 P53 I, GCK2 1 P18

VCCO 2 P53 GND - P17


I/O, IRDY 2 P51 VCCO 0 P16

I/O 2 P50 I, GCK3 0 P15

I/O (D3) 2 P49 VCCINT - P14

I/O, VREF 2 P48 I/O 0 P13


I/O (D2) 2 P46 I/O 0 P11

GND - P45 I/O 0 P10

I/O (D1) 2 P44 VCCINT - P9

I/O, VREF 2 P43 GND - P8

I/O 2 P42 I/O 0 P7




48/68


E&C 48

I/O (DIN,

D0)

2 P39 I/O 0 P4

I/O (DOUT,

BUSY)

2 P38 I/O 0 P3

CCLK 2 P37 TCK - P2



Pins P104, P105 are not connected pins.


49/68


E&C 49

APPENDIX C

CODES

Phase Accumulator:

library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use ieee.std_logic_arith.all;

use ieee.std_logic_unsigned.all;

entity phase_accumulator is

Port ( clk : in STD_LOGIC;

rst : in STD_LOGIC;

freq_offset : in STD_LOGIC_VECTOR (5 downto 0);

dout : out STD_LOGIC_VECTOR (19 downto 0);

comp1: out std_logic);

end phase_accumulator;

architecture Behavioral of phase_accumulator is

signal temp : std_logic_vector(19 downto 0);

begin

process(clk,rst,freq_offset)

begin


50/68


E&C 50

if(rst = '1') then

temp '0');

elsif(clk'event and clk = '1') then

temp


51/68


E&C 51

Complimenter:

library IEEE;




entity complimenter is

Port ( clk : in std_logic;

phase_out : in STD_LOGIC_VECTOR (9 downto 0);

comp : in STD_LOGIC;

comp_out : out STD_LOGIC_VECTOR (9 downto 0));

end complimenter;

architecture Behavioral of complimenter is

begin

process(phase_out,comp,clk)

begin

if(comp = '0') then


52/68


E&C 52

comp_out


53/68


E&C 53

Phase accumulator and Complimenter:

library IEEE;


entity phase1 is


rst : in STD_LOGIC;


comp_out : out STD_LOGIC_VECTOR (9 downto 0);

phase_out : out std_logic_vector(19 downto 0));

end phase1;

architecture Behavioral of phase1 is

component phase_accumulator is


rst : in STD_LOGIC;


dout : out STD_LOGIC_VECTOR (19 downto 0);


54/68


E&C 54

comp1: out std_logic);

end component;

component complimenter is


phase_out : in STD_LOGIC_VECTOR (9 downto 0);

comp : in STD_LOGIC;

comp_out : out STD_LOGIC_VECTOR (9 downto 0));

end component;

signal dout : std_logic_vector(19 downto 0);

signal comp1: std_logic;

signal comp_in : std_logic_vector(9 downto 0);

begin

phase_out


55/68


E&C 55

Explanation of Code:

Entry: clk, rst, freq_offset

Out: comp_out, phase_out

The codes for phase accumulator and compimenter are combined using portmapping technique

The inputs are those of phase accumulator The outputs are of both the blocks

Mux Tree:

library IEEE;




entity mux_tree is

Port ( din : in STD_LOGIC_VECTOR (12 downto 0);

sel : in STD_LOGIC_VECTOR (2 downto 0);

dout1 : out STD_LOGIC_VECTOR (12 downto 0);



56/68


E&C 56

dout3 : out STD_LOGIC_VECTOR (12 downto 0));

end mux_tree;

architecture Behavioral of mux_tree is

component mux is

Port (din1 : in STD_LOGIC_VECTOR (12 downto 0);

din2 : in STD_LOGIC_VECTOR (12 downto 0);







sel : in STD_LOGIC_vector(2 downto 0);

dout : out STD_LOGIC_vector(12 downto 0));

end component;

signal shift1 : std_logic_vector(12 downto 0);


57/68


E&C 57



signal y0,y1,y2,y3,y4,y5,y6,y7 : std_logic_vector(12 downto 0);

constant zero : std_logic_vector(12 downto 0):="0000000000000";

begin

y0


58/68


E&C 58

Mux1 : mux port map(din,din,din,din,din,shift1,shift1,zero,sel,dout1);

Mux2 : mux port map(shift1,shift1,shift2,shift2,shift3,shift3,zero,zero,sel,dout2);

Mux3 : mux port map(y0,y1,y2,y3,y4,y5,y6,y7,sel,dout3);

end Behavioral;


Entry: din, sel

Out: dout1, dout2, dout3

The mux tree contains a combination of three 8:1 Muxs The inputs to mux are all of 12 bits For the first mux the inputs are, din,din,din,din,din,shift1,shift1,zero For the second one, shift1,shift1,shift2,shift2,shift3,shift3,zero,zero For the third one the inputs are the values from the ROM lookup table i.e. they are

y0,y1,y2,y3,y4,y5,y6,y7

Component of Mux used in Mux tree:

library IEEE;





59/68


E&C 59

entity mux is

Port (din1 : in STD_LOGIC_VECTOR (12 downto 0);








sel : in STD_LOGIC_vector(2 downto 0);

dout : out STD_LOGIC_vector(12 downto 0));

end mux;

architecture Behavioral of mux is

begin

process(din1,din2,din3,din4,din5,din6,din7,din8,sel)

begin

if(sel = "000") then


60/68


E&C 60

dout


61/68


E&C 61

end Behavioral;


Entry: din1, din2, din3, din4, din5, din6, din7, din8, sel

Out: dout

The mux used here is a 8:1 mux All the 8 inputs are of 12 bits The output is chosen based on the select input which varies from 000 to 111,

which selects the outputs from din1 to din8 respectively

The output is also of 12 bits

Summer:

library IEEE;




entity sum_out is

Port ( dout1 : in STD_LOGIC_VECTOR (12 downto 0);

dout2 : in STD_LOGIC_VECTOR (12 downto 0);



62/68


E&C 62

sum : out STD_LOGIC_VECTOR (14 downto 0));

end sum_out;

architecture Behavioral of sum_out is

begin

sum


63/68


E&C 63

Format Converter:

library IEEE;


entity format_converter is


sum : in STD_LOGIC_VECTOR (14 downto 0);

sel : in std_logic;

dfs_out : out STD_LOGIC_VECTOR (14 downto 0));

end format_converter;

architecture Behavioral of format_converter is

begin

process(sum,sel,clk)

begin

if(clk'event and clk = '1') then

if(sel = '1') then

dfs_out


64/68


E&C 64

else

dfs_out


65/68


E&C 65

Complete Architecture:

library IEEE;




entity dfs_arch is

Port ( clk,rst : in std_logic;



end dfs_arch;

architecture Behavioral of dfs_arch is

component phase1 is


rst : in STD_LOGIC;


comp_out : out STD_LOGIC_VECTOR (9 downto 0);


66/68


E&C 66

phase_out : out std_logic_vector(19 downto 0));

end component;

component mux_tree is

Port ( din : in STD_LOGIC_VECTOR (12 downto 0);

sel : in STD_LOGIC_VECTOR (2 downto 0);



dout3 : out STD_LOGIC_VECTOR (12 downto 0));

end component;

component sum_out is

Port ( dout1 : in STD_LOGIC_VECTOR (12 downto 0);



sum : out STD_LOGIC_VECTOR (14 downto 0));

end component;

component format_converter is


67/68


E&C 67


sum : in STD_LOGIC_VECTOR (14 downto 0);

sel : in std_logic;


end component;

signal phase_out : std_logic_vector(9 downto 0);

signal full_phase : std_logic_vector(19 downto 0);

signal mux_in : std_logic_vector(12 downto 0);

signal dout1,dout2,dout3 : std_logic_vector(12 downto 0);

signal sum : std_logic_vector(14 downto 0);

signal sel_format : std_logic;

begin

sel_format


68/68


U4 : format_converter port map(clk,sum,sel_format,dfs_out);

end Behavioral;


Entry: freq_offset, clk, rst

Out: dfs_out

The RTL schematic is shown in the figure below All the above explained codes are combined into dfs architecture using port

mapping technique

The input is that of the phase accumulator. It is 6 bit binary input The output is 15 bit binary bit stream which contains digital values that make up

the sine wave when converted to analog using a DAC

Documents

DDFS Report