3
VLSI Architecture Design and Implementation for Application Specific CORDIC Processor Amritakar Mandal 1 , K. C Tyagi 2 , Brajesh Kumar Kaushik 3 1 Electronic Engineering and Installation Unit, New Delhi, INDIA 2 Department of ECE, Dev Bhoomi Institute of Technology, Dehradun, INDIA 3 Department of Electronics and Computer Engineering, Indian Institute of Technology-Roorkee, INDIA Email: [[email protected]; [email protected]; [email protected]] Abstract—COordinate Rotation DIgital Computer (CORDIC) algorithm has become widely researched topic in the field of vector rotated Digital Signal Processing (DSP) applications due to its simplicity. In this paper, we have represented the design of pipelined architecture for the computation of Sine and Cosine values based on application specific CORDIC processor. The design of CORDIC in the circular rotation mode gives a high system throughput due to its pipelined architecture by reducing latency in each individual pipelined stage. Saving area on silicon substrate is essential to the design of pipelined CORDIC and that can be achieved through the optimization in the number of micro rotations. The computed quantization error is also minimized using required number of iterations. The pipelined architecture can be easily integrated in VLSI technology due to its regularity and modularity. Keywords— CORDIC, Digital Signal Processing, Pipelined Architecture, micro-rotation, Quantization error. I. INTRODUCTION CORDIC algorithm was developed by Jack E. Volder in 1959 [4]. This algorithm allows a simple shift and adds operation to calculate trigonometric functions like sine, cosine, magnitude and phase with great precision [1-3]. The same functions could have been implemented by using multipliers, variable shift registers or Multiply Accumulator (MAC) units. But, saving silicon area on a chip is primary criteria in VLSI technology. That is why CORDIC based hardware is well suited than the MAC or multiplier based system. In digital communication, numerous matrix based adaptive signal processing algorithms are required for the solution of matrix based equations for computation of eigenvalues, eigenvectors or singular values. All these functions can be implemented in a digital hardware using processing elements performing vector rotations. The CORDIC offers the opportunity to calculate all the desired functions in a simple and efficient way [7]. Due to the simplicity of the involved operations, the CORDIC algorithm is very well suited in VLSI hardware design and its implementation. The pipelined CORDIC unit is coded in Verilog HDL and simulation of the architecture for sine and cosine has been shown. II. CORDIC ALGORITHM CORDIC is an acronym for COordinate Rotation DIgital Computer. It is a class of shift-add algorithms for rotating vectors in a plane, which is usually used for the calculation of trigonometric functions, multiplication, division and conversion between binary and mixed radix number systems of DSP applications. The Jack E. Volder’s CORDIC algorithm [4] is derived from the general equations for vector rotation. The theory of CORDIC computation is to decompose the desired rotation angle into the weighted sum of a set of predefined elementary rotation angles through each of them can be accomplished with simple shift-add operation for a desired rotational angle θ ,it can be represented for M iterations of an input vector T (x,y) setting initial conditions : x x = 0 , y y = 0 and θ z = 0 as = = 1 0 M i i i f α δ θ z . If 0 = f z holds, then = = 1 0 M i i i α δ θ , i.e. the total accumulated rotation angle is equal to θ . i δ , 1 0 M i , denote a sequence of ±1s that determine the direction of each elementary rotation. When M is the total number of elementary rotation angles, i-th angle i α is given by: = = ) , 1 ( 1 ) , 1 ( 1 ) , 0 ( ) , ( 1 , 2 tanh 2 tan 2 ] 2 [ tan 1 i s i s i s i m s i m m m α where 1 0 , m = and 1 correspond to the rotation operation in a linear, circular, and a hyperbolic coordinate system respectively. For a given value of θ , the CORDIC iteration is given by: = + + i i i i i i i y x δ i δ y x 1 2 2 1 1 1 and i i i i α δ z z = +1 , where i α i = 2 tan 1 . In a counter clockwise rotation of a vector, the recursively updated equations are generated in sine and cosine form can be written as: i i i i i i i α δ y α δ x x sin cos 1 = + i i i i i i i α δ x α δ y y sin cos 1 + = + . The above equation can be simplified and written as ) y (x i i i i i i i α δ α δ x tan cos 1 = + ) α δ x (y α δ y i i i i i i i tan cos 1 + = + . Here, i α tan is restricted to 2010 International Conference on Advances in Recent Technologies in Communication and Computing 978-0-7695-4201-0/10 $26.00 © 2010 IEEE DOI 10.1109/ARTCom.2010.94 191

[IEEE 2010 International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom 2010) - Kottayam (2010.10.16-2010.10.17)] 2010 International Conference

Embed Size (px)

Citation preview

Page 1: [IEEE 2010 International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom 2010) - Kottayam (2010.10.16-2010.10.17)] 2010 International Conference

VLSI Architecture Design and Implementation for Application Specific CORDIC Processor

Amritakar Mandal1, K. C Tyagi2, Brajesh Kumar Kaushik3 1Electronic Engineering and Installation Unit, New Delhi, INDIA

2Department of ECE, Dev Bhoomi Institute of Technology, Dehradun, INDIA 3Department of Electronics and Computer Engineering, Indian Institute of Technology-Roorkee, INDIA

Email: [[email protected]; [email protected]; [email protected]]

Abstract—COordinate Rotation DIgital Computer (CORDIC) algorithm has become widely researched topic in the field of vector rotated Digital Signal Processing (DSP) applications due to its simplicity. In this paper, we have represented the design of pipelined architecture for the computation of Sine and Cosine values based on application specific CORDIC processor. The design of CORDIC in the circular rotation mode gives a high system throughput due to its pipelined architecture by reducing latency in each individual pipelined stage. Saving area on silicon substrate is essential to the design of pipelined CORDIC and that can be achieved through the optimization in the number of micro rotations. The computed quantization error is also minimized using required number of iterations. The pipelined architecture can be easily integrated in VLSI technology due to its regularity and modularity.

Keywords— CORDIC, Digital Signal Processing, Pipelined Architecture, micro-rotation, Quantization error.

I. INTRODUCTION CORDIC algorithm was developed by Jack E. Volder in

1959 [4]. This algorithm allows a simple shift and adds operation to calculate trigonometric functions like sine, cosine, magnitude and phase with great precision [1-3]. The same functions could have been implemented by using multipliers, variable shift registers or Multiply Accumulator (MAC) units. But, saving silicon area on a chip is primary criteria in VLSI technology. That is why CORDIC based hardware is well suited than the MAC or multiplier based system. In digital communication, numerous matrix based adaptive signal processing algorithms are required for the solution of matrix based equations for computation of eigenvalues, eigenvectors or singular values. All these functions can be implemented in a digital hardware using processing elements performing vector rotations. The CORDIC offers the opportunity to calculate all the desired functions in a simple and efficient way [7]. Due to the simplicity of the involved operations, the CORDIC algorithm is very well suited in VLSI hardware design and its implementation. The pipelined CORDIC unit is coded in Verilog HDL and simulation of the architecture for sine and cosine has been shown.

II. CORDIC ALGORITHM CORDIC is an acronym for COordinate Rotation DIgital

Computer. It is a class of shift-add algorithms for rotating vectors in a plane, which is usually used for the calculation of trigonometric functions, multiplication, division and

conversion between binary and mixed radix number systems of DSP applications. The Jack E. Volder’s CORDIC algorithm [4] is derived from the general equations for vector rotation. The theory of CORDIC computation is to decompose the desired rotation angle into the weighted sum of a set of predefined elementary rotation angles through each of them can be accomplished with simple shift-add operation for a desired rotational angle θ ,it can be represented for M iterations of an input vector T(x,y) setting initial conditions : xx =0 , yy =0

and θz =0 as ∑−=−

=

1

0

M

iiif αδθz . If 0=fz holds, then

∑=−

=

1

0

M

iiiαδθ , i.e. the total accumulated rotation angle is equal

to θ . iδ , 10 −≤≤ Mi , denote a sequence of ±1s that determine the direction of each elementary rotation. When M is the total number of elementary rotation angles, i-th angle iα is

given by: ⎪⎩

⎪⎨

−−−

−−

−− ==),1(1

),1(1

),0(

),(1,

2tanh

2tan

2

]2[tan1

is

is

is

imsim m

where 10,m = and 1− correspond to the rotation operation in a linear, circular, and a hyperbolic coordinate system respectively. For a given value of θ , the CORDIC iteration is given by:

⎥⎦⎤

⎢⎣⎡⎥⎦

⎤⎢⎣

⎡⎥⎦⎤

⎢⎣⎡

−−−=

+

+

i

ii

i

i

i

i

yx

δ

iδyx

1221

1

1 and iiii αδzz −=+1 ,

where iα i−−= 2tan 1 . In a counter clockwise rotation of a

vector, the recursively updated equations are generated in sine and cosine form can be written as:

iiiiiii αδyαδxx sincos1 −=+

iiiiiii αδxαδyy sincos1 +=+ .

The above equation can be simplified and written as )y(x iiiiiii αδαδx tancos1 −=+

)αδx(yαδy iiiiiii tancos1 +=+ . Here, iαtan is restricted to

2010 International Conference on Advances in Recent Technologies in Communication and Computing

978-0-7695-4201-0/10 $26.00 © 2010 IEEE

DOI 10.1109/ARTCom.2010.94

191

Page 2: [IEEE 2010 International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom 2010) - Kottayam (2010.10.16-2010.10.17)] 2010 International Conference

.2 i−± So multiplication is converted in an arithmetic right shift. Since cosine is an even function, therefore

( ) ( ).coscos αα −= The iterative equation can be reduced to-

)y(x iδKx iiiii−= −+ 21 and )iδx(yKy iiiii

−+=+ 21 .

Where )221(2arctancos iiK i−+=⎟

⎠⎞⎜

⎝⎛ −= is known as gain

factor for each iteration. If M iterations are performed, then scale factor, K, is defined as the multiplication of every iK .

∏ −+=∏=−

=

=

1

0

1

0)221(

M

i

M

ii

iKK .

The elementary functions sine and cosine can be computed using the rotation mode of the CORDIC algorithm if the initial vector starts at )0,( K with unit length. The final outputs of the CORDIC for the given input values 0,1 00 == yx and θ=0zare as follows:

θcosKx f = , θsinKy f = and 0=fz .

Since the scale factor is constant for a given number of rotations, Kx /10 = can be set to get purely θsin and θcosvalues.

III. PIPELINED ARCHITECTURE OF CORDIC

In this CORDIC architecture, a number of rotational modules have been incorporated and each module is responsible for one elementary rotation. The modules are cascaded through intermediate latches (Fig. 3). Every stage within the pipelined CORDIC architecture only adder/subtractor is used. The shift operations are hardwired using permanent oblique bus connections to perform

multiplications by i−2 reducing a large silicon area as required by barrel shifters. The pre-computed values, as given in Table I, of i -th iteration angle iα required at each module can be stored at a memory location. The delay can be adjusted by using proper bit-length in the shift register. Since no sign detection is needed to force 0=fz , the carry save adders are well suited in this architecture. The use of these adders reduces the stage delay significantly. With the pipelining architecture, the propagation delay of the multiplier is the total delay of a single adder. So ultimately the throughput of the architecture is increased to a many fold as the throughput is given by: “1/delay due to a single adder”. If an iterative implementation of the CORDIC were used, the processor would take several clock cycles to give output for a given input. But in the pipelined architecture, each pipeline stage takes exactly one clock cycle to pass one output.

The most recurrent problems for a CORDIC implementation are overflow. Since the first tangent value is

120 = , then rotation range will be ⎥⎦⎤

⎢⎣⎡−

2,

2ππ . The difference

in binary representation between these two angles is one bit. Overflow arises when a rotational angle crosses a positive right angle to a negative one. To avoid overflow, an overflow control is added. It checks for the sign of the operands involve in addition or subtraction and the result of the operation. If overflow is produced, the result keeps its last sign without affecting the final result. In the overflow control, the sign of

iz determines whether addition or subtraction is to be performed.

IV. ERRORS IN CORDIC ARCHITECTURE AND THEIR OPTIMISATION

Theoretically CORDIC realization has infinite number of iterations and that leads to accurate result. But practically CORDIC realization uses finite number of iterations causing approximation error. Angle and finite word length errors are of such kind.

A. Angle Errors The shift sequence ( )( )10;, −≥≤ Mis im determines the

convergence of the CORDIC iteration as well as magnitude of Scale Factor. The M finite elementary rotation angles im ,α will

never be sufficient to accomplish exactly 0=fz i.e. it is not possible to represent arbitrary rotation angle θ without error. Taking one example of pseudo rotation for angle 030=θ .

1.301.02.04.09.08.16.31.70.146.260.450.30

=+−+−++−+−=

So the angle approximation error can be defined as:

∑−

−==

1

0,

Mαδ

iimiθε . ε is the residual angle to be rotated after

completion of the CORDIC iterations. For any given rotation, the desired angle approximation error is: im,αε ≤ .

B. Truncation Errors The truncation error is due to finite word length effect. If

the internal word length of the CORDIC has finite number of bits in the fractional part, and then quantization error [1] including scaling error can be shown by plotting number of bits (b) and number of iteration (M) as shown in Fig. 1. If the total quantization error (e) is simulated in MATLAB, it will show that the optimal number of fractional bits of the internal word length required is 17 to keep the latency minimum (Fig. 2). Figure 4 shows the simulation results obtained. The post scaling operation will increase the dynamic range of the signal in CORDIC block by the amount of gain factor of CORDIC.

Fig. 1. Computation of number of fractional bits Vs iterations

16 17 18 19 20 21 2216

16.05

16.1

16.15

16.2

16.25

Values of b

Values of m

192

Page 3: [IEEE 2010 International Conference on Advances in Recent Technologies in Communication and Computing (ARTCom 2010) - Kottayam (2010.10.16-2010.10.17)] 2010 International Conference

Fig. 2. Plot between M Vs e

Table I. Pre-Computed Angles

V. CONCLUSION The CORDIC architecture is efficiently coded using

Verilog HDL. The architecture is pipelined to have an internal critical path of a single adder. To minimize angle approximations error, numbers of micro-rotations have been adjusted. To reduce the total quantization error including scale factor error, the pipelined CORDIC architecture has been optimized .As a result a high throughput is maintained. The inherent issue of CORDIC is overflow has been resolved. The architecture can be used as digital sine and cosine generator in various digital signal processing applications.

Fig. 3. Pipelined CORDIC Architecture

Fig. 4. Simulation Result

REFERENCES [1] Y.H. Hu. "The quantization effects of the CORDIC algorithm". IEEE

Trans. Signal Processing, Vol. 40, No. 4, pp. 834-844,Apr. 1992. [2] Y.H. Hu. "CORDIC-Based VLSI Architectures for Digital Signal

Processing". IEEE Signal Processing Magazine, Vol. 9, No. 3, pp. 16-35, 1992.

[3] N. Takagi, T. Asada and S. Yajima. "Redundant CORDIC Methods with a Constant Scale Factor for Sine and Cosine Computation". IEEE Trans. on Computers, Vol. C-40, No. 9, pp. 989-995, 1991.

[4] J.E. Volder. "The CORDIC Trigonometric Computing Technique". IRE Transactions on Electronic Computing, vol EC-8, pp 330-334, Sept 1959.

[5] J.S. Walther. "A Unified Algorithm for Elementary Functions". Proc.

Spring Joint Computers Conference, pp. 379-385, 1971. [6] Andraka R.A., "Survey of CORDIC algorithms for FPGA based

computers”, Proceedings of the 1998 ACM/SIGDA sixth international symposium on FPGAs, pp 191-200, Monterey, California, Feb.22-24, 1998.

[7] M. Chakraborty, A. S. Dhar and Moon Ho Lee, “A Trigonometric formulation of the LMS algorithm for realisation of pipelined CORDIC,” IEEE Trans. Circuits and Systems, vol. 52,no. 9,pp. 530-534, Sep.2005.

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

16

18

20

Values of M

Value

s of

e

Plot between M & e

I i

i αtan2 =−

)2arctan( i

i−=α

iα in radian

0 1 45o 0.7854 1 0.5 26.565o 0.4636 2 0.25 14.063o 0.2450 3 0.125 7.125o 0.1244 4 0.0625 3.576o 0.0624 5 0.03125 1.7876o 0.0312 6 0.015625 0.8938o 0.0156 7 0.0078125 0.4469o 0.0078 .. ….. …….. …….

P R E

C O M P U T E D

A N G L E S

1α 12−

02−

0y

+/-

12 +−n

1−nα

fx fy fz

+/-+/-

0x

+/- +/- +/-

00 θ=z

D D

+/-

D

D D

0α D

+/-

D

D

D

+/-

193