Upload
brajesh-kumar
View
218
Download
2
Embed Size (px)
Citation preview
VLSI Architecture Design and Implementation for Application Specific CORDIC Processor
Amritakar Mandal1, K. C Tyagi2, Brajesh Kumar Kaushik3 1Electronic Engineering and Installation Unit, New Delhi, INDIA
2Department of ECE, Dev Bhoomi Institute of Technology, Dehradun, INDIA 3Department of Electronics and Computer Engineering, Indian Institute of Technology-Roorkee, INDIA
Email: [[email protected]; [email protected]; [email protected]]
Abstract—COordinate Rotation DIgital Computer (CORDIC) algorithm has become widely researched topic in the field of vector rotated Digital Signal Processing (DSP) applications due to its simplicity. In this paper, we have represented the design of pipelined architecture for the computation of Sine and Cosine values based on application specific CORDIC processor. The design of CORDIC in the circular rotation mode gives a high system throughput due to its pipelined architecture by reducing latency in each individual pipelined stage. Saving area on silicon substrate is essential to the design of pipelined CORDIC and that can be achieved through the optimization in the number of micro rotations. The computed quantization error is also minimized using required number of iterations. The pipelined architecture can be easily integrated in VLSI technology due to its regularity and modularity.
Keywords— CORDIC, Digital Signal Processing, Pipelined Architecture, micro-rotation, Quantization error.
I. INTRODUCTION CORDIC algorithm was developed by Jack E. Volder in
1959 [4]. This algorithm allows a simple shift and adds operation to calculate trigonometric functions like sine, cosine, magnitude and phase with great precision [1-3]. The same functions could have been implemented by using multipliers, variable shift registers or Multiply Accumulator (MAC) units. But, saving silicon area on a chip is primary criteria in VLSI technology. That is why CORDIC based hardware is well suited than the MAC or multiplier based system. In digital communication, numerous matrix based adaptive signal processing algorithms are required for the solution of matrix based equations for computation of eigenvalues, eigenvectors or singular values. All these functions can be implemented in a digital hardware using processing elements performing vector rotations. The CORDIC offers the opportunity to calculate all the desired functions in a simple and efficient way [7]. Due to the simplicity of the involved operations, the CORDIC algorithm is very well suited in VLSI hardware design and its implementation. The pipelined CORDIC unit is coded in Verilog HDL and simulation of the architecture for sine and cosine has been shown.
II. CORDIC ALGORITHM CORDIC is an acronym for COordinate Rotation DIgital
Computer. It is a class of shift-add algorithms for rotating vectors in a plane, which is usually used for the calculation of trigonometric functions, multiplication, division and
conversion between binary and mixed radix number systems of DSP applications. The Jack E. Volder’s CORDIC algorithm [4] is derived from the general equations for vector rotation. The theory of CORDIC computation is to decompose the desired rotation angle into the weighted sum of a set of predefined elementary rotation angles through each of them can be accomplished with simple shift-add operation for a desired rotational angle θ ,it can be represented for M iterations of an input vector T(x,y) setting initial conditions : xx =0 , yy =0
and θz =0 as ∑−=−
=
1
0
M
iiif αδθz . If 0=fz holds, then
∑=−
=
1
0
M
iiiαδθ , i.e. the total accumulated rotation angle is equal
to θ . iδ , 10 −≤≤ Mi , denote a sequence of ±1s that determine the direction of each elementary rotation. When M is the total number of elementary rotation angles, i-th angle iα is
given by: ⎪⎩
⎪⎨
⎧
−−−
−−
−
−− ==),1(1
),1(1
),0(
),(1,
2tanh
2tan
2
]2[tan1
is
is
is
imsim m
mα
where 10,m = and 1− correspond to the rotation operation in a linear, circular, and a hyperbolic coordinate system respectively. For a given value of θ , the CORDIC iteration is given by:
⎥⎦⎤
⎢⎣⎡⎥⎦
⎤⎢⎣
⎡⎥⎦⎤
⎢⎣⎡
−−−=
+
+
i
ii
i
i
i
i
yx
δ
iδyx
1221
1
1 and iiii αδzz −=+1 ,
where iα i−−= 2tan 1 . In a counter clockwise rotation of a
vector, the recursively updated equations are generated in sine and cosine form can be written as:
iiiiiii αδyαδxx sincos1 −=+
iiiiiii αδxαδyy sincos1 +=+ .
The above equation can be simplified and written as )y(x iiiiiii αδαδx tancos1 −=+
)αδx(yαδy iiiiiii tancos1 +=+ . Here, iαtan is restricted to
2010 International Conference on Advances in Recent Technologies in Communication and Computing
978-0-7695-4201-0/10 $26.00 © 2010 IEEE
DOI 10.1109/ARTCom.2010.94
191
.2 i−± So multiplication is converted in an arithmetic right shift. Since cosine is an even function, therefore
( ) ( ).coscos αα −= The iterative equation can be reduced to-
)y(x iδKx iiiii−= −+ 21 and )iδx(yKy iiiii
−+=+ 21 .
Where )221(2arctancos iiK i−+=⎟
⎠⎞⎜
⎝⎛ −= is known as gain
factor for each iteration. If M iterations are performed, then scale factor, K, is defined as the multiplication of every iK .
∏ −+=∏=−
=
−
=
1
0
1
0)221(
M
i
M
ii
iKK .
The elementary functions sine and cosine can be computed using the rotation mode of the CORDIC algorithm if the initial vector starts at )0,( K with unit length. The final outputs of the CORDIC for the given input values 0,1 00 == yx and θ=0zare as follows:
θcosKx f = , θsinKy f = and 0=fz .
Since the scale factor is constant for a given number of rotations, Kx /10 = can be set to get purely θsin and θcosvalues.
III. PIPELINED ARCHITECTURE OF CORDIC
In this CORDIC architecture, a number of rotational modules have been incorporated and each module is responsible for one elementary rotation. The modules are cascaded through intermediate latches (Fig. 3). Every stage within the pipelined CORDIC architecture only adder/subtractor is used. The shift operations are hardwired using permanent oblique bus connections to perform
multiplications by i−2 reducing a large silicon area as required by barrel shifters. The pre-computed values, as given in Table I, of i -th iteration angle iα required at each module can be stored at a memory location. The delay can be adjusted by using proper bit-length in the shift register. Since no sign detection is needed to force 0=fz , the carry save adders are well suited in this architecture. The use of these adders reduces the stage delay significantly. With the pipelining architecture, the propagation delay of the multiplier is the total delay of a single adder. So ultimately the throughput of the architecture is increased to a many fold as the throughput is given by: “1/delay due to a single adder”. If an iterative implementation of the CORDIC were used, the processor would take several clock cycles to give output for a given input. But in the pipelined architecture, each pipeline stage takes exactly one clock cycle to pass one output.
The most recurrent problems for a CORDIC implementation are overflow. Since the first tangent value is
120 = , then rotation range will be ⎥⎦⎤
⎢⎣⎡−
2,
2ππ . The difference
in binary representation between these two angles is one bit. Overflow arises when a rotational angle crosses a positive right angle to a negative one. To avoid overflow, an overflow control is added. It checks for the sign of the operands involve in addition or subtraction and the result of the operation. If overflow is produced, the result keeps its last sign without affecting the final result. In the overflow control, the sign of
iz determines whether addition or subtraction is to be performed.
IV. ERRORS IN CORDIC ARCHITECTURE AND THEIR OPTIMISATION
Theoretically CORDIC realization has infinite number of iterations and that leads to accurate result. But practically CORDIC realization uses finite number of iterations causing approximation error. Angle and finite word length errors are of such kind.
A. Angle Errors The shift sequence ( )( )10;, −≥≤ Mis im determines the
convergence of the CORDIC iteration as well as magnitude of Scale Factor. The M finite elementary rotation angles im ,α will
never be sufficient to accomplish exactly 0=fz i.e. it is not possible to represent arbitrary rotation angle θ without error. Taking one example of pseudo rotation for angle 030=θ .
1.301.02.04.09.08.16.31.70.146.260.450.30
=+−+−++−+−=
So the angle approximation error can be defined as:
∑−
−==
1
0,
Mαδ
iimiθε . ε is the residual angle to be rotated after
completion of the CORDIC iterations. For any given rotation, the desired angle approximation error is: im,αε ≤ .
B. Truncation Errors The truncation error is due to finite word length effect. If
the internal word length of the CORDIC has finite number of bits in the fractional part, and then quantization error [1] including scaling error can be shown by plotting number of bits (b) and number of iteration (M) as shown in Fig. 1. If the total quantization error (e) is simulated in MATLAB, it will show that the optimal number of fractional bits of the internal word length required is 17 to keep the latency minimum (Fig. 2). Figure 4 shows the simulation results obtained. The post scaling operation will increase the dynamic range of the signal in CORDIC block by the amount of gain factor of CORDIC.
Fig. 1. Computation of number of fractional bits Vs iterations
16 17 18 19 20 21 2216
16.05
16.1
16.15
16.2
16.25
Values of b
Values of m
192
Fig. 2. Plot between M Vs e
Table I. Pre-Computed Angles
V. CONCLUSION The CORDIC architecture is efficiently coded using
Verilog HDL. The architecture is pipelined to have an internal critical path of a single adder. To minimize angle approximations error, numbers of micro-rotations have been adjusted. To reduce the total quantization error including scale factor error, the pipelined CORDIC architecture has been optimized .As a result a high throughput is maintained. The inherent issue of CORDIC is overflow has been resolved. The architecture can be used as digital sine and cosine generator in various digital signal processing applications.
Fig. 3. Pipelined CORDIC Architecture
Fig. 4. Simulation Result
REFERENCES [1] Y.H. Hu. "The quantization effects of the CORDIC algorithm". IEEE
Trans. Signal Processing, Vol. 40, No. 4, pp. 834-844,Apr. 1992. [2] Y.H. Hu. "CORDIC-Based VLSI Architectures for Digital Signal
Processing". IEEE Signal Processing Magazine, Vol. 9, No. 3, pp. 16-35, 1992.
[3] N. Takagi, T. Asada and S. Yajima. "Redundant CORDIC Methods with a Constant Scale Factor for Sine and Cosine Computation". IEEE Trans. on Computers, Vol. C-40, No. 9, pp. 989-995, 1991.
[4] J.E. Volder. "The CORDIC Trigonometric Computing Technique". IRE Transactions on Electronic Computing, vol EC-8, pp 330-334, Sept 1959.
[5] J.S. Walther. "A Unified Algorithm for Elementary Functions". Proc.
Spring Joint Computers Conference, pp. 379-385, 1971. [6] Andraka R.A., "Survey of CORDIC algorithms for FPGA based
computers”, Proceedings of the 1998 ACM/SIGDA sixth international symposium on FPGAs, pp 191-200, Monterey, California, Feb.22-24, 1998.
[7] M. Chakraborty, A. S. Dhar and Moon Ho Lee, “A Trigonometric formulation of the LMS algorithm for realisation of pipelined CORDIC,” IEEE Trans. Circuits and Systems, vol. 52,no. 9,pp. 530-534, Sep.2005.
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18
20
Values of M
Value
s of
e
Plot between M & e
I i
i αtan2 =−
)2arctan( i
i−=α
iα in radian
0 1 45o 0.7854 1 0.5 26.565o 0.4636 2 0.25 14.063o 0.2450 3 0.125 7.125o 0.1244 4 0.0625 3.576o 0.0624 5 0.03125 1.7876o 0.0312 6 0.015625 0.8938o 0.0156 7 0.0078125 0.4469o 0.0078 .. ….. …….. …….
P R E
C O M P U T E D
A N G L E S
1α 12−
02−
0y
+/-
12 +−n
1−nα
fx fy fz
+/-+/-
0x
+/- +/- +/-
00 θ=z
D D
+/-
D
D D
0α D
+/-
D
D
D
+/-
193