[IEEE 2012 International Conference on Control Engineering and Communication Technology (ICCECT) - Shenyang, Liaoning, China (2012.12.7-2012.12.9)] 2012 International Conference on

Implementation of Si gn le-Precision Floating-Point Trigonometric Functions with Small Area

Chen DongSchool of Information and Electrics

Beijing Institute of Technology Beijing, China

[email protected]

Chen He School of Information and Electrics


[email protected]

Sun Xing School of Information and Electrics


[email protected]

Pang Long School of Information and Electrics


[email protected]

Abstract—Computation of floating-point trigonometric functions has a relevant importance in a wide variety of scientific applications, where the area cost, error and latency are important requirements to be attended. This paper presents an architecture based on CORDIC algorithm to implement single-precision floating-point trigonometric functions with small area. With mathematical transformation and high-precision fixed-point arithmetic instead of floating point operations, this paper addresses three questions for single-precision floating point trigonometric functions, including the range of angles is not enough, large area and low operating frequency. The method is implemented on the FPGA platform, the results show that this method can reduce the area effectively, and to ensure the accuracy of computation.

Keywords-CORDIC;floating-point;FPGA;trigonometric functions;

I. INTRODUCTION (HEADING 1)The CORDIC (Coordinate Rotation Dig ital Computer)

algorithm was presented by Volder as an elegant and cost-effective method to perform rotations on vectors in the 2-Dplane [1]. Walther extended the algorithm in to rotations in circular, linear and hyperbolic coordinate systems [2]. Since then, many implementations of the CORDIC have been made, both for fixed-point and floating-point with respect to the input [3], [4], [5].

The main drawback in the computations on floating-point data with the classic CORDIC algorithm lies in the range of angle and large area. The same as other floating-point unit, single precision floating point trigonometric functions will take a lot of log ic resources , resulting in the computing units occupy large area, and working in the low frequency.

In this paper we transform entire circumference to the range of angle which be covered by CORDIC algorithm,

expanding the range of the input angle. And with high-precision fixed-point arithmetic instead of floating point operations, resources are reduced effectively and the operation frequency are improved.

The results presented in this paper are for circular rotations only, but the same techniques can be applied to derive floating-point algorithms for the hyperbolic and linear coordinate systems.

II. CORDIC BACKGROUND

The CORDIC algorithm was first introduced by Volder as an efficient method to perform p lane rotations. This algorithm was later generalized by Walther for rotations in circular, linear and hyperbolic systems. As the CORDIC algorithm is a shift-addition based iterative method, it is suitable for hardware implementation and has been widespread concerned.

The circular coordinate system is mainly used for trigonometric functions (sine, cosine and arctangent), the i-th iterative equations of circular coordinate system are:

�

1

1

1

( 2 )

( 2 )

ii i i i i

ii i i i i

i i i i

x k x yy k y yz z

�

��

��

��

�

� � �

� � � ��

� � ��

The i� is the i-th rotation angle given by:

� arctan(2 )ii� ��

The circular coordinate system knows two modes of operation, namely rotation and vectoring. Rotation mode is

2012 International Conference on Control Engineering and Communication Technology

978-0-7695-4881-4/12 $26.00 © 2012 IEEE

DOI 10.1109/ICCECT.2012.186

589

2012 International Conference on Control Engineering and Communication Technology

978-0-7695-4881-4/12 $26.00 © 2012 IEEE

DOI 10.1109/ICCECT.2012.186

589

used for operations of cosine and sine. Vectoring mode is used for operation of arctangent.

The i� indicates the direction of the micro-rotations. These

“sigma-bits” i� can be either 1 or -1, signifying clockwise and

counterclockwise rotations. The values of i� in different mode are given by:

�( )

- ( ) i

ii

sign z Rotation Modesign y Vectoring Mode

��

� �

� ��

In the rotation mode, after n iterations, the equations become:

�

1 0 0 0 0

1 0 0 0 0

1

( cos sin )( sin cos )

0

n n

n n

n

x k x z y zy k x z y zz

�

�

�

� ��

� ��

Setting 0 1 nx k� and 0 0y � , equation (4) are transformed to:

�1 0

1 0

cossin

n

n

x zy z

�

�

��

��

The use of unnormalized micro-rotations in the recursion causes that the vector is lengthened or scaled by 1cos i�� with

every step. Hence the need for a division by nk in equation,

where nk is the accumulative scaling factor given by:

�1

1

0

cosN

n ii

k ��

�

�

� � ��

In the vectoring mode, after n iterations, the equations become:

�

2 21 0 0

11

1 0 0 0

0

tan ( )

n n

n

n

x k x yyz z y x

�

�

��

� � � � � ��

� ��

Setting 0 0z � , we could get the value of a rctangent given by:

�1

1 0 0tan ( )nz y x��

III. DRAWBACK AND IMPROVEMENT

The main drawback in the computations on floating-point data with the classic CORDIC algorithm lies in the range of angle and large area. This section describes these two issues, and the corresponding solutions.

A. Range of Angle Assume that the number of iterations is N, in accordance

with the iterative sequence (n = 0, 1, 2, 3, ... N-1) proposed byJwatcher, the range of angle of classic CORDIC algorithm is:

�1 1

0 0arctan(2 arctan(2

N Nn n

n n�

� ��

� �

� � ��

According to (9), we could calculate the ranges of angle with N, shown as follows:

TABLE I. T HE RANGES OF ANGLE WITH N

Nmax� N

max�1 455555 8 99.4444444442 71.5666666 9 99.67777773 85.600000000 10 99.7777777774 92.733333 11 98.8333335 96.3000000 12 99.85555556 98.099999999 13 99.87777777 98.999999999 14� 99.888888888

According to the table, the largest range is 99.88 99.88�� , it is unable to cover the entire cycle.

In order to expand the range, the method of mathematical transformation has been used. With some mathematical relationships to transform the input data, mathematical transformation could transform entire circumference to the range of angle which be covered by CORDIC algorithm, expanding the range indirectly.

We know that any angle or vector within circumference could transform to the fan-shaped region of [ 4, 4]� ��with some trigonometric transforms, shown in (10):

�

'

'

'

'

sin( 2) coscos( 2) sinsin( ) sincos( ) cos

� � �

� � �

� � �

� � �

� � � �

� �

� � � � � ��

�sin� � ��

So we only need to ensure that the range of angles of the CORDIC algorithm to cover[ 4, 4]� �� . Even the original first-level iteration, which is the 45 ° iteration step, could be discarded, leading to reduce resources and system latency.

We also should record the quadrant of input angles for the compensation of output data. We divide the circumference as follows:

TABLE II. PARTITION OF QUADRANTS

Q uadrants Scale

0 ( , ]4 4� �

�

590590

13( , ]

4 4� �

23( , ]4 4� �

� �

33 3[ , ] [ , ]4 4� �� [ , ]

4[ �3[3[ ,

B. Resources Single-precision floating-point data consists of the sign bit,

exponent and mantissa, with the advantages of high accuracy and large dynamic range. But it also makes the computation more complicated and greater consumption of resources [6].Table III shows the resources of Single-precision floating-point addition.

TABLE III. SINGLE-PRECISION FLOATING-POINT ADDITION

Size FPGA Synthesis Tool

All ResourceLUTS REG

IEEE754Single-precision

xc6vlx240t-1ff1156 XST 430 256

As we know, an iterative cell consists of three additions. In order to ensure the accuracy of the operation, we use 22 iterations. So single-precision floating-point trigonometric module will occupy at least 66 floating-point additions. This is a great cost of resources. Table IV shows the resources of single-precision floating-point trigonometric module.

TABLE IV. SINGLE-PRECISION FLOATING-POINT T RIGONOMETRIC




xc6vlx240t-1ff1156 XST 29745 17029

It can be seen from the above two tables, that trigonometric module with floating-point units will take a lot of resources. As we all know, the main cells of CORDIC algorithm are additions. The resources of additions have a direct compact on the all resource. Compared to single-precision floating-point addition, the resources occupied by the fixed-point addition are much smaller.

TABLE V. 32BITS FIXED-POINT ADDITION


All Resource

LUTS REG32bits

Fixed-pointxc6vlx240t-1ff1156 XST 32 0

Resources occupied by the 32bits fixed-point addition are about 10% of the single-precision floating-point addition. To be able to reduce the consumption of resources effectively, we use high-precision fixed-point addition instead of single-precision floating-point subtraction.

The main drawback of the high-precision fixed-point is a small dynamic range. However, most operators in the trigonometric functions are decimal arithmet ic, so it does not require a large dynamic range [7]. Th is is a very important

reason to use high-precision fixed-point cells to implement single-precision floating-point trigonometric.

IV. HARDWARE IMPLEMENTATIONS

The architectures have been described in VHDL using the Xilinx ISE 12.4 development tool. All the floating-point cores are based on the IEEE-754 standard and are parameterizable by bit-width and number of iterations.

The architecture is composed of four units: pre-processing unit (CORDIC_PRE), core processing unit (CORDIC_CORE) and post-processing unit (CORDIC_POST). Fig. 1 shows the relationship between the three units.

Figure 1. Hardware architectures.

Pre-processing unit: Expanding the range of angles and converting the data format. First, it transforms input data into the fan-shaped region of [ 4 , 4]� �� , and recording the quadrants. Then, the transformed data will be converted from floating-point to fixed-point;

Core processing unit: Implementing iterative computation of the CORDIC algorithm, it is the core of the architectures.

In order to improve the computing speed and data throughput, the design uses pipeline architecture. Fig. 2 shows the pipeline architecture.

Figure 2. Pipeline architecture.

Each iterative cell consists of three additions and three shifts, which is easy to be implemented with hardware. Fig. 3 shows hardware implementation of the i-th iterative cell.

�

i control�

ix

1iy �

1ix �

iy

1iz �iz

Figure 3. Implementation of the i-th iterative cell.

591591

Post-processing unit: Data compensation and data format conversion. First, it compensates the data from core processing unit with the quadrants from pre-processing unit. Then the compensated data will be converted from fixed-point to floating-point.

V. RESULT S

The design has been implemented and tested on FPGA platform. All modules have been described in VHDL and synthesized with Xilinx ISE 12.4 development tool. Table VI shows the synthesized results of single-precision floating-point trigonometric with 32bits fixed-point cells.

TABLE VI. SYNTHESIZED RESULTS




xc6vlx240t-1ff1156 XST 2789

(1%)2536(1%)

It can be seen from the table that this computing unit only takes less chip resources, we could implement multiple trigonometric computing unit on one chip. Fig. 4 and Fig. 5 show the simulation results of sine, cosine and arctangent. Thedesign has covered the entire circumference.

Figure 4. Simulation results of Cosine and Sine.

Figure 5. Simulation results of Arctangent .

In order to verify the accuracy of the arithmetic units, we take 2000 points in [0, / 2]� , and generate the simulation data with MATLAB. Then simulating the modules with MODELSIM and writing the results to data files. Finally, analyzing simulation results with MATLAB. Fig. 6 and Fig. 7 show the relative error of cosine, sine and arctangent. The precision almost are 10e-7 or so.

Figure 6. Relative error of Cosine and Sine.

Figure 7. Relative error of Cosine and Sine.

VI. CONCLUSION

In this paper, we use high-precision fixed-point cells instead of the single-precision floating-point cells, and expand the range of angle with mathematical transformation. With these methods, we implement single-precision floating-point trigonometric functions, which have a small area and cover the entire cycle. The architectures has been implemented and tested on FPGA platform. With some tools, we analyzed the relative error of trigonometric functions. From these results, we can see that this design has solved the drawbacks and is viable. As future works we intend to try to optimize the algorithm itself and give a better architecture.

REFERENCES[1] J.E. Volder, “The CORDIC Trigonometric Computer Technique,” IRE.

Transactions. Electron. Computers, vol. EC-8, no. 3, pp. 330-334, Sept. 1959.

[2] J.S. Walther, “A Unified Algorithm for Elementary Functions,” Proc. Spring Joint Compute. Conf, pp. 379-385, May. 1971.

[3] J. Valls, M. Kuhlmann, and K. Parhi, “Evaluation of CORDIC Algorithms for FPGA Design,” Journal of VLSI Signal Processing, vol. 32, pp. 207-222, 2002.

[4] R. Guitierrez, and J. Valls, “Low-Power FPGA- implementation of atan(Y/X) using LUT Methods for Communication Applications,” Springer J. Proc. Syst , 2008.

[5] G. J. Hekstra and Ed F. Deprettere, “Floating-point CORDIC: Algorithm and Architecture for a Word-Serial Implementation,” Ph.D. dissertation, Dept. of Electrical Eng.,Delf Univ.,1998.

[6] D. Muñoz, D. Sanchez, C. Llanos, M. Ayala-Rincón, “Tradeoff of FPGA design of floating-point transcendental,” Proc. IEEE Conf, pp. 1-4, Oct. 2009.

[7] D. Llamoca, C. Agurto, “A fixed-point implementation of the expanded hyperbolic CORDIC algorithm,” Lat. Am. Appl, vol. 37, no. 1, pp. 83-91, 2007.

592592

Documents

[IEEE 2012 International Conference on Control Engineering and Communication Technology (ICCECT) - Shenyang, Liaoning, China (2012.12.7-2012.12.9)] 2012 International Conference on