6
Generation and validation of multioperand carry save adders from the web Minas Dasygenis Department of Informatics and Telecommunications Engineering University of Western Macedonia, Kozani, 50100, Greece, [email protected] Abstract Many arithmetic circuits utilize multioperand addition, usually using carry-save adders (CSA) trees. Automatic generation of custom VHDL models for these CSA trees, allows the designer to perform a time efficient design space exploration. Although, the CSA trees are heavily utilized in modern digital circuits, there is no tool, accessible from the web, to generate the HDL description of such multioperand designs. To the best of our knowledge, our novel tool is the first one to automate the design of optimized CSA trees and simultaneously provide custom testbenches to verify their correctness. Our synthesized circuits on Xilinx Virtex 6 FPGA, operate up to 724 Mhz. I. Introduction The design automation and test processes (DAT) play a crucial role in contemporary multi-billion transistor era. One of the aspects of DAT is the fast parametrized gen- eration of bit accurate models and their testvectors, in a hardware description language. This enables the designers to perform a rapid design space exploration and select the best custom implementation. Especially, the HDL generators for constructing circuits that are required in almost every digital system have an increased importance. Addition is by far the most important elementary operation within digital systems [14]. One very effective technique to perform addition with multiple vectors, is the carry save adder tree (CSA). Even though CSA trees were introduced some decades ago [4], and they are still very popular as parts of other circuits (multipliers [10], FIR [7], cryptography [16], DSP and other), tools that support automatic HDL generation of CSA multioperand vectors of arbitrary bit patterns are absent from the EDA landscape, due to the complexity of their design. Designing CSA trees is a very tedious process, especially when the designer requires the addition of multioperand vectors of non-standard patterns (such as vectors carrying non continuous input bit patterns). All the commercial IP block generators, can create HDL descriptions of adders using CSA, but they only support specific bitwidths (8, 16, 32, 64, etc.), with every vector of the module to have the same size, and every vector to carry only continuous bit patterns. This means, that the derived circuit will be unoptimized, if the vectors are sparsed and utilize only some bit positions. The more the derivation from the standard design parameters that accept all the commercial tools, the more the unoptimization will happen. Designing performance and energy consumption optimized circuits, demands an IP block generator with greater flexibility. Finally, EDA tools that are public ac- cessible from the web are very scarce [12]. Carry save adders are used intensively in designing computer arithmetic modules. One of the area that makes heavy use of them is Residue Number System [15], a non conventional weightless arithmetic system. Due to the nature of this arithmetic system, it requires addition of various bit patterns. Many authors have presented circuits during the last fifty years that utilize carry save adders as core components of it, like multioperand modular adders [8], multipliers [11], RNS to binary (forward and reverse) converters [13] and a multitude of other modules. All these circuits consist of carry save adders, that process non standard sparse input vectors, and cannot be designed automatically using EDA IP Block generators. In fact, the complete absence of a flexible tool from the EDA landscape to produce syntactically correct and synthesiz- able HDL code for adding custom input vectors, was our primary motivation for our work. Even though CSA design is a heavily used arithmetic module, and thus making it an important module in circuit designs, nobody until now has presented a tool to automate the design and test of parametrized CSA trees. The derived CSA trees are optimized according to the heuristic of using the least number of component, and thoroughly tested using random input vectors, supplied to the end user. Therefore, the novelty of this paper lies in the fact that we present the first web based tool to automate the optimized CSA design process, and increase the productivity of every engineer that requires a custom CSA module. Such custom CSA modules cannot be designed with IP-blocks generators from major EDA Companies (like Altera or 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) 978-1-4799-4972-4/14/$31.00 ©2014 IEEE

[IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

  • Upload
    minas

  • View
    216

  • Download
    4

Embed Size (px)

Citation preview

Page 1: [IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

Generation and validation of multioperand carry save adders from the web

Minas Dasygenis

Department of Informatics and Telecommunications EngineeringUniversity of Western Macedonia, Kozani, 50100, Greece, [email protected]

Abstract

Many arithmetic circuits utilize multioperand addition,usually using carry-save adders (CSA) trees. Automaticgeneration of custom VHDL models for these CSA trees,allows the designer to perform a time efficient design spaceexploration. Although, the CSA trees are heavily utilized inmodern digital circuits, there is no tool, accessible from theweb, to generate the HDL description of such multioperanddesigns. To the best of our knowledge, our novel tool isthe first one to automate the design of optimized CSA treesand simultaneously provide custom testbenches to verifytheir correctness. Our synthesized circuits on Xilinx Virtex6 FPGA, operate up to 724 Mhz.

I. Introduction

The design automation and test processes (DAT) playa crucial role in contemporary multi-billion transistor era.One of the aspects of DAT is the fast parametrized gen-eration of bit accurate models and their testvectors, in ahardware description language. This enables the designersto perform a rapid design space exploration and selectthe best custom implementation. Especially, the HDLgenerators for constructing circuits that are required inalmost every digital system have an increased importance.Addition is by far the most important elementary operationwithin digital systems [14]. One very effective techniqueto perform addition with multiple vectors, is the carry saveadder tree (CSA).

Even though CSA trees were introduced some decadesago [4], and they are still very popular as parts of othercircuits (multipliers [10], FIR [7], cryptography [16], DSPand other), tools that support automatic HDL generationof CSA multioperand vectors of arbitrary bit patterns areabsent from the EDA landscape, due to the complexityof their design. Designing CSA trees is a very tediousprocess, especially when the designer requires the additionof multioperand vectors of non-standard patterns (suchas vectors carrying non continuous input bit patterns).

All the commercial IP block generators, can create HDLdescriptions of adders using CSA, but they only supportspecific bitwidths (8, 16, 32, 64, etc.), with every vectorof the module to have the same size, and every vectorto carry only continuous bit patterns. This means, thatthe derived circuit will be unoptimized, if the vectors aresparsed and utilize only some bit positions. The more thederivation from the standard design parameters that acceptall the commercial tools, the more the unoptimization willhappen. Designing performance and energy consumptionoptimized circuits, demands an IP block generator withgreater flexibility. Finally, EDA tools that are public ac-cessible from the web are very scarce [12].

Carry save adders are used intensively in designingcomputer arithmetic modules. One of the area that makesheavy use of them is Residue Number System [15], anon conventional weightless arithmetic system. Due to thenature of this arithmetic system, it requires addition ofvarious bit patterns. Many authors have presented circuitsduring the last fifty years that utilize carry save adders ascore components of it, like multioperand modular adders[8], multipliers [11], RNS to binary (forward and reverse)converters [13] and a multitude of other modules. Allthese circuits consist of carry save adders, that processnon standard sparse input vectors, and cannot be designedautomatically using EDA IP Block generators. In fact,the complete absence of a flexible tool from the EDAlandscape to produce syntactically correct and synthesiz-able HDL code for adding custom input vectors, was ourprimary motivation for our work.

Even though CSA design is a heavily used arithmeticmodule, and thus making it an important module in circuitdesigns, nobody until now has presented a tool to automatethe design and test of parametrized CSA trees. The derivedCSA trees are optimized according to the heuristic of usingthe least number of component, and thoroughly testedusing random input vectors, supplied to the end user.Therefore, the novelty of this paper lies in the fact that wepresent the first web based tool to automate the optimizedCSA design process, and increase the productivity ofevery engineer that requires a custom CSA module. Suchcustom CSA modules cannot be designed with IP-blocksgenerators from major EDA Companies (like Altera or

2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS)

978-1-4799-4972-4/14/$31.00 ©2014 IEEE

!

Page 2: [IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

Xilinx). Our tool is accessible using any standard webbrowser, even from a mobile device, and user friendly.

To the best of our knowledge, no CSA tool exists up tothis date, that accepts an arbitrary number of input vectors,each one consisting of several continuous or sparse bitpatterns, with various bit lengths, accessible to everybodyvia a typical browser, and generate a syntactically correctregister transfer level description in HDL, accompaniedwith a random generated testbench to validate the cor-rectness. We were motivated by this and we worked ondeveloping such a tool. Here, we present the outcome ofour work: a web based tool to assist in the rapid designspace exploration of CSA tree generation. Our tool acceptsas inputs the number of vectors, their bit patterns andweights, and generates a synthesizable RTL description inthe VHDL language. The CSA generation is optimized interm of using the least number of components, becauseit can decide whether to use a full adder, a half adderor to schedule the placement of an adder to a futuredesign iteration for a more efficient component utilization.Furthermore, it generates a custom testbench file to verifythe circuit’s correctness and a schematic. Finally, it reportsmetrics of performance (delay) and hardware (number ofcomponents and transistors).

The rest of the paper is structured as follows: The nextsection (Section II) discusses related work in the field ofautomating HDL generation. In Section III we present ourweb based tool and describe in details its modules, whilein Section IV we provide test cases with performanceand hardware metrics. Finally, in Section V we give theconcluding remarks.

II. Related Work

Reconfigurable computing systems, System-on-Chip,embedded systems and other complex systems, usuallyhave many design requirements, sometimes conflicting,that demand a thorough design space exploration. In arecent research work [18] the authors pinpointed that eventhough during the last 15 years multiple research projectshave worked on automating the HDL generation, theoutcomes are not encouraging. Furthermore, they concludethat ”despite all these efforts the automated hardwaregeneration is yet to become a widely adopted industrialpractice”, due to (a) lack of automation and (b) lack ofoptimization. Due to our agreement with these authors, weconstructed our tool to aid toward the EDA automationprocess.

All major EDA companies (Altera, Xilinx, Mathworks,etc.) support customized IP blocks for various functions.Xilinx provides the “Core Generator”, while Altera pro-vides the “MegaWizard Generator”. These generators canbe used to create HDL descriptions of typical addersfor a specific bitwidth of two vectors, but not for fullyparametrized carry save adders.

Automating VHDL generation for various design mod-ules has been a topic under research for more than 15years. Daveau et al [3] presented an approach to allowthe generation of VHDL from SDL models. The workwas limited only on the specification and no EDA toolwas ever created. Daitx et al [2] presented a tool to createVHDL description of FIR filters according to a coefficientspecification file. Their tool is not available online, incontrast with our tool.

On the other hand, automating High level HDL genera-tion is the focus of multiple research projects. All thesegenerators accept a high level description of the code,usually in C, and create an HDL description of it. Themajor drawback of high level synthesis, is that input codeshould adhere to a lot of requirements, like regular memoryaccess, perfectly nested loops, in some cases absence ofpointers and so on, while all the tools have a limitedapplication domain. The ROCCC project [1] supportsmainly streaming applications. The Streams-C project [6]and DWARV [17] also support a very limited applicationdomain. Also, the majority of EDA companies on thefield of IC design provide High level synthesis solutions(Vivado, Synphony, Altium, C-to-Silicon, Triton, etc.) butall of them support typical design requirements and cannotprovide the granularity required to design optimized CSAtrees. Mathwords also, provides the ”HDL Coder”, whichis a high level synthesizer for Matlab functions, modelsand charts. This can not be used to create parametrizedCSA modules, because CSA modules demand bit levelarithmetic, and not high level design.

As it can be observed, our novelty lays in the automateddesign of the optimized HDL of the custom CSA tree, fromthe web. Our tool can produce a syntactically correct codefor multiple input vectors, either sparse or continuous, ofvarious input lengths, something that is not supported byother EDA tools.

III. The web CSA hardware compiler

Our tool, which has been installed on a public webserver1, utilizes a number of technologies (PHP, Python,JSON) in order to deliver a syntactically correct andsynthesizable VHDL description. Our tool is partitionedin two different departments, according to their function:the front-end and the back-end. These modules exchangeinformation using the javascript object notation (JSON)format [5].

The tool’s back-end provides the core functions, andfor this reason we will focus only on it. This back-endconsists of three modules: (i) the CSA design module,which analyzes the user inputs and creates the specificdesign description in a special netlist format, (ii) the HDLGenerator module, which takes as input this netlist format

1The tool is available at http://arch.icte.uowm.gr/hdl/csa.php

!

!

Page 3: [IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

and creates signals, networks, assignments, and connec-tions, resulting in the output description in VHDL, and(iii) the VHDL Testbench creator, which takes as inputthe constructed data structures of the previous module,and generates a full VHDL testbench, with handles forautomatic design validation.

A. CSA design module

The first module is the CSA design module, whichcreates a netlist in an internal format developed at ourlaboratory, which we call it α-HDL format, and operatesin three stages. At the first stage, it accepts as inputan arbitrary array of vectors, supplied by the user. Eachvector carries a series of ‘1’ and ‘0’ in an weighted list,starting from the least significant bit on the left, up tothe most significant bit on the right. If a location carries‘1’, then this location provides a bit that has to be takeninto consideration, otherwise this location is not used.Using this notation, the user can supply sparse vectors,that is vectors that carry bit in some positions, and are notcontinuous. For example, a sparse vector that carries onlyinformation in the 0th bit and the 2nd bit position, can benoted (the left bit is the LSB bit) as X − X−, with Xbeing the position that should be taken into considerationfor the computation, and − is an empty position. Thisexample denotes that the possible values of this vector are1− 1− (5), 1− 0− (1), 0− 1− (4) and 0− 0− (0), withthe corresponding decimal number placed in parenthesis.Even though it may seem strange to use sparse vectors,circuits in special arithmetic circuits, like residue numbersystem make heavy use for them.

The input vectors accepted at the beginning of theCSA design module define the operands to the CSAtree, with an ‘1’ in a given vector to define a bit thathas to used for computation and 0 to define an emptyposition. For example, the vector v1 = 00a2a3a4a500a8,or −−XXXX −−X is described with 001111001 or inpython syntax [0, 0, 1, 1, 1, 1, 0, 0, 1]. Similarly, the vectorv2 = b0b1b2b30 is described with 11110 or in pythonsyntax [1, 1, 1, 1, 0]. When a vector has ‘1’ in a position,this vector contributes one bit to this column. For theexample with these inputs, the two dimensional array ofvectors will be: [[0, 0, 1, 1, 1, 1, 0, 0, 1], [1, 1, 1, 1, 0]].

The second stage of the CSA module is the reduction,which consists of many iterations. In every iteration i thereduction stage scans all columns j starting from the leastsignificant column, locates the columns that have morethan 1 bit and places full adders (FA) or half adders (HA).The placement of adders is done in the best efficient way,in order to minimize the total number of FAs or HAs, asfollows: Until the total number of bits to be added are over2, full adders are placed in the netlist, with their outputcarry registered for future processing at the next iteration(i + 1), at the next column (j + 1), and their output sumregistered for future processing, at the next iteration (i+1),

same column (j). If the number of bits to be added is 2,then the tool examines whether to add a HA, or to delaythe insertion of the HA in favor of a better utilization in afuture iteration. The tool will not add an HA when a carryhas been registered at the next iteration (i + 1), for thiscolumn (i), because a FA can be used in the next iterationto add all three bits. Also, the tool will not add an HAwhen two bits have been registered at the next iteration(i + 1), of the previous column (j − 1), because in theiteration (i+1) a carry bit will be created and be registeredat iteration (i+ 2), column (j). Thus, in iteration (i+ 2),a FA can be used to add all three bits. When the totalnumber of bits in a column is less than 2 the reductionstage completes.

The third stage of the CSA module is the final additionusing a ripple carry adder. This stage, which is alsooptimized, places the best number and types of adders.This is done by checking the total number of bits to beadded in every column (0 or 1 or 2), and then decideswhether to direct connect the column to the output (whenthe bits are 0 or 1 and no carry has been generated inthe previous column), to place a HA (when this columncarries two bits and no carry bit has been generated in theprevious column, or this column carries one bit and a carrybit has been generated in the previous column), or to placea FA (when 3 bits have to be added).

The CSA design module also accepts as input the optionto pipeline the design or not. The pipelined design uses Dflip flops (DFF) to delay input and output bit columns,and increases the throughput of the design, with the costof increased hardware (Section IV). The tool carefullyadds delay units both to inputs and output columns, for auniform delay to every bit. Currently, there is no pipelineoptimization or parametrization: a DFF is utilized afterevery adder.

B. HDL Generator Module

This is a general purpose VHDL generator module thatcan be easily connected to many different generators. Thismodule accepts as input a special and compact netlistformat, which we name it abstracted HDL α-HDL. Thisis a special type of netlist, that can be visualized asa hypercube (Fig. 1), in which every node correspondsto a component, and carries the vectors that define theinput connections. This means that they convey only theinformation as to the component that generated the signal,the output port of that component, as well as the bitwidthand signal type. By defining only the input connectionvectors, a very compact and easy to generate netlist formatis created. This netlist format is encompassed in a singlejavascript object notation format (json). This netlist formatdoes not belong to the scope of this paper, and thus wewill not describe it further.

The HDL Generator module consists of five stages:

!

!

Page 4: [IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

Fig. 1. The visualization of the α−HDL netlistformat forms a hypercube

(i) validation, (ii) top level input output, (iii) port map-ping, (iv) HDL generator, and (v) schematic generator(Figure 2).

Fig. 2. The HDL Generator accepts our com­pact netlist and outputs the VHDL files andthe schematic

At the validation stage, the α-HDL format is checkedfor inconsistencies, like the existence of vectors that carryinvalid coordinates. Also, every component used in thenetlist is checked with the HDL library of the tool toverify its support. The HDL generator maintains a libraryfor many primitive components (FA, HA, logic gates andmore). Every component provides the architectural descrip-tion, the component description, the entity description andthe number of in, out, inout and buffer ports. The numberand type of ports is important at the portmapping stage.

At the second stage, the toplevel input and outputanalysis is performed. The tool examines every vector, andregisters the input vectors that point as origin an input portof the design. It should be noted that the netlist carries noexplicit input and output port information. The input andoutput ports are discovered from the vectors. The vectorsare analyzed and the top level input ports are defined interms of port number, port bitwidth, signal type, and porttype. In a similar way, this stage defines the output portsof the design. The results of this stage are two structures

that define the input and output ports.

At the third stage, the portmapping operation occurs. Inthis operation signals are created and connected to specificport numbers and port types. For every component, theinput vectors are examined in order to form the connectionpair between the originating component and the destinationcomponent. In case that a signal name already exists withthe same attributes (originating component, originatingport, type and bitwidth), this signal is registered to theportmap structure, otherwise a new signal name is created.The portmap structure for every component, carries foursections. The ‘in’, ‘out’, ‘inout’ and ‘buffer’ sections,similar with the VHDL port types. Every port section is anordered collection of signals, which define the port numberof this port type (for example a signal name located atposition ‘2’ of out port section, signifies that this signalwill be connected to out2 during the VHDL creation).Index ports that have no connection are marked ‘open’ orconnected to a ‘ground’ signal. The outcome of this stageis the derivation of the portmapping data structure and thesignals data structure.

At the fourth stage, the HDL generation occurs. Cur-rently, only VHDL can be generated but this stage canbe extended to cover Verilog HDL generation. This stageoutputs two VHDL files: (1) a VHDL file that carries theentities and the architectural descriptions of all primitivecomponents used in the design, and (2) a VHDL file thatdescribes the design derived from the netlist. Here, the toolplaces the appropriate library declarations, which dependon the type of signals used. Then, it instantiates everyprimitive component used. Using the input and outputports data structure, it creates the entity definition of thedesign. The next task is to define signals, using the signaldata structure, created before. After this, portmappingdeclarations strictly adhering to the VHDL syntax areperformed, using the appropriate data structure. Followingthis step, the signal assignment phase occurs, in whichsignals are connected to constant values, or signals aresuitably connected to other signals of different bitwidth.

At the fifth stage, the block schematic of the designis created, for a visual representation. Using the DOTvisualization language, a graph is constructed and renderedas PNG picture. This stage make use of the portmapping,input and output port data structures in order to completeall connections. Furthermore, it annotates the connectionwith the signal names, input and output port of the connec-tion as well as bitwidths. Finally, it uses colors to indicatethe input and output paths.

With the creation of the above files some metrics arealso reported to the designer, which are the quantities ofevery component used and the delay, measured in termsof stages. It has to be noted that the tool does not performany synthesis at all. The metrics of transistor count arecomputed using standard calculations, for example an onebit Full Adder requires 28 transistors, a 2 input OR gaterequires 6 transistors, and so on [9]. Concerning the delay,

!

!

Page 5: [IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

it corresponds to the worst case path from an input vectorto the computation of the output vector, which is measuredby counting the D flip flops that are used as pipelinelatches. Finally, the transistor count is reported both forevery component, and in total, in case the user wouldlike to implement this design in CMOS [9]. Our tool doesnot synthesize the circuit, because its purpose is different.It is used to generate optimized VHDL codes, which theengineer will use in his own synthesis tool.

C. HDL Testbench Generator

This module, is of utmost importance, because it createsmultiple vectors of testbenches, which can be used totest the correctness of the design in an HDL simulator.As it is evident, the multioperand CSA design is a verycomplicated process and should be tested thoroughly. Ourtool accepts as input the number of input cases to create,and generates the testbench VHDL file. To do this, first itcreates an empty entity declaration, then it instantiates thetop level component and creates signals for every inputand output port. Furthermore, it creates a clock processand a function that is used to convert bits to integer. Thenext step is to create the requested number of input testcases. For the number of input test cases, the followingloop is performed: for every operand a random number iscreated and boolean ANDed with the operand bitmask, aswas given in the beginning. For example if the user hadsupplied for the first input vector the mask XX − − or1100, then every random generated value, will be BooleanANDed with 1100 resulting in one of the following values(lsb is the left bit): 1100 (3), 1000 (1), 0100 (2), 0000 (0).This number is converted to binary, is extended to the fullbitwidth of the operand, and is assigned to the signal that isassociated with this input. The tool sums all the generatedoperands and precomputes the final result. After the valueassignments, it insert a wait clause for the delay, whichwas computed in the previous stage, and then constructs an’assert’ statement to check the output. For example, if onevector has the random value (lsb is the right bit) 100000b(32), and another has the random value 100011b (35), thenthe following lines will be added to the generated VHDLfile:

-- input vector: 32signal0<="100000" AFTER 8 ns ;-- input vector: 35signal1<="100011" AFTER 8 ns ;wait for waittime * 1 ns ;assert (vec2int(signal4) /= 67 )report "TESTBENCH OK" severity note ;assert (vec2int(signal4) = 67 )report "TESTBENCH Output:"&integer’image(vec2int(signal4))&"Cor:"&integer’image(67) severity error ;

In the HDL simulator, if the output value of the designis the same with the value computed before, then the testpasses and a success message is printed. Otherwise, atestbench fail message is reported, with specific details.Of course, the CSA designs that are created by our toolpass all the tests every time, which can be verified byanonymous users of our web tool.

All the testbench vectors are created randomly, accord-ing to the requested number of tests by the user. Also, allthe checks are done automatically, which means that thedesigner can load the testbench file into his HDL Synthesisand Simulator tool, and can execute it without any otherintervention. If a vector fails then the message will beautomatically registered and can be reviewed by the useranytime. Using the automated testbenches the user caneasily and reliable test very big numbers of long vectors(for example 10x64bit). Of course in any case, no errorwill be ever reported due to the thorough testing of ourtool.

IV. Experimental Results

In order to evaluate the efficiency of our web tool,we generated a large number of VHDL descriptions fordifferent design parameters. We have to note that due tothe absence of a similar tool from the EDA ecosystem,the results cannot be compared with the output of otheravailable tools. For this reason we provide some indicativeresults of using our tools, which everyone can easily verifyby visiting our tool’s web page.

Some of these designs are summarized on Table I.Specifically, the nrvectors and bits columns, define thenumber of input vectors and bits per every vector, respec-tively. Accordingly, the signals, adders and DFF, showthe number of internal signals used, the total quantityof the adders and the D flip flops. This table illustratesthe importance of a tool to generate syntactically andoperationally correct hardware descriptions of circuits,especially for non-trivial bitwidths and inputs.

TABLE I. Automatically Generated CSA codes#nrvectors #bits signals delay adders DFF

3 4 50 6 8 193 8 144 10 16 856 16 654 20 80 3958 32 2359 37 223 165210 64 8459 71 585 6636

Concerning the runtime of the tool, we provide theTable II, which shows the execution time of each moduleand the maximum runtime memory footprint, for eachof the parameters of the previous table, running on IntelCore2 Duo E7600 3.06 GHz with FreeBSD operatingsystem. Our EDA tool has very low memory require-ments, even for complicated design parameters, which isimportant for a web application, where many users are

!

!

Page 6: [IEEE 2014 9th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS) - Santorini, Greece (2014.5.6-2014.5.8)] 2014 9th IEEE International Conference

expected to use it simultaneously. Also, the execution timeis reasonably low (lower than 2 minutes, can be easilyverified anonymously by visiting the tool’s web page) forvery demanding design parameters (e.g. 20× 64 bits), andsome seconds for less complex cases. It should be notedthat the testbench module always creates the testbench inless than 2 seconds, while the module that creates theschematic has an execution time linear to the product ofthe number of components times the signals. Of course,generating a single block schematic where 7000 elementswith their connections are placed is not useful and practicalfor the designer, and it is not used in a hierarchical design.

TABLE II. Performance Metrics of the webtool#testcase #CSA mod(s) HDL mod(s) Mem(MB)

(#nrvectors×#bits)3x4 bits 1 1 303x8 bits 1 1 30

6x16 bits 4 3 408x32 bits 7 4 4510x64 bits 40 58 50

Additionally, we synthesized the generated VHDLcodes with Leonardo Spectrum, Xilinx Vivado 2013.2 andAltera Quartus II 12.0. The synthesis results from XilinxVivado (Virtex6, speed grade -2) show that for complicatedinput patterns (like 10 vectors of 64bit each, 10x64),the occupied slices for the pipeline version are 1166,with operating frequency 667Mhz and power consumption4.4 Watt. For other bit patterns (like 8x32), the operat-ing frequency is 724Mhz, with 502 occupied slices, andpower consumption 2.8 Watt. As expected, circuits withoutpipeline have very low demand in slices, but operate atvery low frequencies, due to the critical path. For example,the 10x64 without pipeline requires 520 LUT slices andexhibits operational frequency 58Mhz (17ns critical path).

Concerning the validation, for every testcase our toolconstructed 100 random test vectors that were used inModelsim to check automatically the correctness of thedesign. Every CSA tree passed succesfully all tests. Thisvalidation can be easily performed by every designer,by visiting the web page, downloading the VHDL files,compiling them at his workstation and simulating them.

V. Conclusions

Design automation and fast circuit verification are thefoundations for increasing productivity and achieving thefast time-to-market constrains. Multioperand carry saveadder tree, is an important component that is used almost inevery digital system. Our contribution is a novel web tool,for designing custom multioperand CSA devices, bearingno restriction in the number, size, or bit patterns. The tooloutputs the synthesizable VHDL description, a custom andautomated testbench, block schematic, and other metrics,and is available for every anonymous user over the Internet.

Synthesis results indicate the high performance on theXilinx Virtex 6 FPGA, with operational frequencies up to724Mhz. The generated code can be synthesized in FPGAor in ASIC projects.

References

[1] J. Computing, “ROCCC 2.0 (riverside optimizing compiler forconfigurable computing), C to HDL,” (On-line) http://www.jacquardcomputing.com/roccc/, 2014.

[2] F. Daitx, V. Rosa, E. Costa, P. Flores, and S. Bampi, “VHDLgeneration of optimized FIR filters,” in Signals, Circuits andSystems, 2008. SCS 2008. 2nd International Conference on, 2008,pp. 1–5.

[3] J.-M. Daveau, G. F. Marchioro, C. A. Valderrama, and A. A.Jerraya, “VHDL generation from SDL specifications,” 1996.

[4] P. Denyer and D. Myers, “Carry-save arrays for VLSI signalprocessing, john gray, ed.” in Proc. VLSI 81, 1981, pp. 151–160.

[5] I. E. T. Force, “Introducing JSON,” September 2013. [Online].Available: http://www.json.org/

[6] M. Gokhale, J. Stone, J. Arnold, and M. Kalinowski, “Stream-oriented FPGA computing in the Streams-C high level language,”in Field-Programmable Custom Computing Machines, 2000 IEEESymposium on, 2000, pp. 49–56.

[7] K.-Y. Khoo, Z. Yu, A. N. Willson, and Jr., “Efficient implementa-tion of FIR filters using bit-level optimized carry-save additions,”2000.

[8] S. J. Piestrak, “Design of residue generators and multioperandmodular adders using carry-save adders,” IEEE Transactions onComputers, vol. 43, no. 1, pp. 68–77, January 1994.

[9] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital integratedcircuits- A design perspective, 2nd ed. Prentice Hall, 2004.

[10] B. S. Raminder Preet Pal Singh, Parveen Kumar, “Performanceanalysis of 32-bit array multiplier with a carry save adder andwith a carry-look-ahead adder,” International Journal of RecentTrends in Engineering, Vol 2, No. 6, November 2009, vol. 2, no. 6,pp. 83–86, 2009.

[11] M. R. Reshadinezhad and F. K. Saman, “A novel low complexitycombinational RNS multiplier using parallel prefix adder,” IJCSIInternational Journal of Computer Science Issues, vol. 10, no. 2,pp. 430–440, March 2013.

[12] A. Rettberg and W. Thronicke, “Embedded system design basedon webservices,” in Proceedings of the Design, Automation andTest in Europe, 2002.

[13] A. S. Shubham Kaushik, “Design of RNS converters for modulisets with dynamic ranges up to 6n-bit,” IOSR Journal of VLSI andSignal Processing, vol. 2, no. 6, pp. 14–19, July 2013.

[14] E. E. Swartzlander, Computer Arithmetic. Los Alamitos, CA:IEEE Computer Society Press, Los Alamitos, CA, 1990, vol. 1.

[15] N. Szabo and R. Tanaka, Residue Arithmetic and its Applicationto Computer Technology. New York: McGraw-Hill, 1967.

[16] H. Thapliyal and M. Zwolinski, “Reversible logic to cryptographichardware: A new paradigm,” in Circuits and Systems, MWSCAS’06, vol. 1, Aug 2006, pp. 342–346.

[17] Y. Yankova, G. Kuzmanov, K. Bertels, G. Gaydadjiev, Y. Lu,and S. Vassiliadis, “DWARV: Delftworkbench automated recon-figurable VHDL generator,” in Field Programmable Logic andApplications, 2007. FPL 2007. International Conference on, 2007,pp. 697–701.

[18] Y. Yankova, K. Bertels, S. Vassiliadis, R. Meeuws, and A. Virginia,“Automated HDL generation: Comparative evaluation.” in ISCAS.IEEE, 2007, pp. 2750–2753. [Online]. Available: http://dblp.uni-trier.de/db/conf/iscas/iscas2007.html#YankovaBVMV07

!

!