Timing-driven logic restructuring for nano-hybrid circuits

This article was downloaded by: [Southern Taiwan University of Science and Technology]On: 24 November 2014, At: 01:01Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of ElectronicsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tetn20

Timing-driven logic restructuring fornano-hybrid circuitsZhufei Chu a , Yinshui Xia a , William N. N. Hung b , Xiaoyu Song c

& Lunyao Wang aa School of Information Science and Engineering , NingboUniversity , Ningbo 315211 , Chinab Synopsys, Inc. , Mountain View , CA , USAc Department of Electrical and Computer Engineering , PortlandState University , Portland , OR , USAPublished online: 21 Sep 2012.

To cite this article: Zhufei Chu , Yinshui Xia , William N. N. Hung , Xiaoyu Song & Lunyao Wang(2013) Timing-driven logic restructuring for nano-hybrid circuits, International Journal ofElectronics, 100:5, 669-685, DOI: 10.1080/00207217.2012.720945

To link to this article: http://dx.doi.org/10.1080/00207217.2012.720945

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &

http://www.tandfonline.com/loi/tetn20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00207217.2012.720945

http://dx.doi.org/10.1080/00207217.2012.720945

Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

http://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/page/terms-and-conditions

Timing-driven logic restructuring for nano-hybrid circuits

Zhufei Chua, Yinshui Xiaa*, William N. N. Hungb, Xiaoyu Songc

and Lunyao Wanga

aSchool of Information Science and Engineering, Ningbo University, Ningbo 315211, China;bSynopsys, Inc., Mountain View, CA, USA; cDepartment of Electrical and Computer

Engineering, Portland State University, Portland, OR, USA

(Received 21 October 2011; final version received 15 July 2012)

As the feature size of the integrated circuits (ICs) scales down, the future of nano-hybrid circuit looks bright in extending Moore’s Law. However, mapping acircuit to a nano-fabric structure is vexing due to connectivity constraints.A mainstream methodology is that a circuit is transformed into a nano-fabricpreferred structure by buffer insertion to high fan-out gates. However, it mayresult in timing degradation. Logic replication is a traditional way to split highfan-out gates in logic synthesis but may not be suitable for high fan-out gates withhigh fan-ins. In this article, a timing-driven logic restructuring framework at thegate level is proposed. The proposed framework identifies the high fan-out gatesfrom a given gate netlist according to the fan-out threshold, following by therestructuring of high fan-out gates through the application of logic replicationand buffer insertion. To improve circuit timing from a global perspective, latentcritical edges are identified to avoid entrapping critical paths during therestructuring. Experimental results on ISCAS benchmarks indicate that 8.51%timing improvement and 6.13% CPU time reduction can be obtained traded with4.16% area increase on an average.

Keywords: nano-hybrid circuit; logic restructuring; timing; optimisation

1. Introduction

As the feature size of the integrated circuits (ICs) scales down, the future of nano-hybridcircuit looks bright in extending Moore’s Law. A good many nano-hybrid architecturesconsist of nanowires (Yan et al. 2011), programmable molecule ‘memristive’ devices(Strukov, Snider, Stewart, and Williams 2008; Borghetti et al. 2010) and CMOS logic. Theinputs and outputs of a nanoscale circuit have to interconnect with sub-micron CMOSultimately. The CMOL (Cmos/nanowire/MOLecular hybrid) structure proposed byLikharev et al. is one of them and accomplishes this by connecting two external cones ofdifferent heights and sharp tips to two levels of the crossbar (Likharev and Strukov 2005;Strukov and Likharev 2005; Strukov and Mishchenko 2010). The field-programmablenanowire interconnect (FPNI) architecture proposed by Snider et al. generalises theCMOL with larger pads and sparser crossbar for easy fabrication (Snider and Williams2007). However, the nanowires of both CMOL and FPNI are periodic break, which causesa physical routing constraint. By programming the diode-like molecule switches, we can

*Corresponding author. Email: [email protected]

International Journal of Electronics, 2013

Vol. 100, No. 5, 6 –

© 2013 Taylor & Francis

669 85, http://dx.doi.org/10.1080/00207217.2012.720945

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

only connect a gate to limited neighbor gates directly, which forms the so-calledconnectivity domain.

There are some methods in the public domain to solve CMOL cell mapping problemresulted from connectivity domain. Hung, Gao, Song, and Hammerstrom (2008) encodedthe CMOL cell mapping problem as a satisfiability problem, that is, the Booleanconstraints are satisfiable if and only if there exists a solution to map a circuit to CMOLcells. However, there are two shortcomings for this method: first, it does not seem scalablewith the circuit size; second, when the number of gate fan-outs is larger than the totalnumber of cells in the connectivity domain, the circuit cannot be mapped. Chen, Song, andHu (2009) revealed that some circuits can be transformed to an equivalence circuit bybuffer insertion and are placeable under reasonable connectivity domain size. However,the proposed method may cause severe timing degradation. Our previous works (Chu, Xia,Hung, Wang, and Song 2010; Xia, Chu, Hung, Wang, and Song 2011) revealed that highfan-out gates result in connectivity constraints and further experimental results show thatthe timing of the mapping solution and the CPU time of the algorithm have a closerelationship with those high fan-out gates. We define those high fan-out gates as criticalgates. These findings motivate us to address the cell mapping problem of nano-hybridcircuits through circuit restructuring to solve routability and avoid timing degradation.There are two typical methods for circuit restructuring: logic replication and bufferinsertion.

Logic replication is a common restructuring technique for improving the circuitperformance by replicating one or more logic cells while maintaining logic functionequivalence. It is widely used for thermal reduction in very large-scale integrated (VLSI)circuits (Schafer and Kim 2009) and logic synthesis (Hrkic, Lillis, and Beraudo 2006; Kimand Lillis 2008). Generally, it can be coupled with placement or mapping procedure byrestructuring logic circuit to optimise some parameters. Though logic replication wasinvestigated for increasing the reliability of nanoscale digital logic circuit (Chen 2007),there is little research on improving the nano-hybrid circuit timing issue.

Buffer insertion is used to optimise delay or noise in VLSI circuits (Zhou, Wong, Liu,and Aziz 2000; Sze, Alpert, Hu, and Shi 2007). Given the basic CMOS logic gate, inverter,in nano-hybrid fabric, a buffer can be easily implemented by connecting a pair of inverters.Buffer insertion can be used in two aspects:

(1) Splitting buffer: before mapping the netlist to nano-hybrid fabric, buffer insertionis used for high fan-out gate splitting at the gate level.

(2) Routing buffer: after the mapping algorithm terminates, buffer insertion is used toextend the connection for those gates in the netlist which violate the connectivityconstraints, so that effect connectivity domain is enlarged.

Circuit restructuring is that a circuit is converted into a logic equivalent one, which canbe mapped to a fabric preferred structure under connectivity domain constraints. It isimportant because a good logic restructuring can reduce mapping complexity and circuittiming penalty with less CPU time. In this work, a timing-driven logic restructuringframework is proposed for nano-hybrid circuit mapping. We identify and sort all the gatesby the fan-out number of gates in a circuit, then determine critical gates according to thelimited area constraint. For critical gates, a quadratic equation is formulated with regardto the gate’s fan-in/out number to estimate the mapping complexity before and afterrestructuring. Then, critical paths and timing slack are found by the traditional statictiming analysis. To obtain no timing penalty after restructuring, latent critical edge is

Z. Chu et al.670

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

introduced and defined as that not on critical path but may entrap in critical path afterrestructuring. Finally, critical gate is restructured through the application of logicreplication and buffer insertion under the timing constraints. The resulting circuit netlist isthen applied to the mapping algorithm to evaluate the results in comparison to the originalcircuit mapping (Xia et al. 2011).

Our main contributions in this article are as follows:

. Developing a determination method of critical gates from a circuit netlist, which isbased on the fan-out number of gates and the size of the nano-hybrid fabric area.

. Proposing a combination of logic replication and buffer insertion to split criticalgates synergistically based on a quadratic mapping complexity estimationequation and critical path information without timing penalty.

The rest of this article is organised as follows. Preliminaries are described in Section 2.Then the main logic restructuring framework is demonstrated in Section 3. Experimentalresults are shown in Section 4 and conclusions are presented in the last section.

2. Preliminaries

Before we describe the details of the proposed framework, preliminaries about nano-hybrid circuit cell mapping and mathematic model are introduced, as they are the basis ofthe entire flow.

2.1. Nano-hybrid circuit cell mapping

Nano-hybrid circuits such as CMOL and FPNI can be abstracted to a two-dimensionalarray, with sizes x� y, the example of a 7� 7 cells array as shown in Figure 1. A circle withradius r is formed with a centre at the core of Cell A. The cells completely surrounded bythe circle form the connectivity domain of Cell A. The radius of the circle is also called the

Aa

x

y

0 1 2 3 4 5 6

0

1

2

3

4

5

r

6

Figure 1. The connectivity domain.

International Journal of Electronics 671

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

radius of the connectivity domain. In the following, we use �Ar to denote the connectivity

domain of Cell A with the radius r.A circuit can be modelled as a directed acyclic graph G¼ (I,V,E,O), which is

composed of primary input (PI) set I, gate set V, edge set E and primary output (PO) set O.Take the circuit shown in Figure 2 for example, I¼ {i1, i2, . . . , i7}, V¼ {g1, g2, . . . , g12},E¼ {e(i1, g4), e(i2, g1), . . . , e(g12, o3)} and O¼ {o1, o2, o3}. Each gate gi2V has its fan-in setFin(gi) and fan-out set Fout(gi). For instance, Fin(g1)¼ {i2, i3} and Fout(g1)¼ {g4, g6}.Additionally, the number of elements in the fan-in (fan-out) set of gi2V is called thenumber of gate fan-ins (fan-outs), which are denoted as Din(gi) and Dout(gi). For instance,Din(g1)¼ 2 and Dout(g1)¼ 2.

Generally, given a circuit G and a nano-hybrid cell array �, the nano-hybrid circuit cellmapping is to find a legal mapping P: G!� from the topological to the physical withinthe connectivity domain constraints. For an edge e(g, g0)2E where g, g0 are adjacent in thenetlist, the mapping location of each gate should be within each other’s connectivitydomain such that g2�g0

r and g0 2�gr . Figure 3 shows a simple circuit composed of two NOR

gates with three PIs and one PO, which is mapped to a 4� 3 nano-hybrid cell array.Note that each gate, PI or PO must be mapped to only and just only one nano-hybrid cell.

i1

i2

i3

i4

i5

i6

i7

o2

o3

g1

g2

g3

g4

g5

g6

g7

g8 g9 g10 g11

g12

level 0 level 1 level 2 level 3 level 4 level 5 level 6 level 7

2

1

6

7

2

3

3

4 5 6 7

7

critical edgelatent critical edge

o1

Figure 2. Logic structure of NOR gate-based circuit s27.

g1

g2

i1 i2 i3

o1

(a)

g1 g2

i1 i2 i3

o1

(b)

Figure 3. An example of nano-hybrid circuit cell mapping: (a) a simple sub-circuit; (b) the mappingsolution in a nano-hybrid cell array.

Z. Chu et al.672

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

Meanwhile, the mapping solution should satisfy the connectivity constraints, grouping bydot lines as shown in Figure 3(b).

2.2. Timing estimation

The following timing model is based on topological estimation without considering thefalse path and the interconnect delay. However, the model is adequate for nano-hybridcircuit timing estimation. Apart from the model adopted, our framework can work withany timing estimation engine used in both academic and commercial tools.

The NOR gate-based circuit s27 shown in Figure 2 is taken for illustration of definitionsand theorems. Generally, the timing estimation method is similar to traditional statictiming analysis in VLSI circuits.

Definition 1: For gates gi2V, the maximum number of gates along the path from the PIto the output of gi is defined as the gate’s logic level, denoted as L(gi).

From Definition 1, L(PI)¼ 0. Take g4 in Figure 2 for instance, there are three pathswhich start at the PI and end at g4, such as (i1! g4), (i2! g1! g4) and (i3! g1! g4).Thus, the maximum number of gates gi2G along the path from the PI to the output ofgate g4 is 2 and hence L(g4)¼ 2.

The method to calculate the gate’s logic level can be accomplished by the topologicalsorting algorithm. Let Din(g)¼m and Fin(g)¼ {g0, g1, . . . , gm�1} then

Lð gÞ ¼Max0�i5mðLð giÞ þ 1Þ ð1Þ

As shown in Figure 2, the gate logic level is calculated and shown by vertical dash lines,e.g. L(g4)¼L(g5)¼ 2.

Definition 2: Circuit delay DC is the maximum gate logic level along a path from PIsto POs.

Regardless of interconnect delay, the maximum number of logic gates along a pathfrom PIs to POs is the timing delay of the circuit, also called the logic depth. Hence, thecircuit delay can be obtained by computing L(O).

DC ¼Maxok 2OðLðokÞÞ ð2Þ

Definition 3: Edge’s slack S(e(g, g0)) is the number of gates that may be added to edgee(g, g0) before the paths what the edge is located become critical paths.

To compute each edge’s slack, we first compute the required gate’s logic level RL(g) ofeach gate in the circuit. RL(g) is similar to conventional required arrived time in the statictiming analysis. It is the required gate’s logic level that will not cause timing degradation.Considering the circuit delay obtained, we set the required gate’s logic level of all POsas DC and propagate backwards from POs to PIs. Let Dout(g)¼m and Fout(g)¼{g0, g1, . . . , gm�1}, then

RLð gÞ ¼Min0�i5mðRLð giÞ � 1Þ ð3Þ

The required gate’s logic level is labelled in the upper right side of the logic gate symbol inFigure 2, e.g. RL(g4)¼ 7, RL(g5)¼ 2. Then, the edge’s slack S(e(g, g0)) is defined as

Sðeð g, g0ÞÞ ¼ RLð g0Þ � Lð gÞ ð4Þ


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

Definition 4: Edge e(g, g0) is a critical edge if it is on the critical path.

Definition 5: Edge e(g, g0) is a latent critical edge if it is not on a critical path, but willbecome critical edge if the buffers are inserted on this edge during restructuring.

Theorem 2.1: For a specific circuit G, edge e(g, g0)2E is a critical edge if S(e(g, g0))¼ 1.

Proof: For edge e(g, g0), Lð g0Þ ¼Maxgk2Finð g0ÞðLð gkÞ þ 1Þ according to Equation (1),hence

Lð g0Þ � Lð gÞ � 1 ð5Þ

If S(e(g, g0))¼RL(g0)�L(g)¼ 1, then L(g)¼RL(g0)� 1, substitute this into Equation (5),we obtain L(g0)�RL(g0)þ 1� 1, hence

Lð g0Þ � RLð g0Þ ð6Þ

Additionally, from the definitions, the edge’s slack is non-negative and L(g0)4L(g). Weobtain S(e(g, g0))¼RL(g0)�L(g)4RL(g0)�L(g0)� 0, hence

Lð g0Þ � RLð g0Þ ð7Þ

Combining Equations (6) and (7), we have L(g0)¼RL(g0). Hence, edge e(g, g0)2E is on acritical path. The critical edges are shown as bold dashed lines in Figure 2. œ

Theorem 2.2: For a specific circuit G, edge e(g, g0)2E is a latent critical edge ifS(e(g, g0))¼ 2.

Proof: In nano-hybrid circuit cell mapping, let g 6 2 �g0

r and g0 6 2 �gr for edge e(g, g

0). Henceg and g0 cannot be connected directly due to connectivity domain constraints. However, aninverter pair inv1 and inv2 can be inserted as an intermediate buffer to strengthen theroutability. As shown in Figure 4, the logic function is equivalent to the original one for twoinverters act as relay gates. If inv1 2�g

r , inv2 2�inv1r and g0 2�inv2

r , then the mapping issuccessful. Before the buffer insertion, L(g0)�L(g)� 1, after the buffer insertion,L(g0)�L(g)� 3. If S(e(g, g0))¼RL(g0)�L(g)¼ 2, then L(g0)�RL(g)þ 2� 3. HenceL(g0)�RL(g0)þ 1. This is contradicted with L(g0)�RL(g0). Therefore it will cause timingdegradation. We call this edge as the latent critical edge, which is not a critical edge butbecome critical after restructuring. They are shown as a bold solid line in Figure 2. œ

We tested the number of critical or latent critical edges on several ISCAS benchmarks.The percentage of (latent) critical edges is shown in Table 1. It can be seen that a

g

g’

inv1

inv2

(a)

g

inv1

inv2g’

(b)

Figure 4. Buffer insertion and the resulting mapping solution.

Z. Chu et al.674

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

significant percentage of edges is (latent) critical. The average percentage of critical edges is43.4% while that of latent critical edges is 15.4%. In other words, up to 58.8% of edgesshould satisfy the connectivity constraints for avoiding timing delay degradation.Consequently, the restructuring at the gate level for the nano-hybrid circuit can notonly reduce the mapping complexity but also avoid timing degradation.

3. Proposed timing-driven logic restructuring

3.1. Inspiration

The main issue for nano-hybrid circuit cell mapping is raised from high fan-out gates andconnectivity domain constraints. As pointed out in Chen et al. (2009), the high fan-outgates in CMOL cell mapping can be resolved by inserting fan-out splitting buffers andmaintains the logic equivalence of the circuit. Figure 5(a) shows a circuit topology. It hassix gates and six edges. Obviously, g2 is the high fan-out gate and we have Fin(g2)¼ {g1},

Table 1. Percentage of (latent) critical edges.

Circuit Latent critical (%) Critical (%) Total (%)

s444 34.9 15.7 50.6s510 44.9 6.90 51.8s526 35.7 17.4 53.1s641 35.1 8.10 43.2s713 32.9 7.30 40.2s820 43.2 22.9 66.1s832 38.7 26.5 65.2s838 45.1 9.30 54.4s1196 43.4 11.7 55.1s1238 42.1 9.10 51.2C432 46.2 8.60 54.8C499 61.8 25.1 86.9C880 44.4 13.4 57.8C1355 55.1 23.2 78.3C1908 48.2 26.3 74.5

Average(%) 43.4 15.4 58.8

g1

g2

g3 g4 g5 g6

(a)

g1

g2

g3 g4

g5 g6

i1

i2

(b)

g1

g2

g3 g4 g5 g6

g'2

(c)

Figure 5. Illustration of critical gate restructuring: (a) the original sub-circuit; (b) by bufferinsertion; (c) by logic replication.


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

Fout(g2)¼ {g3, g4, g5, g6} and Dout(g2)¼ 4. Let path g1! g2! g6 is a critical path. The

merits and demerits after employing buffer insertion and logic replication to split high fan-

out gates can be compared as follows.

(1) Buffer insertion: with buffer insertion, we take away g5, g6 and insert two NOT gates

i1, i2 (inverters) to form a buffer as (Chen et al. 2009) shown in Figure 5(b). The

logic function is still equivalent while Dout(g2) is reduced to 3. However, extra two

gates and two edges are added against the original circuit. Furthermore, the logic

depth of the critical path is increased from 3 to 5.(2) Logic replication: replicating g2, we obtain g2

0 and split fan-outs of g2 as shown in

Figure 5(c). Consequently, extra one gate and one edge are added. Then, Dout(g2) is

reduced to 2 while the logic depth of the critical path is the same as Figure 5(a).

Comparing the two from the specific case, the logic replication has advantages against

buffer insertion in terms of area overhead and timing degradation.However, if the gate has high fan-outs with high fan-ins, logic replication may consume

significant extra interconnect resources for maintaining logic function equivalence.

Take Figure 6(a) as an example. Considering g2, we have Fin(g2)¼ {g1, g7, g8, g9, g10} and

Fout(g2)¼ {g3, g4, g5, g6}. Extra five edges as g20 fan-ins are added if logic replication is

applied for fan-out splitting as shown in Figure 6(c), which complicates the mapping

complexity since each edge connection should satisfy the connectivity domain constrains.

However, as shown in Figure 6(b), if path g1! g2! g6 is not a latent critical path, the buffer

insertion method only consumes two additional edges and hence no timing issue exists.Therefore, both buffer insertion and logic replication have their merits and demerits

as shown in Table 2. Inspired from this, we propose to combine the two to optimise

g1

g 2

g3 g4 g5 g6

g9g8g7 g10

(a)

g 2

g3 g4

g5 g6

i1

i2

g1 g9g8g7 g10

(b)

g1

g2

g3 g4 g5 g6

g9g8g7 g10

g'

(c)

2

Figure 6. Illustration of critical gate restructuring with high fan-ins: (a) the original sub-circuit;(b) by buffer insertion; (c) by logic replication.

Table 2. Comparison results of logic restructuring.

Logic restructuringmethods Merits Demerits

Buffer insertion Do not take gate fan-innumber into account

More area overhead, may causetiming degradation

Logic replication Less area overhead, no timingdegradation

Should take gate fan-in numberinto account

Z. Chu et al.676

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

circuit restructuring. In the next sub-sections, a mathematic model and case analysis

details will be given to show how the two methods can be properly utilised in order to

improve the circuit performance.

3.2. Determination of critical gates

Determining critical gates is the key for logic restructuring optimisation in terms of area

and timing. We will first propose the judgment condition of critical gates by discussing the

logic replication and buffer insertion in logic restructuring and then present when it will be

beneficial to use logic replication or buffer insertion for a specific critical gate.

Logic replication: To maintain the logic equivalence of a circuit, the replicate gate g0

should have the same fan-ins as gate g. Let gate g2V, Fin(g)¼ {x0, x1, . . . , xm�1} and

Fout(g)¼ {y0, y1, . . . , yn�1}. The replicate gate g0 has to connect to the same m fan-in edges

and part of fan-out edges according to fan-out splitting. The problem is that given m and n

at what point in time it is worthy of replication in terms of mapping complexity. Regarding

this issue, a cost function is required. Given the mapping complexity being directly

proportional to the number of edges, the exponential judgment function is used for the

consideration of larger fan-in/out number with larger penalty. Hence,

f ð gÞ ¼X

all fan-insDinð gÞ

sþ

X

all fan-insDoutð gÞ

sð8Þ

where s is an integer greater than 0. Initially, fan-in cost is ms and fan-out cost is ns. Hence,

before logic replication, f(g)1¼msþ ns. After logic replication, the number of fan-in is

doubled. Hence the fan-in cost becomes 2ms. For an n fan-outs, assume it is split into two

sets which include x and (n� x) fan-outs, respectively. Then the fan-out cost is

xsþ (n�x)s. Hence f(g)2¼ 2msþ xsþ (n� x)s. When x¼ n/2, f(g)2 reaches the minimum

value 2msþ ns/2s�1. Intuitively, we manage to minimise the complexity cost. That is,

Dcost ¼ f ð gÞ2 � f ð gÞ1 ¼ ms �2s�1 � 1

2s�1ns 5 0 ð9Þ

Thus the first condition is m5 h(s)n, where hðsÞ ¼ffiffi½p

s�2s�1�12s�1

. In addition, the lower the

Dcost, the more the complexity reduction is obtained. Given n and s,m5 n, then smaller m

reaches lower Dcost. From the definitions of h(s), we can obtain h(s)js¼1¼ 0, h(s)js¼2� 0.7,

h(s)js�34 0.9. As s increases, h(s) gradually approaches to 1. Since h(s)js¼1¼ 0 will lead to

m5 0, this is impractical. Given n, the smaller h(s) will lead to smaller m. In addition, from

the above discussion, the smaller m will reduce the complexity. Therefore, we chose s¼ 2.

The exponential estimation function is then quadratic and hence

m5ffiffiffi2p

n=2 ð10Þ

However, if n¼ 2, m ¼ 15ffiffiffi2p

satisfy Equation (10). This will result in a good majority of

gates in the circuit being replicated such that the circuit size is rapidly increased. Hence, the

mapping complexity may also be increased. In addition, high fan-out number varies for

circuits to circuits. In the real design flow, the shape of the targeted nano-hybrid fabric is

determined before mapping. Thus, the logic restructuring focuses on timing improvement

without considering limited area constraint is impractical. Under this area constraint, a


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

fraction of gates will be restructured to make a reasonable circuit size. Therefore, a fan-outthreshold DT is introduced as the judgement condition of critical gates.

For each circuit, the threshold DT is determined according to the fan-out number of thegates and the size of the nano-hybrid fabric area. First, we identify and sort the fan-outnumber of gates in an array A[N] with the descending order, where N is the number ofgates in the circuit. Then, the area of nano-hybrid fabric (number of total cells, such asx� y) minus N is the area which can be mapped by the restructured gates. Hence, at mostNr¼x� y�N gates may be restructured by logic replication, DT can be determined by thearray A[N], that is DT¼A[Nr� 1]. The critical gates can be determined if its fan-outnumber larger than DT. However, considering the routing buffers may be inserted aftermapping, we cannot use all those Nr blank cells for logic restructuring. As a matter of fact,DT¼A[(Nr� 1)� p%], where p is an integer among (0, 100).

Buffer insertion: Alternatively, if m �ffiffiffi2p

n=2, logic replication may make mapping evenmore complex. For example, let n¼ 4 and m¼ 5. If logic replication is applied, since m¼ 5and then five extra edges should be added to maintain the same logic function, which alsoincreases the mapping complexity. On the contrary, buffer insertion can be applied on anon-latent critical path for high fan-out splitting without potential timing degradationwith fewer extra edges.

Hence, the combination of both the logic replication and buffer insertion can efficientlysplit the high fan-out without timing degradation.

In summary, for gates which are potential to be restructured, the primary condition isthat the fan-out number of those gates must be larger than the fan-out threshold DT. Thegate is then replicated only if the following conditions are satisfied.

(1) The mapping complexity which is estimated by the quadratic equation is reduced(Equation (10));

(2) The gate has not been restructured in the previous restructuring process.

Since logic replication causes no timing degradation, the current edge can either be the(latent) critical edge or non- (latent) critical edge. Alternatively, the gate is inserted by abuffer within more strict conditions.

(1) The mapping complexity is increased if the logic replication method is used (violateEquation (10));

(2) The current edge must be a non-latent critical edge;(3) The gate has not been restructured in the previous restructuring process.

3.3. Fanout splitting

For all the gates g2G, a 2-tuple Flag(g)¼ (Rf (g),Bf (g)) is defined to form a flaginformation whether the fan-out of gate g is split by the logic replication or bufferinsertion. Initially, Rf (g) and Bf (g) are all set to FALSE. If the gate g is replicated, thenRf (g)¼TRUE. Similarly, Bf (g)¼TRUE if a buffer is inserted to connect gate g. Note thatfor the same gate g,Rf (g) and Bf (g) cannot be assigned TRUE simultaneously.

During the mapping, the solutions are evaluated according to the flag informationFlag(g). Take Figure 7 as an example, when evaluate an edge e(g2, g3), if Flag(g2)¼{TRUE,FALSE}, then we know the gate g2 has been replicated. Let the replicated gate beg20 which has the same logic function with g2, g3 can be mapped within either connectivity

Z. Chu et al.678

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

domains �g2r or �

g02r . Consequently, the fan-out set Fout(g2) in the original circuit can be

mapped within a joint connectivity domain �r ¼ �g2r \�

g02r . The following mapping results

are all legal.

fg3g 2�g2r and fg4, g5, g6g 2�

g02r

fg3, g4g 2�g2r and fg5, g6g 2�

g02r

fg3, g4, g5g 2�g2r and fg6g 2�

g02r

fg3, g4, g5, g6g 2�g2r and �

g02

r ¼ �

� � � � � � � � � � � � � � � � � � � � � � � �

Hence, in the shown case, there are

N ¼ C04 C

44 þ C1

4 C34 þ C2

4 C24 þ C3

4 C14 þ C4

4 C04 ¼ 70

legal results totally. In other words, the critical gate which is constrained by connectivity

domain is relaxed. The fan-out splitting method for buffer insertion is similar.From the example, either the connectivity domain �g2

r or �g02

r can be an empty set Ø.

Note �g2r ¼ � just means no fan-out gate of g2 is mapped to �g2

r . Hence a check-and-

remove algorithm is developed to check the mapping solution. For the gate or replicated

gate, it is determined as a redundant gate and hence is removed from the solution if the

connectivity domain is Ø during checking.

3.4. Algorithm overview

As an overview of the proposed method, we summarise all the major algorithmic elements

in the flowchart shown in Figure 8. To maintain the logic function correctly during

restructuring, we proceed with the logic replication and buffer insertion step by step.The algorithm is generally iterative until all edges in the given gate netlist is traversed.

The output of the algorithm is an equivalent circuit with a part of gates replicated or

inserted by buffers. Then, the resulting circuit is mapped to nano-hybrid circuit by the

mapping algorithm. As shown in Figure 9, the traditional flow is straightforward from an

NOR gate-based circuit to the solution obtained by the mapping algorithm. However, the

proposed flow adds a timing-driven logic restructuring before the mapping algorithm for

timing improvement and mapping complexity reduction.

(a) (b)

Figure 7. Illustration of fan-out splitting for real mapping: (a) the sub-circuit after logicrestructuring; (b) the corresponding solution.


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

4. Experiments

In order to validate the proposed approach, it is applied on several ISCAS benchmarksunder Linux operating system with Intel Pentium (R) Dual-Core CPU E5400 and 2 GBRAM. The experiments are carried out based on our published mapping algorithm (Xiaet al. 2011). The results from (Xia et al. 2011) and the proposed method are listed forcomparison. We evaluate the proposed method in terms of area, timing delay and

Figure 8. The flowchart of the proposed framework.

Z. Chu et al.680

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

CPU time. The radius of the connectivity domain is set to 9 and p is set to 20 to determine

fan-out threshold DT for experiment comparison.The results are shown in Table 3. The ‘I/O’ column shows the number of PI/POs. ‘#G’

represents the number of gates which are restructured by the proposed method. ‘Area’indicates the total number of cells which are occupied by gates, buffers (splitting buffer

and routing buffer) and replicated gates. ‘Delay’ states the logic depth. ‘CPU time’ exhibits

the running time of the mapping algorithm. In each catalogue, the sub-column ‘Original’

represents the results obtained by the mapping algorithm without logic restructuring, while

Table 3. Performance Comparison Results.

Area Delay CPU time (s)

Circuit I/O #G Original Restructure

Red.

(%) Original Restructure

Red.

(%) Original Restructure

Red.

(%)

s444 24/27 10 187 197 �5.35 11 11 0.00 2.45 1.78 27.35

s510 25/13 17 308 321 �4.22 18 18 0.00 18.31 6.41 64.99

s526 24/27 21 273 294 �7.69 11 11 0.00 7.87 5.16 34.43

s641 54/42 26 313 329 �5.11 20 18 10.00 32.23 38.27 �18.74

s713 54/42 18 345 339 1.74 23 19 17.39 31.88 37.58 �17.88

s820 23/24 23 497 488 1.81 15 12 20.00 105.76 68.66 35.08

s832 23/24 24 526 490 6.84 18 12 33.33 88.00 87.65 0.40

s838 66/33 49 614 677 �10.26 26 24 7.69 126.74 230.33 �81.73

s1196 31/31 37 704 741 �5.26 26 24 7.69 244.22 207.71 14.95

s1238 31/31 39 787 800 �1.65 30 28 6.67 224.91 249.45 �10.91

C432 36/7 21 267 288 �7.87 29 29 0.00 2.85 2.94 �3.16

C499 41/32 49 782 831 �6.27 27 26 3.70 220.03 236.34 �7.41

C880 60/26 43 630 669 �6.19 27 26 3.70 169.00 137.29 18.76

C1355 41/32 57 890 949 �6.63 32 29 9.38 395.35 313.76 20.64

C1908 33/25 78 987 1049 �6.28 37 34 8.11 543.55 460.83 15.22

Average �4.16 8.51 6.13

GateBased Circuit

NOR

MappingAlgorithm

Solution

Timing-DrivenLogic Restructuring

Figure 9. An brief nano-hybrid circuit cell mapping flow.


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

‘Restructure’ shows the results from the proposed timing-driven logic restructuringmethod. The ‘Red.’ indicates the reduced percentage of the proposed method, which iscalculated as

Red: ¼Original�Restructure

Original� 100%

We plot the area, delay and CPU time of the proposed method versus the originalapproach in Figure 10.

4.1. Area

From the results, the proposed scheme consumes an increase of 4.16% in an area onaverage compared with that of the original approach. The area overhead is mainly causedby the logic restructuring which adds extra gates by logic replication and buffer insertion.

However, from Figure 10(a), it can be seen that there are three cases (s713, s820, s832)which consume even less area in the proposed approach than that of the original one. Thisis mainly because the proposed approach can pre-analyse the circuit for high fan-outsplitting such that fewer routing buffers are inserted. This can be explained as follows.

102 103102

103(a) (b)

(c)

Logic restructuring

Ori

gina

l

Area comparison

101101

Logic restructuring

Ori

gina

l

Delay comparison

100 101 102100

101

102

Logic restructuring

Ori

gina

l

CPU time comparison

Figure 10. The comparison results of the proposed method versus the original approach: (a) areacomparison; (b) delay comparison; (c) CPU time comparison.

Z. Chu et al.682

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

If the original approach should insert M pairs of routing buffers (2M inverters) toaccomplish routing, the proposed approach only replicates N gates and inserts P pairs ofsplitting buffers (2P inverters) without routing buffers. When Nþ 2P5 2M, the proposedapproach consumes less area.

4.2. Delay

Considering logic depth is used as metric to measure the delay value in experiment and theoriginal benchmarks are highly optimised in timing by the mapping algorithm proposed inour previous works (Chu et al. 2010; Xia et al. 2011), the improvements in this article onsuch competitive benchmarks are quite substantial. The delay is optimised for the vastmajority of circuits. From Table 3 and Figure 10(b), it can be seen that delay is improvedby 8.51% on average. Particularly, up to 33.3% delay reduction is achieved by circuit s832.Since the circuits were optimised in times delay with original approach, there is no delayimprovement for some circuits such as s444, s510, s526 and C432. However, the CPU timeis reduced apart from C432, which shows the effectiveness of our method.

4.3. CPU time

In terms of CPU time, the proposed approach obtains generally 6.13% CPU time less thanthat of the original approach. This is that the mapping algorithm has strong relationshipwith critical gates. As described in Section 3.3, the critical gate which is constrained byconnectivity domain is relaxed. Hence, the algorithm can somehow achieve fastconvergence. But there are some exceptions. For example, the circuit, s838, runs for amuch longer time than the original approach. The main reason is the mapping complexityis increased for those special circuit architectures. Generally, the proposed approach addsextra edges to connect replicated gates or splitting buffers for maintaining logicequivalence, which increases the mapping complexity, while fan-out splitting reduces themapping complexity by relaxing connectivity constraints. Hence, selecting critical gatescarefully for gate replication or buffer insertion to balance the mapping complexity has asignificant impact on the results.

5. Conclusions

In this article, a timing-driven logic restructuring method is proposed aiming at criticalgates splitting. For each circuit, we first identify and sort all the gate with the number offan-outs, then choose fan-out threshold according to limited area constraint. Then aquadratic judgement function is formulated to estimate whether the replication or bufferinsertion can reduce the mapping complexity. Finally, a method to determine critical gatesfor replication or buffer insertion using the above judgment function and critical pathinformation is proposed. Post-restructuring circuits are then applied to the mappingalgorithm to compare the results with those of the original circuit. Experimental resultsindicate that the proposed approach can improve the circuit timing by 8.51% with a6.13% speed increase in the CPU time and a 4.16% area overhead on averagesimultaneously.

High-density and low-fabrication cost of the nano-hybrid circuit are its advantages.However, limited connection resources increase its mapping complexity and may increase


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

the significant timing delay. Timing-driven logic restructuring of the circuit cansignificantly optimise the circuit performance with reduction in mapping complexity.Besides, the nano-hybrid circuit is prone to defects or faults. Logic replication-basedrestructuring may achieve high system reliability or defect-tolerance. This work will be thebasis of our forthcoming goal to work on defect tolerant mapping and optimisation.

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant No.61131001, Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1090622,Doctoral Fund of Ministry of Education of China under Grant No. 20113305110001, GraduateStudent Scientific Research Innovation Project of Zhejiang Province, the Outstanding(Postgraduate) Dissertation Growth Foundation of Ningbo University (No. PY20110001) andK.C.Wong Magna Fund in Ningbo University.

References

Borghetti, J., Snider, G.S., Kuekes, P.J., Yang, J.J., Stewart, D.R., and Williams, R.S. (2010),

‘Memristive Switches Enable Stateful Logic Operations via Material Implication’, Nature, 464,

873–876.Chen, C. (2007), ‘Reliability-driven Gate Replication for Nanometer-scale Digital Logic’, IEEE

Transactions on Nanotechnology, 6, 303–308.Chen, G., Song, X., and Hu, P. (2009), ‘A Theoretical Investigation on CMOL FPGA Cell

Assignment Problem’, IEEE Transactions on Nanotechnology, 8, 322–329.Chu, Z., Xia, Y., Hung, W.N.N., Wang, L., and Song, X. (2010), ‘A Memetic Approach for

Nanoscale Hybrid Circuit Cell Mapping’, in 13th Euromicro Conference on Digital System

Design, Lille, France, pp. 681–688.Hrkic, M., Lillis, J., and Beraudo, G. (2006), ‘An Approach to Placement-coupled Logic

Replication’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and

Systems, 25, 2539–2551.Hung, W.N.N., Gao, C., Song, X., and Hammerstrom, D. (2008), ‘Defect Tolerant CMOL Cell

Assignment via Satisfiability’, IEEE Sensors Journal, 8, 823–830.Kim, H., and Lillis, J. (2008), ‘A Layout-level Logic Restructuring Framework for LUT-based

FPGAs’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27,

2120–2132.Likharev, K.K., and Strukov, D.B. (2005), CMOL: Devices, Circuits, and Architectures. Berlin:

Springer.

Schafer, B., and Kim, T. (2009), ‘Autonomous Temperature Control Technique in VLSI Circuits

Through Logic Replication’, IET Computers & Digital Techniques, 3, 62–71.Snider, G.S., and Williams, R.S. (2007), ‘Nano/CMOS Architectures Using a Field-programmable

Nanowire Interconnect’, Nanotechnology, 18, 035204.Strukov, D.B., and Likharev, K.K. (2005), ‘CMOL FPGA: A Reconfigurable Architecture for

Hybrid Digital Circuits with Two-terminal Nanodevices’, Nanotechnology, 16, 888–900.Strukov, D., and Mishchenko, A. (2010), ‘Monolithically Stackable Hybrid FPGA’, in Proceeding of

the Design, Automation and Test in Europe, Dresden, Germany, pp. 661–666.Strukov, D.B., Snider, G.S., Stewart, D.R., and Williams, R.S. (2008), ‘The Missing Memristor

Found’, Nature, 453, 80–83.

Sze, C., Alpert, C., Hu, J., and Shi, W. (2007), ‘Path-based Buffer Insertion’, IEEE Transactions on

Computer-Aided Design of Integrated Circuits and Systems, 26, 1346–1355.

Z. Chu et al.684

Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

Xia, Y., Chu, Z., Hung, W., Wang, L., and Song, X. (2011), ‘An Integrated Optimization Approachfor Nanohybrid Circuit Cell Mapping’, IEEE Transactions on Nanotechnology, 10, 1275–1284.

Yan, H., Choe, H.S., Nam, S., Hu, Y., Das, S., Klemic, J.F., Ellenbogen, J.C., and Lieber, C.M.(2011), ‘Programmable Nanowire Circuits for Nanoprocessors’, Nature, 470, 240–244.

Zhou, H., Wong, D., Liu, I.M., and Aziz, A. (2000), ‘Simultaneous Routing and Buffer InsertionWith Restrictions on Buffer Locations’, IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, 19, 819–824.


Dow

nloa

ded

by [

Sout

hern

Tai

wan

Uni

vers

ity o

f Sc

ienc

e an

d T

echn

olog

y] a

t 01:

01 2

4 N

ovem

ber

2014

Documents

Timing-driven logic restructuring for nano-hybrid circuits