Upload
lunyao
View
212
Download
0
Embed Size (px)
Citation preview
This article was downloaded by: [Southern Taiwan University of Science and Technology]On: 24 November 2014, At: 01:01Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
International Journal of ElectronicsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/tetn20
Timing-driven logic restructuring fornano-hybrid circuitsZhufei Chu a , Yinshui Xia a , William N. N. Hung b , Xiaoyu Song c
& Lunyao Wang aa School of Information Science and Engineering , NingboUniversity , Ningbo 315211 , Chinab Synopsys, Inc. , Mountain View , CA , USAc Department of Electrical and Computer Engineering , PortlandState University , Portland , OR , USAPublished online: 21 Sep 2012.
To cite this article: Zhufei Chu , Yinshui Xia , William N. N. Hung , Xiaoyu Song & Lunyao Wang(2013) Timing-driven logic restructuring for nano-hybrid circuits, International Journal ofElectronics, 100:5, 669-685, DOI: 10.1080/00207217.2012.720945
To link to this article: http://dx.doi.org/10.1080/00207217.2012.720945
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.
This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &
Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
Timing-driven logic restructuring for nano-hybrid circuits
Zhufei Chua, Yinshui Xiaa*, William N. N. Hungb, Xiaoyu Songc
and Lunyao Wanga
aSchool of Information Science and Engineering, Ningbo University, Ningbo 315211, China;bSynopsys, Inc., Mountain View, CA, USA; cDepartment of Electrical and Computer
Engineering, Portland State University, Portland, OR, USA
(Received 21 October 2011; final version received 15 July 2012)
As the feature size of the integrated circuits (ICs) scales down, the future of nano-hybrid circuit looks bright in extending Moore’s Law. However, mapping acircuit to a nano-fabric structure is vexing due to connectivity constraints.A mainstream methodology is that a circuit is transformed into a nano-fabricpreferred structure by buffer insertion to high fan-out gates. However, it mayresult in timing degradation. Logic replication is a traditional way to split highfan-out gates in logic synthesis but may not be suitable for high fan-out gates withhigh fan-ins. In this article, a timing-driven logic restructuring framework at thegate level is proposed. The proposed framework identifies the high fan-out gatesfrom a given gate netlist according to the fan-out threshold, following by therestructuring of high fan-out gates through the application of logic replicationand buffer insertion. To improve circuit timing from a global perspective, latentcritical edges are identified to avoid entrapping critical paths during therestructuring. Experimental results on ISCAS benchmarks indicate that 8.51%timing improvement and 6.13% CPU time reduction can be obtained traded with4.16% area increase on an average.
Keywords: nano-hybrid circuit; logic restructuring; timing; optimisation
1. Introduction
As the feature size of the integrated circuits (ICs) scales down, the future of nano-hybridcircuit looks bright in extending Moore’s Law. A good many nano-hybrid architecturesconsist of nanowires (Yan et al. 2011), programmable molecule ‘memristive’ devices(Strukov, Snider, Stewart, and Williams 2008; Borghetti et al. 2010) and CMOS logic. Theinputs and outputs of a nanoscale circuit have to interconnect with sub-micron CMOSultimately. The CMOL (Cmos/nanowire/MOLecular hybrid) structure proposed byLikharev et al. is one of them and accomplishes this by connecting two external cones ofdifferent heights and sharp tips to two levels of the crossbar (Likharev and Strukov 2005;Strukov and Likharev 2005; Strukov and Mishchenko 2010). The field-programmablenanowire interconnect (FPNI) architecture proposed by Snider et al. generalises theCMOL with larger pads and sparser crossbar for easy fabrication (Snider and Williams2007). However, the nanowires of both CMOL and FPNI are periodic break, which causesa physical routing constraint. By programming the diode-like molecule switches, we can
*Corresponding author. Email: [email protected]
International Journal of Electronics, 2013
Vol. 100, No. 5, 6 –
© 2013 Taylor & Francis
669 85, http://dx.doi.org/10.1080/00207217.2012.720945
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
only connect a gate to limited neighbor gates directly, which forms the so-calledconnectivity domain.
There are some methods in the public domain to solve CMOL cell mapping problemresulted from connectivity domain. Hung, Gao, Song, and Hammerstrom (2008) encodedthe CMOL cell mapping problem as a satisfiability problem, that is, the Booleanconstraints are satisfiable if and only if there exists a solution to map a circuit to CMOLcells. However, there are two shortcomings for this method: first, it does not seem scalablewith the circuit size; second, when the number of gate fan-outs is larger than the totalnumber of cells in the connectivity domain, the circuit cannot be mapped. Chen, Song, andHu (2009) revealed that some circuits can be transformed to an equivalence circuit bybuffer insertion and are placeable under reasonable connectivity domain size. However,the proposed method may cause severe timing degradation. Our previous works (Chu, Xia,Hung, Wang, and Song 2010; Xia, Chu, Hung, Wang, and Song 2011) revealed that highfan-out gates result in connectivity constraints and further experimental results show thatthe timing of the mapping solution and the CPU time of the algorithm have a closerelationship with those high fan-out gates. We define those high fan-out gates as criticalgates. These findings motivate us to address the cell mapping problem of nano-hybridcircuits through circuit restructuring to solve routability and avoid timing degradation.There are two typical methods for circuit restructuring: logic replication and bufferinsertion.
Logic replication is a common restructuring technique for improving the circuitperformance by replicating one or more logic cells while maintaining logic functionequivalence. It is widely used for thermal reduction in very large-scale integrated (VLSI)circuits (Schafer and Kim 2009) and logic synthesis (Hrkic, Lillis, and Beraudo 2006; Kimand Lillis 2008). Generally, it can be coupled with placement or mapping procedure byrestructuring logic circuit to optimise some parameters. Though logic replication wasinvestigated for increasing the reliability of nanoscale digital logic circuit (Chen 2007),there is little research on improving the nano-hybrid circuit timing issue.
Buffer insertion is used to optimise delay or noise in VLSI circuits (Zhou, Wong, Liu,and Aziz 2000; Sze, Alpert, Hu, and Shi 2007). Given the basic CMOS logic gate, inverter,in nano-hybrid fabric, a buffer can be easily implemented by connecting a pair of inverters.Buffer insertion can be used in two aspects:
(1) Splitting buffer: before mapping the netlist to nano-hybrid fabric, buffer insertionis used for high fan-out gate splitting at the gate level.
(2) Routing buffer: after the mapping algorithm terminates, buffer insertion is used toextend the connection for those gates in the netlist which violate the connectivityconstraints, so that effect connectivity domain is enlarged.
Circuit restructuring is that a circuit is converted into a logic equivalent one, which canbe mapped to a fabric preferred structure under connectivity domain constraints. It isimportant because a good logic restructuring can reduce mapping complexity and circuittiming penalty with less CPU time. In this work, a timing-driven logic restructuringframework is proposed for nano-hybrid circuit mapping. We identify and sort all the gatesby the fan-out number of gates in a circuit, then determine critical gates according to thelimited area constraint. For critical gates, a quadratic equation is formulated with regardto the gate’s fan-in/out number to estimate the mapping complexity before and afterrestructuring. Then, critical paths and timing slack are found by the traditional statictiming analysis. To obtain no timing penalty after restructuring, latent critical edge is
Z. Chu et al.670
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
introduced and defined as that not on critical path but may entrap in critical path afterrestructuring. Finally, critical gate is restructured through the application of logicreplication and buffer insertion under the timing constraints. The resulting circuit netlist isthen applied to the mapping algorithm to evaluate the results in comparison to the originalcircuit mapping (Xia et al. 2011).
Our main contributions in this article are as follows:
. Developing a determination method of critical gates from a circuit netlist, which isbased on the fan-out number of gates and the size of the nano-hybrid fabric area.
. Proposing a combination of logic replication and buffer insertion to split criticalgates synergistically based on a quadratic mapping complexity estimationequation and critical path information without timing penalty.
The rest of this article is organised as follows. Preliminaries are described in Section 2.Then the main logic restructuring framework is demonstrated in Section 3. Experimentalresults are shown in Section 4 and conclusions are presented in the last section.
2. Preliminaries
Before we describe the details of the proposed framework, preliminaries about nano-hybrid circuit cell mapping and mathematic model are introduced, as they are the basis ofthe entire flow.
2.1. Nano-hybrid circuit cell mapping
Nano-hybrid circuits such as CMOL and FPNI can be abstracted to a two-dimensionalarray, with sizes x� y, the example of a 7� 7 cells array as shown in Figure 1. A circle withradius r is formed with a centre at the core of Cell A. The cells completely surrounded bythe circle form the connectivity domain of Cell A. The radius of the circle is also called the
Aa
x
y
0 1 2 3 4 5 6
0
1
2
3
4
5
r
6
Figure 1. The connectivity domain.
International Journal of Electronics 671
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
radius of the connectivity domain. In the following, we use �Ar to denote the connectivity
domain of Cell A with the radius r.A circuit can be modelled as a directed acyclic graph G¼ (I,V,E,O), which is
composed of primary input (PI) set I, gate set V, edge set E and primary output (PO) set O.Take the circuit shown in Figure 2 for example, I¼ {i1, i2, . . . , i7}, V¼ {g1, g2, . . . , g12},E¼ {e(i1, g4), e(i2, g1), . . . , e(g12, o3)} and O¼ {o1, o2, o3}. Each gate gi2V has its fan-in setFin(gi) and fan-out set Fout(gi). For instance, Fin(g1)¼ {i2, i3} and Fout(g1)¼ {g4, g6}.Additionally, the number of elements in the fan-in (fan-out) set of gi2V is called thenumber of gate fan-ins (fan-outs), which are denoted as Din(gi) and Dout(gi). For instance,Din(g1)¼ 2 and Dout(g1)¼ 2.
Generally, given a circuit G and a nano-hybrid cell array �, the nano-hybrid circuit cellmapping is to find a legal mapping P: G!� from the topological to the physical withinthe connectivity domain constraints. For an edge e(g, g0)2E where g, g0 are adjacent in thenetlist, the mapping location of each gate should be within each other’s connectivitydomain such that g2�g0
r and g0 2�gr . Figure 3 shows a simple circuit composed of two NOR
gates with three PIs and one PO, which is mapped to a 4� 3 nano-hybrid cell array.Note that each gate, PI or PO must be mapped to only and just only one nano-hybrid cell.
i1
i2
i3
i4
i5
i6
i7
o2
o3
g1
g2
g3
g4
g5
g6
g7
g8 g9 g10 g11
g12
level 0 level 1 level 2 level 3 level 4 level 5 level 6 level 7
2
1
6
7
2
3
3
4 5 6 7
7
critical edgelatent critical edge
o1
Figure 2. Logic structure of NOR gate-based circuit s27.
g1
g2
i1 i2 i3
o1
(a)
g1 g2
i1 i2 i3
o1
(b)
Figure 3. An example of nano-hybrid circuit cell mapping: (a) a simple sub-circuit; (b) the mappingsolution in a nano-hybrid cell array.
Z. Chu et al.672
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
Meanwhile, the mapping solution should satisfy the connectivity constraints, grouping bydot lines as shown in Figure 3(b).
2.2. Timing estimation
The following timing model is based on topological estimation without considering thefalse path and the interconnect delay. However, the model is adequate for nano-hybridcircuit timing estimation. Apart from the model adopted, our framework can work withany timing estimation engine used in both academic and commercial tools.
The NOR gate-based circuit s27 shown in Figure 2 is taken for illustration of definitionsand theorems. Generally, the timing estimation method is similar to traditional statictiming analysis in VLSI circuits.
Definition 1: For gates gi2V, the maximum number of gates along the path from the PIto the output of gi is defined as the gate’s logic level, denoted as L(gi).
From Definition 1, L(PI)¼ 0. Take g4 in Figure 2 for instance, there are three pathswhich start at the PI and end at g4, such as (i1! g4), (i2! g1! g4) and (i3! g1! g4).Thus, the maximum number of gates gi2G along the path from the PI to the output ofgate g4 is 2 and hence L(g4)¼ 2.
The method to calculate the gate’s logic level can be accomplished by the topologicalsorting algorithm. Let Din(g)¼m and Fin(g)¼ {g0, g1, . . . , gm�1} then
Lð gÞ ¼Max0�i5mðLð giÞ þ 1Þ ð1Þ
As shown in Figure 2, the gate logic level is calculated and shown by vertical dash lines,e.g. L(g4)¼L(g5)¼ 2.
Definition 2: Circuit delay DC is the maximum gate logic level along a path from PIsto POs.
Regardless of interconnect delay, the maximum number of logic gates along a pathfrom PIs to POs is the timing delay of the circuit, also called the logic depth. Hence, thecircuit delay can be obtained by computing L(O).
DC ¼Maxok 2OðLðokÞÞ ð2Þ
Definition 3: Edge’s slack S(e(g, g0)) is the number of gates that may be added to edgee(g, g0) before the paths what the edge is located become critical paths.
To compute each edge’s slack, we first compute the required gate’s logic level RL(g) ofeach gate in the circuit. RL(g) is similar to conventional required arrived time in the statictiming analysis. It is the required gate’s logic level that will not cause timing degradation.Considering the circuit delay obtained, we set the required gate’s logic level of all POsas DC and propagate backwards from POs to PIs. Let Dout(g)¼m and Fout(g)¼{g0, g1, . . . , gm�1}, then
RLð gÞ ¼Min0�i5mðRLð giÞ � 1Þ ð3Þ
The required gate’s logic level is labelled in the upper right side of the logic gate symbol inFigure 2, e.g. RL(g4)¼ 7, RL(g5)¼ 2. Then, the edge’s slack S(e(g, g0)) is defined as
Sðeð g, g0ÞÞ ¼ RLð g0Þ � Lð gÞ ð4Þ
International Journal of Electronics 673
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
Definition 4: Edge e(g, g0) is a critical edge if it is on the critical path.
Definition 5: Edge e(g, g0) is a latent critical edge if it is not on a critical path, but willbecome critical edge if the buffers are inserted on this edge during restructuring.
Theorem 2.1: For a specific circuit G, edge e(g, g0)2E is a critical edge if S(e(g, g0))¼ 1.
Proof: For edge e(g, g0), Lð g0Þ ¼Maxgk2Finð g0ÞðLð gkÞ þ 1Þ according to Equation (1),hence
Lð g0Þ � Lð gÞ � 1 ð5Þ
If S(e(g, g0))¼RL(g0)�L(g)¼ 1, then L(g)¼RL(g0)� 1, substitute this into Equation (5),we obtain L(g0)�RL(g0)þ 1� 1, hence
Lð g0Þ � RLð g0Þ ð6Þ
Additionally, from the definitions, the edge’s slack is non-negative and L(g0)4L(g). Weobtain S(e(g, g0))¼RL(g0)�L(g)4RL(g0)�L(g0)� 0, hence
Lð g0Þ � RLð g0Þ ð7Þ
Combining Equations (6) and (7), we have L(g0)¼RL(g0). Hence, edge e(g, g0)2E is on acritical path. The critical edges are shown as bold dashed lines in Figure 2. œ
Theorem 2.2: For a specific circuit G, edge e(g, g0)2E is a latent critical edge ifS(e(g, g0))¼ 2.
Proof: In nano-hybrid circuit cell mapping, let g 6 2 �g0
r and g0 6 2 �gr for edge e(g, g
0). Henceg and g0 cannot be connected directly due to connectivity domain constraints. However, aninverter pair inv1 and inv2 can be inserted as an intermediate buffer to strengthen theroutability. As shown in Figure 4, the logic function is equivalent to the original one for twoinverters act as relay gates. If inv1 2�g
r , inv2 2�inv1r and g0 2�inv2
r , then the mapping issuccessful. Before the buffer insertion, L(g0)�L(g)� 1, after the buffer insertion,L(g0)�L(g)� 3. If S(e(g, g0))¼RL(g0)�L(g)¼ 2, then L(g0)�RL(g)þ 2� 3. HenceL(g0)�RL(g0)þ 1. This is contradicted with L(g0)�RL(g0). Therefore it will cause timingdegradation. We call this edge as the latent critical edge, which is not a critical edge butbecome critical after restructuring. They are shown as a bold solid line in Figure 2. œ
We tested the number of critical or latent critical edges on several ISCAS benchmarks.The percentage of (latent) critical edges is shown in Table 1. It can be seen that a
g
g’
inv1
inv2
(a)
g
inv1
inv2g’
(b)
Figure 4. Buffer insertion and the resulting mapping solution.
Z. Chu et al.674
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
significant percentage of edges is (latent) critical. The average percentage of critical edges is43.4% while that of latent critical edges is 15.4%. In other words, up to 58.8% of edgesshould satisfy the connectivity constraints for avoiding timing delay degradation.Consequently, the restructuring at the gate level for the nano-hybrid circuit can notonly reduce the mapping complexity but also avoid timing degradation.
3. Proposed timing-driven logic restructuring
3.1. Inspiration
The main issue for nano-hybrid circuit cell mapping is raised from high fan-out gates andconnectivity domain constraints. As pointed out in Chen et al. (2009), the high fan-outgates in CMOL cell mapping can be resolved by inserting fan-out splitting buffers andmaintains the logic equivalence of the circuit. Figure 5(a) shows a circuit topology. It hassix gates and six edges. Obviously, g2 is the high fan-out gate and we have Fin(g2)¼ {g1},
Table 1. Percentage of (latent) critical edges.
Circuit Latent critical (%) Critical (%) Total (%)
s444 34.9 15.7 50.6s510 44.9 6.90 51.8s526 35.7 17.4 53.1s641 35.1 8.10 43.2s713 32.9 7.30 40.2s820 43.2 22.9 66.1s832 38.7 26.5 65.2s838 45.1 9.30 54.4s1196 43.4 11.7 55.1s1238 42.1 9.10 51.2C432 46.2 8.60 54.8C499 61.8 25.1 86.9C880 44.4 13.4 57.8C1355 55.1 23.2 78.3C1908 48.2 26.3 74.5
Average(%) 43.4 15.4 58.8
g1
g2
g3 g4 g5 g6
(a)
g1
g2
g3 g4
g5 g6
i1
i2
(b)
g1
g2
g3 g4 g5 g6
g'2
(c)
Figure 5. Illustration of critical gate restructuring: (a) the original sub-circuit; (b) by bufferinsertion; (c) by logic replication.
International Journal of Electronics 675
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
Fout(g2)¼ {g3, g4, g5, g6} and Dout(g2)¼ 4. Let path g1! g2! g6 is a critical path. The
merits and demerits after employing buffer insertion and logic replication to split high fan-
out gates can be compared as follows.
(1) Buffer insertion: with buffer insertion, we take away g5, g6 and insert two NOT gates
i1, i2 (inverters) to form a buffer as (Chen et al. 2009) shown in Figure 5(b). The
logic function is still equivalent while Dout(g2) is reduced to 3. However, extra two
gates and two edges are added against the original circuit. Furthermore, the logic
depth of the critical path is increased from 3 to 5.(2) Logic replication: replicating g2, we obtain g2
0 and split fan-outs of g2 as shown in
Figure 5(c). Consequently, extra one gate and one edge are added. Then, Dout(g2) is
reduced to 2 while the logic depth of the critical path is the same as Figure 5(a).
Comparing the two from the specific case, the logic replication has advantages against
buffer insertion in terms of area overhead and timing degradation.However, if the gate has high fan-outs with high fan-ins, logic replication may consume
significant extra interconnect resources for maintaining logic function equivalence.
Take Figure 6(a) as an example. Considering g2, we have Fin(g2)¼ {g1, g7, g8, g9, g10} and
Fout(g2)¼ {g3, g4, g5, g6}. Extra five edges as g20 fan-ins are added if logic replication is
applied for fan-out splitting as shown in Figure 6(c), which complicates the mapping
complexity since each edge connection should satisfy the connectivity domain constrains.
However, as shown in Figure 6(b), if path g1! g2! g6 is not a latent critical path, the buffer
insertion method only consumes two additional edges and hence no timing issue exists.Therefore, both buffer insertion and logic replication have their merits and demerits
as shown in Table 2. Inspired from this, we propose to combine the two to optimise
g1
g 2
g3 g4 g5 g6
g9g8g7 g10
(a)
g 2
g3 g4
g5 g6
i1
i2
g1 g9g8g7 g10
(b)
g1
g2
g3 g4 g5 g6
g9g8g7 g10
g'
(c)
2
Figure 6. Illustration of critical gate restructuring with high fan-ins: (a) the original sub-circuit;(b) by buffer insertion; (c) by logic replication.
Table 2. Comparison results of logic restructuring.
Logic restructuringmethods Merits Demerits
Buffer insertion Do not take gate fan-innumber into account
More area overhead, may causetiming degradation
Logic replication Less area overhead, no timingdegradation
Should take gate fan-in numberinto account
Z. Chu et al.676
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
circuit restructuring. In the next sub-sections, a mathematic model and case analysis
details will be given to show how the two methods can be properly utilised in order to
improve the circuit performance.
3.2. Determination of critical gates
Determining critical gates is the key for logic restructuring optimisation in terms of area
and timing. We will first propose the judgment condition of critical gates by discussing the
logic replication and buffer insertion in logic restructuring and then present when it will be
beneficial to use logic replication or buffer insertion for a specific critical gate.
Logic replication: To maintain the logic equivalence of a circuit, the replicate gate g0
should have the same fan-ins as gate g. Let gate g2V, Fin(g)¼ {x0, x1, . . . , xm�1} and
Fout(g)¼ {y0, y1, . . . , yn�1}. The replicate gate g0 has to connect to the same m fan-in edges
and part of fan-out edges according to fan-out splitting. The problem is that given m and n
at what point in time it is worthy of replication in terms of mapping complexity. Regarding
this issue, a cost function is required. Given the mapping complexity being directly
proportional to the number of edges, the exponential judgment function is used for the
consideration of larger fan-in/out number with larger penalty. Hence,
f ð gÞ ¼X
all fan-insDinð gÞ
sþ
X
all fan-insDoutð gÞ
sð8Þ
where s is an integer greater than 0. Initially, fan-in cost is ms and fan-out cost is ns. Hence,
before logic replication, f(g)1¼msþ ns. After logic replication, the number of fan-in is
doubled. Hence the fan-in cost becomes 2ms. For an n fan-outs, assume it is split into two
sets which include x and (n� x) fan-outs, respectively. Then the fan-out cost is
xsþ (n�x)s. Hence f(g)2¼ 2msþ xsþ (n� x)s. When x¼ n/2, f(g)2 reaches the minimum
value 2msþ ns/2s�1. Intuitively, we manage to minimise the complexity cost. That is,
Dcost ¼ f ð gÞ2 � f ð gÞ1 ¼ ms �2s�1 � 1
2s�1ns 5 0 ð9Þ
Thus the first condition is m5 h(s)n, where hðsÞ ¼ffiffi½p
s�2s�1�12s�1
. In addition, the lower the
Dcost, the more the complexity reduction is obtained. Given n and s,m5 n, then smaller m
reaches lower Dcost. From the definitions of h(s), we can obtain h(s)js¼1¼ 0, h(s)js¼2� 0.7,
h(s)js�34 0.9. As s increases, h(s) gradually approaches to 1. Since h(s)js¼1¼ 0 will lead to
m5 0, this is impractical. Given n, the smaller h(s) will lead to smaller m. In addition, from
the above discussion, the smaller m will reduce the complexity. Therefore, we chose s¼ 2.
The exponential estimation function is then quadratic and hence
m5ffiffiffi2p
n=2 ð10Þ
However, if n¼ 2, m ¼ 15ffiffiffi2p
satisfy Equation (10). This will result in a good majority of
gates in the circuit being replicated such that the circuit size is rapidly increased. Hence, the
mapping complexity may also be increased. In addition, high fan-out number varies for
circuits to circuits. In the real design flow, the shape of the targeted nano-hybrid fabric is
determined before mapping. Thus, the logic restructuring focuses on timing improvement
without considering limited area constraint is impractical. Under this area constraint, a
International Journal of Electronics 677
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
fraction of gates will be restructured to make a reasonable circuit size. Therefore, a fan-outthreshold DT is introduced as the judgement condition of critical gates.
For each circuit, the threshold DT is determined according to the fan-out number of thegates and the size of the nano-hybrid fabric area. First, we identify and sort the fan-outnumber of gates in an array A[N] with the descending order, where N is the number ofgates in the circuit. Then, the area of nano-hybrid fabric (number of total cells, such asx� y) minus N is the area which can be mapped by the restructured gates. Hence, at mostNr¼x� y�N gates may be restructured by logic replication, DT can be determined by thearray A[N], that is DT¼A[Nr� 1]. The critical gates can be determined if its fan-outnumber larger than DT. However, considering the routing buffers may be inserted aftermapping, we cannot use all those Nr blank cells for logic restructuring. As a matter of fact,DT¼A[(Nr� 1)� p%], where p is an integer among (0, 100).
Buffer insertion: Alternatively, if m �ffiffiffi2p
n=2, logic replication may make mapping evenmore complex. For example, let n¼ 4 and m¼ 5. If logic replication is applied, since m¼ 5and then five extra edges should be added to maintain the same logic function, which alsoincreases the mapping complexity. On the contrary, buffer insertion can be applied on anon-latent critical path for high fan-out splitting without potential timing degradationwith fewer extra edges.
Hence, the combination of both the logic replication and buffer insertion can efficientlysplit the high fan-out without timing degradation.
In summary, for gates which are potential to be restructured, the primary condition isthat the fan-out number of those gates must be larger than the fan-out threshold DT. Thegate is then replicated only if the following conditions are satisfied.
(1) The mapping complexity which is estimated by the quadratic equation is reduced(Equation (10));
(2) The gate has not been restructured in the previous restructuring process.
Since logic replication causes no timing degradation, the current edge can either be the(latent) critical edge or non- (latent) critical edge. Alternatively, the gate is inserted by abuffer within more strict conditions.
(1) The mapping complexity is increased if the logic replication method is used (violateEquation (10));
(2) The current edge must be a non-latent critical edge;(3) The gate has not been restructured in the previous restructuring process.
3.3. Fanout splitting
For all the gates g2G, a 2-tuple Flag(g)¼ (Rf (g),Bf (g)) is defined to form a flaginformation whether the fan-out of gate g is split by the logic replication or bufferinsertion. Initially, Rf (g) and Bf (g) are all set to FALSE. If the gate g is replicated, thenRf (g)¼TRUE. Similarly, Bf (g)¼TRUE if a buffer is inserted to connect gate g. Note thatfor the same gate g,Rf (g) and Bf (g) cannot be assigned TRUE simultaneously.
During the mapping, the solutions are evaluated according to the flag informationFlag(g). Take Figure 7 as an example, when evaluate an edge e(g2, g3), if Flag(g2)¼{TRUE,FALSE}, then we know the gate g2 has been replicated. Let the replicated gate beg20 which has the same logic function with g2, g3 can be mapped within either connectivity
Z. Chu et al.678
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
domains �g2r or �
g02r . Consequently, the fan-out set Fout(g2) in the original circuit can be
mapped within a joint connectivity domain �r ¼ �g2r \�
g02r . The following mapping results
are all legal.
fg3g 2�g2r and fg4, g5, g6g 2�
g02r
fg3, g4g 2�g2r and fg5, g6g 2�
g02r
fg3, g4, g5g 2�g2r and fg6g 2�
g02r
fg3, g4, g5, g6g 2�g2r and �
g02
r ¼ �
� � � � � � � � � � � � � � � � � � � � � � � �
Hence, in the shown case, there are
N ¼ C04 C
44 þ C1
4 C34 þ C2
4 C24 þ C3
4 C14 þ C4
4 C04 ¼ 70
legal results totally. In other words, the critical gate which is constrained by connectivity
domain is relaxed. The fan-out splitting method for buffer insertion is similar.From the example, either the connectivity domain �g2
r or �g02
r can be an empty set Ø.
Note �g2r ¼ � just means no fan-out gate of g2 is mapped to �g2
r . Hence a check-and-
remove algorithm is developed to check the mapping solution. For the gate or replicated
gate, it is determined as a redundant gate and hence is removed from the solution if the
connectivity domain is Ø during checking.
3.4. Algorithm overview
As an overview of the proposed method, we summarise all the major algorithmic elements
in the flowchart shown in Figure 8. To maintain the logic function correctly during
restructuring, we proceed with the logic replication and buffer insertion step by step.The algorithm is generally iterative until all edges in the given gate netlist is traversed.
The output of the algorithm is an equivalent circuit with a part of gates replicated or
inserted by buffers. Then, the resulting circuit is mapped to nano-hybrid circuit by the
mapping algorithm. As shown in Figure 9, the traditional flow is straightforward from an
NOR gate-based circuit to the solution obtained by the mapping algorithm. However, the
proposed flow adds a timing-driven logic restructuring before the mapping algorithm for
timing improvement and mapping complexity reduction.
(a) (b)
Figure 7. Illustration of fan-out splitting for real mapping: (a) the sub-circuit after logicrestructuring; (b) the corresponding solution.
International Journal of Electronics 679
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
4. Experiments
In order to validate the proposed approach, it is applied on several ISCAS benchmarksunder Linux operating system with Intel Pentium (R) Dual-Core CPU E5400 and 2 GBRAM. The experiments are carried out based on our published mapping algorithm (Xiaet al. 2011). The results from (Xia et al. 2011) and the proposed method are listed forcomparison. We evaluate the proposed method in terms of area, timing delay and
Figure 8. The flowchart of the proposed framework.
Z. Chu et al.680
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
CPU time. The radius of the connectivity domain is set to 9 and p is set to 20 to determine
fan-out threshold DT for experiment comparison.The results are shown in Table 3. The ‘I/O’ column shows the number of PI/POs. ‘#G’
represents the number of gates which are restructured by the proposed method. ‘Area’indicates the total number of cells which are occupied by gates, buffers (splitting buffer
and routing buffer) and replicated gates. ‘Delay’ states the logic depth. ‘CPU time’ exhibits
the running time of the mapping algorithm. In each catalogue, the sub-column ‘Original’
represents the results obtained by the mapping algorithm without logic restructuring, while
Table 3. Performance Comparison Results.
Area Delay CPU time (s)
Circuit I/O #G Original Restructure
Red.
(%) Original Restructure
Red.
(%) Original Restructure
Red.
(%)
s444 24/27 10 187 197 �5.35 11 11 0.00 2.45 1.78 27.35
s510 25/13 17 308 321 �4.22 18 18 0.00 18.31 6.41 64.99
s526 24/27 21 273 294 �7.69 11 11 0.00 7.87 5.16 34.43
s641 54/42 26 313 329 �5.11 20 18 10.00 32.23 38.27 �18.74
s713 54/42 18 345 339 1.74 23 19 17.39 31.88 37.58 �17.88
s820 23/24 23 497 488 1.81 15 12 20.00 105.76 68.66 35.08
s832 23/24 24 526 490 6.84 18 12 33.33 88.00 87.65 0.40
s838 66/33 49 614 677 �10.26 26 24 7.69 126.74 230.33 �81.73
s1196 31/31 37 704 741 �5.26 26 24 7.69 244.22 207.71 14.95
s1238 31/31 39 787 800 �1.65 30 28 6.67 224.91 249.45 �10.91
C432 36/7 21 267 288 �7.87 29 29 0.00 2.85 2.94 �3.16
C499 41/32 49 782 831 �6.27 27 26 3.70 220.03 236.34 �7.41
C880 60/26 43 630 669 �6.19 27 26 3.70 169.00 137.29 18.76
C1355 41/32 57 890 949 �6.63 32 29 9.38 395.35 313.76 20.64
C1908 33/25 78 987 1049 �6.28 37 34 8.11 543.55 460.83 15.22
Average �4.16 8.51 6.13
GateBased Circuit
NOR
MappingAlgorithm
Solution
Timing-DrivenLogic Restructuring
Figure 9. An brief nano-hybrid circuit cell mapping flow.
International Journal of Electronics 681
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
‘Restructure’ shows the results from the proposed timing-driven logic restructuringmethod. The ‘Red.’ indicates the reduced percentage of the proposed method, which iscalculated as
Red: ¼Original�Restructure
Original� 100%
We plot the area, delay and CPU time of the proposed method versus the originalapproach in Figure 10.
4.1. Area
From the results, the proposed scheme consumes an increase of 4.16% in an area onaverage compared with that of the original approach. The area overhead is mainly causedby the logic restructuring which adds extra gates by logic replication and buffer insertion.
However, from Figure 10(a), it can be seen that there are three cases (s713, s820, s832)which consume even less area in the proposed approach than that of the original one. Thisis mainly because the proposed approach can pre-analyse the circuit for high fan-outsplitting such that fewer routing buffers are inserted. This can be explained as follows.
102 103102
103(a) (b)
(c)
Logic restructuring
Ori
gina
l
Area comparison
101101
Logic restructuring
Ori
gina
l
Delay comparison
100 101 102100
101
102
Logic restructuring
Ori
gina
l
CPU time comparison
Figure 10. The comparison results of the proposed method versus the original approach: (a) areacomparison; (b) delay comparison; (c) CPU time comparison.
Z. Chu et al.682
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
If the original approach should insert M pairs of routing buffers (2M inverters) toaccomplish routing, the proposed approach only replicates N gates and inserts P pairs ofsplitting buffers (2P inverters) without routing buffers. When Nþ 2P5 2M, the proposedapproach consumes less area.
4.2. Delay
Considering logic depth is used as metric to measure the delay value in experiment and theoriginal benchmarks are highly optimised in timing by the mapping algorithm proposed inour previous works (Chu et al. 2010; Xia et al. 2011), the improvements in this article onsuch competitive benchmarks are quite substantial. The delay is optimised for the vastmajority of circuits. From Table 3 and Figure 10(b), it can be seen that delay is improvedby 8.51% on average. Particularly, up to 33.3% delay reduction is achieved by circuit s832.Since the circuits were optimised in times delay with original approach, there is no delayimprovement for some circuits such as s444, s510, s526 and C432. However, the CPU timeis reduced apart from C432, which shows the effectiveness of our method.
4.3. CPU time
In terms of CPU time, the proposed approach obtains generally 6.13% CPU time less thanthat of the original approach. This is that the mapping algorithm has strong relationshipwith critical gates. As described in Section 3.3, the critical gate which is constrained byconnectivity domain is relaxed. Hence, the algorithm can somehow achieve fastconvergence. But there are some exceptions. For example, the circuit, s838, runs for amuch longer time than the original approach. The main reason is the mapping complexityis increased for those special circuit architectures. Generally, the proposed approach addsextra edges to connect replicated gates or splitting buffers for maintaining logicequivalence, which increases the mapping complexity, while fan-out splitting reduces themapping complexity by relaxing connectivity constraints. Hence, selecting critical gatescarefully for gate replication or buffer insertion to balance the mapping complexity has asignificant impact on the results.
5. Conclusions
In this article, a timing-driven logic restructuring method is proposed aiming at criticalgates splitting. For each circuit, we first identify and sort all the gate with the number offan-outs, then choose fan-out threshold according to limited area constraint. Then aquadratic judgement function is formulated to estimate whether the replication or bufferinsertion can reduce the mapping complexity. Finally, a method to determine critical gatesfor replication or buffer insertion using the above judgment function and critical pathinformation is proposed. Post-restructuring circuits are then applied to the mappingalgorithm to compare the results with those of the original circuit. Experimental resultsindicate that the proposed approach can improve the circuit timing by 8.51% with a6.13% speed increase in the CPU time and a 4.16% area overhead on averagesimultaneously.
High-density and low-fabrication cost of the nano-hybrid circuit are its advantages.However, limited connection resources increase its mapping complexity and may increase
International Journal of Electronics 683
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
the significant timing delay. Timing-driven logic restructuring of the circuit cansignificantly optimise the circuit performance with reduction in mapping complexity.Besides, the nano-hybrid circuit is prone to defects or faults. Logic replication-basedrestructuring may achieve high system reliability or defect-tolerance. This work will be thebasis of our forthcoming goal to work on defect tolerant mapping and optimisation.
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant No.61131001, Zhejiang Provincial Natural Science Foundation of China under Grant No. Z1090622,Doctoral Fund of Ministry of Education of China under Grant No. 20113305110001, GraduateStudent Scientific Research Innovation Project of Zhejiang Province, the Outstanding(Postgraduate) Dissertation Growth Foundation of Ningbo University (No. PY20110001) andK.C.Wong Magna Fund in Ningbo University.
References
Borghetti, J., Snider, G.S., Kuekes, P.J., Yang, J.J., Stewart, D.R., and Williams, R.S. (2010),
‘Memristive Switches Enable Stateful Logic Operations via Material Implication’, Nature, 464,
873–876.Chen, C. (2007), ‘Reliability-driven Gate Replication for Nanometer-scale Digital Logic’, IEEE
Transactions on Nanotechnology, 6, 303–308.Chen, G., Song, X., and Hu, P. (2009), ‘A Theoretical Investigation on CMOL FPGA Cell
Assignment Problem’, IEEE Transactions on Nanotechnology, 8, 322–329.Chu, Z., Xia, Y., Hung, W.N.N., Wang, L., and Song, X. (2010), ‘A Memetic Approach for
Nanoscale Hybrid Circuit Cell Mapping’, in 13th Euromicro Conference on Digital System
Design, Lille, France, pp. 681–688.Hrkic, M., Lillis, J., and Beraudo, G. (2006), ‘An Approach to Placement-coupled Logic
Replication’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, 25, 2539–2551.Hung, W.N.N., Gao, C., Song, X., and Hammerstrom, D. (2008), ‘Defect Tolerant CMOL Cell
Assignment via Satisfiability’, IEEE Sensors Journal, 8, 823–830.Kim, H., and Lillis, J. (2008), ‘A Layout-level Logic Restructuring Framework for LUT-based
FPGAs’, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 27,
2120–2132.Likharev, K.K., and Strukov, D.B. (2005), CMOL: Devices, Circuits, and Architectures. Berlin:
Springer.
Schafer, B., and Kim, T. (2009), ‘Autonomous Temperature Control Technique in VLSI Circuits
Through Logic Replication’, IET Computers & Digital Techniques, 3, 62–71.Snider, G.S., and Williams, R.S. (2007), ‘Nano/CMOS Architectures Using a Field-programmable
Nanowire Interconnect’, Nanotechnology, 18, 035204.Strukov, D.B., and Likharev, K.K. (2005), ‘CMOL FPGA: A Reconfigurable Architecture for
Hybrid Digital Circuits with Two-terminal Nanodevices’, Nanotechnology, 16, 888–900.Strukov, D., and Mishchenko, A. (2010), ‘Monolithically Stackable Hybrid FPGA’, in Proceeding of
the Design, Automation and Test in Europe, Dresden, Germany, pp. 661–666.Strukov, D.B., Snider, G.S., Stewart, D.R., and Williams, R.S. (2008), ‘The Missing Memristor
Found’, Nature, 453, 80–83.
Sze, C., Alpert, C., Hu, J., and Shi, W. (2007), ‘Path-based Buffer Insertion’, IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, 26, 1346–1355.
Z. Chu et al.684
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014
Xia, Y., Chu, Z., Hung, W., Wang, L., and Song, X. (2011), ‘An Integrated Optimization Approachfor Nanohybrid Circuit Cell Mapping’, IEEE Transactions on Nanotechnology, 10, 1275–1284.
Yan, H., Choe, H.S., Nam, S., Hu, Y., Das, S., Klemic, J.F., Ellenbogen, J.C., and Lieber, C.M.(2011), ‘Programmable Nanowire Circuits for Nanoprocessors’, Nature, 470, 240–244.
Zhou, H., Wong, D., Liu, I.M., and Aziz, A. (2000), ‘Simultaneous Routing and Buffer InsertionWith Restrictions on Buffer Locations’, IEEE Transactions on Computer-Aided Design ofIntegrated Circuits and Systems, 19, 819–824.
International Journal of Electronics 685
Dow
nloa
ded
by [
Sout
hern
Tai
wan
Uni
vers
ity o
f Sc
ienc
e an
d T
echn
olog
y] a
t 01:
01 2
4 N
ovem
ber
2014