26
A PLA based Asynchronous A PLA based Asynchronous Micropipelining Approach for Micropipelining Approach for Sub-threshold Circuit Design Sub-threshold Circuit Design Authors: Authors: Nikhil Jayakumar* Nikhil Jayakumar* Rajesh Garg* Rajesh Garg* Bruce Gamache Bruce Gamache $ Sunil P. Khatri* Sunil P. Khatri* *Department of Electrical Engineering,Texas A&M *Department of Electrical Engineering,Texas A&M University. University. $ Conexant Systems, Inc. Conexant Systems, Inc.

A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

  • Upload
    berne

  • View
    24

  • Download
    1

Embed Size (px)

DESCRIPTION

A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design. Authors: Nikhil Jayakumar* Rajesh Garg* Bruce Gamache $ Sunil P. Khatri* *Department of Electrical Engineering,Texas A&M University. $ Conexant Systems, Inc. Outline. Motivation Introduction Approach - PowerPoint PPT Presentation

Citation preview

Page 1: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

A PLA based Asynchronous A PLA based Asynchronous Micropipelining Approach for Sub-Micropipelining Approach for Sub-

threshold Circuit Designthreshold Circuit Design

Authors: Authors:

Nikhil Jayakumar*Nikhil Jayakumar*

Rajesh Garg*Rajesh Garg*

Bruce GamacheBruce Gamache$$

Sunil P. Khatri*Sunil P. Khatri*

*Department of Electrical Engineering,Texas A&M University.*Department of Electrical Engineering,Texas A&M University.$$Conexant Systems, Inc.Conexant Systems, Inc.

Page 2: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

2

OutlineOutline MotivationMotivation IntroductionIntroduction ApproachApproach ResultsResults ConclusionsConclusions

Page 3: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

3

Sub-threshold LeakageSub-threshold Leakage

As supply voltage scales down, the VAs supply voltage scales down, the VTT of the devices is scaled down of the devices is scaled down as well.as well.

Leakage increases exponentially with decreasing VLeakage increases exponentially with decreasing VTT Leakage power is becoming comparable with dynamic power.Leakage power is becoming comparable with dynamic power. A larger VA larger VTT would reduce leakage but increase delay. would reduce leakage but increase delay. We can turn this dilemma into an opportunity !!We can turn this dilemma into an opportunity !! Use sub-threshold leakage current to implement circuits.Use sub-threshold leakage current to implement circuits.

Set VDD less than VSet VDD less than VTT..

[1 ]gs T off ds

t t

V V V Vnv vsub

ds DoWI I e eL

Page 4: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

4

Advantages of Sub-threshold Advantages of Sub-threshold Circuit DesignCircuit Design

We performed simulations on a 21 stage ring We performed simulations on a 21 stage ring oscillator (BPTM 65nm)oscillator (BPTM 65nm) Power is significantly lower (100-500X).Power is significantly lower (100-500X). PDP improves by 10-20X.PDP improves by 10-20X.

Transconductance is an exponential function Transconductance is an exponential function of Vof Vgsgs

Circuit noise margins are high.Circuit noise margins are high. IIonon/I/Ioffoff = 100 – 200. = 100 – 200.

Circuits get faster at higher temperature.Circuits get faster at higher temperature.

Page 5: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

5

Disadvantages of Sub-Disadvantages of Sub-threshold Circuit Designthreshold Circuit Design

IIdsds is highly dependent on PVT variations is highly dependent on PVT variations Need dynamic compensating circuitry such as the one Need dynamic compensating circuitry such as the one

mentioned in:mentioned in: ““A Variation-tolerant Sub-threshold Design A Variation-tolerant Sub-threshold Design

Approach”, N. Jayakumar, S. Khatri [DAC’05]Approach”, N. Jayakumar, S. Khatri [DAC’05] Used Adaptive Body Biasing.Used Adaptive Body Biasing.

IIdsds is small which results in large delay. is small which results in large delay. Delay gets worse by 10-25X.Delay gets worse by 10-25X.

Therefore, application space is in very low power Therefore, application space is in very low power applications such as sensor networks.applications such as sensor networks.

Design methodologies for sub-threshold digital Design methodologies for sub-threshold digital circuit design are ad-hoc.circuit design are ad-hoc.

Page 6: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

6

Contribution of this paperContribution of this paper Provide a Provide a systematic EDA frameworksystematic EDA framework for the for the

design of complex digital systems using design of complex digital systems using sub-sub-threshold Network of PLA (NPLA)threshold Network of PLA (NPLA) based circuits. based circuits.

Use Use asynchronous micropipeliningasynchronous micropipelining to provide to provide a greater throughput.a greater throughput. Ideally suited for Data-flow type circuits.Ideally suited for Data-flow type circuits.

Page 7: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

7

Why NPLAs?Why NPLAs? NPLAs are fast and area-efficient when compared to standard-cell NPLAs are fast and area-efficient when compared to standard-cell

based designsbased designs- “- “Cross-talk immune VLSI design using a Network of PLAs Cross-talk immune VLSI design using a Network of PLAs Embedded in a Regular Layout FabricEmbedded in a Regular Layout Fabric”, ”, S.Khatri, R. Brayton, A. Sangiovanni-VincentelliS.Khatri, R. Brayton, A. Sangiovanni-Vincentelli [ICCAD’00][ICCAD’00]

Predictable delay of dynamic PLAs Predictable delay of dynamic PLAs Good circuit implementation choice for sub-threshold/near-threshold logic. Good circuit implementation choice for sub-threshold/near-threshold logic.

Regular Layout StructureRegular Layout Structure Compatible with Restrictive Design Rules (RDRs) required to handle current Compatible with Restrictive Design Rules (RDRs) required to handle current

and future lithographic issues.and future lithographic issues. Technology independent optimizations (literal reduction) utilized betterTechnology independent optimizations (literal reduction) utilized better

No intervening technology mapping step.No intervening technology mapping step. Implementing Structured ASICsImplementing Structured ASICs

An array of fixed-size PLAs is ideally suited for implementing Structured An array of fixed-size PLAs is ideally suited for implementing Structured ASIC type designs.ASIC type designs.

- “- “A METAL and VIA Mask Customizable VLSI Design Scheme A METAL and VIA Mask Customizable VLSI Design Scheme using an Array of Dynamic PLAsusing an Array of Dynamic PLAs”, ”, N.Jayakumar, S.KhatriN.Jayakumar, S.Khatri [ICCAD’04] [ICCAD’04]

Page 8: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

8

PLA structure – PrechargedPLA structure – PrechargedNOR-NORNOR-NOR

ANDPLANE

ORPLANE

Page 9: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

9

PLA structure – PrechargedPLA structure – PrechargedNOR-NORNOR-NOR

Inputs run Inputs run verticallyvertically

Wordlines run Wordlines run horizintallyhorizintally

Outputs run Outputs run verticallyvertically

A dummy A dummy wordline and a wordline and a dummy output dummy output line are line are provided for provided for self-timing.self-timing.

Page 10: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

10

PLA structure – PrechargedPLA structure – PrechargedNOR-NORNOR-NOR

completionis the lastsignal toswitch.

Inputlatchesto latch data frompreviouslevel

Page 11: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

11

AsynchronousAsynchronousMicropipeline StructureMicropipeline Structure

Each PLA hasEach PLA has Data Inputs –Data Inputs –DD (input) (input) Data Outputs – Data Outputs – OO (output) (output) Hand-shaking control signals - Hand-shaking control signals - P1P1, , P2P2

(input)(input) Controls asynchronous handshakeControls asynchronous handshake

PLA evaluation/precharge done signal – PLA evaluation/precharge done signal – completioncompletion (output) (output)

Switches high when evaluation Switches high when evaluation completes, switches low when completes, switches low when precharge completes.precharge completes.

Internal clock signal – Internal clock signal – INTCLKINTCLK (output) (output) Generated from completion, P1 and Generated from completion, P1 and

P2 to control operation of the PLA.P2 to control operation of the PLA. INTCLKINTCLK = low = low → → PLA prechargesPLA precharges INTCLKINTCLK = high = high → → PLA evaluatesPLA evaluates

level 1

level 2

level n

Page 12: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

12

Handshaking LogicHandshaking Logic PLA PLA pp (at level k) precharges (INTCLK goes low) if its P1 rises (at level k) precharges (INTCLK goes low) if its P1 rises

PLA PLA qq at next higher level has latched the output data of at next higher level has latched the output data of pp.. PLA PLA pp evaluates (INTCLK goes high) if its P2 rises and its evaluates (INTCLK goes high) if its P2 rises and its

completion signal is lowcompletion signal is low PLA PLA pp is currently in the precharged state (its completion signal is low). is currently in the precharged state (its completion signal is low).

PLA PLA rr at next lower level has completed evaluation and has new data at next lower level has completed evaluation and has new data ready (P2 for PLA ready (P2 for PLA pp has risen). has risen).

Handshaking logic is therefore as shown below:Handshaking logic is therefore as shown below:

Page 13: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

13

Micro-Pipeline OperationMicro-Pipeline Operation

level 1

level 2

level n

Initially all PLAs are precharged.Initially all PLAs are precharged. Drive primary inputs (D of level 1 PLAs).Drive primary inputs (D of level 1 PLAs). P2 signals of level 1 PLAs are asserted.P2 signals of level 1 PLAs are asserted. After evaluation is done, completion After evaluation is done, completion

signals of level 1 PLAs go high.signals of level 1 PLAs go high. Therefore level 2 PLAs start evaluating.Therefore level 2 PLAs start evaluating. Data gets latched at input of level 2 Data gets latched at input of level 2

PLAs, INTCLK of level 2 PLAs go high.PLAs, INTCLK of level 2 PLAs go high. This causes level 1 PLAs to start This causes level 1 PLAs to start

precharging.precharging. When evaluation of level 2 PLAs is When evaluation of level 2 PLAs is

done, their completion signals go highdone, their completion signals go high This causes level 3 PLAs to start This causes level 3 PLAs to start

evaluatingevaluating

Page 14: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

14

Micro-Pipeline OperationMicro-Pipeline Operation This goes on till the PLAs at This goes on till the PLAs at

level n finish evaluation level n finish evaluation (indicated by their completion (indicated by their completion signal going high).signal going high).

Consumer circuit latches the Consumer circuit latches the output and asserts P1 of output and asserts P1 of level n PLAslevel n PLAs This cause level n PLAs to This cause level n PLAs to

precharge.precharge. When completion of level n-1 When completion of level n-1

PLAs goes high and level n PLAs goes high and level n PLAs have precharged, then PLAs have precharged, then level n PLAs can evaluate level n PLAs can evaluate again.again. level 1

level 2

level n

Page 15: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

15

Non-micropipelined vs MicropipelinedNon-micropipelined vs Micropipelined

Delay for non-Delay for non-micropipelined micropipelined NPLA = NPLA = TTpchg pchg + n x (T+ n x (Tevaleval))

Delay of Delay of micropipelined PLA = micropipelined PLA =

(T(Teval eval + T+ Tpchgpchg+ + handshaking time)handshaking time) level 1

level 2

level n

Page 16: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

16

Verilog Simulation of Verilog Simulation of MicropipeliningMicropipelining

We simulated the handshaking protocol in verilog.We simulated the handshaking protocol in verilog. Verified correct operation.Verified correct operation.

If consumer circuit holds off asserting P1 for level n PLAs, the entire If consumer circuit holds off asserting P1 for level n PLAs, the entire pipeline stalls.pipeline stalls.

Note that when level i is in precharge, level i+1 is in evaluation and Note that when level i is in precharge, level i+1 is in evaluation and vice-versa.vice-versa.

Page 17: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

17

Synthesis-AlgorithmSynthesis-Algorithm First levelize the given multi-level network NFirst levelize the given multi-level network N Generate a DFS of network nodes and sort in Generate a DFS of network nodes and sort in

increasing order of levelsincreasing order of levels Greedily include new nodes from multi level Greedily include new nodes from multi level

network, into a current PLA.network, into a current PLA. Assume current PLA p has nodes {n} in it. Assume current PLA p has nodes {n} in it. Candidate nodes {m} for inclusion in PLA p are:Candidate nodes {m} for inclusion in PLA p are:

Nodes in the fanout of nodes in {n}.Nodes in the fanout of nodes in {n}. Nodes at the same level as nodes in {n}.Nodes at the same level as nodes in {n}.

We evaluate favorability of nodes in {m} is as: We evaluate favorability of nodes in {m} is as: favorability(m) = 2 * (#common fanins (m,{n}) + favorability(m) = 2 * (#common fanins (m,{n}) + (#common fanouts (m,{n}.(#common fanouts (m,{n}.

The first term favors sharing of inputs with The first term favors sharing of inputs with existing nodes {n}, while the second term existing nodes {n}, while the second term favors sharing of outputs.favors sharing of outputs.

Sharing of inputs was empirically Sharing of inputs was empirically determined to be more useful in yielding determined to be more useful in yielding smaller PLA counts.smaller PLA counts.

We include the node with the highest We include the node with the highest favorability value. favorability value.

4

2

3

1 1 1

5

2

5

Page 18: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

18

Synthesis-AlgorithmSynthesis-Algorithm Current PLA p is grown until it violates size Current PLA p is grown until it violates size

constraintsconstraints Nodes {n} in the current PLA are converted into Nodes {n} in the current PLA are converted into

a two-level network N.a two-level network N. We run espresso on N.We run espresso on N. If the number of inputs, outputs and height of If the number of inputs, outputs and height of

this two-level network are bounded, then PLA p this two-level network are bounded, then PLA p is grownis grown

If not, then we start growing a new PLA.If not, then we start growing a new PLA. Build a PLA dependency graphBuild a PLA dependency graph

Each vertex corresponds to a unique PLAEach vertex corresponds to a unique PLA Each edge connects the output of a PLA to the Each edge connects the output of a PLA to the

input of another PLAinput of another PLA Node being included in current PLA p are Node being included in current PLA p are

constrained by the following:constrained by the following: the node being included should not violate size the node being included should not violate size

constraints of a PLA.constraints of a PLA. the inclusion of this node should not result in a the inclusion of this node should not result in a

cyclic PLA dependency graphcyclic PLA dependency graph If such a node is not available pick the next If such a node is not available pick the next

most favorable node.most favorable node.

4

2

3

1 1 1

5

2

5

Page 19: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

19

Synthesis-AlgorithmSynthesis-Algorithm After synthesis, the output of a After synthesis, the output of a

PLA at level i may drive PLAs PLA at level i may drive PLAs at level > i+1at level > i+1 Such a case will cause micro-Such a case will cause micro-

pipelining to fail.pipelining to fail. Insert Stutter blocks for signals Insert Stutter blocks for signals

which traverse one or more which traverse one or more levels of PLAs.levels of PLAs.

Stutter blocks are banks of Stutter blocks are banks of latches to delay signals which latches to delay signals which traverse more than 1 levels of traverse more than 1 levels of PLAs.PLAs.

Multiple stutter blocks are Multiple stutter blocks are inserted for signals traversing inserted for signals traversing multiple levels.multiple levels.

Stutter

block

PLA1 PLA2

PLA3

PLA4 PLA5

Page 20: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

20

ExperimentsExperiments 65nm technology.65nm technology. VDD = 0.2VVDD = 0.2V PLA size : 16 inputs, 14 outputs, 24 rowsPLA size : 16 inputs, 14 outputs, 24 rows Delay, Energy results from SPICE using 65nm BPTM Delay, Energy results from SPICE using 65nm BPTM

model cards.model cards. Comparison made with non-micropipelined PLA.Comparison made with non-micropipelined PLA. Thoughput of PLA = 1/(TThoughput of PLA = 1/(Tevaleval+T+Tpchgpchg+2+2..HHevaleval+H+Hpchgpchg))

TTevaleval = Evaluation time for a PLA (~210ns) = Evaluation time for a PLA (~210ns) TTpchgpchg = Precharge time for a PLA (~155ns) = Precharge time for a PLA (~155ns) HHevaleval = Handshake time before start of evaluation (~60ns) = Handshake time before start of evaluation (~60ns) HHpchgpchg = Handshake time before start of precharge (~25ns) = Handshake time before start of precharge (~25ns)

Page 21: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

21

Results - DelayResults - DelayCktCkt #PLAs#PLAs # Stutter Blks# Stutter Blks Delay(ns) Delay(ns) ↓↓

         Non-µpipeNon-µpipe µpipeµpipe Impr.Impr.alu4alu4 1414 55 28852885 510510 5.66 X5.66 X

apex6apex6 2424 1212 24652465 510510 4.83 X4.83 X

C432C432 1111 44 22552255 510510 4.42 X4.42 X

C499C499 1414 44 22552255 510510 4.42 X4.42 X

C880C880 1616 55 22552255 510510 4.42 X4.42 X

C1355C1355 2121 1010 33053305 510510 6.48 X6.48 X

C1908C1908 2424 1313 39353935 510510 7.72 X7.72 X

C2670C2670 3434 1313 35153515 510510 6.89 X6.89 X

C3540C3540 6767 4646 75057505 510510 14.72 X14.72 X

pairpair 6565 3535 45654565 510510 8.95 X8.95 X

rotrot 1919 1313 30953095 510510 6.07 X6.07 X

AvgAvg 28.0928.09 14.5514.55       6.78 X 6.78 X

Delay = 1/throughput for micropipelined.Delay = 1/throughput for micropipelined. Delay is constant since PLA size is fixed.Delay is constant since PLA size is fixed.

Page 22: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

22

Results – AreaResults – AreaCktCkt #PLAs#PLAs # Stutter Blks# Stutter Blks Area(µArea(µ22) ) ↑↑

         Non-µpipeNon-µpipe µpipeµpipe Ovh.Ovh.

alu4alu4 1414 55 94089408 1276812768 1.36 X1.36 X

apex6apex6 2424 1212 1612816128 2419224192 1.5 X1.5 X

C432C432 1111 44 73927392 1008010080 1.36 X1.36 X

C499C499 1414 44 94089408 1209612096 1.29 X1.29 X

C880C880 1616 55 1075210752 1411214112 1.31 X1.31 X

C1355C1355 2121 1010 1411214112 2083220832 1.48 X1.48 X

C1908C1908 2424 1313 1612816128 2486424864 1.54 X1.54 X

C2670C2670 3434 1313 2284822848 3158431584 1.38 X1.38 X

C3540C3540 6767 4646 4502445024 7593675936 1.69 X1.69 X

pairpair 6565 3535 4368043680 6720067200 1.54 X1.54 X

rotrot 1919 1313 1276812768 2150421504 1.68 X1.68 X

AvgAvg 28.0928.09 14.5514.55       1.47 X1.47 X

Area estimates based on layout of PLAs along with stutter blocks.Area estimates based on layout of PLAs along with stutter blocks.

Page 23: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

23

What about Energy What about Energy consumption?consumption?

Non-micropipelined Non-micropipelined NPLAs precharge NPLAs precharge together and then together and then evaluate in a domino evaluate in a domino fashion.fashion. Energy wasted due Energy wasted due

to leakage in the to leakage in the “Precharged” and “Precharged” and the “Evaluated” the “Evaluated” states.states.

Micropipelined PLAs Micropipelined PLAs spend little time in the spend little time in the “Precharged” or “Precharged” or “Evaluated” states.“Evaluated” states.

Timing Diagram for a non-micropipelined NPLA

Page 24: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

24

Results – EnergyResults – EnergyCktCkt #PLAs#PLAs # Stutter Blks# Stutter Blks Energy(fJ) Energy(fJ) ↓↓

         Non-µpipeNon-µpipe µpipeµpipe Impr.Impr.

alu4alu4 1414 55 5984.85984.8 1811.431811.43 3.33.3

apex6apex6 2424 1212 9033.099033.09 3261.193261.19 2.772.77

C432C432 1111 44 3877.223877.22 13971397 2.782.78

C499C499 1414 44 4961.024961.02 1768.641768.64 2.82.8

C880C880 1616 55 6088.116088.11 2052.222052.22 2.972.97

C1355C1355 2121 1010 10198.8610198.86 2863.682863.68 3.563.56

C1908C1908 2424 1313 13814.1913814.19 3307.963307.96 4.184.18

C2670C2670 3434 1313 18694.3318694.33 4472.114472.11 4.184.18

C3540C3540 6767 4646 73900.5673900.56 9777.189777.18 7.567.56

pairpair 6565 3535 44442.7744442.77 9047.279047.27 4.914.91

rotrot 1919 1313 8966.688966.68 2774.152774.15 3.233.23

AvgAvg 28.0928.09 14.5514.55       3.843.84

Results show energy consumption for one computation through the NPLA circuit.Results show energy consumption for one computation through the NPLA circuit. Significant reduction in energy consumption is observed.Significant reduction in energy consumption is observed.

Page 25: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

25

ConclusionsConclusions We have proposed an We have proposed an asynchronous micropipelined asynchronous micropipelined

designdesign approach that reclaims some of the speed approach that reclaims some of the speed penalty associated with penalty associated with subthreshold circuit designsubthreshold circuit design.. Ideally suited for data-flow type applications.Ideally suited for data-flow type applications.

We implemented:We implemented: Handshaking protocol for micropipelining.Handshaking protocol for micropipelining. Circuit Design aspects of the approach.Circuit Design aspects of the approach. Logic synthesis for micropipelined NPLAs.Logic synthesis for micropipelined NPLAs.

We validated the approach with Verilog andWe validated the approach with Verilog andSpice simulations.Spice simulations.

Results show that:Results show that: Design can be sped up by ~ 7X.Design can be sped up by ~ 7X. Area Overhead is ~ 47%.Area Overhead is ~ 47%. Energy consumption is lower by ~ 4X.Energy consumption is lower by ~ 4X.

Techniques described can be used for regular Techniques described can be used for regular operating conditions (VDD > Voperating conditions (VDD > VTT) as well.) as well.

Page 26: A PLA based Asynchronous Micropipelining Approach for Sub-threshold Circuit Design

26

Thank you.Thank you.

Questions?Questions?