Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013

Preview:

Citation preview

Elastic circuits

Jordi CortadellaUniversitat Politècnica de Catalunya, Barcelona

EMicro 2013

Goals• Convince ourselves that:

– designing an asynchronous circuit is easy– synchronous and asynchronous circuits are similar– asynchronous circuits bring new advantages

• Not to cover exotic asynchronous schemes

• Elasticity can also be synchronous

EMicro 2013 Elastic circuits 2

Clocking

EMicro 2013 Elastic circuits

Nvidia KeplerTM GK110

• How to distribute the clock?

• How to determine the clockfrequency?

• How to implement robustcommunications?

• How to reduce and manageenergy?

3

28nm, 7.1B transistors, 550mm2, 2688 CUDA cores,Base clock: 836MHz, Memory clock: 6GHz

EMicro 2013 Elastic circuits 4

Outline• Synchronous and Source-synchronous circuits• Completion detection• Handshaking• Performance analysis• Why asynchronous?• Design automation• Synchronous elasticity• Globally-asynchronous Locally-synchronous

EMicro 2013 Elastic circuits 5

Synchronous andSource-Synchronous

Synchronous circuit

EMicro 2013 Elastic circuits

PLL

7

12112

Synchronous circuit

EMicro 2013 Elastic circuits

CL

Two competing paths:• Launching path• Capturing path

Launching path < Capturing path + Period

CLKtree + CL < CLKtree + Period

CL < Period (no clock skew)

2PLL

8

Source-synchronous

EMicro 2013 Elastic circuits

CLKgen matched delay matched delay matched delay

• No global clock required

• More tolerance to PVT variations

• Period > longest combinational path

• Good for acyclic pipelines

Launching path

Capturing path

9

CLKgen

?

Source-synchronous with forks and joins

EMicro 2013 Elastic circuits

How to synchronize incoming events?

10

C element (Muller 1959)

EMicro 2013 Elastic circuits

CA

BC

A

B

C

A B C0 0 00 1 C1 0 C1 1 1

11

C element (Muller 1959)

EMicro 2013 Elastic circuits

A

B C

A

B

C

A B C0 0 00 1 C1 0 C1 1 1

MAJ

12

(many implementations exist)

Completion detection

Completion detection

EMicro 2013 Elastic circuits

CLKgen

fixed delay

The fixed delay must be longer than theworst-case logic delay (plus variability)

Q: could we detect when a computation has completed ASAP ?

14

A 1 SP 0 SP 1 SP 1 SP

Delay-insensitive codes: Dual Rail• Dual rail: every bit encoded with two signals

EMicro 2013 Elastic circuits

A.t A.f A0 0 Spacer0 1 01 0 11 1 Not used

A.t

A.f

15

Dual Rail AND gate

EMicro 2013 Elastic circuits

A B C

SP SP SP

0 - 0

- 0 0

SP 1 SP

1 SP SP

1 1 1

A

BC

A.t

A.f

B.t

B.f

C.t

C.f

16

Dual Rail Inverter

EMicro 2013 Elastic circuits

A Z

SP SP

0 1

1 0

A.t

A.f

Z.t

Z.f

17

Dual Rail AND/OR gate

EMicro 2013 Elastic circuits

A

BC

A.t

A.f

B.t

B.f

C.t

C.f

A

BC

A.f

A.t

B.f

B.t

C.f

C.tA

BC

18

Dual rail: completion detection

Dual-rail logic

•••

•••

C done

Completion detection tree

EMicro 2013 Elastic circuits 19

Multi-input C element

EMicro 2013 Elastic circuits

C

C

C

C

C

C

a1

a2

a3

a4

a5

a6

a7

c

20

Dual rail: completion detection

EMicro 2013 Elastic circuits

AND

OR

INV

AND

CLKgen

21

Dual rail: completion detection

EMicro 2013 Elastic circuits

AND

OR

INV

AND

CCLKgen

22

Dual rail: operation

EMicro 2013 Elastic circuits

AND

OR

INV

AND

CCLKgen

ResetComputeComputeComputeCompute

For a correct operation, all internal signals should be reset before the compute phase:• Use a more complex implementation of dual-rail (e.g., DIMS), or• Have internal completion detection, or• Use timing assumptions

23

Other DI codes• There are many DI codes:

– k-out-of n, Berger, Knuth, …

• Example: 1-out-of-4

– 2 bits with 4 wires– Same wire efficiency as DR– Less power consuming– Good for communication– Bad for logic

EMicro 2013 Elastic circuits

Wires Value0000 Spacer0001 00010 10100 21000 3

others not used

24

Single rail data vs. dual railSome back-of-the-envelope estimations:

EMicro 2013 Elastic circuits

Single rail Dual RailArea 1 2Delay 1 << 1Static power 1 2Dynamic power < 0.2 2

Dual rail:• Good for speed• Large area• High power comsumption

25

Handshaking

Handshaking

EMicro 2013 Elastic circuits

CLKgen unknown delay

Assume that the source module can provide data at any rate:

• When should the CLK generator send an event if the

internal delays of the circuit are unknown?

Solution: handshaking

27

Handshaking

EMicro 2013 Elastic circuits

I have data

I want data

Data

Request

Acknowledge

28

Asynchronous elastic pipeline

C

ReqIn ReqOut

AckIn AckOut

C C C

• David Muller’s pipeline (late 50’s)• Sutherland’s Micropipelines (Turing award, 1989)

EMicro 2013 Elastic circuits 29

Multiple inputs and outputs

EMicro 2013 Elastic circuits 30

Multiple inputs and outputs

EMicro 2013 Elastic circuits

delay

31

Mulitple inputs and outputs

EMicro 2013 Elastic circuits

C

Req

Ack Req

Ack

32

Channel-based communication• A channel contains data and handshake wires

EMicro 2013 Elastic circuits

Single-Rail DataReq

Ack

Dual-Rail DataAck

33

Push/pull channels

• Push: the sender initiates the communication• Pull: the receiver initiates the communication

EMicro 2013 Elastic circuits

Sender Receiver

Single-Rail DataReq (push)

Ack

Single-Rail DataAck

Req (pull)

34

Four-phase protocol

• Valid data on the active edge of Req• Req/Ack must return to zero before the next transfer• Different variations of the 4-phase protocol exist

EMicro 2013 Elastic circuits

Data 1 Data 2 Data 3

Req

Ack

Data

Data transfer Data transfer

35

Two-phase protocol

• Every edge is active• It may require double-edge triggered flip-flops or

pulse generators

EMicro 2013 Elastic circuits

Data 1 Data 2 Data 3

Req

Ack

Data

Data transfer Data transfer

36

How to memorize?

EMicro 2013 Elastic circuits

CombinationalLogic LL

delay

CC

? ?

2-phase or 4-phase ?

37

How to memorize?

EMicro 2013 Elastic circuits

CombinationalLogic LL

delay

CC

Pulsegenerator

2-phase

38

How to memorize?

EMicro 2013 Elastic circuits

CombinationalLogic LL

delay

CC 4-phase

39

Performance analysis

Ring oscillators

EMicro 2013 Elastic circuits

CC

CC

C

• Every ring requires an odd number of inverters

• The cycle period is determined by the slowest ring

• The cycle period is adapted to the operating conditions(temperature, voltage)

41

1

2 3 4

5

6 7

Global Rings

EMicro 2013 Elastic circuits 43

C

C C

C

CC

Global Rings

EMicro 2013 Elastic circuits

Th = 1 / 6

• Ramamoorthy and Ho, 1980Performance evaluation of asynchronous concurrent systems with Petri nets

• T. Williams et al., A self-timed chip for division, 1987• Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990• Manohar and Martin, Slack elasticity in concurrent computing, 1998.

44

Global Rings

EMicro 2013 Elastic circuits

Th = 2 / 6

• Ramamoorthy and Ho, 1980Performance evaluation of asynchronous concurrent systems with Petri nets

• T. Williams et al., A self-timed chip for division, 1987• Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990• Manohar and Martin, Slack elasticity in concurrent computing, 1998.

45

Global Rings

EMicro 2013 Elastic circuits

Th = 3 / 6

• Ramamoorthy and Ho, 1980Performance evaluation of asynchronous concurrent systems with Petri nets

• T. Williams et al., A self-timed chip for division, 1987• Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990• Manohar and Martin, Slack elasticity in concurrent computing, 1998.

46

Global Rings

EMicro 2013 Elastic circuits

Th = 1 / 6

• Ramamoorthy and Ho, 1980Performance evaluation of asynchronous concurrent systems with Petri nets

• T. Williams et al., A self-timed chip for division, 1987• Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990• Manohar and Martin, Slack elasticity in concurrent computing, 1998.

47

Global Rings

EMicro 2013 Elastic circuits

0 NN/2tokens

Th

1/2

• Ramamoorthy and Ho, 1980Performance evaluation of asynchronous concurrent systems with Petri nets

• T. Williams et al., A self-timed chip for division, 1987• Greenstreet and Steiglitz, Bubbles can make self-timed pipelines fast, 1990• Manohar and Martin, Slack elasticity in concurrent computing, 1998.

Tokenlimited

Bubblelimited

48

A latch-based view of synchronous circuits

EMicro 2013 Elastic circuits

Filp-flop =Master + Slave

49

Multiple Rings

EMicro 2013 Elastic circuits

2 / 4 2 / 42 / 5

5 / 7 ?It’s bubblelimited !!!2 / 7

50

Slack matching

EMicro 2013 Elastic circuits

2 / 4 2 / 42 / 5

2 / 7 ?4 / 9

• We can add as many bubbles as we want (but not tokens!)• Slack matching can be solved optimally in polynomial time• Slack matching is conceptually equivalent to buffer (FIFO) sizing or recycling

51

Performance analysis

EMicro 2013 Elastic circuits 52

C

C C

C

CC

(Mean Cycle Ratio)

Latch-based design

EMicro 2013 Elastic circuits

L3L2L1 L4

L1

L2

L3

L4

53

Launching path

Capturing path

Matched delays can be adjustable

EMicro 2013 Elastic circuits

L3L2L1 L4

54

delayselection

Delays can be adjusted:

• At testing/boot time (to adjust to static variability)

• At runtime (to compensate dynamic variability)

Why asynchronous?

Exploiting elasticity

CLK

Rigidclock

Highperformance

LowenergyEMicro 2013 Elastic circuits 56

Highperformance

Exploiting elasticity

Vo

ltage

Performance

1 VRigid

2 GHz1 GHz500 MHz

Lowenergy

0.9 V

0.8 V

0.7 V

Rigidclock

Highperformance

Lowenergy

Voltagescaling

EMicro 2013 Elastic circuits 57

Voltage scaling and power savings

-24%-14%

3 ARM926 coreson the same die

EMicro 2013 Elastic circuits 58

Tracking variability

EMicro 2013 Elastic circuits 59

matched delay

Tracking variability

delay

best typ worst

multi-corner matched delay

critical paths

Good correlation for:

• Process variability (systematic)

• Global voltage fluctuations

• Temperature

• Aging (partially)EMicro 2013 Elastic circuits 60

Margins

Gate and wire delays (typ) P V T AgingPLLJitter

Skew

Rigid Clocks:

Cycle period

Gate and wire delays (typ) P V TA

gin

g

Elastic Clocks:

Skew

Cycle period

Margin reduction

Speed-up / Power savings

EMicro 2013 Elastic circuits 61

wasted timecomputation time

Rigid clock

computation time

Cycle period

Cycle period

Elastic clock

Clock elasticity

EMicro 2013 Elastic circuits 62

Design Automation

Design automation paradigms• Synthesis of asynchronous controllers

– Logic synthesis from Petri nets or asynchronous FSMs

• Syntax-directed translation– Correct-by-construction composition of handshake

components

• De-synchronization– Automatic transformation from synchronous to

asynchronousEMicro 2013 Elastic circuits 64

Synthesis of asynchronous controllers

EMicro 2013 Elastic circuits

DeviceLDS

LDTACK

D

DSr

DSw

DTACK

VME BusController

DataTransceiver

BusDSr

LDS

LDTACK

D

DTACK

Read Cycle

65

Synthesis of asynchronous controllers

EMicro 2013 Elastic circuits

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

LDS

LDTACK

D

DSr

DTACK

VME BusController

Signal Transition Graph

66

Synthesis of asynchronous controllers

EMicro 2013 Elastic circuits

DTACKD

DSr

LDS

LDTACK

LDS+ LDTACK+ D+ DTACK+ DSr- D-

DTACK-

LDS-LDTACK-

DSr+

Cortadella et al., Petrify67

Syntax-directed translation

EMicro 2013 Elastic circuits

SEQ

xR

R

RWMUX

yR

R

RWMUX

*

DMX-

DMX-

DMX <>

DMX <

do

→→ @

áá ññ→

out

int = type [0..255]& gcd: main proc (in? chan <<int,int>> & out! chan int)begin x, y: var int| forever do in?<<x,y>>

; do x <> y then if x < y then y:=y-x else x:=x-y fi od

; out!x odend

Sources:

J. Kessels and A. Peeters.DESCALE: A Design Experiment for a SmartCard Application Consuming Low Energy,in Principles of Asynchronous Circuit Design, A Systems Perspective,Eds., J. Sparso and S. Furber, Kluwer Academic Publishers, 2001.

P.A.Beerel, R.O. Ozdag and M. Ferretti.A Designer’s Guide to Asynchronous VLSI,Cambridge University Press, 2010. 68

De-synchronization• Strategy: substitute the clock tree

by local clocks and handshakes

• Combinational logic and latches are not modified

• More tolerance to variability– Similar area, less power and/or more speed

• Cortadella, Kondratyev, Lavagno and Sotiriou. Desynchronization: Synthesis of asynchronous circuits from synchronous specifications.IEEE TCAD, Oct 2006.

EMicro 2013 Elastic circuits 69

Synchronous operation

EMicro 2013 Elastic circuits

CLKgen

Transforming a synchronous circuit into asynchronous (automatically)

70

De-synchronization

EMicro 2013 Elastic circuits

Transforming a synchronous circuit into asynchronous (automatically)

72

System-level de-synchronization

EMicro 2013 Elastic circuits 74

CLK

System-level de-synchronization

EMicro 2013 Elastic circuits 75

System-level de-synchronization

EMicro 2013 Elastic circuits 76

Synchronous elasticity

Different flavors of elasticity

EMicro 2013 Elastic circuits

+147 … 348201…

…Rigid

+e48…147…

201… 3

Elastic

79

4 38+s …147

201… Synchronous Elastic

Carloni et al., Latency-insensitive systems.

Asynchronous elasticity

req

ack

EMicro 2013 Elastic circuits 80

Synchronous elasticity

valid

stop

Ring oscillator

CLK

PLL

EMicro 2013 Elastic circuits 81

Latch-based elasticity

sender receiver

V V V V

En En En En

Data

Valid

Stop

Data

Valid

Stop

EMicro 2013 Elastic circuits 82

Elastic netlists

ForkJoin

Join / Fork

EB

EBEB

EB

Enable signalto data latches

EMicro 2013 Elastic circuits 83

Variable Latency Units

EMicro 2013 Elastic circuits

[0 - k] cycles

[0 - k] cycles

donego clear

84

V/S V/S

Globally-asynchronousLocally-synchronous

GALS

SoC design with GALS• Most IPs are synchronous

• Different components may have different operating frequencies

• Some components have variable latencies (e.g., cache hit/miss latency)

• Multiple clock domains are essential

EMicro 2013 Elastic circuits 86

Bridge

CDC

DSP

P

Fast Bus

Slow Bus

Bridge

CDC

Mem

CLK2

CLK1

CLK3

Multiple clock domains

EMicro 2013 Elastic circuits

CLK

Single clock(mesochronous)

f1/f0

f2/f0

f3/f0

CLK(f0)

Rational clockfrequencies

CLK

1C

LK2

CLK

3

CLK

0

Independent clocks

(controllable skew)

87

Synchronous handshakes

EMicro 2013 Elastic circuits

CLK1 CLK2

Data

Sender ReceiverValid

Ack

• The arrival of data is unpredictable• Handshakes solve the problem

88

The problem: metastability

EMicro 2013 Elastic circuits

D Q

ФT

D Q

?

D

Q

ФRФR

setup hold

89

How long does it take to resolve metastability?

EMicro 2013 Elastic circuits

Metastability

MTBF: Mean Time Between Failures

90

Classical synchronous solution

EMicro 2013 Elastic circuits

D Q D Q D Q D Q

ФT ФR

Wffe

D

rtMTBF

2

Mean Time Between Failures fФ: frequency of the clock fD: frequency of the data tr: resolve time available W: metastability window : resolve time constant

# FFs MTBF

1 FF 15 min

2 FF 9 days

3 FF 23 years

Example

91

Handshake with synchronizers

EMicro 2013 Elastic circuits

CLK1 CLK2

Data

Sender ReceiverValid

Ack

• Simple solution• Throughput can be highly degraded:

a long round trip for every transaction

92

Asynchronous FIFOs

EMicro 2013 Elastic circuits

Circular buffer

Valid Valid

Ack Ack

Data Data

Clk In Clk Out

FIFO control

• Ack is issued as soon as data has been delivered

• No impact on throughput (1 token/cycle)

• Min latency determined by the internal synchronizers

• Some tricky structures for the FIFO pointers (e.g. Grey encoding)

93

SoC design with GALS

EMicro 2013 Elastic circuits

Bridge

CDC

DSP

P

Fast Bus

Slow Bus

Bridge

CDC

Mem

CLK2

CLK1

CLK3

• Bridges for Clock Domain Crossing usually contain asynchronous FIFOs

• Latency cost only when interfacing with synchronous domains

• No latency penalty between asynchronous domains

94

Conclusions• Elasticity offers flexibility in time

– Modularity– Dynamic adaptability– Tolerance to variability

• Better optimization of power/performance

• Why isn’t it an important trend in circuit design?– Lack of commercial EDA support (timing sign-off)– Designers do not feel comfortable with “unpredictable” timing– Other aspects: testing, verification, …

• De-synchronization might be a viable solutionEMicro 2013 Elastic circuits 95

Bibliography• Carmona, Cortadella, Kishinevsky and Taubin,

Elastic Circuits, IEEE Trans. On CAD, Oct. 2009.

• Beerel, Ozdag and Ferreti, A Designer’s Guide to Asynchronous VLSI, Cambridge 2001.

• Sparso and Furber, Principles of Asynchronous Circuit Design: A Systems Perspective,Kluwer 2001.

• Myers, Asynchronous Circuit Design,John Wiley&Sons, 2001

EMicro 2013 Elastic circuits 96

EMicro 2013 Elastic circuits 97

Recommended