Synthesis of synchronous elastic architectures

Preview:

DESCRIPTION

Synthesis of synchronous elastic architectures. Jordi Cortadella (Universitat Polit è cnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.). Network of Computing Units. Out. In. B3. B1. B2. Network of Computing Units. Out. In. B3. B1. B2. - PowerPoint PPT Presentation

Citation preview

Synthesis of synchronousSynthesis of synchronouselastic architectureselastic architectures

Jordi Cortadella (Universitat PolitJordi Cortadella (Universitat Politèècnica Catalunya)cnica Catalunya)

Mike Kishinevsky (Intel Corp.) Mike Kishinevsky (Intel Corp.)

Bill Grundmann (Intel Corp.)Bill Grundmann (Intel Corp.)

Network of Computing UnitsNetwork of Computing Units

InIn OutOut

B1 B3

B2

Network of Computing UnitsNetwork of Computing Units

InIn OutOut

B1 B3

B2

Network of Computing UnitsNetwork of Computing Units

InIn OutOut

B1 B3

B2

Latency-insensitive (elastic) systemLatency-insensitive (elastic) system

InInOutOut

B1 B3

B2

Every block onlyEvery block onlymakes one stepmakes one step

when all inputs are validwhen all inputs are valid

WhyWhy

ScalableScalable

Modular (Plug & Play)Modular (Plug & Play)

Tolerance to variable latencyTolerance to variable latency– CommunicationCommunication– ComputationComputation

Not asynchronousNot asynchronous– Use existing design paradigmsUse existing design paradigms– CAD toolsCAD tools

OutlineOutline

The cost of elasticityThe cost of elasticity

SELF: an elastic protocolSELF: an elastic protocol– Basic implementation (linear pipelines)Basic implementation (linear pipelines)– General netlists (forks and joins)General netlists (forks and joins)– Formal models and verificationFormal models and verification

Synthesis of elastic architecturesSynthesis of elastic architectures

Related workRelated work

Elastic blockElastic block

Data Data

Valid ValidStop Stop

Control

CoreCore

CLK

Gated clockGated clock

What’s the cost ofWhat’s the cost ofelasticity?elasticity?

Communication channelCommunication channelreceiversender

Data Data

Long wires: slow transmission

Pipelined communicationPipelined communicationsender receiver

DataData

sender receiver

DataData

Pipelined communicationPipelined communication

sender receiver

DataData

How about if the sender does not always send valid data?

Pipelined communicationPipelined communication

The Valid bitThe Valid bitsender receiver

Data Data

Valid Valid

The Valid bitThe Valid bitsender receiver

Data

Valid

Data

Valid

The Valid bitThe Valid bitsender

Data

Valid

receiver

Data

Valid

The Valid bitThe Valid bitsender

Data

Valid

receiver

Data

Valid

Data

Valid

The Valid bitThe Valid bitsender receiver

Data

Valid

How about if the receiver is not always ready ?

The Stop bitThe Stop bit

0000000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

The Stop bitThe Stop bit

1111000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

The Stop bitThe Stop bit

1111110000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

The Stop bitThe Stop bit

1111111111

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Back-pressureBack-pressure

The Stop bitThe Stop bit

1100000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Long combinational path

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

• Handshakes with short wires• Double storage required

V

S

V

S

V

S

V

S

Proposal: an elastic protocolProposal: an elastic protocol

SELF (Synchronous ELastic Flow)SELF (Synchronous ELastic Flow)

Simple and provably correctSimple and provably correct

Data-path with no overhead in:Data-path with no overhead in:– AreaArea– LatencyLatency– EnergyEnergy

Negligible control overheadNegligible control overhead

Fine-grain elasticityFine-grain elasticity

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

FF FF

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops already have aFlip-flops already have adouble storage capability, but …double storage capability, but …

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Not allowed in conventionalNot allowed in conventionalFF-based design !FF-based design !

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L LH

Let’s make the master/slave latches independent

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

H L H L

½ cycle ½ cycle

Let’s make the master/slave latches independent

Only half of the latches (H or L) can move tokens

Elastic buffer keeps dataElastic buffer keeps datawhile stop is in flightwhile stop is in flight

W1R1

W2R1

W1R2

W2R2

Cannot be done withSingle Edge Flopswithout double pumping

Use latches inside MS

Carloni’s relay station belongs to this class

SELF (linear communication)SELF (linear communication)sender receiver

V V V V

S S S S

En En En En

1 1

Data

Valid

Stop

Data

Valid

Stop

1 1

SELFSELFsender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

11

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

00

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

Idle cycle: Valid = 0

00

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

Transfer cycle: Valid = 1 Stop = 0

11

00

DD

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

Retry cycle: Valid = 1 Stop = 1

11

11

DD

Persistency: G [ V Persistency: G [ V S S (Data=D) (Data=D) NextNext (V (V Data=D) ] Data=D) ]

RetryRetry

TransferTransfer

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

DataData

ValidValid

StopStop

* D D * C C C B * A* D D * C C C B * A

0 1 1 0 1 1 1 1 0 10 1 1 0 1 1 1 1 0 1

0 0 1 0 0 1 1 0 0 00 0 1 0 0 1 1 0 0 0

Elastic Half BufferElastic Half Buffer

SSii

EnEnii

VVii

SSi-1i-1

VVi-1i-1

DataData Latc

hLa

tch

EHBEHB

JoinJoin

EHB

+

V1

V2

S1

S2

V

S

EHB

EHB

Lazy ForkLazy Fork

V1

V2

S1

S2

V

S

Eager ForkEager Fork

V1

V2

S1

S2

V

S

Elastic combinational pathsElastic combinational paths

ForkJoin

Join / Fork

Wire

Wire

EBEB

EBEBEBEB

EBEB

Elastic combinational pathsElastic combinational paths

ForkJoin

Join / Fork

Wire

Wire

EBEB

EBEBEBEB

EBEB

Enable signalEnable signalto data latchesto data latches

Elastic combinational pathsElastic combinational paths

ForkJoin

Join / Fork

Wire

Wire

EBEB

EBEBEBEB

EBEB

Elastic buffer: formal modelElastic buffer: formal model

…i i+1 i+ki i+1 i+k

rd wr

Dout

Vout

Sout

Din

Vin

Sin

Buffer [ 0.. ]

Initial state: rd = wr = 0

Invariant: wr rd

Elastic buffer: formal modelElastic buffer: formal model

…i i+1 i+ki i+1 i+k

rd wr

Dout

Vout

Sout

Din

Vin

Sin

Liveness properties (finite unbounded latencies)

• Finite forward latency: G (rd wr F Vout)

• Finite backward latency : G( Sout F Sin)

Formal verificationFormal verification

…i i+1 i+ki i+1 i+k

rd wr

Dout

Vout

Sout

Din

Vin

Sin

Din

Vin

Sin

Dout

Vout

Sout

Implementation

Formal verificationFormal verification

The abstract FSM model is appropriate for The abstract FSM model is appropriate for compositional verificationcompositional verification

Verification of implementations with Verification of implementations with model model checkingchecking (1-bit abstractions of the datapath) (1-bit abstractions of the datapath)

– LTL specs + NuSMVLTL specs + NuSMV

– Buffer is a refinement of the specBuffer is a refinement of the spec– In-order data-transmissionIn-order data-transmission– Correct synchronization of fork/join structuresCorrect synchronization of fork/join structures– Absence of deadlocksAbsence of deadlocks

Observational equivalenceObservational equivalence

D: a b c d e f g h i j k …D: a b c d e f g h i j k …

Synchronous:

Elastic:

D: a a b b b c d e e f g g h i i i j k …D: a a b b b c d e e f g g h i i i j k …En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …

ElasticizationElasticization

Synchronous Elastic

CLKCLK

CLKCLK

PC

IF/ID ID/EX EX/MEM MEM/WB

JJOOIINN

JJOOIINN

FFOORRKK

FORKFORK

V

S

CLKCLK

V

S

V

S

V

S

V

S

JOIN

JOIN

FORK

FORK

1

0

CLKCLK

1

0

1

0

1

0

1

0

JOIN

JOIN

FORK

FORK

1

0

CLKCLK

1

0

1

0

1

0

1

0

JOIN

JOIN

FORK

FORK 0

0

1

0

1

0

1

0

1

0

1

0

Elastic control layerGeneration of gated clocks

CLKCLK

Variable-latency UnitsVariable-latency Units

[0 - k] cycles

[0 - k] cycles

VS VS

donego

Variable-latency unitsVariable-latency units

Telescopic units:Telescopic units:– 1 cycle for fast operations1 cycle for fast operations– 2 cycles for slow operations2 cycles for slow operations

Examples:Examples:– Short / long additions (carry propagation)Short / long additions (carry propagation)– A A ×× 0, A / 1 0, A / 1– Dynamic changes in latencyDynamic changes in latency

(fast if cold, slow if hot)(fast if cold, slow if hot)

Microarchitectural explorationMicroarchitectural exploration

Bubble insertion + Variable-latency unitsBubble insertion + Variable-latency units

– May improve performanceMay improve performanceMore bubbles but reduces cycle timeMore bubbles but reduces cycle time

– Reduce powerReduce powerUnits designed for most frequent input dataUnits designed for most frequent input data

Exploration at fine-granularityExploration at fine-granularity

Some related workSome related workAsynchronous designAsynchronous design– Micropipelines (Sutherland)Micropipelines (Sutherland)– Rings (Williams, Sparso)Rings (Williams, Sparso)– CHP and slack-elasticity (Martin, Burns, Manohar et al.)CHP and slack-elasticity (Martin, Burns, Manohar et al.)

Latency insensitive designLatency insensitive design– Carloni and a few follow-ups (large overhead)Carloni and a few follow-ups (large overhead)– Wire pipelining: Svensson, Nookala, Casu, … Wire pipelining: Svensson, Nookala, Casu, …

Interlock pipelinesInterlock pipelines (H. Jacobson et al.) (H. Jacobson et al.)

De-synchronizationDe-synchronization– J. Cortadella et al.J. Cortadella et al.– V. VarshavskyV. Varshavsky

Synchronous implementations of CSPSynchronous implementations of CSP – J. O’Leary et al.J. O’Leary et al.– A. Peeters et al.A. Peeters et al.

SummarySummary

SELF: a specific protocol and implementation for elastic SELF: a specific protocol and implementation for elastic systems with systems with very small overheadvery small overhead buffering buffering

Compositional theoryCompositional theory proving correctness proving correctness (Krstic et al., FMCAD’06)(Krstic et al., FMCAD’06)

Library of controllersLibrary of controllers has been designed and their has been designed and their correctness verifiedcorrectness verified

Elasticization CADElasticization CAD in progress in progress

New New micro-architectural opportunitiesmicro-architectural opportunities based on bubbles based on bubbles and variable latency unitsand variable latency units

Recommended