100
Synthesis of synchronous Synthesis of synchronous elastic architectures elastic architectures Jordi Cortadella (Universitat Jordi Cortadella (Universitat Polit Polit è è cnica Catalunya) cnica Catalunya) Mike Kishinevsky (Intel Corp.) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.) Bill Grundmann (Intel Corp.)

Synthesis of synchronous elastic architectures

  • Upload
    cian

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

Synthesis of synchronous elastic architectures. Jordi Cortadella (Universitat Polit è cnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel Corp.). Network of Computing Units. Out. In. B3. B1. B2. Network of Computing Units. Out. In. B3. B1. B2. - PowerPoint PPT Presentation

Citation preview

Page 1: Synthesis of synchronous elastic architectures

Synthesis of synchronousSynthesis of synchronouselastic architectureselastic architectures

Jordi Cortadella (Universitat PolitJordi Cortadella (Universitat Politèècnica Catalunya)cnica Catalunya)

Mike Kishinevsky (Intel Corp.) Mike Kishinevsky (Intel Corp.)

Bill Grundmann (Intel Corp.)Bill Grundmann (Intel Corp.)

Page 2: Synthesis of synchronous elastic architectures

Network of Computing UnitsNetwork of Computing Units

InIn OutOut

B1 B3

B2

Page 3: Synthesis of synchronous elastic architectures

Network of Computing UnitsNetwork of Computing Units

InIn OutOut

B1 B3

B2

Page 4: Synthesis of synchronous elastic architectures

Network of Computing UnitsNetwork of Computing Units

InIn OutOut

B1 B3

B2

Page 5: Synthesis of synchronous elastic architectures

Latency-insensitive (elastic) systemLatency-insensitive (elastic) system

InInOutOut

B1 B3

B2

Every block onlyEvery block onlymakes one stepmakes one step

when all inputs are validwhen all inputs are valid

Page 6: Synthesis of synchronous elastic architectures

WhyWhy

ScalableScalable

Modular (Plug & Play)Modular (Plug & Play)

Tolerance to variable latencyTolerance to variable latency– CommunicationCommunication– ComputationComputation

Not asynchronousNot asynchronous– Use existing design paradigmsUse existing design paradigms– CAD toolsCAD tools

Page 7: Synthesis of synchronous elastic architectures

OutlineOutline

The cost of elasticityThe cost of elasticity

SELF: an elastic protocolSELF: an elastic protocol– Basic implementation (linear pipelines)Basic implementation (linear pipelines)– General netlists (forks and joins)General netlists (forks and joins)– Formal models and verificationFormal models and verification

Synthesis of elastic architecturesSynthesis of elastic architectures

Related workRelated work

Page 8: Synthesis of synchronous elastic architectures

Elastic blockElastic block

Data Data

Valid ValidStop Stop

Control

CoreCore

CLK

Gated clockGated clock

What’s the cost ofWhat’s the cost ofelasticity?elasticity?

Page 9: Synthesis of synchronous elastic architectures

Communication channelCommunication channelreceiversender

Data Data

Long wires: slow transmission

Page 10: Synthesis of synchronous elastic architectures

Pipelined communicationPipelined communicationsender receiver

DataData

Page 11: Synthesis of synchronous elastic architectures

sender receiver

DataData

Pipelined communicationPipelined communication

Page 12: Synthesis of synchronous elastic architectures

sender receiver

DataData

How about if the sender does not always send valid data?

Pipelined communicationPipelined communication

Page 13: Synthesis of synchronous elastic architectures

The Valid bitThe Valid bitsender receiver

Data Data

Valid Valid

Page 14: Synthesis of synchronous elastic architectures

The Valid bitThe Valid bitsender receiver

Data

Valid

Data

Valid

Page 15: Synthesis of synchronous elastic architectures

The Valid bitThe Valid bitsender

Data

Valid

receiver

Data

Valid

Page 16: Synthesis of synchronous elastic architectures

The Valid bitThe Valid bitsender

Data

Valid

receiver

Data

Valid

Page 17: Synthesis of synchronous elastic architectures

Data

Valid

The Valid bitThe Valid bitsender receiver

Data

Valid

How about if the receiver is not always ready ?

Page 18: Synthesis of synchronous elastic architectures

The Stop bitThe Stop bit

0000000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Page 19: Synthesis of synchronous elastic architectures

The Stop bitThe Stop bit

1111000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Page 20: Synthesis of synchronous elastic architectures

The Stop bitThe Stop bit

1111110000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Page 21: Synthesis of synchronous elastic architectures

The Stop bitThe Stop bit

1111111111

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Back-pressureBack-pressure

Page 22: Synthesis of synchronous elastic architectures

The Stop bitThe Stop bit

1100000000

sender

Data

Valid

Stop

receiver

Data

Valid

Stop

Long combinational path

Page 23: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Page 24: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Page 25: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Page 26: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

V

S

V

S

V

S

V

S

Page 27: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Page 28: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Page 29: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Page 30: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

sender

shell

pearl

receiver

V

S

V

S

V

S

V

S

Page 31: Synthesis of synchronous elastic architectures

Carloni’s relay stations (double storage)Carloni’s relay stations (double storage)

main main main

aux aux aux

shell

pearl

receiver

shell

pearl

sender

• Handshakes with short wires• Double storage required

V

S

V

S

V

S

V

S

Page 32: Synthesis of synchronous elastic architectures

Proposal: an elastic protocolProposal: an elastic protocol

SELF (Synchronous ELastic Flow)SELF (Synchronous ELastic Flow)

Simple and provably correctSimple and provably correct

Data-path with no overhead in:Data-path with no overhead in:– AreaArea– LatencyLatency– EnergyEnergy

Negligible control overheadNegligible control overhead

Fine-grain elasticityFine-grain elasticity

Page 33: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

FF FF

Page 34: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Page 35: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Page 36: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Page 37: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Page 38: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Page 39: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Page 40: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Flip-flops already have aFlip-flops already have adouble storage capability, but …double storage capability, but …

Page 41: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L H L

Not allowed in conventionalNot allowed in conventionalFF-based design !FF-based design !

Page 42: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

1 cycle

H L LH

Let’s make the master/slave latches independent

Page 43: Synthesis of synchronous elastic architectures

Flip-flops vs. latchesFlip-flops vs. latchessender receiver

H L H L

½ cycle ½ cycle

Let’s make the master/slave latches independent

Only half of the latches (H or L) can move tokens

Page 44: Synthesis of synchronous elastic architectures

Elastic buffer keeps dataElastic buffer keeps datawhile stop is in flightwhile stop is in flight

W1R1

W2R1

W1R2

W2R2

Cannot be done withSingle Edge Flopswithout double pumping

Use latches inside MS

Carloni’s relay station belongs to this class

Page 45: Synthesis of synchronous elastic architectures

SELF (linear communication)SELF (linear communication)sender receiver

V V V V

S S S S

En En En En

1 1

Data

Valid

Stop

Data

Valid

Stop

1 1

Page 46: Synthesis of synchronous elastic architectures

SELFSELFsender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

Page 47: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 48: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 49: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 50: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 51: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 52: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 53: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 54: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 55: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

00

00

SELFSELF

Page 56: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

11

SELFSELF

Page 57: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 58: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 59: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 60: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 61: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 62: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 63: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 64: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

11

Data

Valid

Stop

SELFSELF

Page 65: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

11

00

Data

Valid

Stop

SELFSELF

Page 66: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 67: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 68: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 69: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

11

00

Data

Valid

Stop

Data

Valid

Stop

SELFSELF

Page 70: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 71: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 72: Synthesis of synchronous elastic architectures

sender receiver

V V V V

S S S S

En En En En

Data

Valid

Stop

Data

Valid

Stop

11

00

SELFSELF

Page 73: Synthesis of synchronous elastic architectures

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

Idle cycle: Valid = 0

00

Page 74: Synthesis of synchronous elastic architectures

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

Transfer cycle: Valid = 1 Stop = 0

11

00

DD

Page 75: Synthesis of synchronous elastic architectures

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

Retry cycle: Valid = 1 Stop = 1

11

11

DD

Persistency: G [ V Persistency: G [ V S S (Data=D) (Data=D) NextNext (V (V Data=D) ] Data=D) ]

Page 76: Synthesis of synchronous elastic architectures

RetryRetry

TransferTransfer

The protocolThe protocol

SenderSender ReceiverReceiver

DataData

ValidValid

StopStop

DataData

ValidValid

StopStop

* D D * C C C B * A* D D * C C C B * A

0 1 1 0 1 1 1 1 0 10 1 1 0 1 1 1 1 0 1

0 0 1 0 0 1 1 0 0 00 0 1 0 0 1 1 0 0 0

Page 77: Synthesis of synchronous elastic architectures

Elastic Half BufferElastic Half Buffer

SSii

EnEnii

VVii

SSi-1i-1

VVi-1i-1

DataData Latc

hLa

tch

EHBEHB

Page 78: Synthesis of synchronous elastic architectures

JoinJoin

EHB

+

V1

V2

S1

S2

V

S

EHB

EHB

Page 79: Synthesis of synchronous elastic architectures

Lazy ForkLazy Fork

V1

V2

S1

S2

V

S

Page 80: Synthesis of synchronous elastic architectures

Eager ForkEager Fork

V1

V2

S1

S2

V

S

Page 81: Synthesis of synchronous elastic architectures

Elastic combinational pathsElastic combinational paths

ForkJoin

Join / Fork

Wire

Wire

EBEB

EBEBEBEB

EBEB

Page 82: Synthesis of synchronous elastic architectures

Elastic combinational pathsElastic combinational paths

ForkJoin

Join / Fork

Wire

Wire

EBEB

EBEBEBEB

EBEB

Enable signalEnable signalto data latchesto data latches

Page 83: Synthesis of synchronous elastic architectures

Elastic combinational pathsElastic combinational paths

ForkJoin

Join / Fork

Wire

Wire

EBEB

EBEBEBEB

EBEB

Page 84: Synthesis of synchronous elastic architectures

Elastic buffer: formal modelElastic buffer: formal model

…i i+1 i+ki i+1 i+k

rd wr

Dout

Vout

Sout

Din

Vin

Sin

Buffer [ 0.. ]

Initial state: rd = wr = 0

Invariant: wr rd

Page 85: Synthesis of synchronous elastic architectures

Elastic buffer: formal modelElastic buffer: formal model

…i i+1 i+ki i+1 i+k

rd wr

Dout

Vout

Sout

Din

Vin

Sin

Liveness properties (finite unbounded latencies)

• Finite forward latency: G (rd wr F Vout)

• Finite backward latency : G( Sout F Sin)

Page 86: Synthesis of synchronous elastic architectures

Formal verificationFormal verification

…i i+1 i+ki i+1 i+k

rd wr

Dout

Vout

Sout

Din

Vin

Sin

Din

Vin

Sin

Dout

Vout

Sout

Implementation

Page 87: Synthesis of synchronous elastic architectures

Formal verificationFormal verification

The abstract FSM model is appropriate for The abstract FSM model is appropriate for compositional verificationcompositional verification

Verification of implementations with Verification of implementations with model model checkingchecking (1-bit abstractions of the datapath) (1-bit abstractions of the datapath)

– LTL specs + NuSMVLTL specs + NuSMV

– Buffer is a refinement of the specBuffer is a refinement of the spec– In-order data-transmissionIn-order data-transmission– Correct synchronization of fork/join structuresCorrect synchronization of fork/join structures– Absence of deadlocksAbsence of deadlocks

Page 88: Synthesis of synchronous elastic architectures

Observational equivalenceObservational equivalence

D: a b c d e f g h i j k …D: a b c d e f g h i j k …

Synchronous:

Elastic:

D: a a b b b c d e e f g g h i i i j k …D: a a b b b c d e e f g g h i i i j k …En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …En: 1 0 1 0 0 1 1 1 0 1 1 0 1 1 0 0 1 1 …

Page 89: Synthesis of synchronous elastic architectures

ElasticizationElasticization

Synchronous Elastic

Page 90: Synthesis of synchronous elastic architectures

CLKCLK

Page 91: Synthesis of synchronous elastic architectures

CLKCLK

PC

IF/ID ID/EX EX/MEM MEM/WB

JJOOIINN

JJOOIINN

FFOORRKK

FORKFORK

Page 92: Synthesis of synchronous elastic architectures

V

S

CLKCLK

V

S

V

S

V

S

V

S

JOIN

JOIN

FORK

FORK

Page 93: Synthesis of synchronous elastic architectures

1

0

CLKCLK

1

0

1

0

1

0

1

0

JOIN

JOIN

FORK

FORK

Page 94: Synthesis of synchronous elastic architectures

1

0

CLKCLK

1

0

1

0

1

0

1

0

JOIN

JOIN

FORK

FORK 0

0

Page 95: Synthesis of synchronous elastic architectures

1

0

1

0

1

0

1

0

1

0

Elastic control layerGeneration of gated clocks

CLKCLK

Page 96: Synthesis of synchronous elastic architectures

Variable-latency UnitsVariable-latency Units

[0 - k] cycles

[0 - k] cycles

VS VS

donego

Page 97: Synthesis of synchronous elastic architectures

Variable-latency unitsVariable-latency units

Telescopic units:Telescopic units:– 1 cycle for fast operations1 cycle for fast operations– 2 cycles for slow operations2 cycles for slow operations

Examples:Examples:– Short / long additions (carry propagation)Short / long additions (carry propagation)– A A ×× 0, A / 1 0, A / 1– Dynamic changes in latencyDynamic changes in latency

(fast if cold, slow if hot)(fast if cold, slow if hot)

Page 98: Synthesis of synchronous elastic architectures

Microarchitectural explorationMicroarchitectural exploration

Bubble insertion + Variable-latency unitsBubble insertion + Variable-latency units

– May improve performanceMay improve performanceMore bubbles but reduces cycle timeMore bubbles but reduces cycle time

– Reduce powerReduce powerUnits designed for most frequent input dataUnits designed for most frequent input data

Exploration at fine-granularityExploration at fine-granularity

Page 99: Synthesis of synchronous elastic architectures

Some related workSome related workAsynchronous designAsynchronous design– Micropipelines (Sutherland)Micropipelines (Sutherland)– Rings (Williams, Sparso)Rings (Williams, Sparso)– CHP and slack-elasticity (Martin, Burns, Manohar et al.)CHP and slack-elasticity (Martin, Burns, Manohar et al.)

Latency insensitive designLatency insensitive design– Carloni and a few follow-ups (large overhead)Carloni and a few follow-ups (large overhead)– Wire pipelining: Svensson, Nookala, Casu, … Wire pipelining: Svensson, Nookala, Casu, …

Interlock pipelinesInterlock pipelines (H. Jacobson et al.) (H. Jacobson et al.)

De-synchronizationDe-synchronization– J. Cortadella et al.J. Cortadella et al.– V. VarshavskyV. Varshavsky

Synchronous implementations of CSPSynchronous implementations of CSP – J. O’Leary et al.J. O’Leary et al.– A. Peeters et al.A. Peeters et al.

Page 100: Synthesis of synchronous elastic architectures

SummarySummary

SELF: a specific protocol and implementation for elastic SELF: a specific protocol and implementation for elastic systems with systems with very small overheadvery small overhead buffering buffering

Compositional theoryCompositional theory proving correctness proving correctness (Krstic et al., FMCAD’06)(Krstic et al., FMCAD’06)

Library of controllersLibrary of controllers has been designed and their has been designed and their correctness verifiedcorrectness verified

Elasticization CADElasticization CAD in progress in progress

New New micro-architectural opportunitiesmicro-architectural opportunities based on bubbles based on bubbles and variable latency unitsand variable latency units