79
Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Embed Size (px)

Citation preview

Page 1: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Gate Transfer Level Synthesis as an Automated Approach

to Fine-Grain PipeliningAlexander Smirnov

Alexander Taubin

Mark Karpovsky

Leonid Rozenblyum

Page 2: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Presentation goals

Present and overview the synthesis framework Demonstrate a high-level pipeline model Demonstrate the synthesis correctness Illustrate how the correctness is guaranteed Present experimental results Conclusions Future work

Page 3: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Objective

Industrial quality EDA flow for automated synthesis of fine-grain pipelined robust circuits from high-level specifications

Industrial quality Easy to integrate in RTL oriented environment Capable of handling very large designs – scalability

Automated fine-grain pipelining To achieve high performance (throughput) Automated to reduce design time

Page 4: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Choice of paradigm

Synchronous RTL 8 logic levels per stage is the limit

Due to register, clock skew and jitter overhead Timing closure

No pipelining automation available – stage balancing is difficult Performance limitations

To guarantee correctness with process variation etc Asynchronous GTL

Lower design time Automated pipelining possible from RTL specification

Higher performance Gate-level (finest possible) pipelining achievable

Controllable power consumption Smoothly slows down in case of voltage reduction

Improved yield Correct operation regardless of variations

Page 5: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Easy integration & scalability: Weaver flow architecture RTL tools reuse

Creates the impression that nothing has changed

Saves development effort

Substitution based transformations Linear complexity Enabled by using

functionally equivalent DR (dual-rail: physical) and SR (single rail: virtual) libraries

Single-rail (synchronous) netlist synthesis

Weaving

GTL netlist mapping

srGTL lib

Single-rail netlist

GTL netlist

Mapped QDI netlist P&R constraints

Hos

t syn

thes

is e

ngi

ne

HDL design specification

P&R, etc tools following synthesis in EDA flow

GTL lib

VHDL packages

lib cells,GTL stages

We

ave

r

Page 6: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Easy integration & scalability: Weaver flow architecture Synthesis flow

Interfacing with host synthesis engine

Transforming Synchronous RTL to Asynchronous GTL – Weaving

Dedicated library(ies) Dual-rail encoded data logic Cells comprising entire

stages Internal delay assumptions

only

Single-rail (synchronous) netlist synthesis

Weaving

GTL netlist mapping

srGTL lib

Single-rail netlist

GTL netlist

Mapped QDI netlist P&R constraints

Hos

t syn

thes

is e

ngi

ne

HDL design specification

P&R, etc tools following synthesis in EDA flow

GTL lib

VHDL packages

lib cells,GTL stages

We

ave

r

Page 7: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Automated fine-grain pipelining: Gate Transfer Level (GTL) Gate-level pipeline

REGCombinational logic

Page 8: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Automated fine-grain pipelining: Gate Transfer Level (GTL) Gate-level pipeline Let gates communicate

asynchronously and independently

Page 9: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Automated fine-grain pipelining: Gate Transfer Level (GTL) Gate-level pipeline Let gates communicate

asynchronously and independently

Many pipeline styles can be used

Page 10: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

C

M

CDPC

ACK

F

RReq

RAck

parasitic capacitances

LAck

LReq

A0

B0

A1

B1

Z0

Z1

weak

Automated fine-grain pipelining: Gate Transfer Level (GTL) Gate-level pipeline Let gates communicate

asynchronously and independently

Many pipeline styles can be used

Templates already exist

Page 11: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Weaving

Critical transformations Mapping combinational gates (basic weaving) Mapping sequential gates Initialization preserving liveness and safeness

Optimizations Performance optimization

Fine-grain pipelining (natural) Slack matching

Area optimization Optimizing out identity function stages

Page 12: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Basic Weaving

De Morgan transformation Dual-rail expansion Gate substitution Generating req/ack signals

Merge insertion Fork insertion

Reset routing

AND2

A

B

X

Z

weaving

GTL_AND2

ACK

PC CD

F: AND2 M

A0A1B0B1 Z1

Z0

X0X1

XReq

ZReq

XAck

ZAck

AReq

BReq

AAck

BAck

dual-rail impl-n

Page 13: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Basic Weaving: example (C17 MCNC benchmark)

AND2

AND2

AND2

OR2

weaving

GTLAND2

GTLAND2

GTLAND2 GTL

OR2

GTLOR2

GTLAND2

data0data1 } dr_data

reqack } handshake

Page 14: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Linear pipeline (RTL)

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

Q

QSET

CLR

D

L

DL1

CL1

Q

QSET

CLR

D

L

DL2

Q

QSET

CLR

D

L

DL3

CL2

clk0clk1

Q

QSET

CLR

D

L

DL4

Q

QSET

CLR

D

L

DL5

CL3

Page 15: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t3 t4t1 t3 t4

t3 t4t1

...

.........

clk0/clk<=’0’

clk1/clk<=’1’

t5 t6

...

...

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

pipeline PN (PPN) model with local handshake

pipeline PN model with global synchronization

Page 16: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t3 t4t1 t3 t4

*

t3 t4t1

...

.........

clk0/clk<=’0’

clk1/clk<=’1’

t5 t6

...

...

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

pipeline PN (PPN) model with local handshake

pipeline PN model with global synchronization

Page 17: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t3 t4t1 t3 t4

*

t3 t4t1

...

.........

clk0/clk<=’0’

clk1/clk<=’1’

t5 t6

...

...

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

PPN models asynchronous full-buffer pipelines

pipeline PN model with global synchronization

Page 18: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

ACKPC

M

ACKPC

M

ACKPC

M

ACKPC

M

ACKPC CD

MCL1

ACKPC CD

M

ACKPC CD

MCL2

CDCD CDACK

PC CD

MCL2

CDCD CD

GTL implementation

RTL implementation

Page 19: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Correctness

Safeness Guarantees that the number of data portions (tokens) stays

the same over the time Liveness

Guarantees that the system operates continuously Flow equivalence

In both RTL and GTL implementations corresponding sequential elements hold the same data values On the same iterations (order wise) For the same input stream

Page 20: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Non-linear pipelines

Deterministic token flow Broadcasting tokens to all

channels at Forks Synchronizing at Merges

Data dependent token flow Ctrl is also a dual-rail

channel To guarantee liveness

MUXes need to match deMUXes – hard computationally

Fork Merge

in out

ctrl

MUX

in out

ctrl

deMUX

Page 21: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Non-linear pipeline liveness Currently guaranteed for deterministic

token flow only by construction (weaving)

A marking of a marked graph is live if each directed PN circuit has a marker

Linear closed pipelines can be considered instead

Page 22: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t2t1*

*

Closed linear PPN

Every PPN “stage” is a circuit and has a marker by definition

Page 23: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t2t1*

*

Closed linear PPN

Every PPN “stage” is a circuit and has a marker by definition

Page 24: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t2t1*

*

Closed linear PPN

Every PPN “stage” is a circuit and has a marker by definition

Page 25: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t2t1*

*

Closed linear PPN

Every PPN “stage” is a circuit and has a marker by definition

Each implementation loop forms two directed circuits Forward – has at least

one token inferred for a DFF

Q

QSET

CLR

D

DFF2

Page 26: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t2t1*

*

Closed linear PPN

Every PPN “stage” is a circuit and has a marker by definition

Each implementation loop forms two directed circuits Forward – has at least

one token inferred for a DFF

Feedback – has at least one NULL inferred from CL or added explicitly

Q

QSET

CLR

D

DFF2

CL

Page 27: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t2t1*

*

Closed linear PPN pipeline is live iff(for full-buffer pipelines) Every loop has at least 2

stages Token capacity for any loop:

1 C N - 1

Assumption we made – every loop in synchronous circuit has a DFF

A loop with no CL is meaningless

Liveness conditions hold by construction (Weaving)

Q

QSET

CLR

D

DFF2

CL

Page 28: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

entity seqr is port (D_out : out std_logic; D_in, clk : in std_logic ); end seqr; architecture behavior of seqr is signal Data : std_logic_vector(2 downto 0); begin process(clk) begin if (clk='0' and clk'event) then Data <= (D_in, Data(2), Data(1)); end if; if (Data = "111") then D_out <= '1'; else D_out <= '0'; end if; end process; end behavior;

D_inD_outF

HB

FB

F

FBFB

Initialization: example

Q

QSET

CLR

D

Q

QSET

CLR

D

Q

QSET

CLR

DD_in

clk

D_out

data1

ackreq

data0 } channelweaving

Page 29: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Initialization: FSM example

outin

outin

Fork

Next state CL

Output CL

State REG

clk

inout

HB HB HB HB HB HB

Page 30: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

GTL data flow structure is equivalent to the source RTL by weaving No data dependencies are removed No additional dependencies introduced

In deterministic flow architecture There are no token races (tokens cannot pass

each other) All forks are broadcast and all joins are

synchronizers Flow equivalence preserved by construction

Page 31: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

2 1 12

N N N2 1NNNNNN

2 1 12

NNN N N N NNN 2 1

GTL initialization is same as RTL

Page 32: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

2 1 12

3 N N2 NNNN2NN

2 1 12

NNN N N N NN3 2 1

but token propagation is independent

Page 33: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

2 1 12

3 N N2 NNN22NN

2 1 12

NNN N N N N33 2 N

but token propagation is independent

Page 34: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

2 1 12

3 N NN NNN22N3

2 1 12

2NN N N N N3N 2 N

but token propagation is independent

Page 35: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

2 1 12

N N NN NNN2N33

2 1 12

2N3 2 N N N3N N N

but token propagation is independent

Page 36: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 12

N 2 NN NNN2N33

3 2 12

233 2 2 N NNN N N

but token propagation is independent

Page 37: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 12

N 2 N3 NNNNN3N

3 2 12

233 2 2 2 NN4 N N

but token propagation is independent

In GTL “3” hits the first top register output

Page 38: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 12

4 2 N3 NN2NNNN

3 2 12

N3N 2 2 2 NN4 N N

but token propagation is independent

Page 39: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 12

4 2 N3 N22N3NN

3 2 12

N3N N 2 2 244 3 N

In GTL “3” hits the first bottom register output

but token propagation is independent

Page 40: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 12

4 N 23 N22N3N4

3 2 12

NNN N N 2 24N 3 N

but token propagation is independent

Page 41: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 12

N N 2N 2223344

3 2 12

NN4 N N N 2NN 3 2

In GTL “2” hits the second register output

but token propagation is independent

Page 42: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Flow equivalence

3 2 23

N N 2N 22N3344

3 2 23

3N4 N N N 24N 3 2

but token propagation is independent

In RTL “3” and “2” moved one stage ahead

timing is independent, the order is unchanged

Page 43: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Optimizations

Area Optimizing out identity function stages

Performance Fine-grain pipelining (natural) Slack matching

Page 44: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Optimizing out identity function stages Identity function stages (buffers) are inferred

for clocked DFFs and D-latches Implement no functionality Can be removed as long as

The token capacity is not decreased below the RTL level

The resulting circuit can still be properly initialized

Page 45: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

CL CLCL

Optimizing out identity function stages: example

HB HB HB HB HB HB HB HBHB HB HB HBDFF DFF

Final implementation is the same as if the RTL had not been pipelined (except for initialization)

Saves pipelining effort

Page 46: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Slack matching implementation Adjusting the pipeline slack to optimize its

throughput Implementation

leveling gates according to their shortest paths from primary inputs (outputs)

Inserting buffer stages to break long dependencies Buffer stages initialized to NULL

Currently performed for circuits with no loops only Complexity O(|X||C|2)

|X| - the number of primary inputs |C| - the number of connection points in the netlist

Page 47: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Slack matching correctness Increases the token capacity

Potentially increases performance Does not affect the number of initial tokens

Liveness is not affected Does not affect the system structure

The flow equivalence is not affected

Page 48: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Experimental results: MCNC

0

200

400

600

MHz

C1355 C5315 cm163a lal

Benchmark

Performance, MHz

RTL GTL

RTL implementation Not pipelined

GTL implementation Naturally fine-grain

pipelined Slack matching

performed Both implementations

obtained automatically from the same VHDL behavior specification on average ~ x4 better performance

Page 49: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Experimental results: AES

0

200

400

600

800

MHz

Inverter keyexpansn aes10rnds

Module

Performance, MHz

RTL GTL

00.5

11.5

22.5

33.5

4

sq. um xE+06

Inverter keyexpansn

Module

Area, sq. um xE+06

RTL GTL

~ x36 better performance ~ x12 larger

Page 50: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Base line

Demonstrated an automatic synthesis of QDI (robust to variations) automatically gate-level pipelined implementations from large behavioral specifications

Synthesis run time comparable with RTL synthesis (~2.5x slower) – design time could be reduced

Resulting circuits feature increased performance (depth dependent ~4x for MCNC) area overhead

Practical solution – first prerelease at http://async.bu.edu/weaver/

Demonstrated correctness of transformations (weaving)

Page 51: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Future work

Library design Dynamic (domino-like) library design Low leakage library design to combine high performance of

fine-grain pipelining with low power from very aggressive voltage reduction

Balanced library for security related applications Extending the concept to other technologies

Automated asynchronous fine-grain pipelining for standard FPGAs

Synthesis flow development Integration of efficient GTL “design-ware” and architectures

Page 52: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Thank you!

Questions?Comments?Suggestions?

Page 53: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Backup slides

Slack matching animated example Similar work FSM + datapath example (1-round AES) Experiments setup Linear HB PPN Non-linear HB PPN Closed linear HB pipeline liveness

Page 54: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Slack matching: example (C17)

handshake

dr_datachannel}

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

Page 55: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Slack matching: example (C17)

handshake

dr_datachannel}

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

Page 56: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

GTL1

GTL1

Slack matching: example (C17)

handshake

dr_datachannel}

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

GTL1

GTL1

GTL1

Page 57: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Slack matching: example (C17)

handshake

dr_datachannel}

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

Page 58: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Slack matching: example (C17)

handshake

dr_datachannel}

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

GTLAND2

GTLOR2

GTLAND2

GTL1

Page 59: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 60: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Similar work: the difference Null Convention Logic

Coarse-grain Slow and large synchronization trees

Phased logic Different encoding provides less switching activity Complicated synthesis algorithm due to encoding

De-synchronization Bundled data Coarse grain

None of the above provide support for automated fine-grain pipelining

Page 61: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 62: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Example: data path

CL

FSM

CL REGMUX MUX

CL

CL

REG

Page 63: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Example: data path

CL

FSM

CL REGMUX MUX

CL

CL

Page 64: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Example: data path

CL

FSM

CL REGMUX MUX

CL

CL

Page 65: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Example: data path

CL

FSM

CL MUXMUXDE

MUX

CL

CL

Page 66: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 67: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Experiments setup

Standard gates library vtvt from Virginia Tech TSMC 0.25

C-elements – derived from PCHB library from USC and simulated to obtain performance

C

Storage

CDPC

ACK

F

C

C

LAck

LReq RReq

RAck

A0

B0

A1

B1

Z0

Z1

Page 68: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 69: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

All correctness prerequisites1. no additional data dependencies are added and no existing data dependencies are removed during

weaving;2. every gate implementing a logical function is mapped to a GTL gate (stage) implementing equivalent

function for dual-rail encoded data and initialized to NULL (spacer);3. closed asynchronous HB pipeline maximum token capacity is S/2 - 1 (where S is the number of HB

stages);4. closed asynchronous FB pipeline maximum token capacity is S - 1 (S is the number of HB stages);5. in HB pipelines distinct tokens are always separated with spacers (there are no two distinct tokens in any

two adjacent stages);6. for each DFF in RTL implementation there exist in GTL implementation two HB stages one initialized to a

spacer and another – to a token;7. the number of HB pipeline stages in any cycle of GTL implementation is greater than the number of DLs (or

half-DFFs) in the corresponding synchronous RTL implementation;8. GTL pipeline token capacity is greater or equal to that of the synchronous implementation;9. no stage state is shared between any two stages.10. exactly one place is marked in every stage state.11. a HB PPN marking is valid iff every FB-stage in the HB PPN has exactly one marker;12. GTL style pipeline is properly modeled by HB PPN.13. a live closed HB PPN is at least 3 HB stages long;14. a live closed HB PPN has at least one token and at most S/2 – 1 tokens;15. the token flow is deterministic and does not depend on data itself;16. a marked graph is live iff M0 assigns at least one token on each directed loop (or circuit);17. for a HB PPN to be live each of its directed circuits composed of forward arcs as a closed HB PPN must

satisfy the conditions (xi), (xiii) and (xiv);18. every feedback loop in synchronous implementation contains at least one DFF (or a pair of DLs);

Page 70: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 71: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t3 t4t1 t3 t4

*

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

t3 t4 t5t1 t2 t7 t8t6

*

PPN models full-buffer pipelines

HB PPN models half-buffer pipelines

Page 72: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

t3 t4t1 t3 t4

*

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

t3 t4 t5t1 t2 t7 t8t6

*

HB PPN stage has three states

PPN stage has two states

Page 73: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Linear pipeline

Q

QSET

CLR

D

DFF1

CL1

clk

CL2

Q

QSET

CLR

D

DFF2

CL3

Q

QSET

CLR

D

DFF3

t3 t4 t5t1 t2 t7 t8t6

*

HB PPN stage has three statesACK

PC

M

ACKPC

M

ACKPC

M

ACKPC

M

ACKPC CD

MCL1

ACKPC CD

M

ACKPC CD

MCL2

CDCD CDACK

PC CD

MCL2

CDCD CD

models properly HB GTL implementation

Page 74: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 75: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Non-linear pipeline

fork

t1

t3

t2merge

t3

t1

t2

fork

t1

t3

t2merge

t3

t1

t2

HB PPN model

PPN equivalent to HB PPN besides token capacity

Page 76: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Non-linear pipeline

fork

t1

t3

t2merge

t3

t1

t2

merget3

t2

forkt1

t3

HB PPN model

MG PN equivalent to HB PPN besides token capacity

Page 77: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides

Page 78: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Closed linear HB pipeline is live iff Every loop has at least 3

stages Token capacity for any loop:

1 C N/2 - 1

Assumption we made – every loop in synchronous circuit has a DFF

A loop with no CL is meaningless

Liveness conditions hold

t2 t3t1

Q

QSET

CLR

D

DFF2

CL

Page 79: Gate Transfer Level Synthesis as an Automated Approach to Fine-Grain Pipelining Alexander Smirnov Alexander Taubin Mark Karpovsky Leonid Rozenblyum

Q

QSET

CLR

D

DFF

CLCL

Back to backup slides