28
Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi Mark Greenstreet McGill University University of British Columbia [email protected] [email protected] May 22, 2017 Synchronization FIFOs The Interleaved FIFO Architecture Design Flow Hazards Results Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 1 / 17

Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Interleaved Architectures for High-ThroughputSynthesizable Synchronization FIFOsAmeer Abdelhadi Mark GreenstreetMcGill University University of British Columbia

[email protected] [email protected]

May 22, 2017

Synchronization FIFOsThe Interleaved FIFO ArchitectureDesign FlowHazardsResults

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 1 / 17

Page 2: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Multi-Synchronous Designs

...

core core

core core

L2

$

NOC

DSP crypto

DDR4 PCIe 3.0 h.264

Multi-Synchronous Design

Typical chip designs consist ofI A large number of synchronous, synthesized blocks –

this is the easy part!I There are many different clocks.I Designers need easy to use, synthesizable interfaces.

The big challenges are ones of design integration.I This is where asynchronous methods excel.

FIFOs are a commonly used for these interfaces

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 2 / 17

Page 3: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Prior Work: Gray-Code FIFOs

get_datar_data

r_addr

sync sync Q

compare

space_avail w_en

w_data

r_en

get_req

data_valid

w_addr

Dual−Port SRAM

gray

counter

en

controlread

gray2bin

gray2bin

gray

counter

en

controlwrite

gray2bin

gray2binQ

compare

put_data

put_req

Legend:

code

clocked by put_clk clocked by get_clk

code

critical path (for cycle time)

+ Textbook solution – familiar to many designs.− Gray-to-binary conversion a bottleneck.• See [Cummings+Alfke2002].

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 3 / 17

Page 4: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Prior Work: One-Hot FIFOs

sync

gf ge

sync

empty full

stage 1

stage N−1..

.

... get

con

trol

pu

t co

ntr

ol Q

D D enQ

en

Note: data-store not shown.

+ Simple, fast control – very modular design.− Per-stage synchronizers and control tends to dominate area and

power.• See [Chelcea+Nowick2004,Ono+Greenstreet2009].

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 4 / 17

Page 5: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Contributions

A novel, interleaved FIFO architectureI Area and power efficientI High throughput, minimal latency.

Synthesizable designI Open-source Verilog

2 Supports standard, SynopsisTM design flow.2 You can use it in your designs!

I Highly parameterized – synthesize the FIFO you want.I Provides benchmark design for comparisons for further

research.Identify a glitch hazard in synchronization FIFOsI Many published designs have this hazard.I We present our solution.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 5 / 17

Page 6: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

FIFO Architecture

clk_get

DQ

sync

sync

getcontrol

DQ

N

putcontrol

Nv

Nv

D Q

a few gates

wQD

a few gates

w

a few gates

spaceav D Q

data_out

dl_oe

N

data_in

dl_we

data store

D Qreq_put

data_in

clk_put

data_out

req_get

datav

Similar to Gray-Code FIFOsI Separate write and read pointers in put and get interfaces.I Our design uses a pair of one-hot counters in each interface.

2 Avoids conversions to/from Gray code⇒ simple and fast2 Has a small number of synchronizers between put and get⇒ area and power efficient.

Simple interface to sender and receiverI Looks like flip-flop (in each domain) with standard flow control.I Minimal exposure of internal timing details.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 6 / 17

Page 7: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

A Thermometer Counter

clk

enableD Qen

q[2]

D Qen

q[3]q[1]

D Qen

D Qen

q[0]

Initialized to all zeros.Fills left-to-right with ones, then fills with zeros, then with ones, . . .Current position marked by state with transition (or 0 if all stagesare the same).

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 7 / 17

Page 8: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Dual Thermometer Counters

qv[0]

qv[1]

qv[2]

qv[3]

D Qen

qh[2]

D Qen

qh[3]

D Qen

D Qen

qh[0]

DQ

en

DQ

en

DQ

en

DQ

en

qh[1]

clk

enable

Increment the horizontal counter witheach wrap-around of the vertical counter.Sequence length is the product of thosefor the two counters.Avoids the O(N ) control-complexity thatis the bane of one-hot designs.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 8 / 17

Page 9: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

0

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableD Qen

D Qen

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 10: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

1

0 0 0

0 0 0

00 0 0

0

00 0 0

0

00 0 0

0

0

1

0

0

a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableD Qen

D Qen

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

a1a0a0a1 a1a0a0

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 11: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

0

0

00 0 0

0

0 0 0

00 0

0

0

0

1

00 0 0

1

1 0 0

0

a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableD Qen

D Qen

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

a1a0a0a1 a1a0a0

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 12: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

0

00 0

0

0

0

1

01 0 0

1

1 0 0

00 0 0

00 0 0

0

1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableDen

D Qen

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

Q

a1a0a0a1 a1a0a0a1

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 13: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

0

00 0

0

0

0

1

01 0 0

1

1 0 0

01 0 0

00 0 0

1

1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableDen

D Qen

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

Q

a1a0a0a1 a1a0a0a1

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 14: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

1

00

0

1

0

1

01 0 0

1

1 0 0

01 0 0

00 0

1

1

1

0

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableDen

D Qen

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

Q

a1a0a0a1 a1a0a0a1

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 15: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Decode to produce full-thermometer code

1

0

1

0

1

01 0 0

0

1 1 0

01 0 0

00 0

1

1

0

1

0 0

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

a1a0a0a1 a1a0a0a1

clk

enableD Qen

Den

D Qen

D Qen

DQ

en

DQ

en

DQ

en

DQ

en

Q

a1a0a0a1 a1a0a0a1

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 9 / 17

Page 16: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Detecting Data Availability

clk_getneeds to be synchronized to

full[2,*]full[0,*] full[1,*] full[3,*]

full[i,j]

[i,j]therm_put

[i,j]therm_get

data_available

A stage holds valid data if the value of the get-thermometer differs fromthe value of the put-thermometer.An OR-tree can determine if any stage holds valid data.But we need to synchronize.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 10 / 17

Page 17: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Synchronization with Interleaving

D

sync

Q

D

sync

Q

D

sync

Q

D

sync

Qclk_get

full[2,*]full[0,*] full[1,*] full[3,*]

data_available

One synchronizer per row.Nvertical ≥ Latencysync required for full throughput.Can use fewer synchronizers than Gray-Code designs – if the synchro-nizer latency is less than the number of address bits.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 11 / 17

Page 18: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Synchronization: Achieving Full Throughput

data_available

!ohH_get[*]

full[0,*]

b

clk_get

D Qclr

sync

D Qclr

D Qclr

D Qclr

do_geteven

odd

d

c

e

...

...

...

...

...

...

Nh

Nh

ohV[i]

do_get a

data available is registered, we need to know if there will be dataavailable after the next clk get↑.I If any row has a valid value and there isn’t a req get on this cycle,

then data will be available on the next cycle.I If two consecutive rows have valid data (one even, the other odd),

then data will be available on the next cycle.We clear the synchronizer after removing a data word from that row.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 12 / 17

Page 19: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Synchronization: Achieving Full Throughput

data_available

!ohH_get[*]

full[0,*]

b

clk_get

D Qclr

sync

D Qclr

D Qclr

D Qclr

do_geteven

odd

d

c

e

...

...

...

...

...

...

Nh

Nh

ohV[i]

do_get a

data available is registered, we need to know if there will be dataavailable after the next clk get↑.We clear the synchronizer after removing a data word from that row.I If there is data available in another column, we allow the first stage

of the synchronizer to record that on clk get↑

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 12 / 17

Page 20: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

The Data Path

w data_out

datav

req_get

w

w

put control

get control

D Qen

wdata_in

space_av

clk_put

req_put

clk_get

data_even

data_odd

sel_odd

en

oeD Q

en

oeD Q

en

oeD Q

D Q

D Q

sel_even

w

w

w

w

w

Input latch gives FIFO the same set-up and hold timing as a flip-flop.Output path uses even/odd interleaving:I We found that the data output path could not maintain GHz clock fre-

quencies.I Key observation: data is available on the output of the data latches

for nearly the full synchronizer latency. This ensures data validity.I Alternating paths provides an extra clock-cycle for the path to settle,

and the timing requirements are easily satisfied.The output logic is to block glitches (see next slide).

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 13 / 17

Page 21: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

FIFOs and CDC Hazards

...

g

state

local

state

local

data_out

FIFO

data_valid

get_req

f

a1

a2

D Q...

control

What the designer intended

A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);

is safe.

BUT, synthesis can introduce glitches the designer never imagined.This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17

Page 22: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

FIFOs and CDC Hazards

f, g, & control

merged and

optimized by

synthesis

...data_out

data_valid

get_req

D Q

...

statelocal

What synthesis can do

A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);

is safe.BUT, synthesis can introduce glitches the designer never imagined.

This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17

Page 23: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

FIFOs and CDC Hazards

1

...

state

local

0

1

1

...

...

......data_out

data_valid

get_req

D Q

0

01

...

What synthesis can do

A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);

is safe.BUT, synthesis can introduce glitches the designer never imagined.

This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17

Page 24: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

FIFOs and CDC Hazards

...

state

local

...

......

...

data_out

data_valid

get_req

D Q

0

111

1

1...

0

What synthesis can do

A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);

is safe.BUT, synthesis can introduce glitches the designer never imagined.

This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17

Page 25: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

FIFOs and CDC Hazards

...

state

local

...

......

...

data_out

data_valid

get_req

D Q

0

X1

X

X X... X

What synthesis can do

A designer might (reasonably) assume thatassign foo = data valid ? f(data out, . . .) : g(. . .);

is safe.BUT, synthesis can introduce glitches the designer never imagined.This hazard is present in many published clock-domain-crossing (CDC) FIFOs.Our design ensures data out is all 0s when no valid data is available.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 14 / 17

Page 26: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

Design Flow

Design Generator & Run-in-Batch Manager (shell/tcl/perl scripts)

Interleaved FIFO VERILOG Modules

Gates Synthesis(Synopsys Design Compiler)

Pre-layout gates netlist

Place & Route(Synopsys IC Compiler)

Back-annotated Gate-level-simulation (GLS)

(Synopsys VCS simulator)

Timing

Power Estimates(Synopsys PrimeTime)

CAD Tools Setup

Cell-based PDK

User s Design Requirements:Number of vertical/horizontal stages, data width,

synchronizer depth

Design Constraints

Verilog Testbench

RC ExtractionPost-layout gates netlist

Area/cell#/wire length

Performance /Latency

Equiv-alence

Delays(sdf)

Nodes Activity

(VCD)

Switching/Dynamic/Leakage Power Dissipation

Static Timing (STA)(Synopsys PrimeTime)

Complete SynopsisTM design flowI Includes synthesis, place-and-

route, back-annotation, simula-tion, performance analysis, andpower estimation

Highly parameterizedI FIFO word-size, depth, and in-

terleaving specified by user.I Number of synchronizer stagesI Target clock frequency

Open sourceI The FIFO itself is about 200

lines of VerilogI The test bench is about 300

lines of VerilogI Plus about 2400 lines for 10

scripts.I You can use it, you can build on

it!I Get it from Github – details on

the last slide.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 15 / 17

Page 27: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

PerformanceN Nh Nv w S freq power area8 4 2 8 2 1300 1.2mw 1745µ2

16 4 4 8 3 1400 1.3mw 2618µ2

16 8 2 8 3 1500 1.9mw 3268µ2

32 8 4 8 4 1200 1.8mw 4875µ2

8 4 2 32 2 1400 2.2mw 2979µ2

16 4 4 32 2 1200 2.0mw 6725µ2

32 8 4 32 4 1100 2.9mw 12712µ2

A few examples from 144 test cases.

Frequency dominated by “tree structures”. Clock period is roughlylogarithmic in Nh, Nv , and w .Power is dominated by the control circuitry and is roughly linear inthe number of control flip-flops.Area is dominated by data storage and is roughly linear in thenumber of data latches.Results using a 65nm commercial library.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 16 / 17

Page 28: Interleaved Architectures for High-Throughput Synthesizable … · 2017. 5. 22. · Interleaved Architectures for High-Throughput Synthesizable Synchronization FIFOs Ameer Abdelhadi

ConclusionsWe have presented a fully synthesizable, general purpose,clock-domain crossing FIFO.I open source: https://ghithub.com/AmerAbdelhadi/

Interleaved-Synthesizable-Synchronization-FIFOsI Supports complete Synopsis ASIC design flow.I Highly configurable.I Thoroughly evaluated for a wide range of configuration

parameters.Novel interleaved FIFO architectureI avoids the Gray-code conversion bottlenecks of Gray-code

FIFOs.I smaller control logic and far fewer synchronizers than one-hot

FIFOs.Identified a glitch hazard that is common in published designs.I and our design avoids it.

Thanks: Tarik Ono, Brad Quinton, NSERC Canada.

Abdelhadi & Greenstreet Interleaved Synthesizable FIFOs ASYNC – May 22, 2017 17 / 17