View
221
Download
0
Category
Preview:
Citation preview
The Asynchronous NOC
Ran Ginosar
NOCS Tutorial
San Diego, 10 May 2009
Ran Ginosar Async Noc Tutorial NOCS 2009 2
SoC
• Each color is a separate clock domain
R
R
R R R
R
RR R R R
R R
R
R
R
R
Ran Ginosar Async Noc Tutorial NOCS 2009 3
SoC
• What clock for the interconnect?– Fastest?– Opportunistic?– None?
R
R
R R R
R
RR R R R
R R
R
R
R
R
Ran Ginosar Async Noc Tutorial NOCS 2009 4
Conceptual Summary
• NOCs are for large SOCs
• Large SOCs = multiple clock domains
→ NOCs should be asynchronous
• Two complementary research areas: – Asynchronous routers
• simplify design, low power
– Asynchronous interconnect• high bandwidth, low power
• Problem: need special CAD, special methodology– Solutions:
• deliver and use as “configurable hard IP core”
• use only at physical design phase
• deliver as predesigned infrastructure (FPGA, SOPC)
Ran Ginosar Async Noc Tutorial NOCS 2009 5
Contextual Summary
• Async routers– QNoC (Technion, Israel)
– Other research (borrowed slides acknowledged and referenced)• Faust (LETI, France)
• Alpin (LETI, France)
• Mango (DTU, Denmark)
• Async Interconnect– FOX (Technion, Israel)
The QNoC Async Router
Ran Ginosar Async Noc Tutorial NOCS 2009 7
NoC router 101
SL1INPUT PORT SL1OUTPUT PORT
SL1INPUT PORT
SL1INPUT PORT SWITCH
SL1OUTPUT PORT
SL1OUTPUT PORT
Ran Ginosar Async Noc Tutorial NOCS 2009 8
Single-Service-Level Router
InputPort
InputPort
OutputPort
OutputPort
Ran Ginosar Async Noc Tutorial NOCS 2009 9
Single-Service-LevelInput-Port
Ran Ginosar Async Noc Tutorial NOCS 2009 10
Single-Service-LevelOutput-Port
Adding multiple service levels
Ran Ginosar Async Noc Tutorial NOCS 2009 12
Multi-Service-Level Input-Port
Reuse of the
Single-Service-
Level Input-Port
Reuse of the
Single-Service-
Level Input-Port
Ran Ginosar Async Noc Tutorial NOCS 2009 13
Multi-Service-Level Output-Port
SSL-OP
(SL0) Ro4-Way
SPA
G_SL3
Ro
Ao
Do
BT
A_SL0
RH_SL0
RBT_SL0
D_SL0 Di
A
H
SSL-OP
(SL2)
Ro
Ao
Do
SSL-OP
(SL3)
Ro
Ao
Do
SSL-OP
(SL1)
Ro
Ao
Do
Ao
Dout
Ao_0
Ao_1
Ao_2
Ao_3
Ro_0
Ro_1
Ro_2
Ro_3
Do_0
Do_1
Do_2
Do_3
Gate
G_SL1G_SL2G_SL3G_SL4
4
4
4
4*Flit
BT
A_SL1
RH_SL1
RBT_SL1
D_SL1 Di
A
H4
4
4
4*Flit
BT
A_SL2
RH_SL2
RBT_SL2
D_SL2 Di
A
H4
4
4
4*Flit
BT
A_SL3
RH_SL3
RBT_SL3
D_SL3 Di
A
H4
4
4
4*Flit
SL
Index
S* R
Reuse of the
Single-Service-Level
Output-Port
Reuse of the
Single-Service-Level
Output-Port
Inter-Service-
Level Arbitration
Inter-Service-
Level Arbitration
Ran Ginosar Async Noc Tutorial NOCS 2009 14
Buffering and Credits
MonitoringActivity
Ran Ginosar Async Noc Tutorial NOCS 2009 15
PerformanceASYNC routerSYNC router
MHz-270.2 **Max Clock Frequency
Mflits/s75.267.6Max Data Rate
ns (CLK)13.314.8 (4)Data Cycle
ns (CLK)13.0/9.2 *3.7 (1)Min Latency (Input to Output)
620880Number of FFs+Latches
34,20070,000 Number of transistors
Gates8,50017,500 Equivalent Gates (2-in NAND)
µm2470,000960,000Cell Area
* Latency for async router specified for header / body flits.** Synchronous router has a critical path of ~20 FO4 gate delays, matching or outperforming other published results.
Ran Ginosar Async Noc Tutorial NOCS 2009 16
The critical path spans multiple routers
IP
OP
IP
OP
Multi-Service-Level Router Critical Path
Single-Service-Level Router Critical Path
Router Router
Adding virtual channels
Ran Ginosar Async Noc Tutorial NOCS 2009 18
Asynchronous Router 2D Structure
VC classification
SL classificationSending Request
to Output Port
Virtual Channel
Admission Control
Intra-Service
Level Arbitration
Inter-Service
Level Arbitration
Ran Ginosar Async Noc Tutorial NOCS 2009 19
Conclusions• Async routers
– less area than sync routers
– no need for global or local clocks
– no need for synchronization
• Highly configurable– Ports
– Service Levels
– Virtual Channels (inside each service level)
• Dynamic virtual channel allocation
• Fast and fair asynchronous arbitration
• Simulated: 200MFlits/s @ 0.18µm – 5 ports, 4 x SL, 2 x VC
FAUST
CEA-LETI
Grenoble, France
Ran Ginosar Async Noc Tutorial NOCS 2009 21
FAUST• Actual chip demonstrating async NOC
– CEA LETI (France)
• Circa 2005, 0.13µ– ST microelectronics (France)
Ran Ginosar Async Noc Tutorial NOCS 2009 22
The FAUST environmentT
est L
inks
ConverterModule
Host Interface
Data
Co
mp
uti
ng
an
dS
tora
ge
asp
ects
Ethernet
IF unit CPU
unit
RAM
unit
HW
unit
Ext RAM
Ctrl unit
FAUST
FAUST FPGA
FPGA
Ext RAM
Ctrl unit
Eth
TRX
Ethernet
IF unit
Eth
TRX
RF1
IF unit
RF1 module
DACDAC
ADCADC
Analog
TX
RX
RF2
IF unit
RF2 module
DACDAC
ADCADC
Analog
TX
RX
Flash
FPGA
JTAG link
Flash
LVDS Test1 links
Test2 links
Ext RAM
Ctrl unit
Ext RAM
Ctrl unit
RAM
RAM
RAM
RAM
LVDS
MTETH FPGA
Standard component
Integrated unit (FAUST – FPGA)
ETH FAUST
Ethernet
IF unit CPU
unit
RAM
unit
HW
unit
Ext RAM
Ctrl unit
FAUST
FAUST FPGA
FPGA
Ext RAM
Ctrl unit
Eth
TRX
Ethernet
IF unit
Eth
TRX
RF1
IF unit
RF1 module
DACDAC
DACDAC
ADCADC
ADCADC
Analog
TX
RX
RF2
IF unit
RF2 module
DACDAC
DACDAC
ADCADC
ADCADC
Analog
TX
RX
Flash
FPGA
JTAG link
Flash
LVDS Test1 links
Test2 links
Ext RAM
Ctrl unit
Ext RAM
Ctrl unit
RAM
RAM
RAM
RAM
LVDS
MTETH FPGA
Standard component
Integrated unit (FAUST – FPGA)
ETH FAUST
Mixed FAUST / FPGA platform
Board: Sept 2005
Ran Ginosar Async Noc Tutorial NOCS 2009 23
The FAUST architecture
RAM IF
58 Pads
NOC2 IF
83 Pads
OFDMMOD.
ALAM.MOD.
CDMAMOD. MAPP.
BITINTER.
TURBOCODER
RAM CPU RAMEXT.RAMCTRL
AHB
ROTOR EQUAL.CHAN.EST.
CONV.DEC.
ETHERNET
FRAMESYNC.
ODFMDEM.
CDMADEM.
DE-MAPP.
DE-INTER.
DART
EXP
SPort
APort
NOC1 IF
84 Pads
SPort
APort
RAC
NoCPerf.
EXP
CONV.CODER
Clk & Test CTRL
IPs from ext. partners
Async�Sync Interf.
ANOC async. nodes
Latency and throughput
NoC performance analysis
Clk control for power
management (24 clocks)
Direct NoC accessAHB to NoC bridge
Reconf.
Data paths
DMA complex blocks
23 NOC units
(from 100 kgates up to 1Mgates)
Ran Ginosar Async Noc Tutorial NOCS 2009 24
FAUST chip description & Floor-plan
AHB
DART
ARM946
TX units
RX units
ETH unit
RAMs
units
NOC
nodesand
links
• Tech.: STM/HCMOS9GPLL (0.13µ)
• 23 NoC units (166 MHz)
• RAM blocks : 81
• CPU: ARM946
• 4.5 Mgates
• 275 I/Os
• Core area = 70 mm2
• Chip area = 80 mm2
• Package : TBGA 420
• Core power supply: 1.2 V
• I/O power supply: 3.3 V
• Power Consumption : 3 Watts max
Ran Ginosar Async Noc Tutorial NOCS 2009 25
FAUST Node Architecture
• QDI 4 phases + multi-rail protocol
– 90 wires per NOC link
Asynchronous Node 2
Asynchronous Node 1
IP OP
Synchronous
Resource
Data + Send
Accept1
Accept1
Data + Send OP
IP
OP
IP
IP
OP
OP IP
Data +
Sen
d
Acce
pt1
Data
+ S
end
Acc
ept1
Accept1
Data + Send
NORTH
SOUTH
WEST
EAST
OP
Synchronous
Resource
async/sync
Interface
Accept1
Data + Send
IP
IP
OP
OP
IP
IP
OP
OP IP
Data +
Sen
d
Acce
pt1
Data
+ S
end
Acc
ept1
Accept1
Data + Send
NORTH
SOUTH
WEST EAST
async/sync
Interface
Ran Ginosar Async Noc Tutorial NOCS 2009 26
FAUST Node Input Controller
• 2 Virtual Channels
Get_new_flit
Get_priority_bit
Loop_R0
IP_data[33:0]
R0_to_OP0
R0_to_OPn-1
R1_to_OP0
R 1_to_OPn-1
CTRL_shift
R0
R1
Split_R0
Split_Rk-1
IP_send
CUR_R0
CUR_R1
Loop_R1
NXT_R1
NXT_R0
Valid_R0_to_OP0
Valid_R1_to_OPn-1
BopR1_toOP0
BopR1_toOPn-1
Send_accept1 IP_accept1
×n
×n
×n
×n
×n
Accept1_fromOP0
Accept1_fromOPn-1
Ran Ginosar Async Noc Tutorial NOCS 2009 27
FAUST Node Output Controller• Arbitration for VC0
static (N/E/S/W)
• Arbitration for VC1
uses a priority list
(first-arrived first-served)
• 34-bit data Switch
Arbiter
Next_state
CUR_STATE
NXT_STATE
CUR_SUSPENDED
NXT_SUSPENDED
Switch
R0_fromIP0
R0_fromIPn-1
R1_fromIP0
R1_fromIPn-1
OP_accept1 valid_R0_fromIP0
valid_R0_fromIPn-1
valid_R1_fromIP0
valid_R1_fromIPn-1
bopR1_fromIP0
bopR1_fromIPn-1
R0_eop_fromIP0
R0_eop_fromIPn-1
R1_eop_fromIP0
R1_eop_fromIPn-1
Accept1_toIP0
Accept1_toIPn-1
FAFS
OP_data[33:0]
List1
CUR_FROM
CUR_PRIO
NXT_FROM
NXT_PRIO
TO_SUSPEND
OP_send
CTRL_SWITCH
×n
×n
×n
×n
×k
×n
×n
×n
×n
var_state
var_suspended
Ran Ginosar Async Noc Tutorial NOCS 2009 28
FAUST GALS interface
• The GALS interface: A synchronizer
• Uses dual-clock fifo (2 stages)– Allows sync & async transfers every clock cycles
Asynchronous
to
Synchronous
Interface
Edata[67:0]
Edata_ack[16 :0]
Esend[1:0]
Esend_ack
Eaccept1
Eaccept1_ack
n_data[33:0]
n_send[1:0]
u_accept[1:0]
Synchronous
to
Aynchronous
Interface
Sdata[67:0]
Sdata_ack[16 :0]
Ssend[1:0]
Ssend_ack
Saccept1
Saccept1_ack
u_data[33:0]
u_send[1:0]
n_accept[1:0]
Synchronous
UnitANOC
Unit
connec
tion
North
West
South Clock
Ran Ginosar Async Noc Tutorial NOCS 2009 29
FAUST GALS interface• Design is based on Gray-
code Dual-clock fifo’s– Main objective is to provide a full std-cell design approach
– Modification required for the async side
• NOC interface– A-to-S + S-to-A fifo’s
– One fifo per Virtual Channel
Fifo Gray
A - S
write_clock
write_data[33:0]
write_enable
full
read_clock
read_data[33:0]
read_enable
empty
A-to-S VC0
A-to-S VC1
S-to-A VC0
S-to-A VC1
Asynchronous side :
Data /Send /Accept
NOC link QDI 4-phase
Synchronous side :
Data /Send /Accept
NOC link synchronous version
A-to-S interface
S-to-A interface
Ran Ginosar Async Noc Tutorial NOCS 2009 30
FAUST external interfaces
• NOC access in dual-mode : sync / async– Good for debug / explore asynchronous external connections
• For async mode, convert 4-rail/4-phase to bundled-data/2-phase
A-to-S
QDI-to-BD
S-to-A
BD-to-QDI
to/from ANOC
Data / Send /Accept
NOC link QDI 4-phase
to/from Chip Pads
Data / Send /Accept NOC link :
- synchronous version + clk
- bundled-data async version
sync / async
interface mode
A-to-S
QDI-to-BD
S-to-A
BD-to-QDI
to/from ANOC
Data / Send /Accept
NOC link QDI 4-phase
to/from Chip Pads
Data / Send /Accept NOC link :
- synchronous version + clk
- bundled-data async version
sync / async
interface modeRAM IF
58 Pads
ETHERNET IF
17 Pads
On chip NOC GALS IF
ANOC asynchronous node
OFDMMOD.
ALAM.MOD.
CDMAMOD.
MAPP.BIT
INTER.TURBOCODER
RAM CPU RAMEXT.RAMCTRL
AHB
ROTOR EQUAL.CHAN.EST.
CONV.DEC.
ETHERNET
FRAMESYNC.
ODFMDEM.
CDMADEM.
DE-MAPP.
DE-INTER.
SPort
APort
NOC1
PORT
SPort
APort
NoCPerf.
CONV.CODER
Clk & Test CTRL
NOC2
PORT
Off chip NOC IF
Ran Ginosar Async Noc Tutorial NOCS 2009 31
FAUST Performance Results• ANOC performances (worse case, 5/5 nodes)
– Per async Node : 150 Mflit/s, 5 ns Latency– Per GALS interf. : 120 Mflit/s, 10 ns Latency
• Area results (HCMOS9)– Network Interface (wo config reg.) : 10 kgates– ANoC node (requires specific async cells)
• NoC node : 20 kgates• GALS interface : 15 kgates
– 45 kgates totally per NoC block unit to provide :• communication, Quality-of-Service, configurability,• robustness & multi-clock domains
– OK for units with average complexity of about 300 kgates (~15%)
• NoC communication overhead– NI credit mechanism + packet headers : ~10% total NoC throughput– Virtual channel (low latency packets) : 50 % area of NoC node + GALS IF
NODE
GALS
NI
Unit
North
West
South
East
Unit
Clock
ALPIN
CEA-LETI
Grenoble, France
Ran Ginosar Async Noc Tutorial NOCS 2009 33
ALPIN
• Claim 1: Async NOC (=GALS SOC) easily enables dynamic voltage and frequency scaling (DVFS)– Lower voltage to some modules when slow
– Power off to some modules to save leakage
– Sync modules use “pausable clock”• When voltage and frequency change, local module clock is paused momentarily
• Claim 2: Routers used lightly– Shut off when idle
– Easy when async
• Based on FAUST
• Another actual chip ☺– ST Micro 65nm
Ran Ginosar Async Noc Tutorial NOCS 2009 34
ALPIN
SAS=Sync/Async/Sync synchronizer LCG=Local Clock Gating
Ran Ginosar Async Noc Tutorial NOCS 2009 35
ALPIN Unit: Sync core, Async DVFS Wrapper
• Power Modes:– High
– Low
– Changing
– Retention
– Off
Ran Ginosar Async Noc Tutorial NOCS 2009 36
ALPIN Claim 2: Routers used lightly
Total idle � 90%. If shut off, save 90% of leakage in routers
Ran Ginosar Async Noc Tutorial NOCS 2009 37
ALPIN auto-off router architecture
Ran Ginosar Async Noc Tutorial NOCS 2009 38
ALPIN auto-off router layout
• Area:– Typical unit 0.2 mm2
– Core: 85%
– Level shifters: 13%
– Activity detection: 2%
– Power switch: 0%
MANGO
Technical University Denmark
Ran Ginosar Async Noc Tutorial NOCS 2009 40
MANGO
• Guaranteed service– In addition to ”best effort” service
– Not the same as 2 service levels
– Not the same as 2 virtual channels with different priorities
• Connection-oriented service guarantees– Allocate resources (reserve VCs) source-to-destination
• Service levels are applied hop-by-hop
– Similar to circuit switching • Vs. packet switching
• Async / GALS– Cf. Sync GS/BE in Philips/NXP Aethereal
Ran Ginosar Async Noc Tutorial NOCS 2009 41
MANGO Router
Ran Ginosar Async Noc Tutorial NOCS 2009 42
MANGO GS router
Non-blocking
previous
router
Ran Ginosar Async Noc Tutorial NOCS 2009 43
MANGO simulation
• 5x5 MANGO router: 8 VCs - 32 bits– Connection-oriented GS (fair-share)
– Connection-less BE
• Clockless circuits 130nm std cells
• Results:– 795 MHz (typical) / 515 MHz (worst case)
– Area: 0.188 mm2 (pre-layout)
Ran Ginosar Async Noc Tutorial NOCS 2009 44
Async Router Conclusions
• Eliminate clocks in the NoC– Useful for heterogeneous Multi-Clock-Domain SoCs
– Useful for multi-voltage domain SoCs
– Facilitates modularity
– Helps timing closure of large SoCs
– Facilitates DVFS
– Can prioritize or guarantee service
• Handshake may slow traffic– Need careful design
• More room for improvement
FOX:Fast async serial On-chip Interconnect
Ran Ginosar Async Noc Tutorial NOCS 2009 46
Outline
• Why Serial Link
• Architecture– Transmitter
– Receiver
– Channel
Ran Ginosar Async Noc Tutorial NOCS 2009 47
Why serial?
• Less interconnect area
• Less routing congestion
• Less coupling
• Lower power
• Relative improvement scales
• Must be FAST!
• For example: – FOX serial link
• 1 gate delay bit cycle
– Fully-shielded parallel link• 8 gate delay clock cycle
– Equal bit-rate
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
180 130 90 65 30 15
0.0
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
180 130 90 65 30 15
Parallel Link dissipates less power
Serial Link dissipates less power
Technology Node [nm]
Link Length [mm]
Parallel Link requires less area
Serial Link requires less area
Ran Ginosar Async Noc Tutorial NOCS 2009 48
Fast serial highways vs. slow parallel local roads
Ran Ginosar Async Noc Tutorial NOCS 2009 49
FOX Serial Link
• Transition signaling (instead of sampling): – 2-phase Level Encoded Dual Rail (LEDR) async protocol (aka
data-strobe, DS)
• Differential encoding (DS-DE, IEEE1355-95)
• Wave-pipelining over channel
• Acknowledge per word (instead of per bit)
• Low-latency synchronizers
Ran Ginosar Async Noc Tutorial NOCS 2009 50
Encoding: 2-phase LEDR
Strobe (S)
Data (D)
0 0 1 1 0 0 0 0 1 0
Ran Ginosar Async Noc Tutorial NOCS 2009 51
FOX Transmitter
• Data cycle: One gate delay between bits– 15 ps @ 65nm (30ps @ low-power 65nm)
Ran Ginosar Async Noc Tutorial NOCS 2009 52
Transmitter: Transition Generator
Adapted from M.J.E. Lee, "An Efficient I/O and Clock Recovery for TERABIT Integrated Circuits Design,“ PhD Thesis, Stanford Univ., 2001.
Ran Ginosar Async Noc Tutorial NOCS 2009 53
Transmitter: Fast Async Shift Register
Ran Ginosar Async Noc Tutorial NOCS 2009 54
Transmitter: Encoder + Combiner
Ran Ginosar Async Noc Tutorial NOCS 2009 55
Example: SR-Element layout
Ran Ginosar Async Noc Tutorial NOCS 2009 56
FOX Receiver
Ran Ginosar Async Noc Tutorial NOCS 2009 57
Receiver: Decoder + Splitter
Ran Ginosar Async Noc Tutorial NOCS 2009 58
Toggle Circuit
• New circuit
• Single gate delay operation
Ran Ginosar Async Noc Tutorial NOCS 2009 59
Channel
• Channel driver and receiver
– Differential voltage mode
– Differential current mode
– Single-ended (just repeaters)
• Channel layout
• Interconnect modeling
Ran Ginosar Async Noc Tutorial NOCS 2009 60
Differential Voltage Mode
• Current mode differential low-swing transmit
• Differential voltage receive
• Voltage swing � high power, low speed
P / S P / S
Ran Ginosar Async Noc Tutorial NOCS 2009 61
Differential Current Mode: Goals
• Full Current Mode:– Send current from TX
– Measure current at RX
• Minimal voltage swing over channel– Low power, high speed
• Current to voltage conversion at RX
• Long range channel without repeaters
Ran Ginosar Async Noc Tutorial NOCS 2009 62
Differential Current Mode:Transmitter and Receiver
OutputStage
FS
Ran Ginosar Async Noc Tutorial NOCS 2009 63
Channel wires
• Four wires
• Thick metal
• Single layer: non-disruptive to routing
D D S S
Ran Ginosar Async Noc Tutorial NOCS 2009 64
Channel wire model
2 1 2 2 1
2 2
1 1 2 2 1 2 2 1
2 22 21 21 1 2 2
2 2
1 2
1 ( )
( )
( 1) ( 1)DC DC
L L L L
s L R s L R L L R RZ s sL R
L Ls L R s L R
R R
ω
ω ω
∆ ∆ ∆ ∆+ ⋅ +
∆ ∆ ∆ ∆ ∆ + ∆ ∆ ∆= + + + =
∆ ∆∆ + ∆ ∆ + ∆⋅ + ⋅ ⋅ +∆ ∆
Ran Ginosar Async Noc Tutorial NOCS 2009 65
FOX Status
• Four years study and design
• Simulations show:– 65 Gbps over 7mm
– All corners and ±5σ in-die variations• Thanks to async operation
• Presently going for fab on IBM 65nm (MOSIS)
Ran Ginosar Async Noc Tutorial NOCS 2009 66
FOX Summary
• Fastest possible digital on-chip serial interconnect– Data rate of a single gate delay
• Asynchronous does it
• To be used as “hard IP core”
Ran Ginosar Async Noc Tutorial NOCS 2009 67
Summary
• NOCs are for large SOCs
• Large SOCs = multiple clock domains
→ NOCs should be asynchronous
• We reviewed two complementary areas: – Async routers
– High speed async serial interconnect
Ran Ginosar Async Noc Tutorial NOCS 2009 68
References• QNoC Async Router
– R. Dobkin, V. Vishnyakov, E. Friedman, R. Ginosar, An asynchronous router for multiple service levels networks on chip, ASYNC 2005.
– R. Dobkin, R. Ginosar and I. Cidon, QNoC Asynchronous Router with Dynamic Virtual Channel Allocation, NOCS 2007.
– R. Dobkin, R. Ginosar and A. Kolodny, QNoC Asynchronous Router, Integration—The VLSI Journal, 42(2):103-115, 2009.
• FAUST– E. Beigné, F. Clermidy, P. Vivet, A. Clouard, M. Renaudin, An Asynchronous NOC Architecture Providing Low
Latency Service and its Multi-level Design Framework, ASYNC 2005.
• ALPIN– E. Beigné, F. Clermidy, S. Miermont, P. Vivet, Dynamic Voltage and Frequency Scaling Architecture for Units
Integration within a GALS NoC, ASYNC 2008.– Y. Thonnart, E. Beigné, A. Valentian, P. Vivet, Automatic Power Regulation based on an Asynchronous
Activity Detection and its Application to ANOC Node Leakage Reduction, NOCS 2008.
• MANGO– T. Bjerregaard, J. Sparso, A scheduling discipline for latency and bandwidth guarantees in asynchronous
network-on-chip, ASYNC 2005.– T. BJERREGAARD, J. SPARSØ, A router architecture for connection-oriented service guarantees in the
MANGO clockless network-on-chip, DATE 2005.
• FOX– R. Dobkin, R. Ginosar and A. Kolodny, Fast Asynchronous Shift Register for Bit-Serial Communication,
ASYNC 2006. – R. Dobkin, Y. Perelman, T. Liran, R. Ginosar, and A. Kolodny, High rate wave-pipelined asynchronous on-
chip bit-serial data link, ASYN 2007. – R. Dobkin, A. Morgenshtein, A. Kolodny, R. Ginosar, Parallel vs. Serial On-Chip Communication, SLIP 2008. – R. Dobkin, M. Moyal, A. Kolodny and R. Ginosar, Asynchronous Current Mode Serial Communication, IEEE
Trans. On VLSI, 2009.
• Others– T. Felicijan, S.B. Furber, An asynchronous on-chip network router with Quality-of-Service (QoS) support,
Int. SOC Conf. (2004) 274–277.
Ran Ginosar Async Noc Tutorial NOCS 2009 69
More Literature– S. Moore, G. Taylor, R. Mullins, P. Robinson, Point to point GALS interconnect, ASYNC (2002) 69–75.
– S. Oetiker, F.K. Gu¨ rkaynak, T. Villiger, H. Kaeslin, N. Felber, W. Fichtner, Design flow for a 3-million transistor GALS test chip, ACiD Workshop (2003).
– T. Villiger, H. Kaeslin, F.K. Gurkaynak, S. Oetiker, Wolfgang Fichtner, Self-timed ring for globally-asynchronous locally-synchronous systems, ASYNC (2003) 141–150.
– J. Muttersbach, T. Villiger, W. Fichtner, Practical design of globally asynchronous locally-synchronous systems, ASYNC (2000) 52–61.
– K.Y. Yun, R.P. Donohue, Pausible clocking-based heterogeneous systems, TVLSI 7 (4) (1999) 482–488.
– A.E. Sjogren, C.J. Myers, Interfacing synchronous and asynchronous modules within a high-speed pipeline, TVLSI 8 (5) (2000) 573–583.
– R. Dobkin, R. Ginosar, C.P. Sotiriou, High rate data synchronization in GALS SoCs, TVLSI 14 (10) (2006) 1063–1074.
– Y. Semiat, R. Ginosar, Timing measurements of synchronization circuits, ASYNC (2003) 68–77.
– R. Kol, R. Ginosar, Adaptive synchronization, ICCD (1998) 188–189.
– D.J. Kinniment, Synchronization and Arbitration in Digital Systems, Wiley, New York, 2008.
– R. Ginosar, Fourteen ways to fool your synchronizer, ASYNC (2003) 89–96.
– D.J. Kinniment, A. Yakovlev, Low latency synchronization through speculation, PATMOS (2004) 278–288.
– A. Chakraborty, M.R. Greenstreet, Efficient self-timed interfaces for crossing clock domains, ASYNC (2003) 78–88.
– A. Chakraborty, M.R. Greenstreet, A minimal source–synchronous interface, ASIC/SOC (2002) 443–447.
– S. Chakraborty, J. Mekie, D.K. Sharma, Reasoning about synchronization techniques in GALS systems: a unified approach, FMGALS (2003).
– J. Mekie, S. Chakraborty, D.K. Sharma, Evaluation of pausible clocking for interfacing high speed IP cores in GALS framework,VLSI Des. (2004) 559–564.
– R. Mullins, S. Moore, Demystifying data-driven and pausible clocking schemes, ASYNC (2007) 175–185.
– L. Carloni, A. Sangiovanni-Vincentelli, Coping with latency in SoC design, IEEE Micro (special issue on SoC) 22 (5) (2002) 24–35.
– R. Dobkin and R. Ginosar, Two Phase Synchronization with Sub-cycle Latency, Integration—The VLSI Journal, 2008.
– K. Goossens, J. Dielissen, and A. Radulescu, AEthereal network on chip: Concepts, Architectures, and Implementations, IEEE Design and Test of Computers, Vol 22(5):414--421, Sept-Oct 2005.
– T. Bjerregaard, S. Mahadevan, A Survey of Research and Practices of Network-on-Chip, ACM Computing Surveys, Vol. 38, March 2006.
Recommended