118
Design and Analysis of Networks-on-Chip in Heterogeneous Multicore Systems Young Jin Yoon <[email protected]>

Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Design and Analysis of Networks-on-Chip

in Heterogeneous Multicore Systems

Young Jin Yoon

<[email protected]>

Page 2: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Contents

• Motivation and Applications

• System Drivers

• On-Chip Communication and Networks-on-Chip

• Modeling and Tools

Page 3: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Motivation:

Moore’s Law and Performance of CPU

• Moore’s law

– Draw Figure from ITRS 2009

1. Double the transistor in every 18 month!

2. Do we double the Performance?

1. Limited by ILP diminishing return

2. Power problem with Out-of-Order(OoO)!

3. ILP TLP Multi-Core Architecture

• Increasing the number of cores!

ITRS 2009

25 % / year

52 % / year

?? % / year

Bit-Level Parallelism

Instruction-Level Parallelism

TLP

Multicore

Computer Architecture: A Quantitative Approach

Page 4: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Motivation:

System-on-Chip with Mobile Phones

• Performance vs. flexibility: 3.5G Mobile Phones

• 100 Giga-Operation-Per-Second (GOPS) within 1W– 1 core running at 100GHz?

– 1000 cores running at 100MHz?

1.[2]. Multi-Core for Mobile Phones

Page 5: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Motivation:

System-on-Chip with Consumer Devices

1.[3]. Heterogeneous Multi-Core Platform for Consumer Multimedia Applications

Analog

Audio

Decoder

Digital

Audio

Decoder

Audio

Post-

Processing

Analog

Video

Decoder

Digital

RAW Video

Decoder

Digital

Compressed

Video Decoder

Picture

Quality

Enhancement

Content

Browsing

and Control

Host CPU

VLIW Processor

Cores

Embedded

Control CPU

Fixed-point

DSP

Function-Specific

HW cores

DCD with

New Format

DCD with

Established

Format

Page 6: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

DSP VLIW Cores DSP

HW cores HW cores HW cores

Embedded

Control CPU

VLIW Cores

HW VLIW Host CPU

Motivation:

System-on-Chip with Consumer Devices

• Legacy

• Re-usability

• Performance

• Flexibility

• Support of industry standards

1.[3]. Heterogeneous Multi-Core Platform for Consumer Multimedia Applications

Analog

Audio

Decoder

Digital

Audio

Decoder

Audio

Post-

Processing

Analog

Video

Decoder

Digital

RAW Video

Decoder

DCD with

Established

Format

Picture

Quality

Enhancement

Content

Browsing

and Control

Host CPU

VLIW Processor

Cores

Embedded

Control CPU

Fixed-point

DSP

Function-Specific

HW cores

DCD with

New Format

Page 7: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

DSP VLIW Cores DSP

HW cores HW cores HW cores

Embedded

Control CPU

VLIW Cores

HW VLIW Host CPU

Motivation:

System-on-Chip with Consumer Devices

1.[3]. Heterogeneous Multi-Core Platform for Consumer Multimedia Applications

Analog

Audio

Decoder

Digital

Audio

Decoder

Audio

Post-

Processing

Analog

Video

Decoder

Digital

RAW Video

Decoder

DCD with

Established

Format

Picture

Quality

Enhancement

Content

Browsing

and Control

Host CPU

VLIW Processor

Cores

Embedded

Control CPU

Fixed-point

DSP

Function-Specific

HW cores

DCD with

New Format

Page 8: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Motivation:

Networks-on-Chip (NoC)

• How do we connect all cores?

– Bus vs. Point-to-Point vs. Crossbar and Mesh

• Difference between NoC and other Networks

– Less non-determinism

– Local, High-performance networks

– Energy-constraints

– Design-time Specialization

0

1

2

73

4

5

60

1

7

23

4

56

0

1

2

3

0 1 2 3 0 1 2

3 4 5

6 7 8

1.[6]. Networks on Chips: A New SoC Paradigm

Page 9: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

NoC Design Validation and Synthesis

NoC Architecture Analysis and Optimization

Application Modeling

and Optimization

Motivation:

Design and Analysis of NoC

Ph

ys

ica

lA

rch

. &

Co

ntr

ol

So

ftw

are

Wiring

Data Link

Network

Transport

System

Application

1.[6]. Networks on Chips: A New SoC Paradigm

1.[7]. Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspective

Application

Design Goals

& Constraints

Co

de P

artitio

nin

g

Communication

Infrastructure

Communication

Paradigm

Application Communication

Analysis

Analysis

& Optimization

Mapping

& Scheduling

Sim

ula

tion

Pro

toty

pin

g

NoC Testing

NoC Verification

Component

Instantiation

Communication

Component Library

Physical Synthesis & Tapeout

Page 10: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Applications:

PARSEC vs. SPLASH-2

• PARSEC benchmarks

– Multithreaded

– Emerging Workload

– Diverse

– State-of-art Techniques

– Support Research

• Similarity research

– Principal Component Analysis(PCA) based on 3 groups

• Inst. Mix: 4 characteristics

• Working Sets: 8 characteristics

• Sharing: 32 characteristics

1.[4]. PARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors

A Communication Characterization of SPLASH-2 and PARSEC

Page 11: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Applications:

Mobile Architecture

• Benchmarks for Embedded computing

– EEMBC, MiBench…

• Mobile Architecture

– Restricted Power constraints

• Dynamic power management

– Users determine the power consumption

1.[5]. Into the Wild: Studying Real User Activity Patterns to Guide Power Optimizations for Mobile Architectures

Page 12: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Contents

• Motivation and Applications

• System Drivers

• On-Chip Communication and Networks-on-Chip

• Modeling and Tools

Page 13: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Operating System

• How to Manage Heterogeneous Multicores?

– Cores & Systems are diverse.

– The interconnect matters.

– Messages cost less than shared Memory.

2.[1]. The Multikernel: A New OS Architecture for Scalable Multicore Systems

Page 14: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Core Parallelism, Power, and Temperature

• Performance and Power

– Same total parallelism (4P-8W vs. 8P-4W)

– Same power but better throughput on 8P than 4P

– Energy-Delay Product (EDP) and Energy-Delay^2 Product (ED2P)

2.[2]. Design Space Exploration for Multicore Architectures: A Power/Performance/Thermal View

Page 15: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Core Parallelism, Power, and Temperature

2.[2]. Design Space Exploration for Multicore Architectures: A Power/Performance/Thermal View

1~2C lower than the others

Due to the large L2 cache

• Performance and Power

– Same total parallelism (4P-8W vs. 8P-4W)

– Same power but better throughput on 8P than 4P

– Energy-Delay Product (EDP) and Energy-Delay^2 Product (ED2P)

• Temperature Spatial Distribution

– Paired vs. Lined up vs. Centered

Page 16: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Memory Hierarchy:

On-Chip Memory

• Cache vs. Scratch-pad

– Both scales equally well up to 16 cores.

– Streaming applications

• Scratch-pad memory > Transparent Cache

– Cache will suffer in a large-scale CMPs.

• Scratch-pad may be able to address the problem.

3.[3]. Memory Systems: Cache, DRAM, Disk

3.[6]. Comparing Memory Systems for Chip Multiprocessors

Mgmt.

AddressingImplicit Explicit

Transparent Transparent cache Software-managed cache

Non-Transparent Self-managed scratch-pad Scratch-pad memory

Page 17: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Memory Hierarchy:

Cache Coherence Protocol

3.[3]. Memory Systems: Cache, DRAM, Disk

Token Coherence: Decoupling Performance and Correctness

Snoop-based Directory-based Token-based

Ordering Point NoC Directory Caches w/ retransmission

Indirect? N Y N

Broadcast? Y N Y

Performance? Fast Slow Moderate

Unordered NoC? N Y Y

Cache

0 1 n…

NoC

CacheDir

0 1 n…

NoC

Cache

0 1 n…

NoC

Page 18: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

2

NoCNoC

Intelligent NoCs for Cache Coherence:

INSO and INCF

• Snoop-based Coherence in unordered NoCs:

In-Network Snoop Ordering

2.[4]. In-Network Coherence Filtering: Snoopy Coherence without Broadcasts

1. Incorrect

In-Network Snoop Ordering (INSO)

Route messages as ordered

2. Broadcast messages.In-Network Coherence Filtering (INCF)

Filter Unnecessary Broadcasts

0 1{0,2,4} {1,3,5}0 1

0 1

2

--

Addr Dest

-A

AA A

AA

Page 19: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Memory Controller

• On-Chip Memory Controller

– Where to place them?

• Performance

– Row ≈ Column < Diagonal X ≈ Diamond

– The gap can be alleviated by choosing wise routing algorithms

• Class-based Deterministic Routing (CDR)

2.[5]. Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs

Row Column Diagonal X Diamond

Page 20: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Off-Chip Network & Memory

• Bandwidth wall

– Due to pin-limitations, power constraints and package costs

– Memory scales only 10% per year

• Bandwidth Conservation Techniques

2.[7]. Scaling the Bandwidth Wall: Challenges in and Avenues for CMP Scaling

Page 21: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

NoC

Network-on-Chip:

Terminology

• Topology

– Indirect vs. Direct

• Routing

– Deterministic vs. Adaptive

• Flow Control

– Arbitration

– Circuit-Switched

– Packet-Switched

• Worm-Hole and Virtual-Channel

– Hop-to-hop Flow-Control

3.[1]. Principles and Practices of Interconnection Networks

0 1 2 3

4 5 6 7

0

7

1

6

Page 22: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

… …

Network-on-Chip:

Router Microarchitecture

Routing

Logic /Table

Switch

Allocators

Crossbar

VC

Allocators

BW

RCVA SA LTST

• Topology

• Routing

• Flow Control

– Arbitration

– Worm-Hole

– Hop-to-Hop

– Virtual Channel

• Router Pipelines

3.[1]. Principles and Practices of Interconnection Networks

Page 23: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

• Spend 4 c.c. for 1 link traversal

Router Microarchitecture:

Reducing Pipelines

• Speculative Routing

BW

RCVA SA LTST

SA LTSTBW -

BW

RC

VA

SALTST

SA LTSTBW

LTST

SA LTST- -

VA SA

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Head Flit

Body

& Tail Flit

Speculative Router Pipeline

• Speculation + Lookahead Routing• Lookahead Routing

Lookahead Router Pipeline

BW

NRC

VA

SA LTST

SA LTSTBW

BWNRC

VASA

LTST

LTSTSA

BW

LTST

SA LTSTBW -

VA SA

Speculation + Lookahead Router Pipeline

3.[1]. Principles and Practices of Interconnection Networks

Page 24: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Performance and Cost Metrics

• Performance Metrics

• Cost Metrics

– Average or peak energy/power consumption

– Network area overhead and total area

– Average or peak temperature

3.[1]. Principles and Practices of Interconnection Networks

1.[7]. Outstanding Research Problems in NoC Design

Delivery Speed Channel Usage

Ideal Zero-load Latency Bi-section Bandwidth

Average Average Latency Average Throughput

Worst Maximum Latency Peak Throughput

Page 25: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Topology:

Flattened Butterfly

• Flattened Butterfly vs. Mesh

3.[2]. Flattened Butterfly Topology for On-Chip Network

8

0

1

2

3

4

5

6

7

0 1 2

3 4 5

6 7 8

3-ary 3-fly network (3-stage Bfly) Flatten Butterfly

0

1

2

3

4

5

6

7

8

0

1

2

3

4

5

6

7

8

FBfly layout

Mesh layout

T0 = Th + Ts + Tw

1 2

3 4 5

6 7 8

0

Page 26: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Microarchitecture:

Enhance Arbitration

• SPAROFLO

– Speculative Priority Assignment (SPA)

– Recreate Old (RO)

– Flow (FLO)

3.[3]. A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

0

1

2

3

0

1

2

3

Clock n

0

1

2

3

0

1

2

3

Clock (n+1)

2

3

0

2

V:1

Local

Arbiter

V:1

Local

Arbiter

V:1

Local

Arbiter

SPA

Priority

Encoder

Conflict

Detect

P:1

Global

Arbiter

V:1

Local

Arbiter

size(Q) != 0?

Sequential

Retry Queue

Conflict

on current c.c. Top Loser

Conflict on prev c.c.

0

1

Grants from Other

Global Arbiters

Final GrantPort

PriorityR

eq

uest V

ecto

r

Page 27: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

10

2 3

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 28: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 29: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

• BLESS

– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 30: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

• BLESS

– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems

– Injection problem

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 31: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

• BLESS

– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems

– Injection problem

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 32: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

• BLESS

– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems

– Injection problem

– Livelock

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 33: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

• BLESS

– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems

– Injection problem

– Livelock

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 34: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

10

2 3

Bufferless Network

• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?

– Deflective routing vs. Packet/Flit dropping

• BLESS

– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems

– Injection problem

– Livelock

– Throughput and Latency

3.[5]. A Case for Bufferless Routing in On-Chip Networks

Page 35: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Quality of Service (QoS)

• Quality of Service

– Local Fairness ≠ Global Fairness

– Some packets are more important than others.

• Round-Robin vs. Age-based vs. deadline-based

3.[6]. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

Page 36: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

QoS: Globally Synchronous Frame (GSF)

• Deadline-Based Arbitration is impractical

– Infinite-sized sorting queues

– Large overhead for sending and storing the deadline

• Source-managed QoS (e.g. GSF)

– Frame-based approach

• Sorting across frames not within a frame

… Earliest

deadline

Selector

……

……

Head

Deadline-based

with infinite searchable buffer

Frame-based

with per-frame buffers

and infinite frame window

Frame-based

with circular frame buffers

and finite frame window

3.[6]. Globally-Synchronized Frames for Guaranteed Quality-of-Service in On-Chip Networks

Page 37: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

QoS: Other approaches

• Router-Based QoS

– Preemptive Virtual Channel (PVC)

: Router-based dynamic bandwidth allocation

• Application-Aware QoS

– What performance do we really care?

• Network vs. application

– Stall-Time-Criticality (STC)

Preemptive Virtual Clock: A Flexible, Efficient and Cost-effective QoS scheme for Networks-on-Chip

Application-Aware Prioritization Mechanisms for On-Chip Networks

A B C

Compute A

A

Compute

C B

B Com...

A B C

Compute A

A

C

C B

B Com...Compute Stall Compute Stall C Stall C Stall C Stall C Stall

Page 38: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Polymorphic On-Chip Networks

• There is no network to fit all workloads.

3.[4]. Polymorphic On-Chip Networks

0

100

200

300

400

500

600

700

800

0 100 200

Ave

rag

e T

hro

ug

hp

ut

(bit

s/

cyc

le)

Average Packet Latency (cycles)

Meshes

Butterflies

Fat Trees

Flatten Butterflies

Rings

Random Permutation Traffic

Pareto Optimal

Page 39: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Polymorphic On-Chip Networks

• Let’s provide Network resources

– Users can statically configure NoC before running applications

R

A

3.[4]. Polymorphic On-Chip Networks

Page 40: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Polymorphic On-Chip Networks

• Let’s provide Network resources

– Users can statically configure NoC before running applications

0 1 2 30

1

2

3

… …

……

e.g. Unidirectional Ring

3.[4]. Polymorphic On-Chip Networks

Page 41: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Hop-to-hop: wires and interconnects

• Network-on-Chip: Floorplans

4.[3]. COSI: A Framework for the Design of Interconnection Networks

1.[1]. International Technology Roadmap for Semiconductors (ITRS): 2009 edition

Cu Interconnect (ITRS) 2011 2012 2013 2014 2015 2016

Gate Length (nm) 16 14 13 11 10 9

IntermediateRC Delay (ps) 1291 1455 1842 2406 2670 3341

Line length (um) 16 15 12 9 8 7

GlobalRC Delay (ps) 487 557 705 921 1004 1297

Line length (um) 26 23 19 15 13 11

FITs /m /cm^2 2 1.6 1.6 1.4 1.3 1.1

• Interconnect Requirement from ITRS

Page 42: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

• Time-to-market constraints

• Intellectual-Property design modules (IP Cores)

• Interconnect latency

– Hard to estimate in early design stage

– Conservative estimation: suboptimal design

• Latency Insensitive Design(LID)

Latency Insensitive Design

4.[2]. Coping with Latency in SOC Design

: Pearl (IP Core)

: Shell

: Relay Station

: Data w/ void

: Backpressure

Shell 4

Pearl 4

Shell 1

Pearl 1

Shell 2

Pearl 2

Shell 3

Pearl 3

Shell 5

Pearl 5

R

S

R

S

R

S

R

S

R

S

R

S

Page 43: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Back Pressure

Data

Hop-to-Hop Flow Control

• Channels between two routers

– Longer is the wire, slower are delivered the messages.

• Put some intelligence on the channel!

– Link pipelining with distributed buffers

3.[1]. Principles and Practices of Interconnection Networks

3.[7]. Distributed Flit-Buffer Flow Control for Networks-on-Chip

ON/OFF Credit Ack/Nack

- - -

2+5K 2+3K 1+3K

2+2K 2+2K 1+2K

Control

Logic

Control

Logic

Control

Logic

Control

Logic

Control

Logic

Control

Logic

Data Data Data

BPBPBP

Data Data Data

BPBPBP

Flip-Flops

Relay-Stations

Inverters or Latches

Page 44: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Globally Asynchronous,

Locally Synchronous (GALS) Circuit

• The problems of Clock Distribution

– Design Complexity, Noise, and Power

• Local clock w/ asynchronous communication

Property Pausible Clocking FIFO-based Boundary Synchronization

Area Overhead Low Med to High Low

Latency Low High Med

Throughput Depend on clock pause rate High Med

Power Consumption Low High Med

3.[9]. Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook

Local

Sync.

1

Pausible

Clock Gen

Ou

tpu

t Po

rt

Local

Sync.

2

Pausible

Clock Gen

Inp

ut P

ort

Local

Sync.

1

Async

FIFO

Local

Sync.

2

Local

Sync.

1

RE

G Local

Sync.

2

RE

G

CL

DL

Page 45: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Robust Interfaces for Mixed-Timing Systems

• Partition FIFOs into reusable components

– Reusable Put and Get Cell sub-modules required

3.[10]. Robust Interfaces for Mixed-Timing Systems

Cell Cell Cell Cell

Put Ctrl

Full Detector

Empty Detector

req_put

full

data_put

CLK_put

req_get

empty

data_get

CLK_get

valid_get

ack_putG

et

Ctr

l

ack_get

Sync-Sync FIFO

Async-Sync FIFO

Async-Async FIFO

Sync-Async FIFO

Page 46: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Robust Interfaces for Mixed-Timing Systems

Sync-Sync FIFO

• Partition FIFOs into reusable components

– Reusable Put and Get Cell sub-modules required

SR

S

R

valid_get

data_get

en_get

CLK_get

tok_out_get

empty_i

tok_in_get

REG

req_put

data_put

en_put

CLK_put

full_i

tok_out_put tok_in_put

3.[10]. Robust Interfaces for Mixed-Timing Systems

Page 47: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Robust Interfaces for Mixed-Timing Systems

Async-Async FIFO

• Partition FIFOs into reusable components

– Reusable Put and Get Cell sub-modules required

– Only Data Validity Controller sub-module needs to be modified

• Implement Relay Stations with Mixed Timing FIFO

REG

C+

C+

C+

+

wr

ra rr

wa

req_put

data_put

ack_put

data_get

req_get

ack_get

tok_out_get

tok_out_put

tok_in_get

tok_in_put

3.[10]. Robust Interfaces for Mixed-Timing Systems

Page 48: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

• Power Consumption of NoC

– up to 28% total power on NoC

– Router frequency: critical design parameter

• Network power vs. network latency

• Dynamic power management for routers

– Clock Scaling and Time Stealing

Dynamic Voltage-Frequency Scaling (DVFS)

A Case for Dynamic Frequency Tuning in On-Chip Networks

Page 49: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Asynchronous NoC

0

1

2

3

0 1 2 3

• Mesh-of-Trees(MoT) variants

3.[11]. A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

Page 50: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Asynchronous NoC

• Mesh-of-Trees(MoT) variants

– No Switch(i.e. crossbar) is required

– Can be implemented with

• Simple routers (for fan-out)

• Simple arbiters (for fan-in)

0

1

2

3

0

1

2

3

Row Forest Column ForestRow-Column

Shifter

Latch

Control 0

Toggle 0

LA

TC

H

Req0

AckReq Ack0

Latch

Control 1

Toggle 1

LA

TC

H

Req1

AckReq Ack1

B

B

Data1

Data0

Data_InMutex

Ack1

Ack0

L4

L3

L1

L2

0

1

L5

L6

L7

Req0

Req1

Req_Out

Ack_In

Data0

Data1

Data_Out

Mux_Select

LA

TC

H

Flow Control Unit

Datapath

Latch Controller

3.[11]. A Low-Overhead Asynchronous Interconnection Network for GALS Chip Multiprocessors

Page 51: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Reliable Hop-to-Hop transmission

• On-chip interconnect errors

• Using High Voltage

– Reduce error rate

– Limited in delay, area, and produce more energy

• Use low voltage with error correction code

– Type-II HARQ with low-swing channel

3.[8]. On Hamming Product Codes With Type-II Hybrid ARQ for On-Chip Interconnects

Adaptive Error Control For Nanometer Scale Network-on-Chip Links

x

y

z

011

100

HD(011,100) = 3 …

Sender Receivern x k

Page 52: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Photonic NoCs

• The benefits of Photonic communication

– Bandwidth

– Power Dissipation

• Hybrid Photonic vs. electronic NoCs

– Same execution time: 7.6W vs 244W

– Same power dissipation: 960Gbps vs 100Gbps

3.[12]. Photonic NoCs: System-Level Design Exploration

Page 53: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Contents

• Motivation and Applications

• System Drivers

• On-Chip Communication and Networks-on-Chip

• Modeling and Tools

Page 54: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

6. Dynamic, reconfigurable

network tools

Modeling and Tools

4.[1]. Research Challenges for On-Chip Interconnection Networks

5. End-user

feedback

2. Custom

IP blocks3. Validation

7. Application

Instrumentation

1. Synthesis

Many-core system

constraints

4. Models of

CMOS devices

and interconnects

Hardware

Page 55: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

COSI: NoC Design Automation

• Can we automate to design NoC?

• Communication Synthesis Infrastructure (COSI)

– Network specification

– Library of building blocks

– Quantified performance and cost models

– Optimization Algorithms

4.[3]. COSI: A Framework for the Design of Interconnection Networks

Models, Rules & Platforms

Orion, Ho’s Models

Algorithms

K-merging

Shortest path…

01

3

2

4

5

(10,100)0

1

3

2

4

5

Library

Topology Links Routers

Page 56: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

ORION: NoC Power and Area Model

4.[4]. ORION 2.0: A Fast and Accurate NoC Power and Area Model for Early-Stage Design Space Exploration

• Power: the most critical design constraint.

– Power of NoC will also be substantial

– How to estimate NoC power in the early-design stage?

Page 57: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

FAST: Architectural Simulation

• Good simulators

– speed, accuracy, completeness, transparency

– inexpensiveness, up-to-date, and easy-to-use, …

• The functional model of FAST

– Keep generating instruction stream

– Roll back when mis-speculations occur

4.[5]. FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Functional

Model

Timing

Model

Inst.

Next Inst.

Functional

Model

FPGA

Timing

ModelRoll Back

/ Commit

Trace

Buffer

(a) Event-Driven Arch. Simulator (b) FAST

Inst. Trace

BP

Page 58: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

NoC Design Validation and Synthesis

NoC Architecture Analysis and Optimization

Application Modeling

and Optimization

Conclusion

Ph

ys

ica

lA

rch

. &

Co

ntr

ol

So

ftw

are

Wiring

Data Link

Network

Transport

System

Application

1.[6]. Networks on Chips: A New SoC Paradigm

1.[7]. Outstanding Research Problems in NoC Design: System, Microarchitecture, and Circuit Perspective

Application

Design Goals

& Constraints

Co

de P

artitio

nin

g

Communication

Infrastructure

Communication

Paradigm

Application Communication

Analysis

Analysis

& Optimization

Mapping

& Scheduling

Sim

ula

tion

Pro

toty

pin

g

NoC Testing

NoC Verification

Component

Instantiation

Communication

Component Library

Physical Synthesis & Tapeout

Page 59: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Questions?

Page 60: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Backup slides

Page 61: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

An example of MUTEX Circuit

Page 62: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

ReCycle: Pipeline Adaptation

to Tolerate Process Variation

Page 63: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Simulation:Open-loop vs. Closed-loop simulation• Open-loop

– NI with infinite queue

• Isolate the effect of the network design from the injection

– e.g. synthetic traffic patterns

• Closed-loop

– More close to the actual system

– Ni with finite queue

– e.g. full-system simulations

Principles and Practices of Interconnection Networks

Page 64: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Simulation:Synthetic Traffic model• Synthetic Traffic model

– Based on Staticstical analysis of the traffic

– Traffic Patterns

• Random

• Bit permutations– Bit complement, Bit reverse, Bit rotation, Shuffle, Transpose

• Digit permutations– Tornado, Neighbor

– Constant injection rate over time

• Actual traffic : bursty!

Principles and Practices of Interconnection Networks

Page 65: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Simulation:Summary• Trade-off between accuracy and simulation

time

– Synthetic traffic model

• Fast simulation time, less accurate

– Event-driven simulation

• Slow simulation time, more accurate

– RTL-level simulation

• The slowest, but even more accurate

Page 66: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Applications• PARSEC vs. SPLASH-2

– Diversity– State-of-art Algorithms– Input dataset

• Comparison– Instruction Mix, Working Sets, and Sharing– Communication

A Communication Characterization of SPLASH-2 and PARSECPARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Page 67: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Applications:PARSEC vs. SPLASH-2• PARSEC vs. SPLASH-2

– Diversity

– State-of-art Algorithms

– Input dataset

• Similarity research

– Principal Component Analysis(PCA)

– 44 parameters.

• Including Inst. Mix, Working Sets, and Sharing

A Communication Characterization of SPLASH-2 and PARSECPARSEC vs. SPLASH-2: A Quantitative Comparison of Two Multithreaded Benchmark Suites on Chip-Multiprocessors

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Page 68: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Applications:

PARSEC vs. SPLASH-2 (cont’)

• Communication Comparison– Spatial Behavior: Less Distinct

– Temporal Behavior: More Bursty

– Producer–Consumer: Multi-to-Multi

PARSEC vs. SPLASH-2

Page 69: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Operating System• Real-Time Operating System

– How to deliver the real-time requirement

• Operating System coexistence

– Multiprocessor with Heterogeneous cores

– Some simpler cores may require to have RTOS

– Some Complex cores can have General OS

– How to manage those issues?

• Using a hypervisor?

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

Application

System

The Multikernel: A New OS Architecture for Scalable Multicore SystemsA Unified Operating System for Clouds and Manycore: fosProcess Scheduling Challenges in the Era of Multi-Core Processors

Page 70: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Reliable Hop-to-Hop transmission• On-chip interconnect errors• Using High Voltage

– Reduce error rate– Limited in delay, area, and produce more energy

• Use low voltage with error correction code– Increase error rate but correct errors when they happened – Type-II HARQ with low-swing channel

On Hamming Product Codes With Type-II Hybrid ARQ for On-Chip InterconnectsAdaptive error control for nanometer scale network-on-chip links

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

x

y

z

Page 71: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

End-to-End flow control• Message-dependent Deadlock

– Deadlock avoidance• Virtual Network

• Credit-Based(CB)

– Deadlock recovery• Regressive

• Deflective

• Progressive

• CTC: Connect-Then-Credit– 3-way handshake to exchange credits

• P_REQ, P_ACK, and data

Principles and Practices of Interconnection Networks CTC: An End-To-End Flow Control Protocol for SoC Architectures

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Page 72: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Network

Transport

System

Application

Data Link

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Transport

System

Application

Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

Application

System

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

Application

System

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

ApplicationP

hys

ica

lA

rch

. & C

ntl

Soft

war

e

Wiring

Data Link

Transport

Application

Network

System

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Page 73: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Ph

ysic

alA

rch

. & C

ntl

Soft

war

e

Wiring

Data Link

Network

Transport

System

Application

Page 74: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Intelligent NoCs for Cache Coherence:

INSO and INCF

• Two main problems with Snoop-based Coherence in

unordered NoCs:

In-Network Snoop Ordering

In-Network Coherence Filtering: Snoopy Coherence without Broadcasts

1. Incorrect

In-Network Snoop Ordering (INSO)

2. Broadcast messages.

In-Network Coherence Filtering (INCF)

Page 75: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Intelligent NoCs for Cache Coherence:

INSO and INCF

• Two main problems with Snoop-based Coherence in

unordered NoCs:

In-Network Snoop Ordering

In-Network Coherence Filtering: Snoopy Coherence without Broadcasts

1. Incorrect

In-Network Snoop Ordering (INSO)

2. Broadcast messages.

In-Network Coherence Filtering (INCF)

0 1

{0,2,4} {1,3,5}

0 1

0

8

4

Page 76: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Traditional Network-on-Chip•

Principles and Practices of Interconnection Networks

0 1 2 3

4 5 6 7

8 9 10 11

14 1512 13

RoutingLogic /Table

SwitchAllocator

Crossbar

VCAllocator

BW

RCVA SA LTST

Page 77: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

0 1 2 30 1 2 3

0 1 2 3

Page 78: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

0

1 2 3

0

1 2 3

0

1 2 3

Page 79: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

0

1

2 3

01

2 3

01

2 3

Page 80: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0

1

2

3

012

3

012

3

Page 81: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1

2

3

0

12

3

0

12

3

0

0

Page 82: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2

301

20

1

2

3

3

0

0

Page 83: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

01

2 0

1

2

3

3

0

0

Page 84: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0

12

0

1

23

3

0

0

Page 85: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

01

2

0 1

23

3

0

0

Page 86: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0

1

2

0 1 2

3

3

0

0

Page 87: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0

1

2

0 1 2

3

3 0

0

Page 88: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

Circuit-switched NoC• Benefit of circuit-switched NoC

A 2.9Tb/s 8W 64-Core Circuit-Switched Network-on-Chip in 45nm CMOSWinning the Pinning in NoC

Page 89: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0 1

2

0 1 2

0

3

3

0

Page 90: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0 1 20 1 2

3

3

0 0

Page 91: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0 1 20 1 2

3 3

0

0

Page 92: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0 1 20 1 2

3

0

0

3

Page 93: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0 1 20 1 2

3

3

0

3

0

Page 94: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3 0 1 2 3

0 1 20 1 2

300

3

Page 95: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network• Buffers in NoC

– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet or Flit dropping

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

0 1 2 3 0 1 20 1 2 3 00 3

Page 96: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 97: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 98: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 99: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 100: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 101: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 102: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 103: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Bufferless Network

A Case for Bufferless Routing in On-Chip NetworksSCARAB: A Single Cycle Adaptive Routing and Bufferless Network

Ph

ysi

cal

Arc

h. &

Cn

tlSo

ftw

are

Wiring

Data Link

Network

Transport

System

Application

10

2 3

• Buffers in NoC– Energy, area, complexity

• Can we design network without buffers?– Deflective routing vs. Packet/Flit dropping

• BLESS– Deflective bufferless Network

– FLIT-BLESS vs. WORM-BLESS

• Problems– Injection problem

– Livelock

– Throughput and Latency

Page 104: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

• Spend 4 c.c. for 1 link traversal

Router Microarchitecture:

Reducing Pipelines

Principles and Practices of Interconnection Networks

BW

RCVA SA LTST

SA LTSTBW -

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Page 105: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Router Microarchitecture:

Reducing Pipelines

• Speculative Routing

Principles and Practices of Interconnection Networks

BW

RCVA SA LTST

SA LTSTBW -

BW

RC

VA

SALTST

SA LTSTBW

VA

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Head Flit

Body

& Tail Flit

Speculative Router Pipeline

ST

Page 106: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Router Microarchitecture:

Reducing Pipelines

Principles and Practices of Interconnection Networks

BW

RCVA SA LTST

SA LTSTBW -

BW

RC

VA

SALTST

SA LTSTBW

LTST

SA LTST- -

VA SA

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Head Flit

Body

& Tail FlitBW

• Speculative Routing

Speculative Router Pipeline

Page 107: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Router Microarchitecture:

Reducing Pipelines

Principles and Practices of Interconnection Networks

BW

RCVA SA LTST

SA LTSTBW -

LT

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Head Flit

Body

& Tail Flit

• Lookahead Routing

Lookahead Router Pipeline

BW

NRC

VA

SA LTST

SA STBW

Page 108: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Router Microarchitecture:

Reducing Pipelines

Principles and Practices of Interconnection Networks

BW

RCVA SA LTST

SA LTSTBW -

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Head Flit

Body

& Tail Flit

• Speculation + Lookahead Routing

BWNRC

VASA

ST

SA

BW

LT

LTST

Speculation + Lookahead Router Pipeline

Page 109: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Router Microarchitecture:

Reducing Pipelines

Principles and Practices of Interconnection Networks

BW

RCVA SA LTST

SA LTSTBW -

Head Flit

Body

& Tail Flit

Baseline Router Pipeline

Head Flit

Body

& Tail Flit

• Speculation + Lookahead Routing

BWNRC

VASA

LTST

SA LTSTBW -

VA SA

Speculation + Lookahead Router Pipeline

Page 110: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Microarchitecture:

Enhance Arbitration

• Traditional Allocator Implementation

– Input-first, Output-first, Wavefront

• SPAROFLO

– Speculative Priority Assignment (SPA)

– Recreate Old (RO)

– Flow (FLO)

Allocator Implementations for Network-on-Chips

A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

IxV:1

( 1 )

IxV:1

( O )

V:1

( 1 )

V:1

( I )

OxV:1

( 1 )

OxV:1

( I )

V:1

( 1 )

V:1

( O )

req11

req1v

reqIv

reqI1

gnt11

gnt1v

gntIv

gntI1

req11

req1v

reqIv

reqI1

gnt11

gnt1v

gntIv

gntI1

…… … …

… …

… … ……

…… … …

… … ……

Page 111: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Microarchitecture:

Enhance Arbitration

• Traditional Allocator Implementation

– Input-first, Output-first, Wavefront, LOA, PIM, …

• SPAROFLO

– Speculative Priority Assignment (SPA)

– Recreate Old (RO)

– Flow (FLO)

Allocator Implementations for Network-on-Chips

A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

0

1

2

3

0

1

2

req11

reqi1

reqio

req1o

gnt11

gnt1o

gntio

gnti1

…… … …

… … ……

o:1

( 1 )

o:1

( i )

i:1

( 1 )

i:1

( o )

Page 112: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Microarchitecture:

Enhance Arbitration

• Traditional Allocator Implementation

– Input-first, Output-first, Wavefront, LOA, PIM, …

• SPAROFLO

– Speculative Priority Assignment (SPA)

– Recreate Old (RO)

– Flow (FLO)

Allocator Implementations for Network-on-Chips

A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

i:1

( 1 )

i:1

( o )

o:1

( 1 )

o:1

( i )

req11

req1o

reqio

reqi1

gnt11

gnti1

gntio

gnt1o

…… … …

… …

… … ……

0

1

2

3

0

1

2

Page 113: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Microarchitecture:

Enhance Arbitration

• Traditional Allocator Implementation

– Input-first, Output-first, Wavefront, LOA, PIM, …

Allocator Implementations for Network-on-Chips

A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

i:1

( 1 )

i:1

( o )

o:1

( 1 )

o:1

( i )

req11

req1o

reqio

reqi1

gnt11

gnti1

gntio

gnt1o

…… … …

… …

… … ……

0

1

2

3

0

1

2

req11

reqi1

reqio

req1o

gnt11

gnt1o

gntio

gnti1

…… … …

… … ……

o:1

( 1 )

o:1

( i )

i:1

( 1 )

i:1

( o )

Page 114: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Microarchitecture:

Enhance Arbitration

• Switch and Virtual Channel Allocators

Allocator Implementations for Network-on-Chips

A 4.6Tbits/s 3.6GHz Single-Cycle NoC Router with a Novel Switch Allocator in 65nm CMOS

0

1

0

1

0

1

0

11

0

1

0

• SPAROFLO

– Speculative Priority Assignment (SPA)

– Recreate Old (RO)

– Flow (FLO)

Page 115: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Reliable Hop-to-Hop transmission

• On-chip interconnect errors

• Using High Voltage

– Reduce error rate

– Limited in delay, area, and produce more energy

• Use low voltage with error correction code

– Type-II HARQ with low-swing channel

On Hamming Product Codes With Type-II Hybrid ARQ for On-Chip Interconnects

Adaptive error control for nanometer scale network-on-chip links

x

y

z

011

100

HD(011,100) = 3 …

Sender Receivern x k

Page 116: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Globally Asynchronous,

Locally Synchronous (GALS) Circuit

• The problems of Clock Distribution

– Design Complexity, Noise, and Power

• Local clock w/ asynchronous communication

Property Pausible Clocking FIFO-based Boundary Synchronization

Area Overhead Low Med to High Low

Latency Low High Med

Throughput Depend on clock pause rate High Med

Power Consumption Low High Med

Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook

Local

Sync.

1

Pausible

Clock

Ou

tpu

t Po

rt

Local

Sync.

2

Pausible

Clock

Inp

ut P

ort

Local

Sync.

1

Async

FIFO

Local

Sync.

2

Local

Sync.

1

RE

G Local

Sync.

2

RE

G

CL

DL

Page 117: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Robust Interfaces for Mixed-Timing Systems

• Two distinct problems– Different local timing : GALS

– long delays in interconnections : LID

• Can we use mixed FIFOs as relay stations?– LID + GALS

• Reusable mixed-timing FIFOs– And Relay stations based on the FIFOs

clk1

clk2

TAIL

HEAD

Robust Interfaces for Mixed-Timing Systems with Application to Latency-Insensitive Protocols

Robust Interfaces for Mixed-Timing Systems

Cell Cell Cell Cell

Put Ctrl

Full Detector

Empty Detector

req_put

full

data_put

CLK_put

req_get

empty

data_get

CLK_get

valid_get

ack_putG

et

Ctr

l

ack_get

Sync-Sync FIFO

Async-Sync FIFO

Async-Async FIFO

Sync-Async FIFO

Page 118: Design and Analysis of Networks-on-Chip in Heterogeneous Multicore …youngjin/download/Candidacy_note.pdf · Fixed-point DSP Function-Specific HW cores DCD with New Format DCD with

Application & System Drivers Summary

• Multicores & Heterogeneous Systems

– Increasing numbers of IP cores

• Emerging applications

– PARSEC / User-Interactive Apps

• Role of operating system

• Power vs. Performance

• Cache vs. Scratch-pad

– Shared-memory vs. Message-passing

– Cache Coherence Protocols

• On-Chip Memory Controller

• Off-chip Network & Memory