40
SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY Implications of Memory-Centric Computing Architectures for Future NoCs Sudhakar Yalamanchili Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems School of Electrical and Computer Engineering Georgia Institute of Technology Sponsors:

Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Implications of Memory-Centric Computing Architectures for Future NoCs

Sudhakar Yalamanchili

Computer Architecture and Systems Laboratory Center for Experimental Research in Computer Systems

School of Electrical and Computer Engineering Georgia Institute of Technology

Sponsors:

Page 2: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Role of NoCs

www.themobileindian.com

www.theregister.co.uk

Nvidia Tegra4 IBM Power8

Qualcomm Snapdragon

The System Defines the NoC Requirements

2

Page 3: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Overview

n Impact of Technology and Applications

n Transition to Memory Centric Compute: Inside the Package

n Transition to Memory Centric Compute: Inside the Stack

n Concluding Remarks

3

Page 4: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

How are Technology and Applications Reshaping Systems?

4

Page 5: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Moore’s Law and the End of Dennard Scaling

From betanews.com

•  Performance scaled with number of transistors*

Goal: Sustain Performance Scaling

*R. Dennard, et al., “Design of ion-implanted MOSFETs with very small physical dimensions,” IEEE Journal of Solid State Circuits, vol. SC-9, no. 5, pp. 256-268, Oct. 1974.

5

Page 6: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Power and Performance

Perf opss

!

"#

$

%&= Power W( )×Efficiency ops

joule!

"#

$

%&

Operator_cost + Data_movement_cost + Storage_cost

Specialization ! heterogeneity, asymmetry, technology diversity

*

Power Supply (regulation) + Power Consumption + Cooling

W. J. Dally, Keynote IITC 2012

6

Page 7: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Energy Cost of Data Management

Perf opss

!

"#

$

%&= Power W( )×Efficiency ops

joule!

"#

$

%&

Three operands x 64 bits/operand

DataMovementEnergy = #bits× dist −mm× energy− bit −mm

*S. Borkar and A. Chien, “The Future of Microprocessors, CACM, May 2011

Operator_cost + Data_movement_cost + Storage_cost

W. J. Dally, Keynote IITC 2012

•  Refresh •  Access

7

Page 8: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Interconnect Energy Taper: Electrical

n Relative costs of compute and memory accesses n Time and energy costs have shifted to data movement

Courtesy Greg Astfalk, HP

Core Core Core Core

L1$ L1$ L1$ L1$

Last Level Cache

DRAM

1’s ns

ms

Data Access Latency

10’s ns

100’s ns

Data Access Energy

8

Page 9: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Shift in the Balance Point

 38    

 123    

 207      292    

0  1  2  3  4  5  6  7  8  

200   400   600   800   1000  

Band

width  (G

B/s)  

Normalized

 Perform

ance  

Core  Frequency  (MHz)  

 38    

 151    

 264    

0  2  

4  

6  

8  

10  

12  

4   8   12   16   20   24   28   32   36   40   44   Band

width  (G

B/s)  

Normalized

 Perform

ance  

#  AcBve  CUs  

Balance  plane  for  performance  and  energy  

Relative energy costs of compute

and memory access

Relative ops/byte demand of application

Hardware Balance

I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing Compute and Memory Power in High Performance GPUs,” IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2015.

9

Page 10: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Pin Bandwidth Challenges1

n Number of transistors/die continues to grow n Number of pins growing at a slower rate than #transistors n Number of supply pins are crowding out data pins

n Reducing supply current/pin limits growth of #transistor/die

1P. Stanley-Marbell, V. C. Cabezas, and R. P. Luijten, “ Pinned to the Wall – Impact of Packaging and Applications on the Memory and Power Walls,” IEEE/ACM international symposium on Low-Power Electronics and Design (ISPLED), 2011

Data pin bandwidth is not growing as fast as number of transistors on chip

CPU Die

Package Substrate

PCB To DRAM

Die

CPU Die DRAM Die

Si Interposer

Package Substrate

PCB

10

Page 11: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Re-Emergence of Processing In (Near) Memory

3D Interposer System

BGA

Power/Ground Planes

TPVs

Memory Stack

Logic Die

PWB PCB

Logic

MemoryMemoryMemoryMemory

TSV

Test  pads

Glass  interposer TPV RDL

Thermal  adhesiveThermal  Vias

Encapsulate

GPU

Kogge – Execube (4K DRAM + 100K gate parallel processor (www3.nd.edu)

Draper et.al, DIVA chip (isi.edu)

Courtesy Gokul Kumar

1990’s

Today

11

Page 12: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

The Data Tsunami

www.hq.unu.edu

12

Page 13: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Shift in Re-Use Patterns: Locality

Adaptive Mesh Refinement (AMR)

Temporal locality S

patia

l Loc

ality

Machines designed today for these

applications

Need machines for these applications

0

1

2

3 4

5

Graph Analytics Machine Learning

Relational Computing

13

Page 14: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Shift to Finer Arithmetic Density

……

LargeQty(p) <-

Qty(q), q > 1000.

……

Relational Computations Over Massive Unstructured Data Sets

Neu

rom

orph

ic

Que

ry P

roce

sing

Candidate Application

Domains

14

Page 15: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Where do the $$ and Energy Go?

GPUPwr MemPwr RestOfCardPwr

•  Increasing percentage of costs

•  Increasing percentage of power

•  Increasing percentage of performance (latency-BW)

•  Increasing memory intensive applications

I. Paul, W. Huang, M. Arora, and S. Yalamanchili, “Harmonia: Balancing Compute and Memory Power in High Performance GPUs,” IEEE/ACM International Symposium on Computer Architecture (ISCA), June 2015.

15

Page 16: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

The (Re)-Emergence of Near Data Processing

Silicon Interposer

Multicore Chip

Logic Tier

Memory Tiers

Where are the Networks?

New BW Hierarchy and energy taper

• Hybrid Memory Cube (HMC) • High Bandwidth Memory (HBM)

• Wide I/O

Compute Package Capacity Tier Memory 16

Page 17: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Transition to Memory Centric Compute: Inside the Package

PCB

Logic

MemoryMemoryMemoryMemory

TSV

Test  pads

Glass  interposer TPV RDL

Thermal  adhesiveThermal  Vias

Encapsulate

GPU

17

Page 18: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Impact of Interposer: Processor-Memory Hierarchy

V. Garg, D. Stogner, C. Ulmer, D. E. Schimmel, C. Dislis, S. Yalamanchili, and D. S. Wills, “Early Analysis of Cost/Performance Trade-Offs in MCM Systems,” IEEE Transactions on Components, Packaging, and Manufacturing Technology–Part B, vol. 20, no. 3, August 1997.

n Attraction n Smaller die (better yield), process customization, and larger L2 n Better thermal behaviors

n Issues n Increased die and substrate testing costs (increase in I/Os) n Cost

18

Page 19: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Impact of Interposer: SIMD Image Processor

V. Garg, D. Stogner, C. Ulmer, D. E. Schimmel, C. Dislis, S. Yalamanchili, and D. S. Wills, “Early Analysis of Cost/Performance Trade-Offs in MCM Systems,” IEEE Transactions on Components, Packaging, and Manufacturing Technology–Part B, vol. 20, no. 3, August 1997.

n Trading cost vs. chip size vs number of I/Os

n What is the most cost effective partitioning?

19

Page 20: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

The Opportunity: Network in Package

•  Smaller micro bumps à increased #connections •  Shorter faster wires in the interposer •  Higher signal integrity à Lower power •  Interposer cost? •  Example: Network on Interposer: Jerger, Kannan, Li, and

Loh (MICRO 2014) •  Example: memory networks: G. Kim et. al. PACT 2013

PCB

Logic

MemoryMemoryMemoryMemory

TSV

Test  pads

Glass  interposer TPV RDL

Thermal  adhesiveThermal  Vias

Encapsulate

GPU

20

Page 21: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Transition to Memory-Centric Compute: Inside The Stack

21

Page 22: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

3D Multicore Architecture

http://www.extremetech.com/computing/197720-beyond-ddr4-understand-the-differences-between-wide-io-hbm-and-hybrid-memory-cube

PCB

Logic

MemoryMemoryMemoryMemory

TSV

Test  pads

Glass  interposer TPV RDL

Thermal  adhesiveThermal  Vias

Encapsulate

GPU

22

Page 23: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Source of Memory Bandwidth

•  Mismatch between bus bandwidth and DRAM access latency

•  Over the past two decades density has increased by 1000X and latency reduced by 56% [source:hynix]

•  Solution: Have more, narrower channels and exploit parallelism

23

Page 24: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Parallelism in the Memory System

n Move towards more narrower channels n Increases data cycles improving efficiency and utilization

4 channels with 256 bit bus vs. 16 channels with 64 bit bus

32, x86 cores, Hybrid Memory Cube (HMC) Model

You can buy bandwidth but you cannot bribe God! -- Unknown

24

Page 25: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

2D Bandwidth Still Matters!

25

Page 26: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Network Impact of Memory Parallelism

Impact of reduced queuing delays

•  Tiled 3D memory with 16 channels, similar to HMC

• Distributed directory based coherence with shared L2 banks

• DRAM latency vs. MC queuing

More channels à less load per channel à reduced queueing

2D torus network on compute Tier Network component

26

Page 27: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Impact of DRAM Latency vs. Network Latency

•  Impact on all requests is low.

• Coherence requires a request to take multiple hops before being satisfied

•  It’s the network!

Normalized increase in CAS latency

32 cores, 16 memory channels

2D Network dominates

L1 L2 MC

1 2

3 4

Coherence Traffic

Miss Traffic

L1

2

27

Page 28: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

The Opportunity

3D BW

2DBW Locality

28

Page 29: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Refactoring the Memory Hierarchy

29

Page 30: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

How is the Network Used?

cannealdedup

ferretfluidanimate

s treamclus tervips

050

100150200250300350400450

HPKI  with  varying  L1  s ize 2481632

HPKI

0 5 10 15 20 25 30 3502468

10121416

L1  S ize  vs  G lobal  IPCcannealdedupferretfluidanimates treamclus tervips

L 1  S ize  P er  Node  (in  KB )

Globa

l  IPC

Hop Count Per Kilo Instructions (HPKI) Core Core Core Core

L1$ L1$ L1$ L1$

DRAM

DRAM

DRAM

DRAM

MC

MC

MC

MC

L2$ L2$ L2$ L2$

L1 L2 MC

1 2

3 4

Coherence Traffic

Miss Traffic

L1

2

30

Page 31: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Optimization: Memory Side Caching

n Refactor memory hierarchy to reduce hop count

n Modify address space mappings to retain/improve locality

n Maximize L2-DRAM BW n Remove serialization latency

31

Page 32: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Importance of Address Space Management: Locality

Core

Core

Core

Core

L1$

L1$

L1$

L1$

DRAM

DRAM

DRAM

DRAM

MC

MC

MC

MC

L2$

L2$

L2$

L2$

Global Address Space

Cache Address Mapping (CAM)

Global Address Mapping (GAM)

Local Address Mapping (LAM)

•  Different mapping functions determine parallelism in memory and network traffic •  More parallelism à better 3D bandwidth utilization but more load on the network

32

Page 33: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Emphasis on Co-Design

Core Core Core Core

L1$ L1$ L1$ L1$

DRAM DRAM MC

MC

Global Address Space

L2$ L2$

DRAM MC L2$ DRAM M

C L2$

n Refactor the memory hierarchy n Co-design address space assignment across levels of the memory hierarchy

n Diversity of interconnect technologies 33

Page 34: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Cymric: A Near Data Processing Architecture

……

Stream Multiprocessor

Custom Lightweight Parametric GPU C++ Design µArchitectureFlow

Processor Design

Research Topics

Memory Hierarchy

HARP Compiler

PNM Runtime System

PNM Device Driver

Network + Cache?

PNM cores

Memory

Memory

Memory

Memory

3D

Logic Dies

Memory

Memory Memory Memory

PNM cores

2.5D

•  High Radix, Small Buffer Networks •  GHC, Slimfly, FB

H. Kim, S. Mukhopadhyay, and S. Yalamanchili

34

Page 35: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Short Stack: Physical Structure

Active Layer BEOL Silicon TSV

25um

25um 10um

25um 10um

100um

LLC & Memory controllers L2 (per core): 2MB, 4096 sets, 128B, 35 cycles;

Die Dimension: 8.4 mm X 8.4 mm

16 homogeneous OOO x86 cores 3GHz, 1.0V, max temp 100◦C DL1: 128KB, 4096 sets, 64B L1: 32KB, 128 sets, 64B, 1 cycles; Short Stack

Memory Controller and

LLC Bank

35

Page 36: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Why is Temperature A NoC Problem?

36

Page 37: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Thermal Challenges and Microfluidic Cooling

37

•  Fluid flow through the microchannels carry heat out to an external heat exchanger (e.g., heat sink)

Courtesy Professor Muhannad Bakir (GT/ECE)

This research supported by the DARPA ICECOOL Program

Page 38: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

3D FPGA: 3D Bandwidth – Performance Tradeoff

Physical Architecture of Micro-Fluidically Cooled 3D FPGA

Parameters Values

Chip Size (1.65x1.65)cm2

Number of CLBs 980K Number of TSVs per 3D Switch Box

4

Horizontal Routing Channel BW

30

Micro-Channel Dimension () (50x100) µm2

Fluid Velocity 1 m s-1

(50um pitch)

n Tradeoff between cooling capacity and 3D bandwidth

n Co-Design Exploration n Routing quality plays a key

role

Courtesy: A. Srivastava (UMD)

2D vs. 3D congestion

38

Page 39: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY

Concluding Remarks

n We are going to see a reshaping of the boundaries between compute and memory

n System-level Co-design for memory-centric compute

n Exploration of network technologies: wireless, capacitive coupling, optics, etc.

n Expand the scope of traditional NoCs

39

Page 40: Implications of Memory-Centric Computing Architectures for …casl.gatech.edu/wp-content/uploads/2016/01/nocs_keynote.pdf · SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA

SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING | GEORGIA INSTITUTE OF TECHNOLOGY 40

Applications &

Architecture

Power

NoC

Technology & Cooling

Thank YouQuestions?

Scaling Performance