32
Novel NanoSystems to Enable AI Department of EE & Department of CS Stanford University Subhasish Mitra

Novel NanoSystems to Enable AI - nseresearch.org

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Novel NanoSystems to Enable AI

Department of EE & Department of CS

Stanford University

Subhasish Mitra

10010101010101010101010101100101001010101010100110101010101010011101001100010101010101100101000111001010010101010110001011101010101010101001101001010101010101010110101001100101011001010101010110100101101010101010100111110011111011101001001011101010110101011010

Edge to Cloud

Military Science Health Care Government

Abundant data Genomics

Smart Cities

Security

Finance STOP

3

World Relies on Computing

4 US National Academy of Sciences (2011)

Energy × Execution time

De

sig

n te

ch

niq

ue

s

Device performance 5

Improve Computing Performance

De

sig

n te

ch

niq

ue

s

Device performance 6

Few experimental demos

Device ≠ system

Option 1: Better Devices

De

sig

n te

ch

niq

ue

s

Device performance 7

Few “tricks”

Design complexity

Multi-core

Power /

thermal

Option 2: Design Tricks

De

sig

n te

ch

niq

ue

s

Device performance 8

Multi-core

Power,

thermal

Target:

1,000× performance

Improve Computing Performance

New innovations required

NanoSystems New nanotech New systems New applications Devices

Fabrication

Sensors

imperfections?

large-scale fabrication?

variability?

New

architectures

a

9

Abundant-Data Applications

Processors, accelerators

Compute Memory

5%

95%

10

Chip realization ?

Memory Wall Brain-inspired ⊃ Neural Nets

•Compute + memory

•Dense connectivity

•Energy efficiency

•Footprint

Computing Today

11

Computation immersed in memory

Memory

Increased

functionality

Ultra-dense 3D

Computing logic

Impossible with business as usual

N3XT NanoSystems

12

Nano-Engineered Computing Systems Technology

13 [Aly IEEE Computer 15, Proc. IEEE 19] Stanford + CMU + MIT + NTU Singapore + UC Berkeley + U. Michigan

DARPA 3DSoC Program

14

Max Shulaker

Anantha Chandrakasan

Subhasish Mitra H.-S. Philip Wong, Simon S. Wong

Brad Ferguson

Mark Nelson Jefford Humes

Carbon Nanotube FET (CNFET)

15

CNT: d = 1.2nm

2 µm

Gate

2 µm

Gated

CNFET

Sub-litho

Energy Delay Product

~ 10× benefit

Full-design level

Example: OpenSPARC T2 Processor Core

16

0.05

0.5

0.1 1 10

tota

l e

ne

rgy p

er

cycle

(nJ)

clock frequency (GHz)

FinFET Nanowire FET

CNFET

[Hills IEEE TNANO 18] Stanford + IMEC + TSMC

preferred

Big Benefits, Major (Past) Obstacles

17 [Zhang IEEE TCAD 12]

Mis-positioned CNTs Metallic CNTs

circa 2005

Imperfection-immune: Process + Design

Process alone inadequate

18

Stanford

Ph.D. student

MIT

Professor

First CNT computer (Stanford) CNT RISC-V (MIT, Analog Devices)

[Nature 2013] [Nature 2019]

178 CNFETs: PMOS logic

Single instruction (Turing complete)

1-bit data

14,702 CNFETs: CMOS logic

All RV32E instructions

16-bit data

3D Integration

19

Massive ILV density >> TSV density

Conventional BEOL nano-scale

inter-layer vias (ILVs)

TSV (chip stacking)

Through silicon via (TSV)

Dense, e.g., monolithic

Realizing Monolithic 3D

20

+ + Emerging logic Emerging memory Monolithic 3D

Naturally enabled: < 400 °C

Combine device + architecture benefits

3D NanoSystem

21

2 Million CNFETs, 1 Mbit Resistive RAM

[Shulaker Nature 17]

3D NanoSystem

22

Millions of sensors

Memory

1 Megabit RRAM

CNT computing logic

Ultra-dense

vertical connections

CNTs

X100,000

Abundant data: Terabytes / second

In-situ classification: extensive, accurate

Classification accelerator

HD: Brain-Inspired ⊃ Neural Nets

23 [Wu ISSCC 18, IEEE JSSC 18] Stanford + UC Berkeley HD = Hyperdimensional

CNT logic

(1,952 CNFETs)

RRAM TCAM

(224 RRAM cells)

Monolithic 3D: dense ILVs

Exploit: inherent variations, RRAM gradual Reset

Live ISSCC demo: one-shot learning, language classification

N3XT Simulation Framework

24

Explore architectures

Energy, exec. time

Thermal, lifetime

Physical design,

yield, reliability

Heterogeneous

nanotechnologies System analysis Abundant-

data

apps

[Aly Proc. IEEE 19]

Massive Benefits

Deep Learning, Graph Analytics, …

25

10×

100×

PageRank Connected Components

Breadth- First

Search

Linear Regression

Language model (LSTM Neural

Network)

AlexNet (Neural

Network)

Energy Execution Time

Benefits

851× 400× 510× 970× 1,950× 210×

~1,000× benefits, existing software

26

Cross-layer: device + circuit + architecture

Dense compute + thermal

New software optimizations

Many NanoSystems Opportunities

27

RRAM Cross-Layer Solutions Monolithic 3D

Endurance: ENDURER

Non-volatile

Multiple

bits per cell

Low Resistance

State

High Resistance

State

Set

Reset

28

RRAM + Industry Silicon CMOS

CEA LETI

RRAM

Silicon CMOS

compute [Wu ISSCC 19] Stanford + CEA LETI + NTU Singapore

First Multiple bits-per-cell RRAM System

29

Bits

per cell

Cells

measured

Our work

new

algorithms

3 Full arrays

Prior work

ad hoc

2-6.5

Single cell,

few

hand-picked

cells

Neural nets

Optimized weight encoding

On-chip RRAM

multiple bits per cell

Cross-layer

2.3× accurate inference (measured)

Same hardware, bigger neural net

New Exciting Result

30

1T4R, 2 bits-per-cell RRAM array

[Hsieh IEDM 19] Stanford + SkyWater

Foundry CNFET + RRAM + Monolithic 3D

31

Conclusion

32

NanoSystems today

Lab to fab: CNFET, RRAM, monolithic 3D

Game ON, to era

N3XT massive benefits

Existing software, wide range of apps