44
RRAM-Based Reconfigurable Computing Mohammed Zidan and Wei D. Lu Sept 12, Tysons Corner, VA

RRAM-Based Reconfigurable Computing - CMU · RRAM-Based Reconfigurable Computing Mohammed Zidan and Wei D. Lu Sept 12, Tysons Corner, VA

  • Upload
    vunga

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

RRAM-Based Reconfigurable Computing

Mohammed Zidan and Wei D. Lu

Sept 12, Tysons Corner, VA

• K. Rupp, “40 Years of Microprocessor Trend Data,” blog • K. Bresniker et al., "Adapting to Thrive in a New Economy of Memory Abundance," Computer 2015.

?

Device inspires the architecture

Architecture challenges the device

New Computing Devices

No Memory Bottleneck

Classical Processing

Cognitive Processing

Energy Efficient

RRAM Crossbar Array

Inte

rfac

e

Interface

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

RRAM Crossbar Array

Inte

rfac

e

Interface

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

Crossbar Array

Array Tile

Crossbar Array

Inte

rfac

e

Interface

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

Array Tile

Storage

A tile can be assigned to,

Crossbar Array

Inte

rfac

e

Interface

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

Storage

A tile can be assigned to,

Digital Computing

Crossbar Array

Inte

rfac

e

Interface

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

Digital Computing

A tile can be assigned to,

Neural Networks

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core M-Core M-Core

M-Core M-Core M-Core

M-Core M-Core M-Core

Unused

Storage

Digital Computing

Analog Computing

M-Core M-Core M-Core

M-Core M-Core M-Core

M-Core M-Core M-Core

M-Core M-Core M-Core

M-Core M-Core M-Core

M-Core M-Core M-Core

Unused

Storage

Digital Computing

Analog Computing

M-Core M-Core M-Core

M-Core M-Core

M-Core M-Core M-Core

M-Core

M-Core M-Core M-Core

M-Core M-Core

M-Core M-Core M-Core

M-Core

Unused

Storage

Digital Computing

Analog Computing

M-Core M-Core

M-Core M-Core

M-Core M-Core

M-Core

M-Core

M-Core

Device

Storage

Digital

Neural

Common Interface

RRAM Type Analog Binary

Device Levels <100 2

ON/OFF Ratio Average High

Endurance Average High

Programing Slow Fast

RRAM Type Analog Binary

Device Levels <100 2

ON/OFF Ratio Average High

Endurance Average High

Programing Slow Fast

Data Storage Digital Computing Neural Networks Internal Data Movement

RRAM Type Analog Binary

Device Levels <100 2

ON/OFF Ratio Average High

Endurance Average High

Programing Slow Fast

Data Storage Digital Computing TR & PDE Neural Networks BCNN Internal Data Movement In-Situ

RRAM Devices

• Our approach is utilized binary RRAM devices to implement a semi-analog (hybrid) neural network.

• Each classical analog device is replaced with “n” binary devices in the new network.

• The number of synaptic-weight bits can be dynamically configured when needed.

7

Binary Device

8

Postsynaptic Neuron

𝑛

Synaptic Weight

Binary Device

8

Analog Input (DAC)

𝑛

Synaptic Weight

Binary Device

8

Analog Input (DAC)

Postsynaptic Neuron

ADC ADC ADC

Ad

de

r

+

• Low training rate is required to train the network receptive fields (dictionaries) properly.

• This is translated into a larger number of bits to allow small “∆𝑤” values.

“n = 4” “n = 8” “n = 16”

12

• Typically, training is infrequent or is performed offline.

• Hence, after training the number bits per synaptic weights can be significantly reduced by assigning fewer columns per neuron.

Training Stage Regular Operation

13

• In analog image compression (sparse coding) each piece of a picture is represented as weighted combination of the network dictionary.

• We adopted locally competitive algorithm (LCA) to perform the analog image compression.

+

14

“Original” “Reconstructed”

16

5

RRAM Devices

Neural Networks

RRAM Devices

Zero One

0

4

8

12

16

i0 i1 i2 i3 i4 i5 i6 i7

Ou

tpu

t C

urr

ent

(µA

)

Output Node

𝑽𝒓

𝑽𝒓

𝑽𝒓

𝑽𝒓

𝑽𝒓

𝑰𝟎 𝑰𝟏 𝑰𝟐 𝑰𝟑 𝑰𝟒 𝑰𝟓 𝑰𝟔 𝑰𝟕

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

M-Core

R R R R R R R R

R R R R R R R R

R R R R R R R R

R R R S R R R R

R R R R R R R R

R R R R R R R R

R R R R R R R R

R R R R R R R R

M-Core

“32 × 32” Tile

0

150

300

450

600

750

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Nu

mb

er o

f O

ccu

rren

ces

Current (µA)

44,800 Simulation points

𝑋

𝐴 𝐵 𝐶 𝐷 ∙

𝐸 𝐼 𝑀𝐹𝐺

𝐽𝐾

𝑁𝑂

𝐻 𝐿 𝑃

= 𝑋 𝑌 𝑍

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

𝐴0

𝐵0

𝐶0

𝐷0

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ●

𝐴1

𝐵1

𝐶1

𝐷1

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ● ● ● ●

● ● ● ● ● ● ● ● ● ●

𝐴2

𝐵2

𝐶2

𝐷2

● ● ● ● ● ● ●

● ● ●

● ● ● ● ● ● ●

● ● ●

● ● ● ● ● ● ●

● ● ●

𝐴𝐸 + 𝐵𝐹 + 𝐶𝐺 + 𝐷𝐻

𝐴𝐼 + 𝐵𝐽 + 𝐶𝐾 + 𝐷𝐿

𝐴𝑀 + 𝐵𝑁 + 𝐶𝑂 + 𝐷𝑃

1

2

3

4

𝑬 𝑰 𝑴

𝑭 𝑱 𝑵

𝑮 𝑲 𝑶

𝑯 𝑳 𝑷

𝑨

𝑩

𝑪

𝑫

𝑌

𝑍

I/P Data

Data

111

111

111

Weather Forecasting

Electromagnetics

Heat Transfer Aerodynamics

Fluids Flow

Many Others https://www.horiba-mira.com/

http://www.metrosystems-des.com NASA

https://www.mentor.com/

http://www.theseus-fe.com

Hurricane Irma

Floating Point Solver Measured Results

Initial Condition

Neural Networks

RRAM Devices

Digital Computing

Neural Networks

RRAM Devices

Step “1” Step “2”

𝑉 ≤ −𝑉𝑤

𝑉 ≥ 𝑉𝑤

0.5 𝑉𝑟 ≤ 𝑉 < 0.75 𝑉𝑤

𝑉 < 0.5 𝑉𝑟

0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1

Digital Computing

Neural Networks

RRAM Devices

Storage / Memory

Digital Computing

Neural Networks

RRAM Devices

Storage / Memory

Digital Computing

Neural Networks

RRAM Devices

New Computing Devices

No Memory Bottleneck

Classical Process

Cognitive Process

Low Power Consumption

Scalable

Storage / Memory

Digital Computing

Neural Networks

RRAM Devices

1

10

100

1000

10000

10 100 1000 10000

CongestivePerformance(G

O/s)

PeakDPPerformance(GO/s)

GPUs

CPUs

FPCA

TrueNorth

Photo by Mohammed Zidan - 2013