6
ModSim 2021 Photos placed in horizontal position with even amount of white space between photos and header Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525. T. Patrick Xiao, Christopher H. Bennett, Ben Feinberg, Matthew Marinella, Sapan Agarwal Sandia National Laboratories, Albuquerque, NM [email protected] CrossSim: GPU-Accelerated Simulation of Analog Neural Networks 1

Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

ModSim 2021

Photos placed in horizontal position with even amount of white space

between photos and header

Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia LLC, a wholly owned subsidiary of Honeywell International Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.

T. Patrick Xiao, Christopher H. Bennett, Ben Feinberg, Matthew Marinella, Sapan Agarwal

Sandia National Laboratories, Albuquerque, [email protected]

CrossSim: GPU-Accelerated Simulation of Analog Neural Networks

1

Page 2: Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

ModSim 2021 2

Deep learning inside memory arrays

V1

G3,2

V2

V3

G2,2

G1,2

G3,1

G2,1

G1,1

G3,3

G2,3

G1,3

ΣGi,1Vi ΣGi,2Vi ΣGi,3Vi

ElectricalMathematical

x1

x2

x3 A3,2

A2,2

A1,2

A3,1

A2,1

A1,1

A3,3

A2,3

A1,3

ΣAi,1xi ΣAi,2xi ΣAi,3xi=

T

Matrix-vector multiplication:

Ax

V3T2

V2T2

V1T2

V3T1

V2T1

V1T1

V3T3

V2T3

V1T3

V1

V2

V3

T1 T2 T3

x1

x2

x3 x3δ2

x2δ2

x1δ2

x3δ1

x2δ1

x1δ1

x3δ3

x2δ3

x1δ3

δ1 δ2 δ3

=

Outer product update:

xδT

Highly energy-efficient, but is it accurate enough?

Page 3: Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

ModSim 2021 3

Neural networkAnalog MVM cores

Partial result aggregation

Row

driv

ers/

DA

C

ADC

Core 0Core 1Core 2

Weight matrix

mapping engine

Neural network model

Inference

Dataset

Laye

r inp

uts

Layer outputs

Bias addition

Activation function

prediction

Integrator

weight datapartial sums, activations

MVM simulatorArray and device

non-idealities

• Device program errors• Array parasitic

resistance with built-in fast circuit simulator

• ADC quantization• Cycle-to-cycle read

noise• Conductance drift

Characterized device data• Program errors,

process variation• Retention loss• Noise properties• On/Off ratio

conductance, G # de

vice

s

System parameters• Weight bit slicing• Input bit slicing• Negative number handling• ADC resolution & ranges• Array size• Array electrical topology

Inputs to CrossSim

To be released soon! Check cross-sim.sandia.gov

Python with CUDA

acceleration

Page 4: Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

ModSim 2021 4

Multi-scale modeling of inference accuracyDevice properties affect accuracy Array design affects accuracy

G

# de

vice

s ΔG = αpropGΔG = αindGmax

# de

vice

s

G

System architecture affects accuracy

10–310–5 10–4

Parasitic resistance Rp / Rmin

10–2

MNIST, CNN-6

CIFAR-10, ResNet56

ImageNet, ResNet50

Rp

State-independent programming error

State-proportional programming error

Xiao et al, arXiv:2109.01262, 2021Xiao et al, Semi Sci Tech, Accepted (in press), 2021

Offset subtraction Differential cells

State-proportional error αprop (%)

Error αind (%) Error αprop (%)

ResNet50 MobileNet-v1Inception-v3 VGG-19

Wij ~ Gij – Goffset Wij ~ Gij – Gij+ –

8 bits/cell4 bits/cell2 bits/cell

7 bits/cell4 bits/cell2 bits/cell

ResNet50 ResNet50

CrossSim’s fast built-in circuit simulator

Page 5: Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

ModSim 2021

LUT 2

5

Neural network Analog MVM cores

Row

driv

ers/

DA

C

Core 0Core 1

Core 2

Training

Act

ivat

ions

x

loss function error

ADC

Integrator Column drivers/ DAC

Errors δ

Initial conductance (μS)

Con

duct

ance

ch

ange

(μS

)

Probabilistic device lookup table (CDF)

Initial conductance,

desired update

Realistic conductance

update

LUT 1

Device pulse data

Bac

kpro

paga

tion

Lookup tables can model:• Arbitrary device update nonlinearity and asymmetry

properties, not describable by analytical equations• Cycle-to-cycle write noise• Device-to-device variation Fuller et al, Science 2019

Bennett et al, IRPS 2019

LUT 0+

Python with CUDA

acceleration

Page 6: Networks CrossSim: GPU-Accelerated Simulation of Analog Neural

ModSim 2021 6

From device measurements to accuracyTaOx ReRAM

Marinella, Agarwal et al, JETCAS 2018

Electrochemical RAM

Van der Burgt et al, Nature Materials 2017

Domain wall magnetic tunnel junction

Liu et al, Appl Phys Lett, 2021

Device

Pulse data

MNIST accuracy (2-layer MLP)