37
1/37 Towards Memristor based Neuromorphic Architectures Tarek Taha Tarek Taha Electrical and Computer Engineering Department, University of Dayton February 26, 2013 Memory Integrated Neural Network Accelerators

Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

1/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Tarek Taha

Electrical and Computer Engineering Department,

University of Dayton

February 26, 2013

Memory Integrated Neural Network

Accelerators

Page 2: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

2/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Objective

BusBus Bus

Bu

s

Bu

s

C C

C C

BusBus

Bu

s

Bu

s

C C

C C

BusBus

Bu

s

Bu

s

Bu

s

Bus

Bu

s

Bus

Bu

s

C C

C C

R

C C

C C

R

C C

C C

R

R

C C

C C

R

C C

C C

R

C C

C C

R

C C

C C

RR

Input data

Design multicore neural network processors

Examine:

• Routing

• Memory options

• Analog and Digital options

Page 3: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

3/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Related Work: SpiNNaker

Each chip contains 18 ARM cores and a network router

Geared towards brain scale simulation

Cores share 128MB SDRAM stacked within the package

Multiple chips connected in a 2D torroidal network

Page 4: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

4/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Related Work: FACETS

Mixed signal ASIC wafer

Geared towards brain scale simulation

200k neurons and 50 million synapses

Synaptic weight is represented in a 4-bit SRAM with a 4-bit DAC

Page 5: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

5/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Related Work: IBM design

Crossbar memory based digital core

Each core has capacity to simulate 256 integrate and fire neuron

1024 synapse per neuron

One bit per synapse

Page 6: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

6/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Implementing Synapses

Memory elements to implement synapse:

Memristors can also mimic neural computing circuits

Level 1

Level 2

Level 1

Level 2

Level 1

Level 2

DRAM SRAM Memristor

crossbar

Location Off-chip On-chip On-chip

Density (Gb/cm2) 6.67 0.338 12

Read time ~100 0.3 0.3

Page 7: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

7/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Digital Core Design

Multiply-add-

accumulate to

calculate neural

outputs

reg

×

reg

×

reg

×

reg

×

+ + + +

Deco

der

Pre-synaptic

neuron

number i

Wij

Input buffer

Pre-synaptic

neuron inputs

coming in over

routing network

from other cores

Pre-synaptic

neuron value

Control

Unit

(i, xi)

xi

Output buffer

Post-synaptic

neuron outputs to

routing network (j, vj)

SRAM synaptic

memory array

N Axons

M Neurons

Page 8: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

8/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Digital Core Configurations

Configurations:

2 bit per synapse, 1 bit per neuron

Multipliers not needed.

4 bits per synapse

Multiplier and adder needed

Analysis

SRAM array power and area calculated from CACTI.

Only SRAM array, predecoder, and sense amplifiers considered.

Static and dynamic powers both considered

Page 9: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

9/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Routing Analysis

5

12

6

12

7

12

5

12

6

12

7

12

4 4 4 4

4 4 4 4

5

12

6

12

7

12

5

12

6

12

7

12

4 4 4 4

4 4 4 4

Layer l-1

Layer l

Layer l+1

4

4

4

4

Model developed to determine routing network bandwidth

requirements

Router and link powers calculated from Orion

Page 10: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

10/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

New Memristor SPICE Model

A generalized memristor SPICE model has been developed

Capable of modeling several different published memristor devices for a

variety of different voltage inputs

The result of our model has been compared to the published

characterization data using a quantifiable error percentage

Variation tolerance studies have been performed at the device and

circuit level

Layer thickness variation

Variation in state variable dynamics has also been studied

Caused by inconsistent stoichiometric ratio in the metal-oxide layer

Leads to variability in the number oxygen vacancies in each device on

a wafer

Page 11: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

11/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Model Equations

State Variable Motion

Non-Linear Drift

Voltage Threshold

Current-Voltage (I-V) Relationship

Page 12: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

12/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Modeling – Sinusoidal Inputs

Boise State

Device

Pino Compact Model Univ. of Dayton Model HP Labs Model Biolek Model

0 0.5 1 1.5 2-1

-0.5

0

0.5

1

time (s)

Vo

lta

ge

(V

)

-1 -0.5 0 0.5 1-15

-10

-5

0

5

10

15

Voltage (V)

Cu

rre

nt (

A)

0 0.5 1 1.5 2-20

-10

0

10

20C

urr

en

t (

A)

0 0.05 0.1 0.15 0.2-1

0

1

time (s)

Vo

ltag

e (

V)

-1 -0.5 0 0.5 1-2

-1

0

1

2

3

4

Voltage (V)

Cu

rre

nt (m

A)

0 0.05 0.1 0.15 0.2-5

0

5

Cu

rre

nt (m

A)

-0.4 -0.2 0 0.2 0.4 0.6

-2

0

2

4

Voltage (V)

Cu

rre

nt (m

A)

100 Hz

100 Hz Targets

100 kHz

0 10 20 30 40-3

-2

-1

0

1

2

3

4

time (ms)

Cu

rre

nt (m

A)

0 10 20 30 40-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Vo

lta

ge

(V

)

Current

Voltage

Page 13: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

13/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Modeling – Sweeping Inputs

Boise State Device

Univ. of Dayton

Model* HP Labs Model Joglekar Model Biolek Model

*C. Yakopcic T. M. Taha, G. Subramanyam, R. E. Pino, and S. Rogers, “A Memristor Device Model”

IEEE Electron Device Letters, 2011.

C. Yakopcic, T. M. Taha, G. Subramanyam, E. Shin, P. T. Murray, and S. Rogers, “Memristor-Based Pattern

Recognition for Image Processing: an Adaptive Coded Aperture Imaging and Sensing Opportunity,” SPIE

Adaptive Coded Aperture Imaging and Non-Imaging Sensors. (2010).

0 0.05 0.1 0.15 0.2 0.25 0.3

0

0.05

0.1

0.15

0.2

Voltage (V)

Cu

rre

nt (m

A)

0 0.05 0.1 0.15 0.2 0.25 0.3

0

0.2

0.4

0.6

0.8

1

Voltage (V)

Cu

rre

nt (m

A)

0 0.05 0.1 0.15 0.2 0.25 0.3

0

0.2

0.4

0.6

0.8

1

Voltage (V)

Cu

rre

nt (m

A)

0 0.1 0.2 0.30

0.5

1

Voltage (V)

Cu

rre

nt (m

A)

Page 14: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

14/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

UD Model: Boise State Device

[1] A. S. Oblea, A. Timilsina, D. Moore, and K. A. Campbell, “Silver Chalcogenide Based Memristor Devices,” International Joint Conference on

Neural Networks (IJCNN), (2010).

-0.4 -0.2 0 0.2 0.4 0.6

-2

0

2

4

Voltage (V)

Cu

rre

nt (m

A)

100 Hz

100 Hz Targets

100 kHz

0 10 20 30 40-3

-2

-1

0

1

2

3

4

time (ms)

Cu

rre

nt (m

A)

0 10 20 30 40-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Vo

lta

ge

(V

)

Current

Voltage

0 0.1 0.2 0.30

0.5

1

Voltage (V)

Cu

rre

nt (m

A)

Model result Data from [1]

Model matches the target

data with 6.64% error [1]

Model matches the target

data with 6.66% error [1]

Page 15: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

15/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

UD Model: Univ of MI Device

Model matches the target data with 6.21% error

[1] S. H. Jo, T. Chang, I. Ebong, B. B. Bhadviya, P. Mazumder, and W. Lu, “Nanoscale Memristor Device as Synapse in Neuromorphic Systems,” Nano

Letters, 10 (2010).

0 5 10 15-3

-1.5

0

1.5

3

4.5

6

Vo

lta

ge

(V

)

1 2 3 40

0.2

0.4

0.6

0.8

1

Cu

rre

nt (

A)

-2 -1 0-0.4

-0.3

-0.2

-0.1

0

0 5 10 15-0.5

-0.25

0

0.25

0.5

0.75

1

Cu

rre

nt (

A)

Current

Voltage

Voltage (V) time (s)

Our model result

Characterization

data from [1]

Page 16: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

16/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

UD Model: HP Labs Device

Model matches the target

data with 11.58% error [1]

Model matches the target

data with 13.6% error (8.72%

when not considering the

largest outlier [2])

[1] G. S. Snider, “Cortical Computing with Memristive Nanodevices,” SciDAC Review, (2008).

[2] J. J. Yang, M. D. Pickett, X. Li, D. A. A. Ohlberg, D. R. Stewart and R. S. Williams, “Memristive switching mechanism for metal/oxide/metal

nanodevices,” Nature Nanotechnology, 3, 429–433 (2008).

-1.5 -1 -0.5 0 0.5 1 1.5-40

-20

0

20

40

60

Voltage (V)

Cu

rre

nt (

A)

0 1 2 3 4 5 6 7 8-40

-30

-20

-10

0

10

20

30

40

50

60

time (s)

Cu

rre

nt (

A)

0 1 2 3 4 5 6 7 8-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Vo

lta

ge

(V

)

Current

Voltage

-2 -1 0 1 2-200

-100

0

100

200

300

Voltage (V)

Cu

rre

nt (

A)

0 0.5 1 1.5 2-200

-150

-100

-50

0

50

100

150

200

250

300

time (s)

Cu

rre

nt (

A)

0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

3

Vo

lta

ge

(V

)

Current

Voltage

Page 17: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

17/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

UD Model: HP Labs TaOx Device

0 5 10 15 20-7.5

-5

-2.5

0

2.5

5

time (s)

Cu

rre

nt (m

A)

0 5 10 15 20-1.5

-1

-0.5

0

0.5

1

Vo

lta

ge

(V

)

Current

Voltage

-1 -0.5 0 0.5 1-5

0

5

Voltage (V)C

urr

en

t (m

A)

Model matches the target data with 4.60% error

[1] F. Miao, J. P. Strachan, J. J. Yang, M.-X. Zhang, I. Goldfarb, A. C. Torrezan, P. Eschbach, R. D. Kelley, G. Medeiros-Ribeiro, R. S. Williams, „Anatomy of

a Nanoscale Conduction Channel Reveals the Mechanism of a High-Performance Memristor,” Advanced Materials, vol. 23, no. 47, pp. 5633-5640,

Nov. 2011.

Page 18: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

18/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

UD Model: Univ. of Iowa Device

0 5 10 15 20 25-80

-60

-40

-20

0

20

40

60

80

time (s)

Cu

rre

nt (m

A)

-1 -0.5 0 0.5 1-80

-60

-40

-20

0

20

40

60

Voltage (V)

Cu

rre

nt (m

A)

0 5 10 15 20 25-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

Vo

lta

ge

(V

)

Current

Voltage

Matches physical characterization with 5.97% error

K. Miller, K. S. Nalwa, A. Bergerud, N. M. Neihart, and S. Chaudhary, “Memristive Behavior in Thin Anodic Titania,” IEEE Electron Device Letters 31(7),

(2010).

K. Miller, “Fabrication and modeling of thin-film anodic titania memristors,” Master’s Thesis, Iowa State University, Electrical and Computer

Engineering (VLSI), Ames, Iowa, (2010).

Page 19: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

19/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Memristor Modeling

Our model is more robust and accurate than other models

Results are based on a 16 memristor crossbar circuits with a

10 ns switching time

Measurement

Memristor Models Included in the Study

Biolek Joglekar Laiho Chang Our Model

Time Before

Error (µs) 0.517 0.517

15

(Finished:

no error)

0.135

15

(Finished:

no error)

Accuracy Low Low Medium Medium High

Generality High High Medium Medium High

a1

a2

a3

a4. . .

b1 b2 b3 b4. . .

Page 20: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

20/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Crossbar Write Technique

Write method used stops state change of unselected devices

Although this does not solve the problem for reads

X X X X

Initial Data From Previous Cycle

Write 1s Write 0s

X 1 X 1Vw/2

Vw/2-Vw/2

Vw/2-Vw/2

Data After Step 1

0 1 0 1-Vw/2

Vw/2-Vw/2

Vw/2-Vw/2

Data After Complete Write Cycle

Page 21: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

21/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Unwanted Current Paths

Alternate current paths in the resistor array consume power and

degrade signal.

a1

a2

a3

a4. . .

b1 b2 b3 b4. . .

Correct read between a1 and b1

Incorrect read between a1 and b1 due

to the alternate current path

a1

a2

a3

a4. . .

b1 b2 b3 b4. . .

Page 22: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

22/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Tiled Memristor Digital Memory

Inputs

Outputs OutputsMemristor

Tile row 0

access line

Tile row 1

access line

Untiled memory array

Tiled memory array • Full crossbar simulated in SPICE

• Wire resistances simulated

• Untiled memory array write

energy very high

• Read noise margin low

• Designed a tiled memristor

memory array

Page 23: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

23/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Memristor Memory Properties

Inputs

Outputs Outputs

Crossbar

size

Training

energy per

synapse

Read

energy per

synapse

4×4 1.90 pJ 1.84 fJ

8×8 5.07 pJ 2.00 fJ

Memristor

• Training energy increases with

crossbar size.

• Novel segmented memristor

crossbar memory.

• 4x4 crossbar: 2 bits per transistor

• 8x8 crossbar: 4 bits per transistor

Tile row 0

access line

Tile row 1

access line

Page 24: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

24/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Classifier Simulation

2 layer neural classifier

Based on a 16 memristor crossbar

Iteratively trained through MATLAB and

SPICE 1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

Image Classification Set

Class 1

Class 2

Class 3

Class 4

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

Voltage Input

Write Enable

Read Enable

Voltage Output

I1

I2

I3

I4

O1 O2 O3 O4

Inputs

Outputs

Page 25: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

25/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Simulation Setup

Simulation platform

Uses SPICE to produce accurate crossbar circuit simulation

Output voltages and synaptic weights are analyzed in MATLAB

3. Read weights from file

4. Simulate crossbar with

different inputs and record

outputs

5. Export output voltages to

file

1. Generate weights and write

to a SPICE configuration file

2. Call SPICE program with the

new weight configuration file

6. Evaluate output voltages

from export file

7. Revise weights based on

outputs

8. Go to step 2

MATLAB / C SPICE

Page 26: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

26/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Image Conversion

Image data is converted to a SPICE waveform

Each row in the image is converted to a voltage of 16 intervals

Entire image is represented by 4 voltage pulses

1 2 3 4

1

2

3

4

1 1 1 1

0 0 0 1

0 0 0 1

0 0 0 0

1V

1/15V

0V

1/15V

0 0.5 1 1.5 2 2.5 3

0

0.2

0.4

0.6

0.8

1

Class Output

Time (ns)

Voltage (

V)

Class 1

Class 2,3,4

0 0.5 1 1.5 2 2.5 3

0

0.2

0.4

0.6

0.8

1

Input Data Set

Time (ns)V

oltage (

V)

Neuron 1

Neuron 2

Neuron 3

Neuron 4

Page 27: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

27/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Simulation Result (1/2)

Crossbar bar is performing as a linear seperator

Can differentiate between all 4 image classes

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

Class 1 Class 2

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4

Voltag

e (

V)

Output Neuron

Voltag

e (

V)

Output Neuron

Page 28: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

28/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Simulation Result (2/2)

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

Class 3 Class 4

1 2 3 4

1

2

3

4

1 2 3 4

1

2

3

4

Crossbar bar is performing as a linear seperator

Can differentiate between all 4 image classes

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

1 2 3 4

0

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4

Voltag

e (

V)

Output Neuron

Voltag

e (

V)

Output Neuron

Page 29: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

29/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Parallel operation

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

Voltage Input

Write Enable

Read Enable

Voltage Output

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

+

-

Voltage Input

Write Enable

Read Enable

Voltage Output

I0

I1

I2

I3

I4

I5

I6

I7

O1 O2 O3 O4

• Read enable signal and diode allow multiple crossbars to be read in parallel

• Multipliers and adders not needed

Page 30: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

30/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Analog Memristor Core

Pre-synapticneuron value

Memristor crossbar arraySRAM array

Pre-synaptic Neuron Values (PSNV)

AC

AC

AC

AC

AC AC AC AC

AC

AC

AC

AC

AC AC AC AC

AC

AC

AC

AC

AC AC AC AC

AC

AC

AC

AC

AC AC AC AC

(PSNV)

Outputs Outputs

Pre-synapticneuron number

Output buffer

acc

×

acc

×

acc

×

acc

×

+ + ++

Dec

od

er

• Analog crossbar array replaces digital memory AND multiply-add units

• Full crossbar can be evaluated in parallel

• Most of the core can be shutdown once computations are done

Page 31: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

31/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Digital Memristor Core

• Digital crossbar array replaces SRAM memory array

• One row of synapses processed per cycle

• Most of the core can be shutdown once computations are done

Pre-synapticneuron value

Memristor crossbar arraySRAM array

Pre-synapticneuron number

Output buffer

acc

×

acc

×

acc

×

acc

×

+ + ++

Dec

od

er

Inputs

Outputs Outputs

Page 32: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

32/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Core Characteristics

Config.

Synaptic

memory

device

Memory

cells per

synapse

Bits per

synapse Adder

Core

area

(mm2)

Energy per

neuron (pJ)

Core

throughput

(neurons/s)

1 Memristor 1 Analog Current add 0.037 23 263.9×106

2 Memristor 2 2 bits 12 bit adder 0.098 240 42.1×106

3 SRAM 2 2 bits 12 bit adder 0.288 377 42.1×106

Config.

Synaptic

memory

device

Memory

cells per

synapse

Bits per

synapse Multiplier Adder

Core

area

(mm2)

Energy per

neuron (pJ)

Core

throughput

(neurons/s)

4 Memristor 1 Analog Memr. resist. Current add 0.058 26 66.3×106

5 Memristor 4 4 bits 4 bit multiplier 18 bit adder 0.179 391 28.6×106

6 SRAM 4 4 bits 4 bit multiplier 18 bit adder 0.513 780 26.6×106

Cores for 1 bit per neuron case (multiplies not needed):

Cores for 4 bits per neuron case:

Cycle Time: 5ns (200MHz clock)

Technology: 40nm

Energy considers leakage, data transfer to/from router, and control circuits.

Page 33: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

33/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

General Purpose Systems Comparison

Programmed commercial systems:

Data set was small enough to fit in cache/onboard memory

Simulated neural networks with 1024 synapses per neuron

NVIDIA Tesla M2070 GPGPU

225W max power

448 cores

575 MHz

CUDA program

Intel Xeon X5650 processor

95W max power

Six cores

2.66 GHz

SSE instructions modeled

NVIDIA GPGPU:

Intel Xeon:

Page 34: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

34/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Comparison: 1 Bit Neurons

Example 1:

• 25,600 neurons

• 100,000 iterations/s

• 1024 synap./neuron

Example 2:

• 1,706,667 neurons

• 1500 iterations/s

• 1024 synap./neuron

Example 1: 25,600 neurons

100,000 iterations/s

Example 2: 1,706,667 neurons

1500 iterations/s

Configuration

# of

chips

Chip

area

(mm2)

%

Active

Power

(W)

Power

eff. over

Xeon

# of

chips

Chip

area

(mm2)

%

Active

Power

(W)

Power

eff. over

Xeon

Memristor Analog 2 248 0.15% 0.70 24,395 2 248 0.15% 0.70 24,395

Memristor Digital 2 333 0.91% 1.25 13,633 2 333 0.91% 1.25 13,633

SRAM 5 388 0.91% 28.02 607 5 388 0.91% 28.02 607

NVIDIA M2070 12 529 99.2% 2700.00 6 12 529 99.2% 2700.00 6

Intel Xeon X5650 179 240 99.90% 17005.00 1 179 240 99.90% 17005.00 1

Example 1: 25,600 neurons

100,000 iterations/s

Example 2: 1,706,667 neurons

1500 iterations/s

Configuration

# of

chips

Chip

area

(mm2)

%

Active

Power

(W)

Power

eff. over

Xeon

# of

chips

Chip

area

(mm2)

%

Active

Power

(W)

Power

eff. over

Xeon

Memristor Analog 1 3.7 9.7% 0.07 253,489 2 248 0.15% 0.70 24,395

Memristor Digital 1 9.7 60.8% 0.62 27,546 2 333 0.91% 1.25 13,633

SRAM 1 35.2 60.8% 1.13 15,099 5 388 0.91% 28.02 607

NVIDIA M2070 12 529.0 99.2% 2700.00 6 12 529 99.2% 2700.00 6

Intel Xeon X5650 179 240.0 99.9% 17005.00 1 179 240 99.90% 17005.00 1

Page 35: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

35/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Comparison: 4 Bit Neurons

Example 1: 25,600 neurons

100,000 iterations/s

Example 2: 1,706,667 neurons

1500 iterations/s

Configuration

# of

chips

Chip

area

(mm2)

%

active

Power

(W)

Power

eff. over

Xeon

# of

chips

Chip

area

(mm2)

%

Active

Power

(W)

Power

eff. over

Xeon

Memristor Analog 1 395 0.58% 0.70 24,210 1 395 0.58% 0.70 24,210

Memristor Digital 4 303 1.34% 1.63 10,419 4 303 1.34% 1.63 10,419

SRAM 9 383 1.34% 48.67 349 9 383 1.34% 48.67 349

NVIDIA M2070 12 529 99.2% 2700.00 6 12 529 99.2% 2700.00 6

Intel Xeon X5650 179 240 99.9% 17005.00 1 179 240 99.9% 17005.00 1

Example 1: 25,600 neurons

100,000 iterations/s

Example 2: 1,706,667 neurons

1500 iterations/s

Configuration

# of

chips

Chip

area

(mm2)

%

active

Power

(W)

Power

eff. over

Xeon

# of

chips

Chip

area

(mm2)

%

Active

Power

(W)

Power

eff. over

Xeon

Memristor Analog 1 5.9 38.6% 0.07 234,859 1 395 0.58% 0.70 24,210

Memristor Digital 1 18.2 89.6% 0.62 16,968 4 303 1.34% 1.63 10,419

SRAM 1 29.1 89.6% 1.13 8,215 9 383 1.34% 48.67 349

NVIDIA M2070 12 529.0 99.2% 2700.00 6 12 529 99.2% 2700.00 6

Intel Xeon X5650 179 240.0 99.9% 17005.00 1 179 240 99.9% 17005.00 1

Example 1:

• 25,600 neurons

• 100,000 iterations/s

• 1024 synap./neuron

Example 2:

• 1,706,667 neurons

• 1500 iterations/s

• 1024 synap./neuron

Page 36: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

36/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Comparison: Digtal vs Analog cores

Neu

ron

nu

mb

er

Output buffer

AC

x,+

AC

x,+

AC

x,+

AC

x,+

1024x1024SRAM array

Digital SRAM Analog

29.1 mm2

Adder and multiplier needed

1 synapse processed /cycle

(per neuron)

1.13 W

256 Neurons with 1024

synapses each.

Synapses hold 16 possible

values.

5.9 mm2

Current summation

1024 synapses processed/cycle

(per neuron)

0.07 W 16x more energy*

Slower

Add/Multiply

6x area

Inputs

Outputs Outputs

*Energy calculation considers leakage and data routing power

Page 37: Memory Integrated Neural Network Acceleratorsnice.sandia.gov/documents/Tarek_Taha_nice_2013.pdf · 2013. 3. 12. · 1/37 Towards Memristor based Neuromorphic Architectures TarekTaha

37/37 Towards Memristor based Neuromorphic Architectures Tarek Taha

Conclusion

Specialized cores very efficient for neural network acceleration

Further improvements possible for both SRAM and memristor

cores

Memristor cores efficient when carrying out recognition tasks

Acknowledgement: This work was partially supported by an

NSF Award