2
RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang, P. Huang, L.F. Liu, X.Y. Liu, J.F. Kang * Institute of Microelectronics, Peking University, Beijing 100871, China * Email: [email protected] Abstract: In this work, we conduct research on optimizing schemes for the RRAM-based implementation of CNN. Our main achievements contain: 1) A concrete CNN circuit and corresponding operation methods are developed. 2) Quantification methods for utilizing binary or multilevel RRAM as synapses are proposed, and our CNN performs with 98% accuracy on the MNIST dataset using multilevel RRAM and 97% accuracy using binary RRAM. 3) Influence of the number and size of kernels, as well as the device conductance variation on final recognition accuracy is studied in detail. 4) Online learning tasks are performed using the developed CNN system with binary STDP protocol, and 94.6% accuracy on average is achieved. Introduction: The use of convolutional neural networks (CNN) has been hugely successful in the field of image recognition [1]. The RRAM-based implementation of CNN can reduce energy consumption and physically realize parallelism [2]. Though many research efforts have been made [3-6], there’s still a gap between the software algorithm of CNN and the hardware implementation in terms of applying RRAM as synapses efficiently while maintaining accuracy. In this work, practicable strategies are proposed to obtain high accuracy with CNN using binary RRAM or multilevel RRAM as synapses. Moreover, we achieve the online learning task by taking a binary STDP scheme and utilizing the first layer of CNN as a feature extractor. CNN Architecture: As shown in Fig.1, input images are firstly applied to the convolutional layer. The definition and formula of convolution operation is illustrated in Fig.2. Then the output images are pooled and activated. Finally, the output is reshaped to a vector and conveyed to the fully- connected layer to make the final decision. Fig.3 shows the circuit realization of the convolutional layer. To represent negative weight values, two rows of RRAM are used as one kernel, namely = 1 2 and thus = 1 2 ( is the effective weight). Traditionally, activation comes before the pooling operation. Changing the sequence can make the circuit implementation easier while having little effect on recognition accuracy. ReLU activation is selected because it is easy to be realized in circuits and its linearity allows us to achieve 2*2 mean pooling operation by simply combining four currents together without division circuits. Device Fabrication and Characteristics: Fig.4 shows the process flow for the fabrication of our RRAM crossbar array. Typical DC I-V characteristics of the AlOx/HfOy- based RRAM are presented in Fig.5. In order to use a large number of alternative weight values in quantification methods, conductance states are obtained by adding small reset pulses. (Fig.6) [7] Considering the efficiency of training, a feasible operation scheme among three conductance states is proposed, as shown in Fig.7. Because = 1 2 , three conductance states with a proportion of 0:1:3 can represent weights 0, ±1, ±2, ±3. Results and Discussion: Fig.8 shows the relationship between accuracy and the number of alternative weight values (No. of |w|) realized by RRAM with multilevel resistances. The number of |w| has little effect on recognition accuracy when the number of kernels is higher than 5. To obtain accuracy higher than 95%, the binary RRAM scheme needs at least 10 kernels, while the scheme using RRAM with 3 conductance states needs only 5 kernels. The thresholds utilized in the quantification method in the binary scheme (deciding a weight from software CNN corresponding to -1, 0 or 1) can affect the accuracy significantly. Fig.9 and Fig.10 show the influence of W1 threshold and W2 threshold respectively (W1 are weights in convolutional kernels and W2 are weights in the fully- connected layer). By choosing threshold in an appropriate scope, 97% accuracy can be achieved. Fig.11 illustrates the system’s tolerance to the variation of device conductance. Our CNN is highly reliable when device variation is under 50%. Besides, the accuracy of the binary scheme is always a bit lower than the 3-level scheme, and the fluctuation of the former tends to be larger than the latter in the same situation. Online Learning: A. Binary STDP: Fig.12 and Fig.13 shows the binary STDP (spike-time dependent plasticity) characteristics of RRAM. After the forward propagation period, the fired neuron will send a feedback signal to the bottom electrode of RRAM, and a pre-pulse will be added to the top electrode according to the input value. The superimposed signal can set/reset the RRAM to LRS/HRS, regardless of the device’s initial resistance. B. Online Learning Scheme and Results: To achieve online learning, the convolutional layer copied from CNN, which takes the 10 kernels scheme in order to reduce the hardware cost, is fixed and used as a feature extractor, thus only the last layer needs to be online-trained. We utilize 7000 0~6 digits selected from MNIST as the test set and the recognition accuracies of each digit based on both binary and 3-level CNN kernel weights are shown in Fig.14. The average performance using binary convolutional weights (93.4%) is a little lower than the accuracy using 3-level weight scheme (94.6%). In addition, the recognition ability of digit 3 in the binary weight scheme is not as good as that of other digits, which has a bad influence on the average accuracy. Conclusion: CNN architecture and circuits are designed and optimized. Both binary and multilevel RRAM based CNN are implemented. Factors that affect recognition accuracy are studied. Online learning is performed based on the developed CNN system. Acknowledgement: This work was supported in part by the NSFC (61334007, 61421005), National Innovation Training Program, and Beijing Municipal Science and Technology Plan Projects. Reference: [1] Alex Krizhevsky et al., NIPS, 2012. [2] L. Gao et al., Nanotechnology, p.455204, 2015. [3] D. Garbin et al., TED, p.2494, 2015. [4] S. Yu et al., EDL, p.870, 2016. [5] S.Ambrogio et al., VLSI, 2016. [6] Wei Wang et al., SNW, 2016. [7] Zhe Chen et al., IEDM, 2015.

RRAM Based Convolutional Neural Networks for High … scheme needs at least 10 kernels, while the scheme using RRAM with 3 conductance states needs only 5 kernels. The thresholds utilized

  • Upload
    dangdat

  • View
    218

  • Download
    2

Embed Size (px)

Citation preview

Page 1: RRAM Based Convolutional Neural Networks for High … scheme needs at least 10 kernels, while the scheme using RRAM with 3 conductance states needs only 5 kernels. The thresholds utilized

RRAM Based Convolutional Neural Networks for High Accuracy Pattern

Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang, P. Huang, L.F. Liu, X.Y. Liu, J.F. Kang*

Institute of Microelectronics, Peking University, Beijing 100871, China * Email: [email protected]

Abstract: In this work, we conduct research on optimizing

schemes for the RRAM-based implementation of CNN.

Our main achievements contain: 1) A concrete CNN circuit

and corresponding operation methods are developed. 2)

Quantification methods for utilizing binary or multilevel

RRAM as synapses are proposed, and our CNN performs

with 98% accuracy on the MNIST dataset using multilevel

RRAM and 97% accuracy using binary RRAM. 3)

Influence of the number and size of kernels, as well as the

device conductance variation on final recognition accuracy

is studied in detail. 4) Online learning tasks are performed

using the developed CNN system with binary STDP

protocol, and 94.6% accuracy on average is achieved.

Introduction: The use of convolutional neural networks

(CNN) has been hugely successful in the field of image

recognition [1]. The RRAM-based implementation of CNN

can reduce energy consumption and physically realize

parallelism [2]. Though many research efforts have been

made [3-6], there’s still a gap between the software

algorithm of CNN and the hardware implementation in

terms of applying RRAM as synapses efficiently while

maintaining accuracy. In this work, practicable strategies

are proposed to obtain high accuracy with CNN using

binary RRAM or multilevel RRAM as synapses. Moreover,

we achieve the online learning task by taking a binary STDP

scheme and utilizing the first layer of CNN as a feature

extractor.

CNN Architecture: As shown in Fig.1, input images are

firstly applied to the convolutional layer. The definition and

formula of convolution operation is illustrated in Fig.2.

Then the output images are pooled and activated. Finally,

the output is reshaped to a vector and conveyed to the fully-

connected layer to make the final decision. Fig.3 shows the

circuit realization of the convolutional layer. To represent

negative weight values, two rows of RRAM are used as one

kernel, namely 𝐼𝑜𝑢𝑡 = 𝐼1 − 𝐼2 and thus 𝑊𝑒𝑓𝑓 = 𝑊1 −𝑊2

(𝑊𝑒𝑓𝑓 is the effective weight). Traditionally, activation

comes before the pooling operation. Changing the sequence

can make the circuit implementation easier while having

little effect on recognition accuracy. ReLU activation is

selected because it is easy to be realized in circuits and its

linearity allows us to achieve 2*2 mean pooling operation

by simply combining four currents together without

division circuits.

Device Fabrication and Characteristics: Fig.4 shows the

process flow for the fabrication of our RRAM crossbar

array. Typical DC I-V characteristics of the AlOx/HfOy-

based RRAM are presented in Fig.5. In order to use a large

number of alternative weight values in quantification

methods, conductance states are obtained by adding small

reset pulses. (Fig.6) [7] Considering the efficiency of

training, a feasible operation scheme among three

conductance states is proposed, as shown in Fig.7. Because

𝑊𝑒𝑓𝑓 = 𝑊1 −𝑊2 , three conductance states with a

proportion of 0:1:3 can represent weights 0, ±1, ±2, ±3. Results and Discussion: Fig.8 shows the relationship

between accuracy and the number of alternative weight

values (No. of |w|) realized by RRAM with multilevel

resistances. The number of |w| has little effect on

recognition accuracy when the number of kernels is higher

than 5. To obtain accuracy higher than 95%, the binary

RRAM scheme needs at least 10 kernels, while the scheme

using RRAM with 3 conductance states needs only 5

kernels. The thresholds utilized in the quantification method

in the binary scheme (deciding a weight from software

CNN corresponding to -1, 0 or 1) can affect the accuracy

significantly. Fig.9 and Fig.10 show the influence of W1

threshold and W2 threshold respectively (W1 are weights in

convolutional kernels and W2 are weights in the fully-

connected layer). By choosing threshold in an appropriate

scope, 97% accuracy can be achieved. Fig.11 illustrates the

system’s tolerance to the variation of device conductance.

Our CNN is highly reliable when device variation is under

50%. Besides, the accuracy of the binary scheme is always

a bit lower than the 3-level scheme, and the fluctuation of

the former tends to be larger than the latter in the same

situation.

Online Learning:

A. Binary STDP: Fig.12 and Fig.13 shows the binary STDP

(spike-time dependent plasticity) characteristics of RRAM.

After the forward propagation period, the fired neuron will

send a feedback signal to the bottom electrode of RRAM,

and a pre-pulse will be added to the top electrode according

to the input value. The superimposed signal can set/reset the

RRAM to LRS/HRS, regardless of the device’s initial

resistance.

B. Online Learning Scheme and Results: To achieve online

learning, the convolutional layer copied from CNN, which

takes the 10 kernels scheme in order to reduce the hardware

cost, is fixed and used as a feature extractor, thus only the

last layer needs to be online-trained. We utilize 7000 0~6

digits selected from MNIST as the test set and the

recognition accuracies of each digit based on both binary

and 3-level CNN kernel weights are shown in Fig.14. The

average performance using binary convolutional weights

(93.4%) is a little lower than the accuracy using 3-level

weight scheme (94.6%). In addition, the recognition ability

of digit 3 in the binary weight scheme is not as good as that

of other digits, which has a bad influence on the average

accuracy.

Conclusion: CNN architecture and circuits are designed

and optimized. Both binary and multilevel RRAM based

CNN are implemented. Factors that affect recognition

accuracy are studied. Online learning is performed based on

the developed CNN system.

Acknowledgement: This work was supported in part by the

NSFC (61334007, 61421005), National Innovation

Training Program, and Beijing Municipal Science and

Technology Plan Projects.

Reference: [1] Alex Krizhevsky et al., NIPS, 2012. [2] L.

Gao et al., Nanotechnology, p.455204, 2015. [3] D. Garbin

et al., TED, p.2494, 2015. [4] S. Yu et al., EDL, p.870, 2016.

[5] S.Ambrogio et al., VLSI, 2016. [6] Wei Wang et al.,

SNW, 2016. [7] Zhe Chen et al., IEDM, 2015.

Page 2: RRAM Based Convolutional Neural Networks for High … scheme needs at least 10 kernels, while the scheme using RRAM with 3 conductance states needs only 5 kernels. The thresholds utilized

-1000 0 1000 2000 3000

103

104

105

Re

sis

tan

ce

(Oh

m)

Delta t (ns)-1000 0 1000 2000 3000

103

104

105

Re

sis

tan

ce

(Oh

m)

Delta t (ns)

2 4 6 8 1080

85

90

95

100

Accu

racy(%

)

No. of alternative weight values

25

20

15

10

5

3

No. of kernels

Fig.4 Process flow for the AlOx/HfOy-

based RRAM crossbar array

fabrication. Corresponding schematic

diagram of each step is shown.

Fig.11 The influence of

device conductance

variation on recognition

accuracy.

Fig.6 Conductance states

obtained by adding

increasing number of small

continuous AC reset pulses.

Fig.7 Three conductance states

(to be used as quantified

weights in the networks) and

corresponding writing methods

for RRAM devices.

Fig.3 Specific circuit of convolutional neural

networks. Two rows of RRAM work as one kernel and

each eight output currents are pooled and activated

together to compute one value in the output matrix,

which will be conveyed to the fully-connected layer.

Fig.5 Measured typical DC

I-V characteristics of RRAM

device in an array.

Fig.9 The influence of W1

weight threshold on recognition

accuracy (keep W2 weight

threshold constant).

Fig.12 Waveform used in binary STDP learning

system. Input voltage bigger than threshold is

followed by Pre1 which corresponds to a 1.4V

100ns set process, while input voltage smaller

than threshold is followed by Pre2 which

corresponds to a 2V 100ns reset process.

Fig.13 (a) Measured LTP characteristics

of RRAM at different initial resistance.

(b) Measured LTD characteristics of

RRAM at different initial resistance. Δt is

the time difference between Pre and Post

signal.

Fig.14 Recognition results of 0~6 digits

respectively, based on both binary

weight CNN and 3-level weight CNN.

The average accuracy of each scheme is

illustrated with dotted line.

Fig.1 The architecture of RRAM-based

convolutional neural network. The first layer

is a convolutional layer and the last layer is a

fully-connected layer. Between them are

pooling, activation and reshaping operations.

-2 -1 0 1 2

10-8

10-7

10-6

10-5

10-4

10-3

Cu

rre

nt(

A)

Voltage(V)0 200 400 600 800 1000

0.0

0.5

1.0

1.5

2.0

2.5

Co

nd

uc

tan

ce

(mS

)

Number of Pulse0.4 0.8 1.2 1.6 2.0

0

20

40

60

80

100

-1.4V 100ns

1.2V 100ns

-1.4V 100ns

Cu

mu

lati

ve

Pro

ba

bil

ity (

%)

Conductance (mS)

1.3V 100ns

5 10 15 20 25 30

75

80

85

90

95

100

Accu

racy(%

)

Threshold of W1 weights

2

5

8

11

14

W2 threshold

0 2 4 6 8 10 12 14 16 1875

80

85

90

95

100

Accu

racy(%

)

Threshold of W2 weights

11

14

17

20

23

W1 threshold

0 20 40 60 80 100

40

60

80

100

3-level weights

binary weightsAcc

ura

cy(%

)

Variation(%)

0 1 2 3 4 5 6

90

95

100

Ac

cu

rac

y(%

)

Digit for Recognition

3-level weight

Binary weight

Average 94.6%

Average 93.4%

Fig.2 Schematic of the

convolution operation

between an input image

and a learned kernel.

Fig.10 The influence of W2

weight threshold on recognition

accuracy (keep W1 weight

threshold constant).

Fig.8 Recognition accuracy as a

function of the number of kernels

and alternative weight values.