Upload
dangdat
View
218
Download
2
Embed Size (px)
Citation preview
RRAM Based Convolutional Neural Networks for High Accuracy Pattern
Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang, P. Huang, L.F. Liu, X.Y. Liu, J.F. Kang*
Institute of Microelectronics, Peking University, Beijing 100871, China * Email: [email protected]
Abstract: In this work, we conduct research on optimizing
schemes for the RRAM-based implementation of CNN.
Our main achievements contain: 1) A concrete CNN circuit
and corresponding operation methods are developed. 2)
Quantification methods for utilizing binary or multilevel
RRAM as synapses are proposed, and our CNN performs
with 98% accuracy on the MNIST dataset using multilevel
RRAM and 97% accuracy using binary RRAM. 3)
Influence of the number and size of kernels, as well as the
device conductance variation on final recognition accuracy
is studied in detail. 4) Online learning tasks are performed
using the developed CNN system with binary STDP
protocol, and 94.6% accuracy on average is achieved.
Introduction: The use of convolutional neural networks
(CNN) has been hugely successful in the field of image
recognition [1]. The RRAM-based implementation of CNN
can reduce energy consumption and physically realize
parallelism [2]. Though many research efforts have been
made [3-6], there’s still a gap between the software
algorithm of CNN and the hardware implementation in
terms of applying RRAM as synapses efficiently while
maintaining accuracy. In this work, practicable strategies
are proposed to obtain high accuracy with CNN using
binary RRAM or multilevel RRAM as synapses. Moreover,
we achieve the online learning task by taking a binary STDP
scheme and utilizing the first layer of CNN as a feature
extractor.
CNN Architecture: As shown in Fig.1, input images are
firstly applied to the convolutional layer. The definition and
formula of convolution operation is illustrated in Fig.2.
Then the output images are pooled and activated. Finally,
the output is reshaped to a vector and conveyed to the fully-
connected layer to make the final decision. Fig.3 shows the
circuit realization of the convolutional layer. To represent
negative weight values, two rows of RRAM are used as one
kernel, namely 𝐼𝑜𝑢𝑡 = 𝐼1 − 𝐼2 and thus 𝑊𝑒𝑓𝑓 = 𝑊1 −𝑊2
(𝑊𝑒𝑓𝑓 is the effective weight). Traditionally, activation
comes before the pooling operation. Changing the sequence
can make the circuit implementation easier while having
little effect on recognition accuracy. ReLU activation is
selected because it is easy to be realized in circuits and its
linearity allows us to achieve 2*2 mean pooling operation
by simply combining four currents together without
division circuits.
Device Fabrication and Characteristics: Fig.4 shows the
process flow for the fabrication of our RRAM crossbar
array. Typical DC I-V characteristics of the AlOx/HfOy-
based RRAM are presented in Fig.5. In order to use a large
number of alternative weight values in quantification
methods, conductance states are obtained by adding small
reset pulses. (Fig.6) [7] Considering the efficiency of
training, a feasible operation scheme among three
conductance states is proposed, as shown in Fig.7. Because
𝑊𝑒𝑓𝑓 = 𝑊1 −𝑊2 , three conductance states with a
proportion of 0:1:3 can represent weights 0, ±1, ±2, ±3. Results and Discussion: Fig.8 shows the relationship
between accuracy and the number of alternative weight
values (No. of |w|) realized by RRAM with multilevel
resistances. The number of |w| has little effect on
recognition accuracy when the number of kernels is higher
than 5. To obtain accuracy higher than 95%, the binary
RRAM scheme needs at least 10 kernels, while the scheme
using RRAM with 3 conductance states needs only 5
kernels. The thresholds utilized in the quantification method
in the binary scheme (deciding a weight from software
CNN corresponding to -1, 0 or 1) can affect the accuracy
significantly. Fig.9 and Fig.10 show the influence of W1
threshold and W2 threshold respectively (W1 are weights in
convolutional kernels and W2 are weights in the fully-
connected layer). By choosing threshold in an appropriate
scope, 97% accuracy can be achieved. Fig.11 illustrates the
system’s tolerance to the variation of device conductance.
Our CNN is highly reliable when device variation is under
50%. Besides, the accuracy of the binary scheme is always
a bit lower than the 3-level scheme, and the fluctuation of
the former tends to be larger than the latter in the same
situation.
Online Learning:
A. Binary STDP: Fig.12 and Fig.13 shows the binary STDP
(spike-time dependent plasticity) characteristics of RRAM.
After the forward propagation period, the fired neuron will
send a feedback signal to the bottom electrode of RRAM,
and a pre-pulse will be added to the top electrode according
to the input value. The superimposed signal can set/reset the
RRAM to LRS/HRS, regardless of the device’s initial
resistance.
B. Online Learning Scheme and Results: To achieve online
learning, the convolutional layer copied from CNN, which
takes the 10 kernels scheme in order to reduce the hardware
cost, is fixed and used as a feature extractor, thus only the
last layer needs to be online-trained. We utilize 7000 0~6
digits selected from MNIST as the test set and the
recognition accuracies of each digit based on both binary
and 3-level CNN kernel weights are shown in Fig.14. The
average performance using binary convolutional weights
(93.4%) is a little lower than the accuracy using 3-level
weight scheme (94.6%). In addition, the recognition ability
of digit 3 in the binary weight scheme is not as good as that
of other digits, which has a bad influence on the average
accuracy.
Conclusion: CNN architecture and circuits are designed
and optimized. Both binary and multilevel RRAM based
CNN are implemented. Factors that affect recognition
accuracy are studied. Online learning is performed based on
the developed CNN system.
Acknowledgement: This work was supported in part by the
NSFC (61334007, 61421005), National Innovation
Training Program, and Beijing Municipal Science and
Technology Plan Projects.
Reference: [1] Alex Krizhevsky et al., NIPS, 2012. [2] L.
Gao et al., Nanotechnology, p.455204, 2015. [3] D. Garbin
et al., TED, p.2494, 2015. [4] S. Yu et al., EDL, p.870, 2016.
[5] S.Ambrogio et al., VLSI, 2016. [6] Wei Wang et al.,
SNW, 2016. [7] Zhe Chen et al., IEDM, 2015.
-1000 0 1000 2000 3000
103
104
105
Re
sis
tan
ce
(Oh
m)
Delta t (ns)-1000 0 1000 2000 3000
103
104
105
Re
sis
tan
ce
(Oh
m)
Delta t (ns)
2 4 6 8 1080
85
90
95
100
Accu
racy(%
)
No. of alternative weight values
25
20
15
10
5
3
No. of kernels
Fig.4 Process flow for the AlOx/HfOy-
based RRAM crossbar array
fabrication. Corresponding schematic
diagram of each step is shown.
Fig.11 The influence of
device conductance
variation on recognition
accuracy.
Fig.6 Conductance states
obtained by adding
increasing number of small
continuous AC reset pulses.
Fig.7 Three conductance states
(to be used as quantified
weights in the networks) and
corresponding writing methods
for RRAM devices.
Fig.3 Specific circuit of convolutional neural
networks. Two rows of RRAM work as one kernel and
each eight output currents are pooled and activated
together to compute one value in the output matrix,
which will be conveyed to the fully-connected layer.
Fig.5 Measured typical DC
I-V characteristics of RRAM
device in an array.
Fig.9 The influence of W1
weight threshold on recognition
accuracy (keep W2 weight
threshold constant).
Fig.12 Waveform used in binary STDP learning
system. Input voltage bigger than threshold is
followed by Pre1 which corresponds to a 1.4V
100ns set process, while input voltage smaller
than threshold is followed by Pre2 which
corresponds to a 2V 100ns reset process.
Fig.13 (a) Measured LTP characteristics
of RRAM at different initial resistance.
(b) Measured LTD characteristics of
RRAM at different initial resistance. Δt is
the time difference between Pre and Post
signal.
Fig.14 Recognition results of 0~6 digits
respectively, based on both binary
weight CNN and 3-level weight CNN.
The average accuracy of each scheme is
illustrated with dotted line.
Fig.1 The architecture of RRAM-based
convolutional neural network. The first layer
is a convolutional layer and the last layer is a
fully-connected layer. Between them are
pooling, activation and reshaping operations.
-2 -1 0 1 2
10-8
10-7
10-6
10-5
10-4
10-3
Cu
rre
nt(
A)
Voltage(V)0 200 400 600 800 1000
0.0
0.5
1.0
1.5
2.0
2.5
Co
nd
uc
tan
ce
(mS
)
Number of Pulse0.4 0.8 1.2 1.6 2.0
0
20
40
60
80
100
-1.4V 100ns
1.2V 100ns
-1.4V 100ns
Cu
mu
lati
ve
Pro
ba
bil
ity (
%)
Conductance (mS)
1.3V 100ns
5 10 15 20 25 30
75
80
85
90
95
100
Accu
racy(%
)
Threshold of W1 weights
2
5
8
11
14
W2 threshold
0 2 4 6 8 10 12 14 16 1875
80
85
90
95
100
Accu
racy(%
)
Threshold of W2 weights
11
14
17
20
23
W1 threshold
0 20 40 60 80 100
40
60
80
100
3-level weights
binary weightsAcc
ura
cy(%
)
Variation(%)
0 1 2 3 4 5 6
90
95
100
Ac
cu
rac
y(%
)
Digit for Recognition
3-level weight
Binary weight
Average 94.6%
Average 93.4%
Fig.2 Schematic of the
convolution operation
between an input image
and a learned kernel.
Fig.10 The influence of W2
weight threshold on recognition
accuracy (keep W1 weight
threshold constant).
Fig.8 Recognition accuracy as a
function of the number of kernels
and alternative weight values.