Upload
vunga
View
218
Download
2
Embed Size (px)
Citation preview
• K. Rupp, “40 Years of Microprocessor Trend Data,” blog • K. Bresniker et al., "Adapting to Thrive in a New Economy of Memory Abundance," Computer 2015.
New Computing Devices
No Memory Bottleneck
Classical Processing
Cognitive Processing
Energy Efficient
RRAM Crossbar Array
Inte
rfac
e
Interface
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
RRAM Crossbar Array
Inte
rfac
e
Interface
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
Crossbar Array
Array Tile
Crossbar Array
Inte
rfac
e
Interface
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
Array Tile
Storage
A tile can be assigned to,
Crossbar Array
Inte
rfac
e
Interface
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
Storage
A tile can be assigned to,
Digital Computing
Crossbar Array
Inte
rfac
e
Interface
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
Digital Computing
A tile can be assigned to,
Neural Networks
M-Core M-Core M-Core
M-Core M-Core M-Core
M-Core M-Core M-Core
Unused
Storage
Digital Computing
Analog Computing
M-Core M-Core M-Core
M-Core M-Core M-Core
M-Core M-Core M-Core
M-Core M-Core M-Core
M-Core M-Core M-Core
M-Core M-Core M-Core
Unused
Storage
Digital Computing
Analog Computing
M-Core M-Core M-Core
M-Core M-Core
M-Core M-Core M-Core
M-Core
M-Core M-Core M-Core
M-Core M-Core
M-Core M-Core M-Core
M-Core
Unused
Storage
Digital Computing
Analog Computing
M-Core M-Core
M-Core M-Core
M-Core M-Core
M-Core
M-Core
M-Core
RRAM Type Analog Binary
Device Levels <100 2
ON/OFF Ratio Average High
Endurance Average High
Programing Slow Fast
RRAM Type Analog Binary
Device Levels <100 2
ON/OFF Ratio Average High
Endurance Average High
Programing Slow Fast
Data Storage Digital Computing Neural Networks Internal Data Movement
RRAM Type Analog Binary
Device Levels <100 2
ON/OFF Ratio Average High
Endurance Average High
Programing Slow Fast
Data Storage Digital Computing TR & PDE Neural Networks BCNN Internal Data Movement In-Situ
• Our approach is utilized binary RRAM devices to implement a semi-analog (hybrid) neural network.
• Each classical analog device is replaced with “n” binary devices in the new network.
• The number of synaptic-weight bits can be dynamically configured when needed.
7
• Low training rate is required to train the network receptive fields (dictionaries) properly.
• This is translated into a larger number of bits to allow small “∆𝑤” values.
“n = 4” “n = 8” “n = 16”
12
• Typically, training is infrequent or is performed offline.
• Hence, after training the number bits per synaptic weights can be significantly reduced by assigning fewer columns per neuron.
Training Stage Regular Operation
13
• In analog image compression (sparse coding) each piece of a picture is represented as weighted combination of the network dictionary.
• We adopted locally competitive algorithm (LCA) to perform the analog image compression.
+
14
Zero One
0
4
8
12
16
i0 i1 i2 i3 i4 i5 i6 i7
Ou
tpu
t C
urr
ent
(µA
)
Output Node
𝑽𝒓
𝑽𝒓
𝑽𝒓
𝑽𝒓
𝑽𝒓
𝑰𝟎 𝑰𝟏 𝑰𝟐 𝑰𝟑 𝑰𝟒 𝑰𝟓 𝑰𝟔 𝑰𝟕
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
M-Core
R R R R R R R R
R R R R R R R R
R R R R R R R R
R R R S R R R R
R R R R R R R R
R R R R R R R R
R R R R R R R R
R R R R R R R R
M-Core
“32 × 32” Tile
0
150
300
450
600
750
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Nu
mb
er o
f O
ccu
rren
ces
Current (µA)
44,800 Simulation points
𝑋
𝐴 𝐵 𝐶 𝐷 ∙
𝐸 𝐼 𝑀𝐹𝐺
𝐽𝐾
𝑁𝑂
𝐻 𝐿 𝑃
= 𝑋 𝑌 𝑍
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
𝐴0
𝐵0
𝐶0
𝐷0
→
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ●
𝐴1
𝐵1
𝐶1
𝐷1
→
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
𝐴2
𝐵2
𝐶2
𝐷2
→
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
● ● ● ● ● ● ●
● ● ●
𝐴𝐸 + 𝐵𝐹 + 𝐶𝐺 + 𝐷𝐻
𝐴𝐼 + 𝐵𝐽 + 𝐶𝐾 + 𝐷𝐿
𝐴𝑀 + 𝐵𝑁 + 𝐶𝑂 + 𝐷𝑃
1
2
3
4
𝑬 𝑰 𝑴
𝑭 𝑱 𝑵
𝑮 𝑲 𝑶
𝑯 𝑳 𝑷
𝑨
𝑩
𝑪
𝑫
𝑌
𝑍
I/P Data
Data
111
→
111
→
111
→
Weather Forecasting
Electromagnetics
Heat Transfer Aerodynamics
Fluids Flow
Many Others https://www.horiba-mira.com/
http://www.metrosystems-des.com NASA
https://www.mentor.com/
http://www.theseus-fe.com
Hurricane Irma
Step “1” Step “2”
𝑉 ≤ −𝑉𝑤
𝑉 ≥ 𝑉𝑤
0.5 𝑉𝑟 ≤ 𝑉 < 0.75 𝑉𝑤
𝑉 < 0.5 𝑉𝑟
0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1 1 1 0 1 1 1
Storage / Memory
Digital Computing
Neural Networks
RRAM Devices
New Computing Devices
No Memory Bottleneck
Classical Process
Cognitive Process
Low Power Consumption
Scalable
Storage / Memory
Digital Computing
Neural Networks
RRAM Devices
1
10
100
1000
10000
10 100 1000 10000
CongestivePerformance(G
O/s)
PeakDPPerformance(GO/s)
GPUs
CPUs
FPCA
TrueNorth