Upload
hoangdien
View
214
Download
1
Embed Size (px)
Citation preview
Memory Intensive Architectures
1
Shahar KvatinskyViterbi Faculty of Electrical EngineeringTechnion – Israel Institute of Technology
ICRI-CI June 2017
MemristorsEmerging Nonvolatile Memory Technologies
STT MRAMResistive
RAM
(RRAM)
Phase Change
Memory
(PCM)
2
Example: Intel 3D Xpoint
3
• 16 GB, PCIe 3.0, 241 mm2
• 20nm process, 4F2, 1S1R cells between M4 and M5
• 91.4% memory efficiency (4.5X higher than DRAM)
Memristors Add New Capabilities to CMOSSea of memory above the logic
4 Dense, nonvolatile, fast, and CMOS compatible
Memory
Memory Intensive Architectures
• Tight integration of memory and logic
–Bringing memory to logic
– In-memory computing5
Input OutputCPU
MIA Research Projects
6
Memory design
Memristive Memory Processing Unit
Embedded memory
Neuromorphic computing
Memory Design and Methodologies
• Understanding the fundamental issues in resistive memory
• Circuits for memory (Ramadan et al., submitted to TCAS-I)
• Coding for RRAM (Cassuto et al., ISIT 13, 16, TIT 16)
• RRAM/PCM in the memory system (Nishil Talati)
7
On-Die Intensive Memory
Circuits and Architectures
8
Multistate Register (TVLSI 15)
ContinuousFlow
Multithreading(CAL 14)
IoTRFIC (Wainstein et al.,ISCAS 17, Memrisys 17)
Neuromorphic Computing
9
• Online gradient descent training (TNNLS 15, ISCAS 16)
• Machine learning accelerators (Tzofnat Greenberg)
• Configurable mixed signal circuits (Loai Danial)
Agenda
• Memristors and MIA
• Memristive MPU (mMPU) architecture
• Summary
10
Processing “In-Memory” (PIM)Reducing Data Movement
4
Micron Automata Memory
Processor
Prior Art
SA connected to SIMD pipeline
Configuration PIM machine
Active Pages90’s
Recent
M. Gokhale et al., “Processing in memory: the Terasys massively parallel PIM array,” Computer, 1995
M. Oskin et al., “Active pages: A computation model for intelligent memory,” Comput. Archit. News, 1998
D. Elliott et al., “Computational ram: Implementing processors in memory,” IEEE Des. Test, 1999
P. Dlugosch et al., "An Efficient and Scalable Semiconductor Architecture for Parallel Automata Processing," IEEE TPDS, 2014
Processing “In-Memory” (PIM)Reducing Data Movement
4
Data transfer is still required to/from DRAM and PUs
CPU
CMOS
Processing
Units (PUs)
Memory
(DRAM) Memory
(DRAM)
Real Computing within the MemoryBeyond von Neumann Architecture
13
Memory
Control Unit
Arithmetic/Logic Unit
Input Device
Output Device
CPU
MemoryProcessing
Unit(MPU)
mMPU: Solving thevon Neumann Bottleneck
mMPU: performing
computation USING
the memristive
memory cells
5
Moving from DRAM to
memristive memory
Clock, Address, Data, and Controls
CPU
mMPUmMPU
Logic within MemoryLogic Families
15
MAGIC(TCAS II 14, TNANO 16)
IMPLY(ICCD 11, TVLSI 14)
Unipolar logic(ICSEE 16, VLSI-Soc 16)
Akers array(MEJ 14)
𝑦0
𝑥0
𝑀𝑍
𝑓01
𝑥1
𝑀𝑍
𝑦1
𝑓10
𝑀𝑍
𝑓𝑜𝑢𝑡
𝑓𝑜𝑢𝑡
𝑀𝑍
𝑀𝑍
𝐴
𝐴 𝐵
𝐵 ′1′
0 ′1′
𝑓𝑜𝑢𝑡
𝐛 Array for 𝑿𝑶𝑹 𝑨, 𝑩 𝐚 Array 2x2 model
′0′
′0′
𝑀𝑍
𝑀𝑍
𝑀𝑍
Memristor – Memory ResistorResistor with Varying Resistance
Decrease resistanceIncrease resistance
Current
Voltage
Current
16
MAGIC – Memristor Aided LoGICExample of MAGIC NOR
17
NORIN2IN1
100
010
001
011
RON
ROFF >> RON
ROFF
ROFF <<VG
Increase resistance
>VG/2
ROFF
RON
RON
Initialize OUT to RON
S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, "MAGIC – Memristor Aided LoGIC," IEEE TCAS II, Nov. 2014
R ON =Logic ‘1’R OFF=Logic ‘0’
18
Real MAGIC
B. C. Jung et al., “Zero-static-power nonvolatile logic-in-memory circuits for flexible electronics,” Nano Research, April 2017
VG VG
IN1 IN2OUT
19
MAGIC NOR in a Crossbar
VG VG
IN1 IN2OUT
20
MAGIC NOR in a Crossbar
21
VIsolate
VG VG
IN1 IN2OUT
IN INOU
T
VG VG
IN1 IN2 OUT
MAGIC NOR in a Memristive Memory
N. Talati, S. Gupta, P. Mane, and S. Kvatinsky, “Logic Design within Memristive Memories Using MAGIC," IEEE Transactions on Nanotechnology, July 2016
Hierarchy of Logical Functions
22
MUL
COPYNOT
DIV
NANDOR
Convolution
MAGIC - NOR
ADD SUB
XOR
NOR AND
Matrix multiplication
SQRTPOW
23
𝑓𝑛:
,
=
a0
a1
a2
an
b0
b1
b2
bn
c0
c1
c2
cn
Latency of the vector operation is
independent of the length of the vector
Control𝑓𝑛: 𝑅𝑛 × 𝑅𝑛 → 𝑅𝑛
Parallel Vector Operation within Memristive MPU
24
MemristiveMemory
Ro
w C
on
trol
Column Control
mM
PU
Co
ntro
ller
a0
a1
a2
an
b0
b1
b2
bn
mMPU µArchitecture
R. Ben-Hur and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium
on Nanoscale Architectures, July 2016
MemristiveMemory
25
Ro
w C
on
trol
Column Control
mM
PU
Co
ntro
ller
a0
a1
a2
an
b0
b1
b2
bn
mMPU µArchitecture
R. Ben-Hur and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium
on Nanoscale Architectures, July 2016
MemristiveMemory
26
Ro
w C
on
trol
Column Control
mM
PU
Co
ntro
ller
a0
a1
a2
an
b0
b1
b2
bn
mMPU µArchitecture
R. Ben-Hur and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium
on Nanoscale Architectures, July 2016
MemristiveMemory
27
Ro
w C
on
trol
Column Control
mM
PU
Co
ntro
ller
a0
a1
a2
an
b0
b1
b2
bn
c0
c1
c2
cn
mMPU µArchitecture
R. Ben-Hur and S. Kvatinsky, "Memory Processing Unit for In-Memory Processing," Proceedings of the IEEE/ACM International Symposium
on Nanoscale Architectures, July 2016
DRAM
DIMMMemristive
MemorymMPU
mMPU SystemsAccelerator or Main Memory?
28
Clock, Address, Data, and Controls
CPU
mMPU
Accelerators
?Memristive
memory with
processing
capabilities
29
Issues Involved in mMPU Architecture
CPU
mMPU
mMPU Architecture
mMPU Controller
?
Memory
Design
Periphery
Design
mMPU
Controller
Design and
Optimization
Programming
Model
Software
Applications
Agenda
• Memristors and MIA
• Memristive MPU (mMPU) architecture
• Summary
30
Memristors to the Rescue?
• New technologies enable memory intensive
architectures
• Better processors (multithreading, low power)
• Accelerators (machine learning)
• Smart memories (memory processing unit)
31
Thanks!
32