View
216
Download
2
Category
Preview:
Citation preview
August 21st, 2016 l JIN KIM l Samsung Electronics
The future of graphic and mobile memory for new applications
2/24
Disclaimer
This presentation is intended to provide information concerning memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.
The information in this presentation or accompanying oral statements may include forward-looking
statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.
3/24
Contents
• Memory technology trend
• High speed graphic technology ( >10Gbps)
• Low power mobile technology ( >20%)
• Conclusion
4/24
Memory technology trend
5/24
Higher Performance Autonomous
Artificial
Intelligence
Computer Vision
Virtual
Reality Memory -Centric
Computing
Lower Power
x10 Bandwidth
1
0.5
x0.5 Power Efficiency
Memory is at the core of new applications
256GB/s
30GB/s
LP4X LP3
0.7
HBM2 GDDR5 LP4
Source: Samsung
6/24
Memory Evolution
Memory-centric system evolution • Extreme B/W, performance/power, data processing, cost effective solutions
Core Clock SoC
Multi-Core
Time
Me
mo
ry W
all
Eff
icie
ncy (
Pe
rfo
rm./
Pow
er,
Cost)
A.I., VR/MR, Vision
Data Traffic,
Cost, Thermal
off-loading
customized
processing
lower power
noise immune
high speed
Value, UX
Perform extension
PC/Server, Mobile, Gfx
DDR5/LP5/GDDR6
Low Cost HBM/PIM
7/24
Memory technology trend
Power Efficiency
[mW/GBps]
100%
80%
60%
40%
20%
2020 2016 2018
Performance
[Gbps/pin] 15
12
6
3
LP5
LP4X LP4
2016 2018 2020
DDR4
9
DDR5
GDDR5
LP4
• GDDR6 with over 14Gbps, beyond 10Gbps GDDR5
• LP5, 20% more power-efficient than LP4X
LP5
GDDR6
DDR5
LP4X
GDDR5
DDR4
LP3
DDR3
Source: ISCA2016, Samsung
GDDR6
8/24
High Bandwidth Memory: HBM
PCB
DRAM
Buffer Logic Processor
Si Interposer
HBM
TSV Technology
1,024 I/O Architecture
Benefits
Microbump
8H stacked 20nm 8GB HBM
HBM
GDDR5
X 0.8
Power Efficiency
High Bandwidth 1TB/s
X 2.7
Performance
HBM
GDDR5
Source: Samsung
9/24
Processing In Memory: PIM • Fill the performance gap and deliver energy-efficient solutions
Processing In-Memory Better parallelism and lower bus traffic
CPU
DRAM
Source: Samsung
Processing In Buffer Processing In DRAM
AP GPU/VPU
Memory off-loading for lower frequency and power
10/24
High speed graphic technology ( >10Gbps)
• Graphic application requirement
• Asymmetric System, Crosstalk, EQ tuning
• GDDR6, Low cost HBM, PIM
11/24
High speed memory requirement • For 4K real infographic virtual reality, 13.2GB, 1TB/s memory needed
• For 4K 3D mixed reality, +3.5GB, 151GB/s memory needed
90 462
3,216
215
1064
3,640
QHD 4K UHD 8K UHD
2
8
13
6
13.2
23.6
QHD 4K UHD 8K UHD
Main H/E
1.0
2.7
9.0
1.6
3.5
11.6
QHD 4K UHD 8K UHD
Main H/E
28 101
527
42 151
791
QHD 4K UHD 8K UHD
[ Gfx Capacity, GB ]
[ B/W, GB/s ]
[ Added Capacity, GB ]
[ B/W, GB/s ]
Gaming Virtual Reality memory Mixed Reality memory
Source: Samsung Variable Assumption
Poly count, fps, # of texture per fragment, cache hit rate, tri-linear filtered,
# of virtual light source, Reflection/refraction ratio, ray bounce depth
12/24
Asymmetric system for higher data rate • Focus on the respectively dedicated features to maximize data rate
‒ Smart GPU : Training (Per-bit Timing/EQ) for minimizing static offset/noise
‒ Noise immune DRAM : minimizing dynamic noise (Jitter, ISI/x-talk, clock duty/skew)
CMD/AMD
PLL/DLL
Data Tx/Rx
D Q
Clock Phase
controller
DQ D Q
Phase
Detector
CTLE
DQ[0:7] D Q
D Q
DRAM
Core
To EDC pin
D Q
D Q DRAM
Core
WCK_t
WCK_c
CK_t
CK_c
CA[0:9]
Calibration data
Noise immune
circuit/PKG
Jitter
ISI
X-talk
Training(Timing/EQ)
Board/PKG SI/PI
GPU DRAM
Source: Samsung
13/24
X-talk reduction for Board/PKG design
• Small X-talk Package : reduction of X-talk with better return path
• Crosstalk Reduction with coding : 3B4B, 8B9B
Small X-talk PKG requirement 3B4B encoding
Crosstalk Reduction
GDDR5
Source: Samsung
ICR: Insertion loss to Crosstalk Ratio
14/24
DFE for return-loss reduction on system
CTLE & DFE
CTLE and DFE
Periodically Calibrated by GPU
Quarter rate DFE with summer in sampler
Adopt merged summer/sampler for fast feedback
Source: Samsung
• Single ended signaling requires noise immune equalizer
‒ DFE* is more suitable than CTLE**
* Decision Feedback Equalization
** Continuous Time Linear Equalization
DQ
FIFO
8GHz
WCK/WCKB
/2 4
4GHz
RX EQ FIFO
MUX TX
CLK buffer
4
4
15/24
GDDR6 ideas • High Speed Signaling, 14Gbps ~ 16Gbps, 1.35V
‒ Low jitter clocking with WCK/byte, Per-bit RX/TX equalizer training, X-talk reduction
‒ 2 channel with BL16, same Clock/ADD freq., twice of WCK/DQ freq.
Target Timing WCK Clocking
RX
WCK tree WCK
GPU DRAM
GPLL
TX
WR RD
Noise immune DRAM
Word Byte
DQ
14Gbps
~16Gbps
7GHz
~8GHz
GDDR5
GDDR6
CK : 1.75Gbps
CMD : 1.75Gbps
ADDR : 3.5Gbps
WCK : 3.5Gbps
DQ : 7Gbps
CK : 1.75Gbps
CA : 3.5Gbps
WCK : 7Gbps
DQ : 14~Gbps
Source: Samsung
16/24
Low cost HBM for consumer segment • ~ 200GBps with smaller # of TSV compared to HBM2
‒ Cost competitiveness ; remove buffer die, reduce # of TSV, organic interposer, etc..
‒ Need inputs from Client segment for specific features
Challenges
1. IO reduction, Smaller # of TSV
2. Remove buffer die
3. Master/Slave structure
4. Remove ECC
5. Si or organic Interposer
Challenge for HBM Comparison
HBM2 Low cost HBM
I/O 1024 ~512
Pin speed 2Gbps 3Gbps ~
BW (GB/s) 256 ~ 200
Cost/GB 1 0.X
PCB
Si Interposer
HBM
5
1 2
3 4
Buffer
DRAM
Logic Processor
Source: Samsung
17/24
PIM, Deep Learning in DRAM • Parallel processing in buffer to reduce extreme-bandwidth
‒ convolution, subsampling, matrix calculation
• Collaborate with accelerator for performance/cost
Processing in Buffer Extreme B/W Requirement
GPGPU
HBM/GDDRx
CPU
Mem
CPU
Mem
Accelerator
xHBM xHBM
CPU
+
GPU+HBM/GDDRx
CPU
+
Acc.s+xHBMs*
X10 (# of core)
Convolution / Subsampling Deep Learning
In Buffer
DRAM
DRAM
DRAM
DRAM
Ac
ce
lera
tor
Accelerator
xHBM xHBM
X10 (# of core)
* xHBM: Extreme HBM
Data movement reduction
18/24
Low power mobile technology ( >20%)
• Motivation for low power mobile
• LP4X / LP5
• PIM
19/24
Motivation for low power mobile
10
1
10-1
10-2
Pow
er
Dis
sip
ation [
W]
‘00 ‘20 ‘10 ‘15 ‘05
Thermal Limit (hand-held device)
Static Power
Dynamic Power Power Gap
Lower Power design
[Year]
Power Dissipation Trend
TDP [Watt]
GFLO
PS (
GPU
)
0 300 100 200
3K
4K
1K
5K
2K
Desktop
Notebook
Mobile
Oculus Rift (+GTX Card) PC Graphic
Performance
• PC-level graphic performance and mobile power budget
• Power is continuously increasing with limited thermal budget
Source: Samsung
Performance vs. TDP
*TDP(Thermal Design Power)
20/24
Lower power solution, LP4X
LP4X Power Reduction LP4X Idea
• LP4X : 4266Mbps, VDDQ/VDD = 0.6V/1.1V
‒ IO power reduction with 0.6V VDDQ, Good example of small change but big gain
Source: Samsung
CHANNELVOH
Pre
-dri
ve
r
DQ
Rterm
MNDW
MNUP
VO0
VDDQ (=1.1V)VOH =VDDQ/3
VREF =VOH/2
1-UI
GNDVO
CHANNELVO
Pre
-dri
ve
r
DQ
Rterm
VDDQL (=0.6V)
from AP
MNDW
MNUP
VO0
VDDQ (=1.0V)
VOH =VDDQ/2
VREF =VOH/2
1-UI
GNDVO
LP4X
•Conditions : IDD4R(VDDQ+VDD2) Spec Value / 50% Data change each burst transfer / Included process node contribution
LP4 3200 LP4 3733 LP4 4266 LP4X 4266
IO
Core
18% Total Power Saving!!!! LP4
-45%
Same Swing
Same VOH
Half-level VDDQ
1.1V
0.6V
21/24
LP5 target & ideas • LP5 : 6400Mbps, VDDQ/VDD < 0.6V/1.1V
‒ Extremely high band-width(~6.4Gbps) and smart power reduction(~20%)
LP5 ideas Power Efficiency Trend
Source: Samsung
LP2 LP3 LP4 LP4X LP5
[mW/Gbps]
* Pin Speed
- LP2 : 800Mbps ~ 1066
- LP3 : 1600Mbps ~ 1866
- LP4 : 3200Mbps ~ 3733
- LP4X : 4266Mbps
20%
35%
39%
18%
CMD Based
Data CLK(WCK)
WCK
Center-tap term
Deep
Sleep Mode
• IDD4W/R reduction
• IDD6 reduction
• IDD2N reduction
CK/CKLPDDR4 :
BL0 BL1 BL2 BL11 BL12 BL13 BL14 BL15
DQS_C/T
DQ
LPDDR5 :
BL0 BL1 BL2 BL11 BL12 BL13 BL14 BL15
WCK
DQ
CK
(Single-Ended)
Over 50%
IDD2N Reduction
Over 5%
IDD4W/4R Reduction
Over 30%
IDD6 Reduction
22/24
PIM, Lower power processing • Memory off-loading for reduced power consumption
‒ Reduce the unnecessary data transfer and frame rate control
• Collaborate with SoC/AP for performance/power
‒ PoC with special memory for post/pre-processing
AP CIS Display
AMBA AHB
Display
CIS AP
Pre/Post Processing In Memory
Memory Off-loading Solution Memory B/W Traffic
VPU
Recognition Distortion FRC Correction Limited Power Budget
Severe
Data Traffic
23/24
Conclusion
24/24
Conclusion
• Memory requirements have become more strict in time with respect to
performance, power, and cost
• Keeps innovating technology to correspond to those requirements
‒ Make efforts to extend the value of traditional memory
‒ Figure out innovative memory solution
• Close collaboration with partners is essential for delivering the right
memory solution.
kjh5555@samsung.com
Recommended