The future of graphic and mobile memory for new applications · August 21st, 2016 l JIN KIM l...

August 21st, 2016 l JIN KIM l Samsung Electronics

The future of graphic and mobile memory for new applications

Disclaimer

This presentation is intended to provide information concerning memory industry. We do our best to make sure that information presented is accurate and fully up-to-date. However, the presentation may be subject to technical inaccuracies, information that is not up-to-date or typographical errors. As a consequence, Samsung does not in any way guarantee the accuracy or completeness of information provided on this presentation. Samsung reserves the right to make improvements, corrections and/or changes to this presentation at any time.

The information in this presentation or accompanying oral statements may include forward-looking

statements. These forward-looking statements include all matters that are not historical facts, statements regarding the Samsung Electronics' intentions, beliefs or current expectations concerning, among other things, market prospects, growth, strategies, and the industry in which Samsung operates. By their nature, forward-looking statements involve risks and uncertainties, because they relate to events and depend on circumstances that may or may not occur in the future. Samsung cautions you that forward looking statements are not guarantees of future performance and that the actual developments of Samsung, the market, or industry in which Samsung operates may differ materially from those made or suggested by the forward-looking statements contained in this presentation or in the accompanying oral statements. In addition, even if the information contained herein or the oral statements are shown to be accurate, those developments may not be indicative developments in future periods.

Contents

• Memory technology trend

• High speed graphic technology ( >10Gbps)

• Low power mobile technology ( >20%)

• Conclusion

Memory technology trend

Higher Performance Autonomous

Artificial

Intelligence

Computer Vision

Virtual

Reality Memory -Centric

Computing

Lower Power

x10 Bandwidth

x0.5 Power Efficiency

Memory is at the core of new applications

256GB/s

30GB/s

LP4X LP3

HBM2 GDDR5 LP4

Source: Samsung

Memory Evolution

Memory-centric system evolution • Extreme B/W, performance/power, data processing, cost effective solutions

Core Clock SoC

Multi-Core

A.I., VR/MR, Vision

Data Traffic,

Cost, Thermal

off-loading

customized

processing

lower power

noise immune

high speed

Value, UX

Perform extension

PC/Server, Mobile, Gfx

DDR5/LP5/GDDR6

Low Cost HBM/PIM

Memory technology trend

Power Efficiency

[mW/GBps]

2020 2016 2018

Performance

[Gbps/pin] 15

LP4X LP4

2016 2018 2020

• GDDR6 with over 14Gbps, beyond 10Gbps GDDR5

• LP5, 20% more power-efficient than LP4X

Source: ISCA2016, Samsung

High Bandwidth Memory: HBM

Buffer Logic Processor

Si Interposer

TSV Technology

1,024 I/O Architecture

Benefits

Microbump

8H stacked 20nm 8GB HBM

Power Efficiency

High Bandwidth 1TB/s

Performance

Source: Samsung

Processing In Memory: PIM • Fill the performance gap and deliver energy-efficient solutions

Processing In-Memory Better parallelism and lower bus traffic

Source: Samsung

Processing In Buffer Processing In DRAM

AP GPU/VPU

Memory off-loading for lower frequency and power

High speed graphic technology ( >10Gbps)

• Graphic application requirement

• Asymmetric System, Crosstalk, EQ tuning

• GDDR6, Low cost HBM, PIM

High speed memory requirement • For 4K real infographic virtual reality, 13.2GB, 1TB/s memory needed

• For 4K 3D mixed reality, +3.5GB, 151GB/s memory needed

90 462

QHD 4K UHD 8K UHD

Main H/E

QHD 4K UHD 8K UHD

Main H/E

28 101

42 151

QHD 4K UHD 8K UHD

[ Gfx Capacity, GB ]

[ B/W, GB/s ]

[ Added Capacity, GB ]

[ B/W, GB/s ]

Gaming Virtual Reality memory Mixed Reality memory

Source: Samsung Variable Assumption

Poly count, fps, # of texture per fragment, cache hit rate, tri-linear filtered,

# of virtual light source, Reflection/refraction ratio, ray bounce depth

Asymmetric system for higher data rate • Focus on the respectively dedicated features to maximize data rate

‒ Smart GPU : Training (Per-bit Timing/EQ) for minimizing static offset/noise

‒ Noise immune DRAM : minimizing dynamic noise (Jitter, ISI/x-talk, clock duty/skew)

CMD/AMD

PLL/DLL

Data Tx/Rx

Clock Phase

controller

DQ D Q

Detector

DQ[0:7] D Q

To EDC pin

D Q DRAM

CA[0:9]

Calibration data

Noise immune

circuit/PKG

Jitter

X-talk

Training(Timing/EQ)

Board/PKG SI/PI

GPU DRAM

Source: Samsung

X-talk reduction for Board/PKG design

• Small X-talk Package : reduction of X-talk with better return path

• Crosstalk Reduction with coding : 3B4B, 8B9B

Small X-talk PKG requirement 3B4B encoding

Crosstalk Reduction

Source: Samsung

ICR: Insertion loss to Crosstalk Ratio

DFE for return-loss reduction on system

CTLE & DFE

CTLE and DFE

Periodically Calibrated by GPU

Quarter rate DFE with summer in sampler

Adopt merged summer/sampler for fast feedback

Source: Samsung

• Single ended signaling requires noise immune equalizer

‒ DFE* is more suitable than CTLE**

* Decision Feedback Equalization

** Continuous Time Linear Equalization

WCK/WCKB

RX EQ FIFO

MUX TX

CLK buffer

GDDR6 ideas • High Speed Signaling, 14Gbps ~ 16Gbps, 1.35V

‒ Low jitter clocking with WCK/byte, Per-bit RX/TX equalizer training, X-talk reduction

‒ 2 channel with BL16, same Clock/ADD freq., twice of WCK/DQ freq.

Target Timing WCK Clocking

WCK tree WCK

GPU DRAM

Noise immune DRAM

Word Byte

14Gbps

~16Gbps

CK : 1.75Gbps

CMD : 1.75Gbps

ADDR : 3.5Gbps

WCK : 3.5Gbps

DQ : 7Gbps

CK : 1.75Gbps

CA : 3.5Gbps

WCK : 7Gbps

DQ : 14~Gbps

Source: Samsung

Low cost HBM for consumer segment • ~ 200GBps with smaller # of TSV compared to HBM2

‒ Cost competitiveness ; remove buffer die, reduce # of TSV, organic interposer, etc..

‒ Need inputs from Client segment for specific features

Challenges

1. IO reduction, Smaller # of TSV

2. Remove buffer die

3. Master/Slave structure

4. Remove ECC

5. Si or organic Interposer

Challenge for HBM Comparison

HBM2 Low cost HBM

I/O 1024 ~512

Pin speed 2Gbps 3Gbps ~

BW (GB/s) 256 ~ 200

Cost/GB 1 0.X

Si Interposer

Buffer

Logic Processor

Source: Samsung

PIM, Deep Learning in DRAM • Parallel processing in buffer to reduce extreme-bandwidth

‒ convolution, subsampling, matrix calculation

• Collaborate with accelerator for performance/cost

Processing in Buffer Extreme B/W Requirement

HBM/GDDRx

Accelerator

xHBM xHBM

GPU+HBM/GDDRx

Acc.s+xHBMs*

X10 (# of core)

Convolution / Subsampling Deep Learning

In Buffer

Accelerator

xHBM xHBM

X10 (# of core)

* xHBM: Extreme HBM

Data movement reduction

Low power mobile technology ( >20%)

• Motivation for low power mobile

• LP4X / LP5

• PIM

Motivation for low power mobile

ation [

‘00 ‘20 ‘10 ‘15 ‘05

Thermal Limit (hand-held device)

Static Power

Dynamic Power Power Gap

Lower Power design

[Year]

Power Dissipation Trend

TDP [Watt]

0 300 100 200

Desktop

Notebook

Mobile

Oculus Rift (+GTX Card) PC Graphic

Performance

• PC-level graphic performance and mobile power budget

• Power is continuously increasing with limited thermal budget

Source: Samsung

Performance vs. TDP

*TDP(Thermal Design Power)

Lower power solution, LP4X

LP4X Power Reduction LP4X Idea

• LP4X : 4266Mbps, VDDQ/VDD = 0.6V/1.1V

‒ IO power reduction with 0.6V VDDQ, Good example of small change but big gain

Source: Samsung

CHANNELVOH

VDDQ (=1.1V)VOH =VDDQ/3

VREF =VOH/2

CHANNELVO

VDDQL (=0.6V)

from AP

VDDQ (=1.0V)

VOH =VDDQ/2

VREF =VOH/2

•Conditions : IDD4R(VDDQ+VDD2) Spec Value / 50% Data change each burst transfer / Included process node contribution

LP4 3200 LP4 3733 LP4 4266 LP4X 4266

18% Total Power Saving!!!! LP4

Same Swing

Same VOH

Half-level VDDQ

LP5 target & ideas • LP5 : 6400Mbps, VDDQ/VDD < 0.6V/1.1V

‒ Extremely high band-width(~6.4Gbps) and smart power reduction(~20%)

LP5 ideas Power Efficiency Trend

Source: Samsung

LP2 LP3 LP4 LP4X LP5

[mW/Gbps]

* Pin Speed

- LP2 : 800Mbps ~ 1066

- LP3 : 1600Mbps ~ 1866

- LP4 : 3200Mbps ~ 3733

- LP4X : 4266Mbps

CMD Based

Data CLK(WCK)

Center-tap term

Sleep Mode

• IDD4W/R reduction

• IDD6 reduction

• IDD2N reduction

CK/CKLPDDR4 :

BL0 BL1 BL2 BL11 BL12 BL13 BL14 BL15

DQS_C/T

LPDDR5 :

BL0 BL1 BL2 BL11 BL12 BL13 BL14 BL15

(Single-Ended)

Over 50%

IDD2N Reduction

Over 5%

IDD4W/4R Reduction

Over 30%

IDD6 Reduction

PIM, Lower power processing • Memory off-loading for reduced power consumption

‒ Reduce the unnecessary data transfer and frame rate control

• Collaborate with SoC/AP for performance/power

‒ PoC with special memory for post/pre-processing

AP CIS Display

AMBA AHB

Display

CIS AP

Pre/Post Processing In Memory

Memory Off-loading Solution Memory B/W Traffic

Recognition Distortion FRC Correction Limited Power Budget

Severe

Data Traffic

Conclusion

• Memory requirements have become more strict in time with respect to

performance, power, and cost

• Keeps innovating technology to correspond to those requirements

‒ Make efforts to extend the value of traditional memory

‒ Figure out innovative memory solution

• Close collaboration with partners is essential for delivering the right

memory solution.

kjh5555@samsung.com

The future of graphic and mobile memory for new applications · August 21st, 2016 l JIN KIM l...

Documents

In-memory technologies: The evolution of a revolution Dbriefs graphic recording

Image Features Extraction for Mobile Robots Navigation - Memory

10 Desirable Mobile App Graphic Design trends for 2016

Selecting the Right ISSI Industrial Grade Memory · RLDRAM 10-20ns Mobile DRAM 16Mb-1Gb Memory Access No DRC*

UI UX, Logo, Graphic design, Web, Mobile, Email template

Next Generation Storage Solution · Mobile Memory Forum 2011 Next Generation Storage Solution - UFS standard and its features - Mobile Memory Forum China, Taiwan & Korea 20 ~ 24 of

BOOM: Enabling Mobile Memory Based Low-Power Server DIMMs · BOOM: Enabling Mobile Memory Based Low-Power Server DIMMs Doe Hyun Yoon Jichuan Chang Naveen Muralimanohar Parthasarathy

Techtiip - ArchestrA Graphic Development Guidelines€¦ · can utilize more processor time and take up more memory than traditional InTouch graphics. ... ArchestrA Graphic Development

Accelerate Mobile Apps with In-Memory Computing · Accelerate Mobile Apps with In-Memory Computing ... •Enable wide range of products and services Wearables ... Microsoft Azure

Theory and practice of flash memory mobile forensicsccf.cs.uml.edu/forensicspapers/Theory and practice of flash memory...Theory and practice of flash memory mobile forensics ... (Samsung,

Mobile banking graphic

Graphic Novels: Memory through Illustration

Static Memory Management for Efficient Mobile Sensing Applications

Flash Memory Guide - Kingston Technologymedia.kingston.com/pdfs/MKF_283.1_Flash_Memory_Guide_EN.pdf · Flash Memory Guide Portable Flash ... cameras, mobile phones and other devices

Forensic Analysis of Mobile Phone Internal Memory

Mobiiliohjelmointi Kevät 2009 1 2. Memory Management Basics of memory usage in mobile devicesBasics of memory usage in mobile devices –Static and dynamic

Hearst Digital Media Mobile Info Graphic

Maximize Your Mobile Experience - eMoney Advisor Blogblog.emoneyadvisor.com/.../eMoney_MobileExperience.pdfexperience of mobile, including graphic design, branding and layout. Design

Mobile memory dumps, MSAB and MPE+ - index-of.co.ukindex-of.co.uk/presentation/emes_04_dumps_tools_analysis_w3.pdf · Mobile memory dumps, MSAB and MPE+ Data collection Information

Vietnam Mobile Day 2013: Memory Management For Android Apps