24
3D Landscape for the Post-Moore Transition Mustafa Badaroglu and Xiang Xu Huawei Technologies 3D Workshop at LETI Innovation Days Grenoble, France, June 24-28, 2019

3D Landscape for the Post-Moore Transition

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 3D Landscape for the Post-Moore Transition

3D Landscape for the Post-Moore TransitionMustafa Badaroglu and Xiang XuHuawei Technologies

3D Workshop at LETI Innovation DaysGrenoble, France, June 24-28, 2019

Page 2: 3D Landscape for the Post-Moore Transition

Outline

• More Moore scaling drivers

• Stall of more Moore scaling

• Landscape of fine-pitch stacking technologies

• Exploring the cost question

• 3D-VLSI era

• Conclusions

Page 3: 3D Landscape for the Post-Moore Transition

Huawei: Leading Global Provider of ICT Infrastructure and Smart Devices

Bring digital to every person, home and organization for a fully connected, intelligent world

Huawei's end-to-end portfolio of products, solutions and services are both competitive and secure. Through open collaboration with ecosystem partners, we create lasting value for our customers, working to empower people, enrich home life, and inspire innovation in organizations of all shapes and sizes.

At Huawei, innovation focuses on customer needs. We invest heavily in basic research, concentrating on technological breakthroughs that drive the world forward.

180,000Employees

80,000R&D employees

72 in

Fortune Global 500

170+Countries

68 in

Interbrand'sTop 100

Best Global Brands

Page 4: 3D Landscape for the Post-Moore Transition

Focusing on Smart Devices, Connectivity, Computing, and Cloud; Providing Products and Solutions for Three Customer Groups

Consumer

Products & ServicesIconic global technology brand

Carrier

Products & ServicesBest strategic partner

for carriers

Enterprise

Products & ServicesEnabler and preferred partner for

digital transformation

Cloud Products & Services

Cloud partner with reliable, trusted, evolvable services, pushing to achieve the affordable,

effective, and reliable inclusive AI.

Global carriers

Storage and servers

Data center

Private cloud

Enterprise EI

AI platforms

Big data analytics

Cloud services

Video surveillance

Enterprise communications

Network energy

Info

rmat

ion

Dis

trib

utio

n &

P

rese

ntat

ion

Information P

rocessing &

Storage

Information Learning and Inference

Smart devices Connection Computing and cloud

Information Transmission

+Intelligent

SmartphoneWearables Smart home devicesVehicle-mounted terminalsLife services for all scenarios

Wireless networksFixed networksCore networksOperation support systems

Intelligent O&MEnterprise networksIoT networks

Page 5: 3D Landscape for the Post-Moore Transition

Cloud and Mobile Computing Drive More Moore

• Device-interconnect-memory technologies for mobile and cloud computing

• Edge computing - additional functionality, biometrics, and display/camera/sensing for increased consumer value

• 2.5D/3D integration to scale memory bandwidth / power and latency

Big Data and abundant computing power are pushing

computing to the Cloud

Micro (data) servers

Increasing data on cloud requires high-

density memories

Instant Data generated by sensors and

users are pushing

computing to the Edge

Mobile computingAI at edge

requires fast and high-density

memories

Page 6: 3D Landscape for the Post-Moore Transition

More Moore Device Structure Evolution - IRDS

Page 7: 3D Landscape for the Post-Moore Transition

Big Wins that We Note Today

• finFET has successfully served in high-performance and mobile applications in 2010 and with good outlook until 2022-2025

• GAA devices now has significant traction as the mainstream logic device candidate for both high-performance and low-power applications

• Years of R&D for EUV is paying-off – EUV in HVM very soon beyond 7nm, critical enabler for cost scaling

• 3D NAND FLASH reached the 1TBit/chip density

• Emerging non-volatile-memories in manufacturing:

• MRAM is proven to be enabler for the memory solution in IoT applications, marching in the right direction towards for last-level cache replacement

• RRAM is having a stake for IoT edge devices and PCRAM for the storage-class-memory applications• Significant progress in fine-pitch 3D integration, marching towards becoming mainstream computation fabric:

• Memory technologies – 3D HBM DRAM stack (128 Gbit/cube)• BIS image sensors (<1um pitch) with the integration of sensor, read-out, and processing• Energy-efficient and high-bandwidth GPU, Network Processor, and Machine Learning in HBM Si Interposer

(64GB memory, 2TB bandwidth at 512GB/sec)

7

Page 8: 3D Landscape for the Post-Moore Transition

Critical Challenges of More Moore Scaling – IRDS

• Near Term (2018-2025)• Power scaling• Parasitics scaling• Cost reduction• Integration enablement for SRAM-cache applications• Interconnect scalability

• Long Term (2025-2033)• Power scaling• No foreseeable beyond CMOS (e.g. steep-SS) device in the HVM horizon• Integration of vertical device structures at the memory-logic interface• Thermal issue due to increased power density• Cost reduction with 3D integration• Adoption of non-Cu interconnects for low-resistance and meeting EM/TDDB

8

Page 9: 3D Landscape for the Post-Moore Transition

Frequency and Power Scaling Challenges - IRDS

• Frequency scaling limited because of parasitics and stalling after 2028

• Thermal (increasing power density) is a treat for utilizing the system frequency performance improvement

• Power reduction limited because of slow-down in Vdd and capacitance

Page 10: 3D Landscape for the Post-Moore Transition

Dimensional Scaling Expected to Stall >2028 - IRDS

2D design rule limits (mostly electrical)

• Metal pitch at 16nm –Rwire, TDDB

• CPP at 40nm – Rext, MOL parasitics,

• Lgate at 12nm – SS, RMG process, Vt tuning

Page 11: 3D Landscape for the Post-Moore Transition

Demonstrated Fine-Pitch Assembly/Stacking Technologies

FO+POP

W2W stacking+TSV+uBump

W2W F2F di-electricbonding+TSV

W2W F2F hybridbonding w/o TSV

Si Interposer + TSV EMIB

Sony: CMOS Image Sensors

SK Hynix HBM memory

Apple A10 on FO+POP

AMD Fiji GPUIntel: Altera Stratix 10 FPGA

Page 12: 3D Landscape for the Post-Moore Transition

3D Landscape – Manufacturing Status

Fine pitch D2W / W2W assembly

Market Tech Characteristics Scaling driver Challenges

FO+POP Mobile chip-first/last, face-up/down, ~2-10um RDL pitch, 40um ubump pitch

Multi-stack FO, RDL pitch, #layers

Warpage, MC topography

W2W stacking w/ TSV + uBump

Memory F2B, 40um TSV pitch, 10x50um TSV, 50um thick Si

Wafer thickness, uBump pitch

Size matching, TSV

W2W F2Fdi-electric/hybrid bonding

Memory,Imagers

F2F, <1um pad pitch, 1x5um TSV, 5um thick Si

Wafer thickness, uBump pitch

Size matching, TSV, pad density, alignment (~200nm) limits

Si interposer w/ TSV and w/o TSV

HPC,Network

<1um pitch RDL, 40um ubump pitch, 40um 10x100um TSV pitch, >1200mm2 Si area

Si size, #layers, decap/ESD co-integration

High cost (~$600 wafer cost adder), test

EMIB (Embedded Multi-die Interconnect Bridge) in laminate

HPC <1um pitch interconnect FO compatibility EMIB Alignment to FO/laminate

Page 13: 3D Landscape for the Post-Moore Transition

Need for Fine-Pitch Assembly Technology

• Memory access/bandwidth red brick-wall• Laminate substrates not sufficient to provide enough bandwidth

• Mobile converged on fanout (FO) / flip-chip (FC) + POP - xyz form factor reduction and multi-sourcing• High-performance (HP) converged on Si interposer (<1/1um L/S, >7 layers) w/ TSV – high-performance• Alternatives such as EMIB, SLIT, SWIFT in the HP applications can eliminate TSV for cost reduction• FO baseline - embedded dies in FO MC with TMV and 2-layer 5/5um L/S RDL

• Fine-pitch die-on-die/wafer stacking• Value – reduced CPI risk, product bundling, smaller (xyz) form factor, die-split (?) if node-n missing function• Cost-value questionable - case-dependent• Key-function - Inter-chip communication at high-performance and low-power and if possible without IOs and ESD• W2W stacking - <1um pitch w/ density restrictions - Size/KGD-pairing constraints (OK for memories, imagers)• Fine-pitch D2W stacking - ~5-10um pitch, not yet manufacturing ready• All solutions might require TSV/TDV/FC solutions, which are cost adders

Si Interposer

FO & embedded die C4: 130um pitchLaminate build-up:30um pitch, 45um via

RDL: 10um pitch

Ball: 0.4mm pitch

TMV: 90um pitch

FO+POP

RDL: <1um pitch

TSV: 40um pitch,10x100um

RDL: 10um pitch

Page 14: 3D Landscape for the Post-Moore Transition

High-Bandwidth Interposer – HBM: SK Hynix Example

Page 15: 3D Landscape for the Post-Moore Transition

Communication between Memory and Logic

Source: SK Hynix

Page 16: 3D Landscape for the Post-Moore Transition

Ideal Assembly for Mobile – Singulated D2W Dies in FO Assembly

• Assembly should support full interconnect hierarchy and heterogeneity

• FO+POP package allowing heterogenous assembly• Multi-source dies – IPD, sensor, memory, high-speed IO, etc• Reduced height, reduced footprint package, core-less laminate• Low-cost fine-pitch RDL (~5-10um pitch)

• D2W stacking allowing fine-pitch inter-chip communication• Schemes

• Hybrid bonding, dielectric bonding+TSV (~5um pad pitch)• Classical uBump (~40um ubump pitch)

• Advantages• Elimination of CTE and warpage• Small form factor and reduced parasitics• Node-agnostic - supporting heterogenous technologies• No IOs needed for inter-chip communication• Elimination of TSVs and ubumps in one/both of the tiers

• Die cost saving questionable (w/ 1st order assumptions)• Potential use cases

• Performance - Memory w/ logic• Small form factor – RF on logic, NVM on logic (IoT)• Split die – separate L3/L4 stack, IO and analog, PDN

Connection HierarchyD2W embedded in FO

Typical value

Ball pitch 400um

Laminate build-up pitch and via CD

30um, 45nm via CD, 4-8 layers

C4 ball pitch 130um

TMV pitch 70-90um

FO RDL pitch 10um, 2 layers

uBump/FO-bump pitch 40-90um

Chip (backside) RDL pitch 5-10um, 1 layer

TDV pitch 10um

Chip-to-chip pad pitch <5umNew

Optional

Page 17: 3D Landscape for the Post-Moore Transition

• Not all functions scale the same • Not all functions require the same metal layers • Not all functions require the same transistor set

Metrics SoC Chip Digital Logic SRAMMemory

IO/Analog RF/Power Mgrmt

Shrinkage/node (avg)

30-35% 40-45% 30% 0-15% <10%

# metal layers (7nm) 13-14 13-14 7-8 6-7 5-6

Transistor flavor ALL Low-mid-hi Vt Mid-high Vt High gain (I/O)

Low noise/HV

Can SoC Benefit from the Cost Reduction by 3D Adoption?

Page 18: 3D Landscape for the Post-Moore Transition

Die Size Determining the Cost Break-Even Point

• Networking/data server chip (area>150mm2) is cheaper whenever 3D integration is used

• 3D more expensive for mobile processors

• Partitioning favors more unified memory architectures• L4 SRAM• eDRAM• MRAM - Weight memory for machine learning applications

• Heterogenous integration use cases for smaller form factor, performance, BoM (package cost reduction, product bundling, time-to-market)

• mm-Wave• IoT for wearables• RF-logic product bundling

Page 18

-20%-15%-10%

-5%0%5%

10%15%20%25%

0 50 100 150 200 250 300 350 400

Die Cost Savings

Page 19: 3D Landscape for the Post-Moore Transition

Technology Capability – CliffsRegion Component Cliff Limit

Chip-to-laminate Flip-Chip (FC) uBump

20um pitch, 40um tall,5um pitch feasible using embedded bumps in polymer

Cu/Sn UBM

Thru-Silicon-Via (TSV)

1x5um, 2um pitch AR=10

Wafer thickness 2um w/ grind+CMP+etch w/ SiGe etch stop (ES), 5um w/o ES

Wafer edge impact to TTV

Chip-to-chip W2W pad (F2F) <1um pitch Alignment and Cu CMP

D2W pad (F2F) ~3-8um pitch Alignment, z-compliance

FanOut (FO) RDL 1/1um LS, 2.5um thick Resist, FO warpage

Thru-Mold-Via (TMV)

1x5um, 2um pitch AR=10 and FO warpage

IMEC: 20um Cu/Sn uBump pitch

IMEC: 1x5um TSV

IME A-STAR: 2/2um L/S polymer-based Cu RDL

IMEC: 10um uBump pitch using polymer

LETI: 1um pitch W2W Cu pad

Page 20: 3D Landscape for the Post-Moore Transition

Transition to Non-Von Neuman Fine-Pitch 3D Architectures

• Tighter coupling between memory and logic

• Lower latency• Higher bandwidth• Energy efficient

computation• Vectorization• Less clocking overhead

2D: Classical Von-NeumanData placed lateral

3D: Data placed vertical

Stage1Stage2

Stage3Stage4

Source: IEEE Computer Journal 2017

Processing time:n times for n units of data

Processing time:1 time for m units of data (m>>n)

Page 21: 3D Landscape for the Post-Moore Transition

2D to 3D Transition - Physical Design Aspects

CPU GPU MEM

2D EraMax Frequency at Low Power

MEM + NVM

IO CONNECT

IO CONNECT

NV RECONFIG ACCELERATORS

CPUSeq.

GPUSeq.

• Memory limited• Limited Performance• Overhead in clocking• Increasing parasitics• Yield and cost issues• Increasing dark Si

Tier1

Tier2

Tier3

Tier4

3D EraMax Throughput at Low Power

Using Reconfig. Accelerators/Fabric

High Bandwidth

High Bandwidth

High Bandwidth

• Memory dominant• High throughput• Less overhead in clock• Less parasitics• Better yield, low-cost• Less dark Si• Parallelism and NVL/M• Drivers: Big Data,

Machine Learning, VR/AR

Page 22: 3D Landscape for the Post-Moore Transition

Architecture Evolution Bringing Beyond CMOS

<2021 2024 2027 2030 2033

W2W/D2W Stacking Sequential Integ.

NCFET

STT-LOGIC SPIN-LOGIC

MRAM,PCRAM,RRAM

3D integration

Steep-SS Device

Interconnect Device

NVM Logic

High-speed high-density storage class Memory

Enabling Technologies – next 15 years

Reconfigurable3D fabric

NeuromorphicComputing

QuantumComputing

Heterogenousmulti-core

More Moore

More Moore + Beyond CMOS

Beyond CMOS+

Complexity

Time

2D Materials

Page 23: 3D Landscape for the Post-Moore Transition

Conclusions

• Mobile computing and IoT(cloud+edge) driving More Moore scaling

• Slow-down in pitch scaling tackled with DTCO• Lateral-GAA is a viable path for high PPA value but with risks in integration and SoC device• 3D VLSI (sequential integration) needed beyond 2031

• Thermal and yield cost challenges for fine-pitch 3D adoption• New computational architectures needed to reduce power density• Tighter defect control• Wafer sort/stacking approaches to tackle the process complexity factor

• Memories gaining momentum with 3D adoption

• 3D stacking approaches• Not cost attractive for the application processor applications• Motivating cases of memory-on-logic in large chips or product bundling/time-to-market needs• Machine learning - bringing memory closer to logic by 3D stacking key to enable computational throughput

scaling at reduced power

Page 24: 3D Landscape for the Post-Moore Transition

Thank you.