Upload
evan
View
226
Download
0
Embed Size (px)
Citation preview
7/24/2019 The Future of Innovation in Computing.pdf
1/24
Copyright IBM Corporation 2014
The Future of Innovationin Computing
Jeff Stuecheli
Hardware Architect
IBM
7/24/2019 The Future of Innovation in Computing.pdf
2/24
POWER1 1990
Execution BW
6 x 107FLOPS
Storage BW
240 MB/sec to L1
144 MB/sec to 1 GB DRAM
POWER7 QCM 2011
Execution BW
1 x 109 FLOPS
Storage BW
6 TB/sec to L1
6 TB/sec to L2 3 TB/sec to L3
400 GB/sec1 TB DRAM
16000x
25000x
2700x
The last 20 years
2
7/24/2019 The Future of Innovation in Computing.pdf
3/24
Innovations in those 20 years
1.0 m 45 nm feature size
~500x the transistor density
25 MHz 4 GHz clock rates
133x the clock rate (enabled by faster gates And deeper
pipelines)
25 MHz 6.4 GHz busses
High frequency communication
3
7/24/2019 The Future of Innovation in Computing.pdf
4/24
The next 20 years
At the same rate (16k x), in 20 years 44 Watsons would fit in 1U of rack space!
But,
Gates would be 125 pm ( < 1 Si atom wide )
Voltage scaling limits
Prior power increase offset by lower voltage operation (Dennard
scaling)
4
7/24/2019 The Future of Innovation in Computing.pdf
5/24
What will the future bring?
For computer architects, its likely more exciting than the last 20 years
Vision presented in this talk 16k x is possible! More diverse innovations
More gates without smaller devices (cheaply manufactured)
3D structures
Power reduction through more efficient gate utilization
Gate leakage (Power gating)
Integration
Sophisticated power management/voltage control
Reconfigurable logic
Higher power density through advancement in packaging and cooling technologies
Liquid replaces air cooling
Energy recovery through reuse
System integration enables higher bandwidth at reduced energy
Si interposer based communication Optics
5
7/24/2019 The Future of Innovation in Computing.pdf
6/24
IBM Stack
Research
And
Innovation
IBM
NVIDIA
TYAN
Mellanox
OpenPower
Open Innovation
What is OpenPOWER?
Industry Consortium focused on Innovation
- Across Server HW / SW stack
- For customized servers and components
- Leveraging complementary skills and investments
- To provide differentiated architectural alternatives
Benefits for Clients
New Innovators on Power Platform = More Value
OpenPOWER = Greater choice for IBM Clients
More Innovation = Increased Adoption of Power
OpenPOWER: The Beginning
7/24/2019 The Future of Innovation in Computing.pdf
7/24
Boards / Systems
I/O / Storage / Acceleration
Chip / SOC
System / Software / Services
Implementation / HPC / Research
OpenPOWER: Today
7/24/2019 The Future of Innovation in Computing.pdf
8/24
Growing transistors without process shrinks
Current industry expectation is ~6nm in 2026 (International Technology Roadmap for Semiconductors)
Moores law would predict 0.25nm
Density doubles every 4 years instead of 2 years
How can we achieve more ~gates without smaller transistors
Every 4 years we need some other doubling improvement to stay on 2year growth rate
Todays example: eDRAM
eDRAM reduces both transistors and energy for semiconductor arrays
~equivalent to a new process generation
Beyond more gates, more useful gates
Remove gates through integration, optimization
8
7/24/2019 The Future of Innovation in Computing.pdf
9/24
3d Stacking
Many-levels of chips with
low power and latency
communication
Enables larger caches
stacked below the CPU
Enables larger chips with
good yield
DRAM TSV enables larger
capacity without power and
frequency cuts
9
7/24/2019 The Future of Innovation in Computing.pdf
10/24
Gates in 3D are better than 2D
CPU design example
Todays 2D designs Larger structures introduce longer physical
distances creating a design conflict
Example, multi level design structures
TLB: Translation Lookaside Buffer,
POWER7 design uses a two levelstructure. Second tier is outside critical
logic path, but adds area, making other
structures father apart.
Data/Instruction caches: Tertiary levels
inherently forced to be across adjacent
(vs inside the CPU core). This resulting
in wide high power busses crossing
large distances.
10
Core
L2 Cache
Core
L2 Cache
L2 Cache L2 Cache
Mem Ctrl L3 Cache and Chi
LocalSMPLinks
Remo
teSMP
Fast Local
L3 Region
7/24/2019 The Future of Innovation in Computing.pdf
11/24
Gates in 3D advantage
Critical high power execution core takes up the penthouse
(where heat can more easily be removed)
Smaller core yields higher frequency and reduced energy
Second level structures pulled under the core
Can grow to ideal size without hurting critical execution loop.
L3 cache pulled to 3rdlevel
4thlevel can be power control, IO transceivers, more cache
11
7/24/2019 The Future of Innovation in Computing.pdf
12/24
Staging current CPUs into many layers
Key limiter is available vertical wiring channels
1stgeneration : Pull external interface logic and voltage
regulation into lower layer
2ndgeneration : Add an L3 cache layer
3rdgeneration : Move L2 cache and large second level core
centric structures (TLB, Predictors)
4thgeneration : Two layer CPU, execution units on top, L1
caches below
12
7/24/2019 The Future of Innovation in Computing.pdf
13/24
Efficient integration with Si Interposers
Use old manufacturing line to produce large
active Si interconnect (base layer of 3D)
Enables efficient communication between CPU
compute stacks, memory stacks, Accelerators, and
optical transceivers.
MCM (Multi-chip-module), where module is active
Si logic.
Enables very high bandwidth/low power
interconnect
Conceptually, system could fit on the Si carrier,with optical external attach points
Much lower power interconnect
Micro bumps
13
7/24/2019 The Future of Innovation in Computing.pdf
14/24
Si interposer communication advantages
On vs Off chip communication
On chip
Bucket brigade
Clock skew managed along path
Wire pitch ~10s nm
Off chip
Wave pulses along a string
Clock skew managed at endpoint
Wire pitch ~10s m
14
7/24/2019 The Future of Innovation in Computing.pdf
15/24
Circuit based energy improvements
Power gating: Turn off voltage to prevent gateleakage
Utilized in Power7+ core (entire core as one
domain)
Multi-cycle transition
Current server class design gate large blocks
(entire CPU)
Required to provide voltage stability through
capacitance in power grid
Fine grain power gating will become possible in
server space with sophisticated 3D based
power delivery.
Potential ~4x reduction in leakage power.
15
7/24/2019 The Future of Innovation in Computing.pdf
16/24
19 IBM Research Zurich
Scalable Heat Removal by Interlayer Cooling
3D integration requires interlayer cooling for stacked logic
chips Bonding scheme to isolate electrical interconnects from
coolant
Through silicon via electrical bonding
and water insulation scheme
A large fraction of energyin computers is spent for
data transport
Shrinking computers
saves energyTest vehicle with fluid
manifold andconnection
Microchannel
Pin fin
7/24/2019 The Future of Innovation in Computing.pdf
17/24
Future Memory
Today
DRAM ~100ns, Read and Write Durable, Volatile
Technology scaling slowdown
FLASH ~100us, Read Durable
Disk ~10ms, Read and Write Durable
Tomorrow
Phase Change Memory (PCM)
Restive RAM (RRAM)
~100ns Read
~1 usec write Read Durable
Non-volatile
17 Copyright IBM Corporation 2011
7/24/2019 The Future of Innovation in Computing.pdf
18/24
18 Copyright IBM Corporation 2013
Disruptive Optics Evolutionand Silicon Photonics
Silicon Photonics,
Multi-wavelength,
25 Gb/s Optics
0 11 0 01 1 1
0 1 1 0 1 0 1 1 1 0 1 0 0 1 0 1 1 0 0 1
3 A 6 F 2 9 7 0 B C 5 3 E 5 1 4 D 8 9 F
10 Gb/s = 24 GB/s
1 Color (Deuce)
25 Gb/s = 60 GB/s
1 Color (Deuce)
25 Gb/s = 240 GB/s
4 Color (Deuce)
Optical Interconnects
POWER7 775 HPCsystem
High density IO off
module optical
transceivers
Physical escape density
P7 775 HPC network chip
shown below
7/24/2019 The Future of Innovation in Computing.pdf
19/24
Heterogeneous Computing
ASIC- An application-specificintegrated circuit(ASIC) is
an integrated circuit (IC) customized fora particular use, rather than intended forgeneral-purpose use. For example, achip designed solely to run specific cellphone is an ASIC.
FPGA- A field-programmable gate
array(FPGA) is an integratedcircuit designed to be configured by thecustomer or designer aftermanufacturinghence "field-programmable
GP GPUA General Purpose
Graphics Processing Unitis amassively threaded processing engine
capable of accelerating highly parallel
computation programs using many
very lightweight threads.
19
7/24/2019 The Future of Innovation in Computing.pdf
20/24
FPGA Trends
This 3x rise in LE density occurred
as technology shifted from lagging
edge (180nm) to leading edge
(40nm). This brings new speedsand capabilities, lower costs (stillpreserving >60% margins), while
ASIC costs are expected to rise
exponentially.
=> This has primed a tipping in
the industry.
FPGACapability
Cost per LE (MID-RANGE)
Current
Field
DeployedAppliances
High End FPGA Price to Logic Ratio (100Ku/yr.)
note: DSP, memory blocks, hard ip (e.g. PCIe) added over time
74.4
49.635.631.3
26.0
14.0
9.06.6
6.2 5.33.8
3.0 2.8 2.52.0
1.5 1.4 1.2 1.10.9 0.8
0.7
0.1
1.0
10.0
100.0
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
$/KL
E's
WFO
PoC /
Datapower
20
7/24/2019 The Future of Innovation in Computing.pdf
21/24
CPU vs ASIC vs FPGA efficiency
Custom logic
4 Ghz Highly optimized for one task
At time of fabrication
GenericLogic 250 MHz
Configurable for ~any task
Change at any time
21
Pervasive
Instruction
Fetch and Decode
Instruction
Sequencing
DecimalUnit
Vector
and
Scalar
Unit
Load/Store
Unit
Fixed
Point
Unit
7/24/2019 The Future of Innovation in Computing.pdf
22/24
FPGAs and Workload Optimized Systems
Big data has inherent data parallel components
Data compression
Algorithms in logic 10-100x more efficient than CPU based
Negligible latency
Increases effective disk capacity Increases effective disk and network bandwidth
FPGA logic can be used to sift large volumes of data, which is then passed to
the CPUs for detailed analysis
Packaged solutions hide FPGA programing complexity from user
Workload optimized appliance delivery model
22
7/24/2019 The Future of Innovation in Computing.pdf
23/24
Software challenges
All of the following apply to,
Applications
Middleware: Compilers, database, etc.
System SW: OS, hypervisors, cluster, etc.
Parallel programing
Accelerator usage (heterogeneous computing)
Workload partitioning
FPGA compilation
Tiered memory management
More levels, diverse types Melding of main memory and storage
EDA tools required to support complex design structures and circuit power
optimization (e.g. productive fine grain power gating, diffraction mask generation,
etc.)
23
7/24/2019 The Future of Innovation in Computing.pdf
24/24
IBM as the Innovator
Only company with resources to design such an integrated
systems
World leading
Technology
Research labs
Hardware design
Software design
System design