Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
ECE5461:Low Power SoC Design
Tae Hee Han: [email protected]
Semiconductor Systems Engineering
Sungkyunkwan University
Course Information
n Objectivesn This course covers all major aspects of low-power design of SoCs, and addresses
emerging topics related to future design. It explores the many different domains and disciplines that impact power consumption from system-level to device level.
n Lecture Schedulen Mon. /Wed. 9:00 ~ 10:15 AM
n References for this coursen Findlay Shearer, Power Management in Mobile Devices, Newnes, 2007n Jan Rabaey, Low Power Design Essentials, Springer, 2009n Liming Xiu, VLSI Circuit Design Methodology Demystified, Wiley Inter-Science, 2008
2
Course Schedule
Schedule Contents Remarks
Week 1 Basic Concept, Introduction
Week 2 Battery Aware Power Management / System-level Power Estimation
Week 3~4 System level power optimization
Week 5 Algorithm / Architecture level power optimization
Week 6~7 Logic/Circuit/Device level Power Reduction
Week 8 Term paper assign
Week 9 Logic/Circuit/Device level Power Reduction
Week 10~12 Case studies
Week 13~15 Term paper discussion
Week 16 Final Exam.
3
Grading System
n Homework/Term Paper: 50%
n Attendance: 10%
n Final Exam: 30%
n Etc. : 10%
4
5
Basic Concept & Introduction
Sad fact: Computers turn electrical energy into heat. Computation is a byproduct.
Air or water carries heat away, or chip melts.
6
7/36
55 W-hour battery stores the energy of
1/2 a stick of dynamite.
If battery short-circuits, catastrophe is possible ...
What Consumers Care About
n Users want more features in their mobile devices:n MP3, Camera, Video, GPS...
n Convenient form factor, affordable price
n But also need long battery lifen Battery technology is not evolving fast enough!è Need to manage power consumption
8
Smart Devices and ICT Convergence
9
Needs vs. Reality
10
1
10
100
1000
10000
100000
1000000
10000000
Battery Capacity1G
2G
3G
Processor Performance (Moore’s Law)
Algorithmic/ApplicationComplexity 4G
Problem Statement: Explosive Growth of SoC Complexity
n Massive feature integration: Driving SoC complexity to the extremen Multiple processors
n CPU processorn DSP processorn Graphic processor
n Many high-performance enginesn Video cores n DMA engines
11
Distributed Heterogeneous Architectures
What Drives the Widening Power Gap
n Performance and style dictates designn To be smarter, more performance and functionalities are neededn Demand for portability equates to space limitations on batteries
n Slow Battery Research and Developmentn Cutting edge battery R&D is focused elsewhere (hybrid vehicle technology)
n Development focuses on delivering higher discharge vs. power conserving batteries
n Convergence is primary contributorn Users continue to demand more applicationsn Operators derive increasing revenue streams from application based services
12
Energy and Power
n Energy: ability to do workn Most important in battery-powered systems
n Power: energy per unit timen Important even in wall-plug systems --- power becomes heat
n Power draw increases with…
n Vccn Clock speedn Temperature
13
Why are Power & Energy Important?
n Battery life for mobile devices
n Reliability at high temperatures
n Power density (cooling)n Limits compaction & integration
n Costn Energy costn Cost of power delivery, cooling system, packaging
n Environmental issuesn IT responsible for 0.53 billion tons of CO2 in 2002
14
Metrics
n Energy (Joules) = Power (Watts) ´ Time (sec)n Power is limited by infrastructure (e.g., power supply)n Energy: what the utilities charge for or battery can store
n Power density = power/arean The major metrics for the cooling system
n Combined metricsn How to tradeoff performance for power savingsn TPS/W, energy ´ delay (EDP), energy ´ delay2 (EDP2), …
15
Recall: Charge-based Digital Logic
n Key principles in the charge based digital logicn Representation of digital states
n Logic “0”: No Charge in the capacitorn Logic “1”: Charge stored in the capacitor
n Change of digital staten Charge/dis-charge capacitor through a resistor
Vmin
Ron
C Vout=Q/C
Time
VoltageVmin
“0”
“1”
16
Power Consumption in ICs
n Dynamic or active power consumption
n Charging and discharging capacitorsn Depends on switching activity
n Short circuit currentsn Short circuit path between supply
rails during switchingn Depends on the size of the
transistors
n Leakage current or static power consumption
n Leaking diodes and transistorsn Gets worse with smaller devices and
lower Vddn Gets worse with higher temperatures
17
Can be ignored
Power Wall
18
• Intel 80386 consumed ~ 2 W
• 3.3 GHz Intel Core i7 consumes 130 W
• Heat must be dissipated from 1.5 x 1.5 cm chip
• This is the limit of what can be cooled by air
Active power Standby power
v Memory Wall
v ILP Wall
v Power Wall
n Moore’s Lawn Transistor density increases every
18~24 months
n CMOS Powern Total Power = V2 × f × C × a + V × Ileakage
n Drastic increase in leakage current and decrease in noise margin prevent the voltage scaling around 1V
Limitations in Processor Performance Not only Battery, but also Heat!
Pollack’s Rule: Trade-offs
19
CMOS Process Technology (mm)
Area(Lead / Compaction)
0
1
2
3
4
1.5 1 0.7 0.5 0.35 0.18
improvement (X)
Performance(Lead / Compaction)
Pollack’s RulePollack’s Rule:"performance increase due to m-architecture advances is roughly proportional to [the] square root of [the] increase in complexity“
Implications (in the same technology)• New m-Arch consumes about 2-3x die area
of the last m-Arch, but provides 1.5-1.7x performance
Reducing Power/Energy
n An interdisciplinary issuen Circuits, architecture, software, systems
n Key high-level ideasn Reduce redundant work/componentsn Turn off unused componentsn Pick implementation that best matches constraints
n E.g., don’t use a 3GHz processor if 1GHz would do
20
Reducing Power/Energy: Another Option
21
Energy Reduction Portfolio
n Holistic Approach for entire energy dilivery chain
EnergyGeneration
EnergyStorage
EnergyConversion
ENERGY MANAGEMENTENERGY MANAGEMENT
Energy Generation
Energy Storage
RFtransceiver Processor Digital logic Memory
Energy Conversion
Analog
22
Energy/Power Flow in Mobile Device
23
Power Supply
§ Standard IC• 5V, 3.3V, 2.5V, 1.8V, 1.1V, …• Vcore
§ RF IC/Device• Low noise required
§ Display• LED Lighting/Flash• LCD/OLED/EL Bias• CCFL supply
§ Motor/Inductive• Vibrator• HDD
§ USB Host• Ports
§ Others• Tuners, …
§ Adaptor• 5V, 12V, …
• ± 10% Tolerance
• Xmer/Switching
• Car outlet
§ USB Port• 4.5V ~ 5.25V
• Imax: 500mA (USB 2.0) ~ 900mA (USB 3.0)
§ Li+/Li-Poly• 3V ~ 4.2V
§ NiMH/Akaline• 0.9V ~ 1.5V
§ Solar/Fuel cell
§ Role• Main / Backup
§ Type• Primary
• Secondary
§ Management• Protector
• Gas gauge
• Security
Battery Power Conversion Load
Charge
Discharge
ø EL: Electro Luminescence
ø CCFL: Cold Cathode Fluorescent Lamp
L1 SW
CPU
DSP
ABB
RF
PROTOCOLSTACK
L1 DBB HARDWARE
MMI (Man- Machine Interface)
ApplicationTasks
DATA I/OLCD,
Camera,Etc
Anatomy of a Handset
24
Digital Baseband
PM IC
ADC
DAC
LNA
PADuplexor/Switch
Filter
Filter
VCO Application Processor
NOR or NAND Flash
SRAM or SDRAM
25
Anatomy of a Handset: Another View
Hardware
Cellular Radio Interface
Tools Agent Framework Physical Layer FrameworkSystem Framework
MANPositioning Broadcast PAN LANHW Accelerator& Device Interface
Application ProtocolFramework
Cellular Protocol Stack:Multimode Protocol
Multimedia Device Framework
IP System
File System Data Format
Media Codec
Applications Framework
Database (UI, Phonebook, Security, Java, Browser, Messaging, Multimedia playing)
Modem Applications Abstraction Layer Multimode Protocol Service IF
2G 3G
TransportService
Multimedia Engine
Media Devices
PlatformDevices
RFPMDisplayAVcodec
RTO
S
Traffic Manager
2G/3G/4G RF
4G
DSP CPU
Regulators
Block Diagram of a Cellular Phone (Feature Phone)
~ ÷
BatteryChargingControl
PowerSupply
ADCDAC
AudioCodec
BottomConnector
SIM card
Mixed-Signal
BB
Memory
CintrolInterfacesLogic SRAM
Flash
Keyboard
LCDBacklight
LCD
Infra Red
Vibra
Microphone
Earpiece
HandsfreeSYNTH
PA
RF1900
1800
900
900
1800/1900
26
POWER,W
3
2
6
20042002
Gross Power Consumption Exceeds Thermal Dissipation Capability of Mobile Device
4
1
5
2006
COOLING REQUIRED
100cc plastic monoblock
100cc metal monoblock
100cc plastic clamshell,open
Cellular RF Cellular RF
Miscellaneous
Cellular BBCellular BB
Local Connectivity
Local Connectivity
Local Connectivity
Display+backlight
Display+backlight
Display+backlight
Camera
Camera
Cellular RF
Audio
Audio
Audio
Apps Engine
Apps Engine
Mass Memory
Mass Memory
Power conversion
Power conversion
Power conversion
Cellular BB
Large plastic communicator,open
Small metal communicator,open
Inside Smartphone: Apple iPhone4
28
Infineon 3G Baseband
Dialog Power Management
A4 Application Processor
Inside Smartphone: Apple iPhone4
29
Power Amp. for UMTS band 5,8
Skyworks
WiFi/BT module
BT/WiFi Combo chip
Broadcom
RF SwitchMurata
Quad-band LNAInfineon
A-GPSBroadcom
Power Amp. for UMTS band 1,2
TriQuint
GSM/EDGE frontend module
Skyworks
HSPA/UMTS/EDGE baseband modem
Infineon
HSPA/UMTS/EDGE RF Transceiver
Infineon
128MB NOR + 128MB mobile DDR
Numonyx
Power ManagementDialog Semiconductor
3-axis Digital compassAkin Semiconductor
3-axis AccelerometerSTMicro
16GB NAND FlashSamsung
Audio processorCirrus Logic
3-axis GyroscopeSTMicro
Video amp.Maxim
Touchscreencontroller
Texas Instruments
LCD
Touchscreen
Application ProcessorApple A4
Apple/Samsung
4Gbit mDDR SDRAMSamsung
Primary Camera module
Secondary Camera module
Docking connectorEarphone Jack
Nvidia Tegra2 Multi-Core SoC
n 8 Dedicated Processors
n Highest CPU Performance - (ARM Cortex-A9@1GHz ´2)
n HD 1080p Video
n GeForce® Graphics
n Ultra Low Power
30
Nvidia Tegra 4i
31
New Quad core ARM Cortext-A9 R4
@2.3GHz
Integrated i500 HSPA/LTE Modem
“4 plus 1” Companion Core
ULP GeForce 60 GPU cores
§ Computational photography architecture§ Image signal processor§ Video engine
Less Energy
Cortex-A8 65 nm
Cortex-A8 65 nm
Relat
ive C
ompa
rison
Cortex-A8 45 nm
Cortex-A8 45 nm
2x Cortex-A9 40 nm
2x Cortex-A9 40 nm
4x Cortex-A932 nm
4x Cortex-A932 nm
2x Cortex-A152x Cortex-A7
28nm
2x Cortex-A152x Cortex-A7
28nm
2x Cortex-A57 2x Cortex-A53
20nm
2x Cortex-A57 2x Cortex-A53
20nm
More Performance
big.LITTLE Effect
Peak Performance
Energy
ARM big.LITTLE Architecture for Low Power
32
Samsung Exynos 5410 Octa - for Galaxy S4
33
Galaxy S4 Teardown
34
lQualcomm WCD9310 audio codec
lQualcomm MDM9215M 4G GSM/UMTS/LTE modem
lARM Holdings MBG965H
lQualcomm PM8917 power management
lBroadcom BCM4335 Single-Chip 5G Wi-Fi MAC/Baseband/Radio
lSamsung K3QF2F200E 2 GB LPDDR3 RAM + Snapdragon 600 APQ8064T 1.9 GHz Quad-Core CPU lurks below)
lToshiba THGBM5G7A4JBA4W 16 GB eMMC(eMMC integrates a NAND flash memory and a controller chip in a single package)
Galaxy S4 Teardown
35
lQualcomm WTR1605L seven-band 4G LTE RF transceiver
lBroadcom 20794S1A standalone NFC chip
lSilicon Image 8240BO MHL 2.0 transmitter
lMaxim MAX77803 microcontroller
lSWA GNF09
lQualcomm PM8821 power management IC
lSkyworks 77619 power amplifier module for quad-band GSM/EDGE
A 20nm Scenario (High-end Processor)
n This means:n A 2cm2 processor consumes 10 kWn A bound of 100W requires only 1% to be active ] dark silicon
36
Assume VDD = 1.2V§ FO4 delay < 5 ps§ Assuming no architectural changes, digital circuits could be run at 30 GHz§ Leading to power density of 20 kW/cm2 (??)
Reduce VDD to 0.6 V§ FO4 delay » 10 ps§ The frequency is lowered to 10 GHz§ Power density reduces to 5 kW/cm2 (still way too high)
Ref: S. Borkar (Intel)
The “Dark Silicon” Problem
37Source: Rob Aitken (ARM)
How Much Energy in the Air?
38
39
Cutting Edge SoC Design looks like the Rocket Science
Payload: Pure Functional
Implementation efforts
DFT: Design ForTestability
DFP: Design For(Low) Power
DFM: Design For
Manufacturability
DFV: Design ForVerification
Overhead
Signal Integrity
Signal Integrity
Power Power Power Power
Integration
Observation #1: Design Challenges
n Technology shrink leads to critical design challenges
Signal Integrity
Signal Integrity
Signal Integrity
Integration Integration Integration
DFM DFM DFM
RDR RDR
Advanced Lithography
Tech
nolo
gy G
eom
etry
Desi
gn C
ompl
exity
2004 2006 2008 2010 2012
90nm
65nm
40nm28nm 20nm
40
Observation #2: Design Complexity
n Complexity outpaces Design Productivity
[ Source: SEMATECH ]
41
42
Observation #3: Power Densities
n Power densities affects packaging, cooling, reliability, speed, …
400480088080
8085
8086
286 386486
Pentium®P6
1
10
100
1000
10000
1970 1980 1990 2000 2010Year
Powe
r Den
sity (
W/cm
2 )
Hot Plate
NuclearReactor
RocketNozzle
Sun’sSurface
Source: Borkar, De Intelâ
Observation #4: Battery Technology Limitation
n Less than 10% technology improvement is expected for battery for next 10 years
Source: Dataquest43
Observation #5: Leakage Power
n Leakage may ruin Moore’s Law ( it is worse than expected ), threatening the success of CMOS by ITRS
[ ISLPED 04, Ray Bryan ]
44
45
Observation #6: Chip I/O Bottleneck
Year 2002 2005 2008 2011 2014
Logic trans/chip (M) 60 235 925 3,650 14,400
Signal pins/chip 1024 1024 1280 1408 1472
Simultaneously Switching Noise
[ Source: SEQUENCE ]
1
10
100
1000
10000
1999 2002 2005 2008 2011 2014
Year
Logi
c tra
ns/S
igna
l pin
s
Observation #7: Interconnect Challenge
n Impact of interconnect has to be considered in early design stagen Helps faster convergence – tight correlation with the backendn Produces more efficient designs – lower area, powern Design flow becomes more predictablen Improves performance – higher frequency
Source: Synopsys (2012)
Process 130 to 90 nm 65, 45, 32 nm 28, 20, 14 nm
Wire length
(m/cm2)1,019 2,222 3,143
Important new
effectsRoute topology
Layer awareness,Coupling
capacitance
Resistive shielding,Much less
resistance on higher metal
layers
2005 2010 2012
46
47
Observation #8: Lithography Limitation
Source: Sematech (2013)
48
Observation #9: Process Variation
130nm
30%
5X
0.9
1.0
1.1
1.2
1.3
1.4
1 2 3 4 5
Normalized Leakage (Isb)
Nor
mal
ized
Freq
uenc
y
Source: Borkar, De Intelâ
49
Observation #10: Cost of Chip Development
Source: Xilinx, IBS(2011)
50
Moor
e’s L
aw:
Mini
atur
izatio
n
Base
line C
MOS:
CPU
, Mem
ory,
Logi
c
130nm
90nm
65nm
45nm
32nm
22nm
Beyond CMOS
More than Moore: Diversification
Analog/RF Passives HV Power SensorsActuators Biochips
InformationProcessing
Digital contentSystem-on-Chip
(SoC)
Interacting with people and environmentNon-digital content System-in-Package
(SiP)
Observation #11: CMOS Limitation
Source: ITRS 2011
16nm
51
Observation # 12: Design Methodology Change
Traditional Chip Design
[ Source: Virage â]
• OMAP2420: • Five Power Domains(MCU Core, DSP Core, Graphic Accelerator, Peripheral, Alive logic)• 40x Leakage Power Savings
[Source: Stork (TI), DAC 2006 ]
Mobile Chip Design
52
Observation # 13: Noise Isolation
n Digital switching noise propagates through the substrate
[Source: TI, Stork, DAC 2006 ]
Power Profile Optimization
Dynamic Power Profile
Optimized Dynamic Power Profile
Static Power Profile
Optimized Static Power Profile
System Workload
53
Classification of Low Power Techniques
Voltage Scaling Instruction-Level Optimization
Control-Data-Flow Transformation Dynamic Power Management
Approximate Signal Processing Memory Optimization
Hardware-Software Partitioning Parallelism/Pipelining
Don’t-care OptimizationPath Balancing Factorization
Technology Decomposition/Mapping Encoding Retiming
Gated Clocks Pre-computation
Transistor/Interconnect Sizing Transistor Reordering
Threshold Voltage Scaling
TechniquesMethods Overheads
ReducingActivity
ReducingCapacitance
Scaling Supply V
Scaling Threshold V
Area
Speed
Noise
Negligible
System Level Architecture Level Logic Level Circuit/Device LevelNote:
54
Summary: Reducing Power @ All Design Levels
n Algorithmic level
n Compiler level
n Architecture level
n Organization level
n Circuit level
n Silicon level
n Important concepts:n Lower Vdd and freq. (even if
errors occur) / dynamically adapt Vdd and freq.
n Reduce circuitn Exploit localityn Reduce switching activity,
glitches, etc.
55
P = α × f × C × Vdd2
E = ò P dt ÞE / cycle = α × C× Vdd
2