Upload
ian-phillips
View
524
Download
0
Embed Size (px)
DESCRIPTION
The early 21c has brought the power of the computer into the hands of the general population, and though these computers consume small amounts of energy they are so numerous that their Energy Efficiency will soon become a major issue. This presentation looks at modern Computing, the ways that Energy Efficiency is currently being enhanced, and the principles behind this.
Citation preview
1
Energy Efficient Computing ... In the early 21C
Abstract: With the assistance of its global partners, ARM shipped 8.7 billion CPUs in 2012; a number which continues
to grow at around ~20%pa. The 40B we have shipped to date outnumber the total of PC's more than 50 times; and today more than 75% of the things connected to the Internet are ARM based. The dominant nature of Computing in the 21c is very different to that of the Mainframe era. It is sobering to think that if each of those 8.7B CPUs was to dissipate just 100mw, then it would require the output of two modern power stations to drive them; with 2.4 next year, and 3 the year after that! So Electronic Systems are also defining where the real Energy Efficient Computing issue is! But with such a small footprint it must be easy to measure and manage power optimisation? An increasing percentage of these are immensely complex systems, running significant multi-tasking and multi-threaded operating systems on platforms which include multi-processor CPU/GPU configurations, and GB of memory. Whilst their minimum dissipations are a few uW, their peak power exceed the silicon's ability to dissipate it; so the penalty for power un-aware software design is huge. What has been done to manage this in Electronic Systems design, and can any lessons can be transferred to the Classic Computing domains?
Context 1hr talk at The Centre for Robotics and Neural Systems (CNRS) at University of Plymouth, Devon, UK.
The CRNS has a regular seminar series inviting national and international speakers.
http://www.tech.plym.ac.uk/SOCCE/CRNS/
SlideCast and pdf available via http://ianp24.blogspot.co.uk/
Opinions expressed are those of the author alone
2
Prof. Ian Phillips Principal Staff Eng’r,
ARM Ltd [email protected]
Visiting Prof. at ...
Contribution to Industry Award 2008
Centre for Robotics and Neural Systems Uo.Plymouth
1nov13
1v0
SlideCast and pdf available via http://ianp24.blogspot.co.uk/
Opinions expressed are those of the author alone
3
Energy Efficient Computing ..?
4
Energy Efficient Computing ..?
5
Energy Efficient Computing ..?
6
The Visible Face of Computing Today
7
The Invisible Face of Computing Today
100’s of Billions of computers each consuming mW!
Bringing Embedded Intelligence to the Consumer Market, has changed the Face of Computing! (Again)
8
Our 21c World ...
9
Markets provide the Growth Drivers
1960 1970 1980 1990 2000 2010 2020
Milli
ons
of U
nits
1st Era Select work
tasks
2nd Era Broad-based computing
for specific tasks
3rd Era Computing as part
of our lives
Today: ~2% of our Energy Use goes on Computing and Electronics! ... Tomorrow: It could easily be 20%!
10
ARM in the Digital World
1998 2012 2020
40+ billion CPUs to date
150+ billion CPUs cumulative by 2020
http://www.arm.com/
8.7B CPUs shipped in 2012 (Growing 20%pa.pa)
75% of the things connected to the Internet today are ARM Powered! Gartner
11
Moore’s Law ... 10nm
100nm
1um
10um
100um
Appr
oxim
ate
Proc
ess
Geo
met
ry
ITRS’99
Tran
sist
ors/
Chi
p (M
)
Tran
sist
or/P
M (K
)
X
... x More Functionality on a Si Chip in 20 yrs!
Gordon Moore. Founder of Intel. (1965)
http://en.wikipedia.org/wiki/Moore’s_law
12
A Machine for Computing ... Computing: A general term for algebraic manipulation of data ...
... State and Time are always factors (variable weight).
It can include phenomena ranging from human thinking to calculations with a narrower meaning. Wikipedia
Usually used it to exercise analogies (models) of real-world situations; Frequently in real-time (Fast enough to be a stabilising factor in a loop).
... So what part does Hardware and Software play? ... And what about Energy?
y=F(x,t,s) Numerated Phenomena
IN (x)
Processed Data/ Information
OUT (y)
13
Antikythera c87BC ... Planet Motion Computer
See: http://www.youtube.com/watch?v=L1CuR29OajI
Mechanical Technology
• Inventor: Hipparchos (c.190 BC – c.120 BC). Ancient Greek Astronomer, Philosopher and Mathematician.
• Single-Task, Continuous Time, Analogue Mechanical Computing (With backlash!)
14
Orrery c1700 ... Planet Motion Computer
• Inventor: George Graham (1674-1751). English Clock-Maker. • Single-Task, Continuous Time, Analogue Mechanical Computing (With backlash!)
Mechanical Technology
15
Babbage's Difference Engine 1837
The difference engine consists of a number of columns, numbered from 1 to N. Each column is able to store one decimal number. The only operation the engine can do is add the value of a column n + 1 to column n to produce the new value of n. Column N can only store a constant, column 1 displays (and possibly prints) the value of the calculation on the current iteration.
Computer for Calculating Tables: A Basic ALU Engine
(Re)construction c2000
Mechanical Technology
16
“Enigma” c1940
Data Encryption/Decryption Computer
Mechanical Technology
17
“Colossus” 1944
Code-Breaking Computer: A Data Processor
Valve/Mechanical Technology
18
“Baby” 1947 (Reconstruction)
General Purpose, Quantised Time and Data, (Digital) Electronic Computing
Valve/Software Technology
19
Signal Processing
Bush Radio 7 Transistors
1 Diode
c1960
Evoke DAB Radio 100 M Transistors
2-3 Embedded Processors
c2005
BTH Crystal Set
1 Diode
c1925
Tele-Verta Radio 4 Valves
1 Rectifier Valve
c1945
20
Vrf=Vi*100
Vlo=Cos(t*1^6)
Vi
Vrf
Vif=Vrf*Vlo
Vlo
Vif
Vro='Bandpass'(Vif*1000)
Vro
Radio as Computation ...
Single-Task (Embedded), Real-Time, Analogue (Close-Enough) Computing
21
Vrf=Vi*100
Vlo=Cos(t*1^6)
Vi
Vrf
Vif=Vrf*Vlo
Vlo
Vif
Vro='Bandpass'(Vif*1000)
Vro
Radio as Computation ...
Single-Task (Embedded), Real-Time, Analogue (Close-Enough) Computing
Valve Technology
22
Vrf=Vi*100
Vlo=Cos(t*1^6)
Vi
Vrf
Vif=Vrf*Vlo
Vlo
Vif
Vro='Bandpass'(Vif*1000)
Vro
Radio as Computation ...
Single-Task (Embedded), Real-Time, Analogue (Close-Enough) Computing
Valve Technology Transistor Technology
‘Integrated Circuit’ Technology
23
Computing is Era and Application Related ...
Computing: Creating Useful Output from Input ... Architecture: The way this is done on the day.
It is the Most Important Product Decision! (HW, SW, Digital, Analogue, Optics, Graphene, Mechanics, Steam, etc)
24
Electronic era:
1975-2005 System era:
2003-2030
Cascade of Technologies supporting Functional growth ...
... The ‘Law’ started with Wood ⇒ Stone ⇒ Bronze ⇒ Iron
Moore's Real Law: x2 Functionality Every 18mth! Fu
nctio
nal D
ensi
ty (u
nits
)
1960 1980 2000 2020
102
1010
106
1012
100
25
Computing in a Cool iCon ...
26
‘A lot’ of Architecture in a Smart Phone ...
... Computation in many forms
27
Take a Look Inside...
http://www.ifixit.com
The Control Board.
Level-1: Modules
28
Inside The Control Board (a-side)
http://www.ifixit.com
Level-2: Sub-Assemblies Visible Computing Contributors ...
Samsung: Flash Memory - NV-MOS (ARM Partner) Cirrus Logic: Audio Codec - Bi-CMOS (ARM Partner) AKM: Magnetic Sensor - MEM-CMOS Texas Instruments:Touch Screen Controller and mobile DDR - Analogue-CMOS (ARM Partner) RF Filters - SAW Filter Technology
Invisible Computing Contributors ... OS, Drivers, Stacks, Applications, GSM, Security, Graphics, Video, Sound, etc Software Tools, Debug Tools, etc
29
Inside The Control Board (b-side)
GPS Bluetooth, EDR &FM
http://www.ifixit.com
Level-2: Sub-Assemblies More Visible Computing Contributors ... A4 Processor. Spec:Apple, Design & Mfr: Samsung Digital-CMOS (nm) ...
Provides the iPhone 4 with its GP computing power. (Said to contain ARM A8 600 MHz CPU and other ARM IP)
ST-Micro: 3 axis Gyroscope - MEM-CMOS (ARM Partner) Broadcom: Wi-Fi, Bluetooth, and GPS - Analogue-CMOS (ARM Ptr) Skyworks: GSM Analogue-Bipolar Triquint: GSM PA Analogue-GaAs Infineon: GSM Transceiver - Anal/Digi-CMOS (ARM Partner)
30
Level-3: Processor (Nvidea Tegra 3, Around 1B transistors)
NB: The Tegra 3 is similar to the A4/5, but not used in the iPhone
31
Packing Technology into an iCon
Analogue and Digital Design Embedded Software Mechanics, Plastics and Glass Micro-Machines (MEMs) Displays and Transducers Robotics and Test Knowledge and Know-How Research, Education and Training Components, Sub-Systems and Systems;
Design, Assembly and Manufacture Metrology, Methodology and Tools ... Involving Many Specialist Businesses
... Round and Round the World ...Not-Least from Europe
32
Architecting your Product : Is the cumulative non-functional choices made to
support the functional need A Good Architecture is the one that ‘survives’ History is written by winners (2nd is for losers)
: Component Performance may be ‘poor’ as long as System Performance is ‘better’ for its use.
Architectural Options ... : Business Model (Cost-of Ownership, ROI), TTM (Productivity, History, IP-
Availability, Know-How), Aesthetics (Power, Quality, Behaviour, Appearance)
: Analogue, Digital, Mechanical, Optical, RF, Software, Plastics, Metal-forming, Manufacturing, Glass, ...
: More than 99% of a Product is Reused from its Predecessor
... is assumed (working is expected!) ... It used to be the only consideration!
33
Power Philosophy Hardware Dissipates Power ... Chose Underlying Technology for best power efficiency. One size does not fit all (Products, Applications or Instances)
... Software Doesn’t (But it Tells Hardware To!) Chips can literaly melt-down under software ‘instruction’ Make computing hardware power as ‘Activity’ dependent as possible Zero Activity => Zero Power
Make OS/Apps aware of the power/performance situation, and their options for controlling it (Need Indicators and Levers)
... Think System: It’s how the ‘box’ performs, not the components
34
Core Power Management For Processor and Peripheral Circuitry...
Variable/Gated - Clock Domains
Variable/Switched - Power Domains
Indicators and Levers Allow the software to see and influence what is going on
Principles of Core Power Efficiency... Minimise voltage/frequency (P=CV2f) so that processor has just
enough performance for the current application need Maximises ‘Activity Power’ dependence (Zero Activity => Zero Power) Management by the OS and the Application SW Apply to all on & off-chip zones (not just the CPU) ... Methodology Retention Flops/Latches, Level Shifters, Power-Switch Cells, PLLs
35
Architectural Energy Efficiency - Parallelism
Processor
f
Input Output
Processor
f/2
Processor
f/2
f
Input
Output
Capacitance = 2.2C Voltage = 0.6V
Frequency = 0.5f
Power = (2.2*0.6*0.6*0.5)CV2f = 0.4CV2f
Capacitance = C Voltage = V
Frequency = f
Power = CV2f
To a limit determined by Amdahl’s or Gustafson’s Law ... Amdahl: Extracted parallelism from existing code (Reuse) Gustafson: Some needs only benefit from parallelism (Custom)
... Actual improvement is application specific.
36
Architectural Energy Efficiency - Data Moving Data takes significant Energy Becoming the dominant energy consumption in a system
Data Location Avoid moving or copying Data Energy ∝ DataVolume x Speed x Distance>2(3)
Bring the processing to the data
Bring the Processing to the Data Caching is good (depends on implementation) Write back is better than write-through
Local working memory is good Aka Software Caching
... The Arrangement of your Data matters!
37
All ARM Processors are Power Efficient
38
Chose The Horses for The Course
... Delivering ~5x speed (Architecture + Process + Clock)
About 50MTr
About 50KTr
39
Multicore ARM On-Chip ... Heterogeneous Multicore Systems have been in ARM for a long time:
Cortex™-A8 Mali™-400
MP Cortex-M3
Interconnect
Power Manager Application UI & 3D Graphics
Memory
40
Coherent Multicore Cluster ...
Cortex-A9 Cortex-A9 …
Coherency Logic
Power Manager User Interface
and 3D graphics
Mali-400 MP Cortex-M3
Interconnect
Homogenous Multicore cluster, as part of a heterogeneous system:
41
Multiple Clusters ... Multiple Homogeneous Coherent Clusters
Cortex-A15 …
Coherency Logic in L2 Cache
Coherent Interconnect
Cortex-A15 Cortex-A15 …
Coherency Logic in L2 Cache
Cortex-A15
42
Today’s Consumer require a pocket ‘Super-Computer’ ... Silicon Technology Provides a Billion transistors ...
It will be supported with a few GB of memory ...
Computer On a Chip c2010 ...
• Typically 10 Processors ... • 4 x A9 Processors (2x2): • 4 x MALI 400 Frag. Proc • 1 x MALI 400 Vertex Proc • 1 x MALI Video CoDec • Software Stacks, OS’s and Design
Tools/
• ARM Technology gives chip/system designers ...
• Improved Productivity • Improved TTM • Improved Quality/Certainty
http://www.arm.com/
43
CoreLink™ CCN-504 and DMC-520
ACE
ACE
NIC-400 Network Interconnect
Flash GPIO
NIC-400
USBQuad Cortex-
A15
L2 cache
Interrupt Control
CoreLink™DMC-520
x72DDR4-3200
PHY
AHB
Snoop Filter
Quad Cortex-
A15
L2 cache
Quad Cortex-
A15
L2 cache
Quad Cortex-
A15
L2 cache
CoreLink™DMC-520
x72DDR4-3200
8-16MB L3 cache
PCIe10-40GbE
DPI Crypto
CoreLink™ CCN-504 Cache Coherent Network
IO Virtualisation with System MMU
DSPDSP
DSP
SATA
Dual channel DDR3/4 x72
Up to 4 cores per cluster
Up to 4 coherent clusters
Integrated L3 cache
Up to 18 AMBA
interfaces for I/O coherent accelerators
and IO
Peripheral address space
Heterogeneous processors – CPU, GPU, DSP and accelerators
Virtualized Interrupts
Uniform System
memory
44
C/C++ Development
Middleware
Debug & Trace
Methodology As Well As Hardware
Energy Trace Modules
45
big.LITTLE Processing For High-Performance systems...
Tightly coupled combination of two ARM CPU clusters: Cortex-A15 and Cortex-A7 - functionally identical Same programmers view, looks the same to OS and applications
big.LITTLE combines high-performance and low power Automatically selects the right processor for the right job Redefines the efficiency/performance trade-off
big
“Demanding tasks”
LITTLE
“Always on, always connected tasks”
30% of the Power (select use cases)
Current smartphone
big.LITTLE Current smartphone
big.LITTLE
>2x Performance
46
Fine-Tuned to Different Performance Points
Simple, in-order, 8 stage pipelines
Performance better than mainstream, high-volume smartphones (Cortex-A8 and Cortex-A9)
Most energy-efficient applications processor from ARM
Complex, out-of-order, multi-issue pipelines
Up to 2x the performance of today’s high-end smartphones
Highest performance in mobile power envelope
Cortex-A7 Cortex-A53
Cortex-A15 Cortex-A57
LIT
TLE
bi
g
Queue
Issue
Integer
47
CPU Migration Migrate a single processor workload to the appropriate CPU Migration = save context then resume on another core Also known as Linaro “In Kernel Switcher”
DVFS driver modifications and kernel modifications Based on standard power management routines Small modification to OS and DVFS, ~600 lines of code
big.LITTLE MP OS scheduler moves threads/tasks to appropriate CPU Based on CPU workload Based on dynamic thread performance requirements
Enables highest peak performance by using all cores at once
big.LITTLE Software
48
Bringing the Processing to the Data …
288 server nodes in a 4U rack space Public Source: http://www.engadget.com/2011/11/02/hp-and-calxedas-moonshot-arm-servers-will-bring-all-the-boys-to/
Dell + Marvell, Copper
BaiDu + Marvell, Baserock
Press Claims:
49
... Refining Data into Information
50
Transferrable Lessons to GP Software Moving data is Power Expensive ... Don’t move data; use it locally (Cache it) Refine it once, use it often (Pre-Process it)
Your CPU Power is work-load independent ... So, get in; get the work done; and get out. Maximise the workload of your code; terminate when complete.
Make your Processing work-load dependent Use a Hypervisor and turn off (at least free) processors not in use.
51
Societies Challenges in the 21c Urbanisation (Smart Cities) Health (eHealth) Transport Energy (Smart Grid) Security Environment
And whilst our technologies will be an essential part of all solutions, they cannot not fix them without Society’s help and cooperation!
... Energy Efficient Computing will minimise the impact not avert the challenges!
Food/Water Ageing Society Sustainability Digital Inclusion Economics
Having a great time!
52
Conclusions Putting the power of Computation into the hands of the masses,
has changed the face of Computing (again) Electronic Systems will become Essential to our Lives and the Economy
Power Efficient ES are a major issue to Society Which faces a future with them as a significant energy consumer in themselves
Power Efficiency must be architected into the System Hardware and Software from the beginning To realise the maximum potential out of your Silicon (Avoiding Dark Si) Architect & Design HW as efficiently as possible (reflecting the task) Strive for: No Work => No Power
Equip HW with Indicators and Levers so the System/App can manage it Bring Processing to the Data ... Don’t move Data; move Information Process data Locally Energy ∝ DataVolume x Speed x Distance>2(3)
53
Computing at the heart of the 21c
ARM: Enabling the Creation of High-Performance Electronic Systems
--- • Productively, Economically and Reliably • Through Hw/Sw Reuse Methodologies • Based on a family of CPU/GPU cores