View
3
Download
0
Category
Preview:
Citation preview
Enabling Realistic Fine-Grain Voltage Scalingwith Reconfigurable Power Distribution Networks
Waclaw Godycki, Christopher Torng, Ivan BukreyevAlyssa Apsel, Christopher Batten
School of Electrical and Computer EngineeringCornell University
1 Abstract
Recent work has shown that monolithic integration of voltage regulators will befeasible in the near future, enabling reduced system cost and the potential forfine-grain voltage scaling (FGVS). In this project, we use architecture-levelmodeling to explore a new dynamic voltage/frequency scaling controller called thefine-grain synchronization controller (FG-SYNC+). FG-SYNC+ enables improvedperformance and energy efficiency at similar average power for multithreadedapplications with activity imbalance. We then use circuit-level modeling to explorevarious approaches to organizing on-chip voltage regulation, including a newapproach called reconfigurable power distribution networks (RPDNs). RPDNs allowone regulator to “borrow” energy storage from regulators associated withunderutilized cores resulting in improved area/power efficiency and fasterresponse times. We evaluate FG-SYNC+ and RPDN using a vertically integratedresearch methodology, and our results demonstrate a 10–50% performance and10–70% energy-efficiency improvement on the majority of the applications studiedcompared to no FGVS, yet RPDN uses 40% less area compared to a more traditionalper-core regulation scheme.
2 Motivation
Monolithic integration using a standard CMOS process provides a tremendous costincentive for integrating closed-loop voltage regulators on the die. Recent technologytrends suggest that it is now becoming feasible to integrate switching regulatorson-chip (e.g., Intel Haswell), enabling reduced system cost as well as the potentialfor fine-grain voltage scaling (FGVS) to exploit fine-grain activity imbalance inmulti-threaded applications for performance and energy efficiency benefits.
Integrated Voltage Regulation inIntel Haswell Processors
HASWELL’S FIVR EXTENDS BATTERY LIFE First Processor With Integrated Voltage Regulator Won’t Be the Last
By David Kanter (July 29, 2013)
...................................................................................................................
July 2013 © Microprocessor Report
The PC ecosystem has lagged phones and tablets in power management. For many years, voltage regulators (VRs) for PC processors have been a staid and boring af-fair, delivering relatively incremental improvements. Tra-ditional VRs are bulky and consume high-value mother-board real estate close to the processor, but they operate at very low frequencies (e.g., 1MHz) compared with modern 1–3GHz processors.
Intel’s introduction of fully integrated voltage regula-tors (FIVRs) in its newest Core processor, code-named Haswell, is fundamentally changing the power-delivery land-scape. This innovation is the technical underpinning for new platform-level power management that adopts and in some aspects exceeds the capabilities of mobile systems, radically reducing average power consumption for the sys-tem and improving responsiveness. The result is one of the greatest increases in mobile-PC battery life ever, as evi-denced in the initial Haswell-based notebooks that are now shipping.
Specifically, the FIVR’s low latency enables more-agile power management that can quickly turn platform components on or off, enabling the system to spend most of its time in a deep-sleep state. The OS and applications, however, remain alive and connected to the network, and the system swiftly responds to external stimuli. For exam-ple, in Windows 8 Connected Standby mode, Intel claims the Haswell platform reduces power consumption by 20x compared with Sandy Bridge, while still meeting the 300ms wake-up requirement.
Looking beyond the PC, the implications are signifi-cant across a wide range of markets. Properly designed IP blocks such as CPUs, GPUs, FPGA fabrics, wireless modems, and caches can all take advantage of a FIVR. Ap-plications from smartphones to servers all benefit from
improvements in power efficiency. Although Intel is the first to bring FIVRs to market, others will eventually follow.
Tiny Capacitors and Inductors The FIVR is tightly coupled to Haswell, using both on-die and package-level integration. It complements an external voltage-regulator module (VRM) to deliver power to the processor. The FIVR’s main input voltage is supplied by the VRM at 1.8V (nominal) but is dynamically adjusted via serial voltage identification (SVID) that ranges between 1.6V and 2.3V. The output voltages that the FIVR supplies to the different die regions vary from about 0.65V to 2.0V.
Conceptually, the VRM is optimized for high con-version efficiency and acts as a slow and coarse-grained
Figure 1. Block diagram of Haswell’s power distribution. The FIVR converts a single voltage, VCCIN (1.8V nominal), into several independent voltage rails, each of which varies dynamically according to the needs of the circuit.
Haswell
VLLC
VDDQ
GraphicsEngine
Rin
g In
terc
onne
ct
SystemAgent
+PCIe
+DMI
DRAM CntlVCCIN
FIV
R
CPUs+
Cache
VCPU0
VRING
VGPU
VSA
VIOA
VCPU1……
Figure from D. Kanter, MPR’13
Fine-Grain Activity Imbalance inMulti-Threaded Applications
bfs
rsort
str
sea
rch
0
7
0
7
0
7
Application Start Application End
90μs
7μs
2.3μs
3 Target System
Our target system is an embedded processor composedof: eight in-order, single-issue, five-stage, RISC cores;private, coherent 16 KB instruction and data L1 caches;and a shared 512 KB unified L2 cache. We implementedthe core and L1 memory system for this design in RTL andused a commercial standard-cell-based CAD toolflowtargeting a TSMC 65 nm process to generate layout. Eachcore can run at 333 MHz at 1 V and the full eight-coresystem is approximately 6 mm2. Target System Layout
4 Types of Integrated Voltage Regulation
The three primary types of step-down voltage regulators are linear regulators (e.g.,LDOs), inductor-based switching regulators (e.g., buck), and capacitor-basedswitching regulators (e.g., switched-capacitor).
5 FGVS Architecture Design: FG-SYNC+
We explore a new FGVS controller called the fine-grain synchronization controller(FG-SYNC+) that exploits fine-grain scaling in level (i.e., many voltage levels), space(i.e., per-core regulation), and time (i.e., fast transition times between levels) toimprove performance and energy efficiency while maintaining similar average power.FG-SYNC+ uses a thread library instrumented with hint instructions to inform thehardware about which cores are doing useful work vs. useless work (e.g., waiting fora task, waiting at a barrier).
(d) rsort with 100 ns Response Time(b) splash2-lu with Eight Domains
time
(a) splash2-lu with Two Domains (c) rsort with 1000 ns Response Time
time
time time7
Core
0
Exec.Time
Reducedby 20%
7
Core
0
Super-SprintSprintNominalRestBusy Waiting
7
Core
0
7
Core
0
In these application activity plots illustrating FG-SYNC+, rows show controllerdecisions per-core and black strips above cores show when that core is active. Wecompare SPLASH-2 LU factorization with two vs. four voltage domains (a,b). We alsoillustrate the impact of slow voltage-settling response times over a small excerpt fromradix sort (c,d).
Fine-Grain Voltage Scaling with FG-SYNC+
We use three sensitivity studies to understand the implication of varying: (1) thenumber of voltage levels, (2) the number of voltage domains, (3) and voltage-settlingresponse times.
0.7/1.0V 1.0/1.15V
1.0/1.33V
0.7/1.0/1.33V0.7/1.0/1.15V
0.7/1.0/1.15/1.33V
2 Domains 4 Domains
8 Domains
1000 ns 100 ns
0 ns
isopower
isopower
Increasing
sprinting
levels
isopower
Increasing
number of
domains
Decreasing
response
time
isopower
Norm
. E
nerg
y E
ffic
ien
cy 2.0
1.8
1.6
1.4
1.2
1.0
0.80.9 1.0 1.1 1.2 1.3 1.4 1.5 1.0 1.1 1.2 1.3 1.4 1.5
Speedup Speedup
Norm
. E
nerg
y E
ffic
ien
cy 2.0
1.8
1.6
1.4
1.2
1.0
0.81.0 1.1 1.2 1.3 1.4 1.5 0.4 0.6 0.8 1.0 1.2 1.4
2-Level Controllers 3-Level vs. 4-Level Controllers
Different Numbers of Domains Different Response Times
Speedup Speedup
To exploit fine-grain activity imbalance, (1) at least three levels are required and fourlevels helps further; (2) more domains results in improved performance and energyefficiency; (3) response times of 100 ns or faster are required.
6 FGVS Circuit Design: RPDN
Shown in (a), we use a single fixed-voltage regulator (SFVR) as a baseline tocompare against more sophisticated regulation schemes. We choose a configurationthat can provide 80% efficiency at 1 V with an area of 0.26 mm2 (4% of the core/L1area). Shown in (b), multiple adjustable voltage regulators (MAVR) enablefine-grain voltage scaling in space and level. The power efficiency vs. area plot in (c)shows how we choose a per-core regulator area of 0.08 mm2 to allow efficient voltageregulation for super-sprint. Note that designing for super-sprint significantlyover-provisions for rest, nominal, and sprint modes; also, only one or two cores willever be using the super-sprint mode at any given time.
Core 1 with Adjustable Regulator
Vin
AVRControl
Unit
Core
Phase 0
Core 0 with Adjustable Regulator
(b) MAVR Abstract Schematic
Vin Flyback Cap
Core 0
SFVRControl
Unit
SP P
Vout = Vin/2
S
Phase 0
Core 1 Core 7
(a) SFVR Abstract Schematic
80
75
70
65
60
55
0.02 0.04 0.06 0.08
Pow
er
Effic
ien
cy (
%)
Area (mm2)
NominalMode
Super-SprintMode
0.08 mm2
80% Efficiencyat Super-Sprint
Chosen Design Point
(c) MAVR Efficiency vs. Area
Reconfigurable Power Distribution Networks
We propose a new approach called reconfigurable power distribution networks(RPDNs). As shown below, RPDNs include many small “unit cells” that each containthe flyback capacitance and regulator switches required for a SC regulator. Thesecells can be flexibly reconfigured through a switch fabric and combined with per-corecontrol circuitry to effectively create multiple differently-sized SC regulators“on-demand” for cores. The inset shows how 16 unit cells can be allocated to fourcores operating in four different modes.
VoutControlLoop
Core 0 Core 1
RPDNFabricControl
RPDN Switch Fabric
DCBAUnitCell
X
r
S N
Allocating 16 Cells
The efficiency vs. powerplot shown for a singlecore illustrates howlower-voltage modesrequire less area for highefficiency. A resting coreonly requires a single cell;RPDN would allow asuper-sprinting core to“borrow” three cells from aresting core whilemaintaining high efficiency.
Pow
er
Effic
ien
cy
10 20 30 40 50 60 700
60%
65%
70%
75%
80%
85%Rest
(0.7V)
Nominal
(1.0V)
Sprint
(1.15V)
Super-Sprint
(1.33V)
Output Power (mW)
7 Cells
6
Cells
5
Cells
4
Cells
3
Cells
2
Cells
1
Cell
100 150 200 250 300 350 400
1.4
1.3
1.2
1.1
1.0
0.9
0.8
0.7
Time (ns)
Volta
ge (
V)
120 ns
150 ns
1.35
1.25
1.15
1.05
0.95
0.85
0.75
0.25 1.0 1.75 2.5 3.25 4.0Time (us)
Volta
ge (
V)
1390 ns 960 ns
2900 ns
MAVR Transient Response
RPDN Transient Response
The RPDN architecture provides areasavings of 40% over MAVR whensupporting per-core supply regulationacross the same number of cores. Inaddition to reducing area overhead,RPDN significantly reduces thevoltage-settling response time. Thewaveforms shown here illustrate thedifference in the transient responses forRPDN and MAVR when transitioningbetween modes. For RPDN, theresponse time for the nominal tosuper-sprint transition takes 150 ns whilethe same transition takes 2.9 µs withMAVR.
7 Evaluation
Evaluation Methodology
We use a vertically integrated evaluation methodology that uses a mix of circuit-,gate-, register-transfer-, and architectural-level modeling.
CrossCompiler
RISCISA Sim
RISC Binary
C++ MultithreadedApp Using Hints
Cycle-LevelSimulator
Verilog RTL of MulticoreRISC Processor
Gate-Level Model
VerilogSimulator
VerilogSimulator
SwitchingActivity
Power Model
Layout
SynthesisP&R
SPICE CircuitModels of PDNs
SPICESimulator
Per-Core Stats
Cycle-Level Multicorewith DVFS Controller
Instruction-BasedEnergy Dictionary System
Performanceand Energy
Results
DVFS ModeTransition
Times
Power Efficienciesfor each PDN
Evaluation Results
We evaluate the performance, energyefficiency, and power of applications onour target system with each PDN. We usea 4-level, 8-domain FG-SYNC+ controllerand account for realistic voltage-settlingresponse times and regulator powerefficiencies in each DVFS mode forvarying load currents. 1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
Norm
. E
nerg
y E
ffic
ien
cy
MAVR RPDN
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5Speedup
isopower
leak cores regulator efficiency overhead transition
System-level Power Breakdowns for SFVR, MAVR, RPDN
bfs
bilater
aldith
er
kmeans
mriq
pbbs-dr
pbbs-knn
pbbs-mm
rsort
splash2-ff
t
splash2-lu
strsea
rchvite
rbi
0.00.10.20.30.40.50.60.70.80.91.0
SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR SMR
Norm
aliz
ed
Pow
er
Our promising results suggest that there is an important opportunity for architectureand circuit co-design of integrated voltage regulation in future systems.
8 Acknowledgments
This work was supported in part by NSF CAREER Award #1149464, a SporkFellowship, and donations from Intel Corporation and Synopsys, Inc. The authorsacknowledge and thank Ji Kim for his help in developing the instruction-based energymodel, and Derek Lockhart, Shreesha Srinath, and Pol Rosello for their help in writingmulticore application kernels.
Publication: Appears in the Proceedings of the 47th Int’l Symp. on Microarchitecture (MICRO-47), Dec. 2014. URL: http://www.csl.cornell.edu/~cbatten/pdfs/godycki-rpdn-micro2014.pdf Contact Author: Christopher Torng, 364 Upson Hall, Ithaca, NY 14853, clt67@cornell.edu
Recommended