;2M/ 1M2`;v@ r `2 QKTmiBM; avbi2Kb S`27 +2€¦ · 1M2`;v@2{+B2Mi Hmbi2` `+?Bi2+im`2. pB/ :X M/2`b2M 2i HX, 7 bi `` v Q7 rBKTv MQ/2b U6 qLV ( R) +Hmbi2` `+?Bi2+im`2 i? i Bb +QKTQb2

Energy-Aware Computing SystemsEnergiebewusste Rechensysteme

VII. Cluster Systems

Timo Hönig

2019-06-13

EASY

Agenda

Preface

Terminology

Composition and StrategiesCompound StructureProvisioning and Load Control

Cluster SystemsEnergy ProportionalityEnergy-efficient Cluster ArchitectureThermal Awareness and Control

Summary

© thoenig EASY (ST 2018, Lecture 7) Preface 3 – 26

Preface: Changing the Perspective

small individual problemsthat are processed to jointlyprovide a overall solution

deeply embedded systems,wireless sensor nodes incyber-physical systemsbottom up approach: build(nested) control loops withself-contained solo systemsheterogeneous tasks acrossconcerned systems

Preface: Changing the Perspective

large problems that are splitdown to small problems, thatcontribute to a overall solution

clustered networked systems ina compound structure withmanageable dynamicitytop down approach: divide andconquer; consider local andglobal energy demandhomogeneous (sub-)tasksacross concerned systems

Abstract Concept: Cluster Systems

© thoenig EASY (ST 2018, Lecture 7) Terminology 7 – 26

cluster systemsa number of things of the same kind, growing or held togethera bunchswarm

old English swearmmultitude, cluster

cluster compositionheterogeneous nodeshomogeneous nodes

cluster linkagewired linkswireless links

nationalgeographic.org/

Compound Structure

© thoenig EASY (ST 2018, Lecture 7) Composition and Strategies – Compound Structure 9 – 26

cluster systems

energy-efficient cluster architecturewith homogeneous low-power nodescheap hardware……but sensitive to errorsRPi cluster

1 350 systems5 400 cores< 4 kW (idle)> 13 kW (active)small arearequirements

Compound Structure


cluster systems

energy-efficient cluster architecturewith homogeneous high-performance nodespowerful hardware……with complex wiring and administrationmining cluster

energy-efficientspecial purposehardware (e.g., GPUs)yet, large clusters havean energy demandthat exceeds the oneof entire cities

Compound Structure


cluster systems

energy-efficient cluster architecturewith heterogeneous low-power and high-performance nodesheterogeneous hardware components……enable an appropriate mapping of software requirements tohardware offeringsmixed cluster

address heterogeneityof softwarerequirementshighly dynamic →power and energyproportionality

Provisioning and Load Control

© thoenig EASY (ST 2018, Lecture 7) Composition and Strategies – Provisioning and Load Control 13 – 26

provisioning and load control at level of the system software

workload distribution [4]software characterization → (available) hardware componentsnode assignment strategies → avoid under- and overload

schedulingthermal-awareness [2] → cluster locality and deferred executionexploit parallelism where possible

distributed run-time power managementcluster power cap [5]steer progress speed of distributed tasks

Energy Proportionality

© thoenig EASY (ST 2018, Lecture 7) Cluster Systems – Energy Proportionality 15 – 26

considerations on warehouse-scale computersthe datacenter as a computerprovisioning of hardware components → impact on cost efficiencyoperation of hardware components → impact on cost efficiency, too

utilization/workload vs. power demanddepending on the workload of systems, the power demand must scalebest case: no power when idle → reasoning between blocking andnon-blocking energy management control methods

[3]U. Hölzle and LA. Barroso:The Case forEnergy-Proportional Computing (2007)

Energy-efficient Cluster Architecture

David G. Andersen et al.: fast array of wimpy nodes (FAWN) [1]cluster architecture thatis composed of homogeneouslow-power nodes („wimpy nodes”)FAWN nodes and cluster have drasticallydifferent characteristics comparedto server systemsthat employ so-called„beefy nodes”

© thoenig EASY (ST 2018, Lecture 7) Cluster Systems – Energy-efficient Cluster Architecture 17 – 26

Understanding Data-Intensive Workloads on FAWN

Iulian Moraru, Lawrence Tan, Vijay Vasudevan15-849 Low Power Project

Abstract

In this paper, we explore the use of the Fast Arrayof Wimpy Nodes (FAWN) architecture for a wideclass of data-intensive workloads. While the bene-fits of FAWN are highest for I/O-bound workloadswhere the CPU cycles of traditional machines areoften wasted, we find that CPU-bound workloadsrun on FAWN can be up to six times more efficientin work done per Joule of energy than traditionalmachines.

1 Introduction

Power has become a dominating factor in the costof provisioning and operation of large datacenters.This work focuses on one promising approach toreduce both average and peak power using a FastArray of Wimpy Node (FAWN) architecture [2],which proposes using a large cluster of low-powernodes instead of a cluster of traditional, high powernodes. FAWN (Figure 1) was originally designed totarget mostly I/O-bound workloads, where the ad-ditional processing capabilities of high-speed pro-cessors were often wasted. While the FAWN ar-chitecture has been shown to be significantly moreenergy efficient than traditional architectures forseek-bound workloads [16], an open question iswhether this architecture is well-suited for otherdata-intensive workloads common in cluster-basedcomputing.

Recent work has shown that the FAWN archi-tecture benefits from fundamental trends in com-puting and power—running at a lower speed savesenergy, while the low-power processors used inFAWN are significantly more efficient in work doneper joule [16]. Combined with the inherent paral-lelism afforded by popular computing frameworks

+;1<

Figure 1: FAWN architecture

such as Hadoop [1] and Dryad [9], a FAWN sys-tem’s improvement in both CPU-I/O balance andinstruction efficiency allows for an increase in over-all energy efficiency for a much larger class of dat-acenter workloads.

In this work, we present a taxonomy and analysisof some primitive data-intensive workloads and op-erations to understand when FAWN can perform aswell as a traditional cluster architecture and reduceenergy consumption for data centers.

To understand where FAWN improves energyefficiency for data-intensive computing, we in-vestigate a wide-range of benchmarks commonin frameworks such as Hadoop [1], finding thatFAWN is between three to ten times more efficientthan a traditional machine in performing operationssuch as distributed grep and sort. For more CPU-bound operations, such as encryption and compres-sion, FAWN architectures are still between three tosix times more energy efficient. We believe thesetwo categories of workloads—CPU-bound and I/O-bound—encompass a large enough range of com-

1

AMD Geode256MB DRAM

4GB CompactFlash

Energy-efficient Cluster Architecture

David G. Andersen et al.: fast array of wimpy nodes (FAWN) [1]goal: efficient execution of I/O bound, computationally light workloadsmulti-layered architecture: frontend node passes requests to responsiblebackend nodes → identified by hashesjoint hardware/software architecture

custom key-value storelow memory nodespartitioning

© thoenig EASY (ST 2018, Lecture 7) Cluster Systems – Energy-efficient Cluster Architecture 18 – 26

FAWN Back-end

FAWN-DS

Front-end

Front-end

Switch

Requests

Responses

E2A1

B1

D1

E1

F1D2

A2

F2

B2

Figure 1: FAWN-KV Architecture.

Thermal Awareness and Control

Jeonghwan Choi et al.: thermal-aware task scheduling [2]goal: hot spot mitigation to reduce thermal stress

avoid performance loss as to overheatingreduce cooling efforts

core hopping vs. task deferralspatial hot spot mitigationtemporal mitigation ofoverheating

© thoenig EASY (ST 2018, Lecture 7) Cluster Systems – Thermal Awareness and Control 20 – 26

2.50.91.61.11.00.4-0.5-1.10.1%slow down

Temperature reduction by core hopping(4ms)

5.5

4.2

3.3

4.9 5.1

2.2 2.32.0

3.5

0.01.02.03.04.05.06.07.08.09.0

10.0

daxp

y

apsi

fma3

d

lucas

swim

bzip2 twolf

vorte

x

vpr

Red

uctio

n In

Tem

pera

ture

s (C

elsi

us)

maximum deltatemperature

5.5

4.2

3.3

4.9 5.1

2.2 2.32.0

3.5

0.01.02.03.04.05.06.07.08.09.0

10.0

daxp

y

apsi

fma3

d

lucas

swim

bzip2 twolf

vorte

x

vpr

Workloads

Red

uctio

n In

Tem

pera

ture

s (C

elsi

us)

maximum deltatemperature

Figure 1: Core hopping reduces on-chip temperatures withsmall performance impact


Jeonghwan Choi et al.: thermal-aware task scheduling [2]task deferral

reschedule hot-running tasks to be last in run queuecool down ahead of (resumed) execution


Time

Detect overheating

Task A Task B Task C

Time

Task B Task C Task A

Stop execution and move to end of queue Resume


Jeonghwan Choi et al.: thermal-aware task scheduling [2]task deferral

reschedule hot-running tasks to be last in run queuecool down ahead of (resumed) execution


Time

Detect overheating

Task A Task B Task C

Time

Task B Task C Task A

Stop execution and move to end of queue Resume

Default Linux Scheduling ofNon-SMT, Four-thread Workload: {daxpy, bzip2, bzip2, bzip2}

0

2

4

6

8

10

12

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5Time (seconds)

Del

ta T

empe

ratu

re o

ff Li

nux

Idle

Lo

op (C

elsi

us)

FPU

Deferred SchedulingNon-SMT, Four-thread Workload: {daxpy, bzip2, bzip2, bzip2}

0

2

4

6

8

10

12

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5Time (seconds)

Del

ta T

empe

ratu

re o

ff Li

nux

Idle

Lo

op (C

elsi

us)

FPU

Considerations and Caveats

cluster systemscompound systems consisting large number of nodessuitable mapping of software requirements to hardware offerings

energy demand depends on system softwareworkload distribution and node assignmentschedulingrun-time controls (i.e. distributed powerful management)

power and energy proportionalityas to varying workloads, power demand must scaleconsider blocking and non-blocking energy management methods


Paper Discussion

paper discussion▶ Andrew Krioukov et al.

NapSAC: Design and Implementation of aPower-Proportional Web ClusterProceedings of the Workshop on Green Networking (GreenNet’10),2010.


Subject Matter

cluster systems consist of homogeneous or heterogeneous nodesthat cooperatively work on a solution for a large problem (e.g.,scientific computing, number crunching)consider overall energy demand at cluster and local energy demandat node level to improve energy proportionality

reading list for Lecture 8:▶ Rolf Neugebauer and Derek McAuley

Energy is just another resource: Energy accounting andenergy pricing in the Nemesis OSProceedings of the 8th Workshop on Hot Topics in OperatingSystems (HotOS’01), 2001.

© thoenig EASY (ST 2018, Lecture 7) Summary 24 – 26

Reference List I

[1] Andersen, D. G. ; Franklin, J. ; Kaminsky, M. ; Phanishayee, A. ; Tan, L. ;Vasudevan, V. :FAWN: A Fast Array of Wimpy Nodes.In: Proceedings of the 22nd ACM SIGOPS Symposium on Operating SystemsPrinciples, 2009, S. 1–14

[2] Choi, J. ; Cher, C.-Y. ; Franke, H. ; Hamann, H. ; Weger, A. ; Bose, P. :Thermal-aware Task Scheduling at the System Software Level.In: Proceedings of the 2007 International Symposium on Low Power Electronics andDesign (ISLPED’07), 2007, S. 213–218

[3] Hölzle, U. ; Barroso, L. A.:The Case for Energy-Proportional Computing.In: Computer 40 (2007), 12, S. 33–37

[4] Srikantaiah, S. ; Kansal, A. ; Zhao, F. :Energy Aware Consolidation for Cloud Computing.In: Proceedings of the 2008 Workshop on Power Aware Computing and Systems(HotPower’08), 2008

© thoenig EASY (ST 2018, Lecture 7) Summary – Bibliography 25 – 26

Reference List II

[5] Zhang, H. ; Hoffmann, H. :Performance & Energy Tradeoffs for Dependent Distributed Applications UnderSystem-wide Power Caps.In: Proceedings of the 47th International Conference on Parallel Processing(ICPP’18), 2018

© thoenig EASY (ST 2018, Lecture 7) Summary – Bibliography 26 – 26

Documents

;2M/ 1M2`;v@ r `2 *QKTmiBM; avbi2Kb S`27 +2€¦ · 1M2`;v@2{+B2Mi *Hmbi2` `+?Bi2+im`2. pB/ :X M/2`b2M 2i HX, 7 bi `` v Q7 rBKTv MQ/2b U6 qLV ( R) +Hmbi2` `+?Bi2+im`2 i? i Bb +QKTQb2

;2M/ 1M2`;v@ r `2 QKTmiBM; avbi2Kb S`27 +2€¦ · 1M2`;v@2{+B2Mi Hmbi2` `+?Bi2+im`2. pB/ :X M/2`b2M 2i HX, 7 bi `` v Q7 rBKTv MQ/2b U6 qLV ( R) +Hmbi2` `+?Bi2+im`2 i? i Bb +QKTQb2