Static and Dynamic Load Balancing for Heterogeneous Systems · Static and Dynamic Load Balancing...

Multicore Challenge Conference 2013

UWE, Bristol

Static and Dynamic Load Balancing for Heterogeneous Systems

Greg Michaelson

Turkey Al Salkini, Khari Armih, Deng Xiao Yan

Heriot-Watt University

Phil Trinder

University of Glasgow

1 Multicore Challenge Conference 2013

Context

• explosive growth in deployment of heterogeneous systems

– desktop/laptop/tablet with many-core + GPU

• HPC with clusters of nodes of many-core + GPU

• upgrade/replace/remove cluster components as opportune rather than replacing whole cluster

• newer components typically higher spec than older components

• hard to allocate and balance load

Multicore Challenge Conference 2013 2

Context

• typical heterogeneous system

• colours indicate different technologies

network

Cores Mem GPU

cluster cluster

Context

• static work allocation for regular data parallelism

• simple data parallel skeleton with master & identical workers

• masters sends work to workers and receives results

• assume: master on core; workers on cores or GPU

master

worker1 worker2 workerM ...

Homogeneity

• cluster of identical single core nodes with no GPU

network

Core Mem

Amdahl’s Law

• T = total sequential time

Amdahl’s Law

• Tseq = necessarily sequential time

• Tpar = potentially parallelisable time

• i.e. T = Tseq + Tpar

Tseq Tpar

Amdahl’s Law

• N = number of nodes

• allocate same amounts of Tpar

to each node

Tseq Tpar

Tpar1 Tpar2 TparN ... Tseq

Amdahl’s Law

• N = number of nodes

• allocate same amounts of Tpar

to each node

• Tmin = minimum time

• Tmin = Tseq + Tpar/N

Tseq Tpar

Tpar1 Tpar2 TparN ... Tseq

Tseq Tpar1

saving

Communication cost

• Tsend = time to send data to processors

• Trec = time to receive results from processors

• for any parallel benefit, need:

• Tsend+Trec+Tpar/N < Tpar

Tseq Tpar1

Tseq Tpar

Tsend Trec

saving

Heterogeneity 1

• different number of cores/amount of RAM on each node

• same core/RAM technologies

network

Cores1 Mem1

Cores2 Mem2

Heterogeneity 1

• Coresi = number of cores on node i

• P = number of processors = Cores1+...+CoresN

• each core gets:

Tpar/P

• node i gets:

Coresi*Tpar/P

• haven’t taken memory into account...?

Heterogeneity 2

• different technologies on each node

• no GPU

network

Cores1 Mem1

Cores2 Mem2

Heterogeneity 2

• characterise node strength

– coarse-grain, relative measure

• for node i:

Speedi = clock speed of each core

Strengthi = Coresi*Speedi

i.e. strength increases with number of cores & core speed

• whole system: Strength = Strength1+...+StrengthN

• node i gets: Tpar*Strengthi/Strength

• each node i core gets: (Tpar*Strengthi/Strength)/Coresi

Heterogeneity 2

• still doesn’t take memory into account?

• top level shared cache (i.e. L2 or L3) most important

– determines frequencies of access clashes & caching

• for node i:

Cachei = shared cache size

Strengthi = Coresi*Speedi*Cachei

i.e. strength also increases with cache size

• use previous equations for system strength, and node and core load

Heterogeneity 2

Heterogeneity 3

• different node technologies

• with GPU

network

Cores1 Mem1 GPU1

Cores2 Mem2 GPU2

Heterogeneity 3

• wish to treat GPU as additional general compute resource - GP-GPU

• but cannot directly adopt current algorithm independent model

– very hard to characterise GP-GPU succinctly from static information

• base GPU strength measure on algorithm specific profiling

• use cost model for core strength

• normalise strength relative to one core

Heterogeneity 3

TCore0 = time for algorithm on base core

• for node i:

TGPUi = time for algorithm on GPU

• strength of GPUi = TGPUi/TCore0

• node i core strength = (Speedi*Cachei)/(Speed0*Cache0)

• assume GPU controlled by one core

Strengthi = TGPUi/TCore0+

(Corei-1)*

(Speedi*Cachei)/(Speed0*Cache0)

Heterogeneity 3

Task mobility

• static model assumes program has sole use of system

– no load changes at run-time

• if system shared then load changes unpredictably at run-time

• base system design on threads?

• hope operating system will manage load?

– optimising movement of arbitrary threads across nodes is a black art

• decentralise mobility

• let master decide when to move work

Task mobility

M: Master, Wi: Worker, Colour: Amount of load on processor.

Task mobility

W3 becomes overloaded. The load should be balanced by moving the

work to a lightly loaded processor.

Task mobility

The load is re-balanced.

Mobility decision

Ti = time to complete task on location i

Tmove = time to move task from current to new location

Overhead = acceptable completion overhead e.g. 5%

• to justify moving work from location i to location j:

Tj+Tmove < Ti*Overhead

Mobility decision

• homogeneous case

TBase = time to complete whole task on unloaded core

Loadi,t = load on location i at time t

• assume we inspect load having completed P% of task

Ti = TBase*(100-P)/100*f(Loadi,t,Loadi,0)

Evaluation

16 workers + 10 tasks, no mobility 16 workers + 10 tasks, mobility

• 7,000*7,000 matrix multiplication • Poisson load • darker == more heavily loaded

Conclusions

• simple models + relative measures suffice for static & dynamic load balancing

• data parallel/regular parallelism only

• small scale experiments

• next:

– larger scale experiments

– explore more complex algorithms

– extend mobility to many core

References

• T. Al Salkini and G. Michaelson, `Dynamic Farm Skeleton Task Allocation Through Task Mobility’, 18th International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'12), Las Vegas, July 16-19, 2012

• K. Armih, G. Michaelson and P. Trinder, `Cache size in a cost model for heterogeneous skeletons’, Proceedings of HLPP 2011 : 5th ACM SIGPLAN Workshop on High-Level Parallel Programming and Applications, Tokyo, September. 2011

• X. Y. Deng, P. Trinder and G. Michaelson, `Cost-Driven Autonomous Mobility', Computer Languages, Systems and Structures, 36(1) (April 2010) pp 34-51, 2010

Static and Dynamic Load Balancing for Heterogeneous Systems · Static and Dynamic Load Balancing...

Documents

Static and Dynamic Balancing Second Edition

FLOW RATE STATIC BALANCING Valves for radiators

Load Balancing of Elastic Data Tra c in Heterogeneous Wireless … · 2017. 4. 28. · static policies. For dynamic policies, the performance is evaluated by simulations. The results

Rotor Blade Static Balance Art or Science Balancing - Art or Science.pdf · Rotor Blade Static Balance – Art or Science? 2 Dynamic balancing is always the final solution Static

Non-Dimensional Approach for Static Balancing of

Static Balancing de GIMBAL

Heuristic Static Load-Balancing Algorithm Applied to the … · Heuristic Static Load-Balancing Algorithm Applied to the Fragment Molecular Orbital Method Yuri Alexeev*, Ashutosh

Balancing: Dynamic and Static · Balancing: Dynamic and Static !!!!! Stages!of!Motor!Development3 STATIC! Stage!1! Stage!2! Stage!3! InialStage! ElementaryStage! Mature!Stage!

Rotor Blade Static Balancing - Art or Science

DHANALAKSHMI SRINIVASAN COLLEGE OF ENGINEERING … DOM.pdf24. Write different types of balancing? a) Balancing of rotating masses · Static balancing · Dynamic balancing b) Balancing

VSHB Series Static Balancing Valves

LSQ: Load Balancing in Large-Scale Heterogeneous Systems

National Institute of Technology · PDF file... design principles - user interaction - information presentation ... static and dynamic force ... static and dynamic balancing - balancing

Dynamic Load Balancing Strategies in Heterogeneous ...ethesis.nitrkl.ac.in/5642/1/dlbmain.pdf · Dynamic Load Balancing Strategies in Heterogeneous Distributed System Thesis submitted

FIVC Static Balancing Valve Grey Cast Iron – PN 16flowconivc.com/wp-content/uploads/2016/08/12-Static-Bal-Valve... · FIVC Static Balancing Valve Grey Cast Iron ... Technical data

BALANCING OF ROTATING MASSES - WordPress.com of a single rotating mass by two masses ... STATIC BALANCING: A system of rotating masses is said to be in static ... For dynamic balancing

Static and Dynamic Balancing

Balancing Static and Dynamic Match and Uncertainty … · Balancing Static and Dynamic Match and Uncertainty Quantification in Stochastic History Matching João Lino Pereira Thesis

Section 2: Static Equilibrium II- Balancing Torques

BALANCING - Darshan Institute of Engineering & Technology€¦ · BALANCING Course Contents 3.1 Introduction 3.2 Static Balancing 3.3 Types of Balancing 3.4 Balancing of Several Masses