High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is...

High Performance Computing 1

Load-Balancing

Load-Balancing• What is load-balancing?

– Dividing up the total work between processes when running codes on a parallel machine

• Load-balancing constraints– Minimize interprocess

communication

• Also called: – partitioning, mesh partitioning,

(domain decomposition)

Know your data and memory

• Memory is organized by banks. Between access to any bank, there is a latency period.

• Matrix entries are stored column-wise in FORTRAN.

is addressed

Matrix addressing in FORTRAN

Addressing Memory

• For illustration purposes, lets imagine 8 banks [128 or 256 common on chips today], with bank busy time (bbt) of 8 cycles between accesses. Thus we have:

data a13 a23 a33 a43 a14 a24 a34 a44

data a11 a21 a31 a41 a12 a22 a32 a42

bank 1 2 3 4 5 6 7 8

Addressing Memory

• If we access data column-wise, we proceed through each bank in order. By the time we call a13, we (just) avoid bbt.

• On the other hand, if we access data row-wise, we get a11 in bank 1, a12 in bank 5, a13 in bank 1 again - so instead of access on clock cycle 3, we have to wait until cycle 9. Then we get a14 in bank 5 again on cycle 10, etc.

Indirect addressing

• If addressing is indirect we may wind up jumping all over, and suffer performance hits because of it.

Shared Memory

• Bank conflicts depend on granularity of memory

• If N memory refs per cycle, p processors, memory with b cycles bbt, need p*N*b memory banks to see uninterrupted access of data

• With B banks, granularity is g = B/(p*N*b)

• Separate selection of data from its processing

• Each subtask requires its own data structure. Be prepared to change structures between tasks

Load-balancing nomenclature

Object

Objects get distributed among different processes

Edges represent information that need to be shared between objects

PartitioningDivides up the work

•5 & 4 objects assigned to processes

•Creates “edge-cuts”

•Necessary communications between processes

Work/Edge Weights• Need a good measure of what the expected work may be

– Molecular dynamics: • number of molecules• regions

– FEM/finite difference/finite volume, etc:• Degrees of freedom• Cells/elements

• If edge weights are used, also need a good measure on how strongly objects are coupled to each other

Static/Dynamic Load-Balancing

• Static load-balancing– Done as a “preprocessing” step before the

actual calculation– If the objects and edges don’t change very

much or at all, can do static load-balancing

• Dynamic load-balancing– Done during the calculation– Significant changes in the objects and/or edges

Dynamic Load-Balancing Example

h-adapted mesh

•Workload is changing as the computation proceeds

•Calculate a new partition

•Need to migrate the elements to their assigned process

Static vs. Dynamic Load Balancing

• Static partitioning insufficient for many applications– Adaptive mesh refinement– Multi-phase/Multi-physics computations– Particle simulations– Crash simulations– Parallel mesh generation– Heterogeneous

computers

• Need dynamic load balancing

Dynamic Load-Balancing Constraints

• Minimize load-balancing time – Memory constraints

• Minimize data migration -- incremental partitions– Small changes in the computation should result in small

changes in the partitioning– Calculating new partition and data migration should

take less time than the amount of time saved by performing computations on new grid

• Done in parallel

Methods of Load-Balancing

• Geometric– Based on geometric location– Faster load-balancing time with medium quality results

• Graph-based– Create a graph to represent the objects and their

connections– Slower load-balancing time but high quality results

• Incremental methods– Use graph representation and “shuffle” around objects

Choosing a Load-Balancing Algorithm/Method

No algorithm/method is appropriate for all applications!

• Graph load-balancing algorithms for:– Static load-balancing– Computations where computation to load-balancing

time ratio is high• Implicit schemes with a linear and non-linear solution scheme

Choosing a Load-Balancing Algorithm/Method

• Geometric load-balancing algorithms for:– Computations where computation to load-balancing

time ratio is low• For explicit time stepping calculations with many time steps

and varying workload (MD, FEM crash simulations, etc.)• Problems with many load-balancing objects

Geometric Load-Balancing• Based on the objects’ coordinates

– Want a unique coordinate associated with an object

• Node coordinates, element centroid, molecule coordinate/centroid, etc.

• Partition “space” which results in a partition of the load-balancing objects

• Edge cuts are usually not explicitly dealt with

Geometric Load-Balancing Assumptions

• Objects that are close will likely need to share information– Want compact partitions

• High volume to surface area or high area to perimeter length ratios

• Coordinate information• Bounded domain

Geometric Load-Balancing Algorithms

• Recursive Coordinate Bisection (RCB) – Berger & Bokhari

• Recursive Inertial Bisection (RIB)– Taylor & Nour-Omid

• Space Filling Curves (SFC)– Warren & Salmon, Ou, Ranka, & Fox, Baden & Pilkington

• Octree Partitioning/Refinement-tree Partitioning– Loy & Flaherty, Mitchell

Recursive Coordinate Bisection1. Choose an axis for the

2. Find the proper location of the cut

3. Group objects together according to location relative to cut

4. If more partitions are needed, go to step 1

Recursive Inertial Bisection1. Choose a direction for

the cut

2. Find the proper location of the cut

3. Group objects together according to location relative to cut

4. If more partitions are needed, go to step 1

Space Filling Curves

A Space Filling Curve is a 1-dimensional curve which passes through every point in an n-dimensional domain

Load-Balancing with Space Filling Curves

• The SFC gives a 1-dimensional ordering of objects located in an n-dimensional domain– Easier to work with objects

in 1 dimension than in n dimensions

• Algorithm:1. Sort objects by their location

on the SFC2. Calculate cuts along the SFC

Octree Partitioning/Refinement-Tree Partitioning

• Tree based algorithms for applications with multiple levels of data, simulation accuracy, etc.– Tree is usually built from

specific computational schemes

– Tightly coupled with the simulation

Comparisons of RCB, RIB, and SFC

• RCB and RIB usually give slightly better partitions than SFC

• SFC is usually a little faster

• SFC is a little better for incremental partitions– RIB can be real unstable for incremental

partitions

Load-Balancing Libraries

• There are many load-balancing libraries downloadable from the web– Mostly graph partitioning libraries

• Static: Chaco, Metis, Party, Scotch

• Dynamic: ParMetis, DRAMA, Jostle, Zoltan

• Zoltan (www.cs.sandia.gov/Zoltan)– Dynamic load-balancing library with:

• SFC, RCB, RIB, Octree, ParMetis, Jostle

– Same interface to all load-balancing algorithms

Methods to Avoid Communication

• Avoiding load-balancing– Load-balancing not needed every time the

workload and/or edge connectivity changes

• Ghost cells

• Predictive load-balancing

Accessing Information on Other Processors

• Need communication between processors

• Use ‘ghost’ cells – need to maintain consistency of data in ghost cells

Ghost Cells

• Copies of cells assigned to other processors• Make needed information available• No solution values are computed at the ghost cells• Ghost cell information needs to be updated whenever

necessary• Ghost cells need to be calculated dynamically because of

changing mesh and dynamic load-balancing

Predictive Load-Balancing

• Predict the workload and/or edge connectivity and load-balance with that information– Assumes that you can predict the workload

and/or edge connectivity

• Still need to perform communication but reduces data migration

Predictive Load-Balancing• Refine then load-balance – 4 objects migrated• Predictive load-balance then refine – 1 object

migrated

High Performance Computing 1 Load-Balancing. High Performance Computing 1 Load-Balancing What is...

Documents

GREEDY LOAD BALANCING FOR CLOUD COMPUTING FRAMEWORK

Anticipatory Models of Load Balancing in Cloud Computing · Anticipatory Models of Load Balancing in Cloud ... continuous guidance, ... 2.3 Load Balancing in Cloud Computing

Lecture 31 – Load Balancing Distributed Computing (Part 5)

Swarm Intelligent Algorithms for solving load balancing in ...ecsjournal.org/Archive/Volume43/Issue1/5.pdf · Swarm Intelligent Algorithms for solving load balancing in cloud computing

Performance Analysis of Load Balancing Algorithms in Cloud ... · inspire scheduling and load balancing for distributed cloud computing system. Evaluation and performance analysis,

Load Balancing for Distributed and Integrated Power Systems Using Grid Computing

SURVEY ON VIRTUAL LOAD BALANCING ARCHITECTURES IN MOBILE CLOUD · Keywords: cloud –computing, virtual load balancing, Mobile cloud computing, guaranteed servicing and traditional

LOAD BALANCING IN CLOUD COMPUTING SYSTEMS Bachelor of Technology Computer

Cluster Computing. References HA Linux Project – Sys Admin –1155/sam0101a/0101a.htm Load Balancing

Load Balancing in Cloud Computing

Cluster Computing and Load Balancing

Load Balancing Azure David Rendón. Agenda Demo FAQS ¿Qué es Load Balancing? ¿Por qué Load Balancing?

DVAD42 –Load Balancing P4 programmable Load Balancing

Load Balancing In Distributed Computing

LOAD BALANCING IN CLOUD COMPUTING SYSTEMS …ethesis.nitrkl.ac.in/2545/1/load_balancing_in... · Load balancing in cloud computing systems ... individually for efﬁcient load balancing

Load Balancing in Cloud Computing by Ant Colony ......Load Balancing is a vital part of Cloud Computing framework to accomplish maximum consumption of resources. Ant colony optimization

Dynamic Load Balancing of High Performance Computing … · 2014. 12. 5. · Dynamic Load Balancing of High Performance Computing Applications Echtzeit-AG, ... Apply this algorithm

Honey Bee Based Load Balancing in Cloud Computing · Load balancing is the process of distributing workloads and computing resources in a cloud computing environment. It allows enterprises

Load-Balancing Scatter Operations for Grid Computingperso.ens-lyon.fr/frederic.vivien/Publications/LIP-2003-17.pdf · Load-Balancing Scatter Operations for Grid Computing 3 time idle

A Result on Novel Approach for Load Balancing in Cloud Computing