Load Balancing, Beowulf, and Grid ComputingDavid Finkel
References
“The Anatomy of the Grid”, Ian Foster, Carl Kesselman, Steven
Tuccke, International Journal of Supercomputer Applications,
2001
“A Performance Oriented Migration Framework for the Grid”, Satish
S. Vadhiyar and Jack J. Dongarra, Proceedings of CCGrid 2003, Third
IEEE/ACM International Symposium on Cluster Computing and the
Grid
Innumerable papers by PEDS members Finkel, Wills and Finkel, and
Claypool and Finkel, with additional co-authors.
Computer Science Department
Runs over the Internet, potentially world-wide
Several approaches have emerged: Paper discusses Globus
Toolkit
Computer Science Department
Coordinated resource sharing and problem solving in dynamic,
multi-institutional virtual organizations.
Highly controlled, with resource providers and consumers defining
what is shared and the conditions of sharing.
Issues to address: Protocols, privacy, security, costs, …
Computer Science Department
Computer Science Department
Resources: Computational, storage, network
Enquiry functions: to determine characteristics and state of a
resource
Management functions: Start, stop computations, reserve
bandwidth
Computer Science Department
Directory services for discovery of resources
Co-allocation, scheduling, brokering
Computer Science Department
Load Sharing - Overview
Transferring work from a heavily loaded node to a lightly loaded
node
Purpose: To improve application performance
Transferring processes not suitable for fine-grain
parallelism
Also known as: Load Balancing, Process Migration.
Computer Science Department
Measuring load (policy, implementation)
Which jobs to transfer
Computer Science Department
Load Sharing in the Grid
“A Performance Oriented Migration Framework for the Grid”, Vadhiyar
and Donngarra
Part of the GrADS project – Grid Application Development System –
based at Univ. of Tennessee and other institutions
Designed for long-running computations
Load Sharing in the Grid - 2
Basic idea – the load sharing system can run a performance model of
a computation to estimate running time and resource
requirements.
Application programmer is responsible for providing performance
model for the application, and hooks to stop application,
checkpoint state, and re-start application.
Based on MPI Programming Library, Globus Toolkit
Computer Science Department
Before application begins, Application Manager runs performance
model to predict execution times, number of processors.
Determines whether an appropriate set of processors is available,
schedules jobs
Monitors process of application as it runs
Computer Science Department
Load sharing can occur if
Application progress is delayed
Additional resources become available
Checkpoint
Computer Science Department
Load sharing on the Grid:
There’s a large body of pre-Grid research of load balancing in
distributed systems
Can the results of this research be used to design load balancing
systems for the Grid
Computer Science Department
David Finkel