A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing

A Multi-Agent System Approach toLoad Balancing and Resource Allocation

for Distributed Computing

Soumya Banerjee & Joshua Hecker

� Age of distributed computing

� Trend in moving computation on inexpensive but geographically distributed computers

� SETI@home, LHC@home

� Need for efficient allocation algorithms

Motivation

Decentralized Computing

� Can alleviate computing load on centralized monitors

� Robust to single-point failures

� Can achieve application-level resource management (nodes can manage resources better than a global monitor)

� Can scale more gracefully since as the system grows; centralized monitor has to communicate with more and more nodes

� Can better respond to fluctuations in process requirements

� Scenario where it has to "forget" past process requirements and completely rebuild new clusters after servicing one process i.e. no locality

� An agent is a computing node; join together to form a cluster

� Multi-agent systems have emergent properties

� Have been used to model biological phenomenon and real-life problems (left: Keepaway soccer, right: Ant foraging):

Multi-Agent Systems

� A huge number of distributed nodes or agents

� Advantages to computing with geographically proximal computers due to network latency, bandwidth limitations, etc

� There is a global data structure which has a large number of tasks/processes

� A new process that comes in the system will declare a priori the number of threads that it can be parallelized into and its resource requirements (CPUreq)

� Cluster as a network of computers which together can completely service the resource requirements of a single task

� Over time clusters would be created, dissolved and created again dynamically in order to serve the resource requirements of the tasks in the queue

Problem Statement and Assumptions

� dRAP: Distributed Resource Allocation Procedure

� Mode 1: an agent/node that is currently not part of a cluster and has no task assigned to it

1. agent looks at queue Q, examines unallocated tasks and takes on the task which minimizes

� Mode 2: an agent/node that is currently not part of a cluster and has a task assigned to it

1. keep on executing task

2. if the task requirements are not completely satisfied, i.e., keep on querying your neighbors and try to

form a cluster such that

3. when task completes, go to Mode 1

dRAP Algorithm

|1| −reqCPU

1>reqCPU

CPU req = CPU cluster

� Mode 3: an agent/node that is currently part of a cluster and has no task assigned to it

1. agent looks at queue Q, examines unallocated tasks and takes on the task which minimizes

� Mode 4: an agent/node that is currently part of a cluster and has a task assigned to it

1. keep on executing task2. when task completes, breakup cluster and go to Mode 1

dRAP Algorithm

|CPUreq −CPUcluster |

� Caveat: Task list traversal requires O(nm) time per timestep, where n = number of tasks and m = number of clusters

� For entire simulation:

� Compare to FIFO scheduling - drops to O(nm)

� Does our algorithm’s increased complexity per timestep provide enough decrease in scheduling rate to be effective?

dRAP Algorithm

)()( 2

mnOminn

≈−∑=

� Example screenshots of implementation (lines show clusters, redsymbolizes task execution):

Simulation

� Comparisons with a null model (FIFO scheduling algorithm)

� Time to empty queue (of 1000 tasks) = Tcomplete

� Average waiting time (averaged over 1000 tasks) = Twait

� Values given in simulation time steps:

Experiments

Tcomplete Twait

RAP 845.60 342.54

FIFO 1071.20 475.31

� Utilization experiments

� We compared the cluster utilization ability of our algorithm vs. the FIFO scheduling algorithm

� Calculation for each task: (averaged over total number of tasks)

� Optimal value is 100% (our algorithm always achieves this):

Experiments

Utilization

RAP 100%

FIFO 56%

cluster

� Lastly we looked at how the average waiting time and time to completion scaled with the number of nodes in the system

Experiments

0 200 400 600

Scaling of Tcomplete

0 200 400 600

Scaling of Twait

� Same data using log2 on axes and a power curve fit:

Experiments

y = 63630x-0.927

R² = 0.9976

40 80 160 320 640

Nodes (log2)

Scaling of Tcomplete

y = 47010x-1.075

R² = 0.9992

40 80 160 320 640

Nodes (log2)

Scaling of Twait

Optimizations Inspired by the Natural Immune System

• Operates under constraints of physical space

• Resource constrained (metabolic input, number of immune system cells)

• Performance scalability is an important concern (mice to horses)(Banerjee and Moses, 2010, in review)

Search Problem

• They have to search throughout the whole body to locate small quantities of pathogens

Response Problem

• Have to respond by producing antibodies

Nearly Scale-Invariant Search and Response

• How does the immune system search and respond in almost the same time irrespective of the size of the search space?

Crivellato et al. 2004

Solution?

Lymph Nodes (LN)

• A place in which IS cells and the pathogen can encounter each other in a small volume

• Form a decentralized detection network

Decentralized Detection Network

www.lymphadvice.com

Lymph Node Dynamics

Summary

• There are increasing costs to global communication as organisms grow bigger

• Semi-modular architecture balances the opposing goals of detecting pathogen (local communication) and recruiting IS cells (global communication)

• Can we emulate this modular RADAR strategy in distributed systems?

Optimizations inspired by the immune system

� The move towards distributed computing necessitates efficient scheduling algorithms

� Decentralized scheduling of large number of nodes leads to robustness, reduces load on centralized monitor and better response to fluctuations in task queue requirements

� Multi-agent systems have emergent properties and have been used here to adaptively create and allocate clusters to match task demand

� The algorithm outperforms our null model (FIFO scheduling) on average waiting time, time to empty task queue and utilization

� Further, our algorithm is robust to adversarial attack (task queue fluctuations in task processor requirements)

Conclusions

� Value of immune system inspired approaches

� General theory of scaling of artificial immune systems

Conclusions

� Compare with more null algorithms

� Compare with algorithms used in industry e.g. SLURM uses static allocation of nodes to clusters known as partitions

� Compare with cluster allocation algorithm used by Google in MapReduce (this algorithm can improve on their locality optimization since it seeks to form clusters with its neighbors)

� … and sell to the highest bidder!

Future Work

� Dr. Dorian Arnold

Acknowledgements

A Multi-Agent System Approach to Load-Balancing and Resource Allocation for Distributed Computing

Science

Dynamic Load Balancing and Channel Allocation in Indoor WLAN

A survey on the continuous nonlinear resource allocation ... · and vary from equilibrium problems in the engineering and economic sciences, through resource allocation and balancing

Joint Dynamic Radio Resource Allocation and Mobility Load ...Joint Dynamic Radio Resource Allocation and Mobility Load Balancing in 3GPP LTE Multi-Cell Network Feng LI, Lina GENG.,

Task Allocation for Cooperative Multi-agent Systemstechlav.ncat.edu/Presentations/Techlav_Presentation_Shamghah6_24... · Task Allocation for Cooperative Multi-agent Systems Advisor:

Multi-Agent Resource allocation Notes prepared by Ulle Endriss Used with permission

Methods for task allocation via agent coalition formationferber/M2R/biblio/ai1548.pdf · Artiﬁcial Intelligence 101 (1998) 165–200 Methods for task allocation via agent coalition

Optimization Techniques for Task Allocation and Scheduling ... · Optimization Techniques for Task Allocation and Scheduling in Distributed Multi-Agent Operations By Mark F. Tompkins

Multi-Agent Coordination for Multi-robot Task Allocation and Area Coverage

Load Balancing and Virtual Machine Allocation in Cloud

Overlay Networks for Task Allocation and Coordination in Large-scale Networks of Cooperative Agent

GAME THEORY BASED JOB ALLOCATION/LOAD … · game theory based job allocation/load balancing in distributed systems with applications to grid computing by satish penmatsa, m.s. dissertation

ASSET ALLOCATION...A Guide to ASSet ALLoCAtioN 02 Balancing risk and return by apportioning your portfolio’s assets through different life stages investing. If we could see into

Load Balancing using Mobile Agent Approach

Robust Distributed Task Allocation for Autonomous Multi-Agent Teams

Decentralised agent-based resource allocation in open and

A Market-Based Model for Resource Allocation in Agent Systems

Multi-Agent Task Allocation using Cross-Entropy Temporal ...coogan.ece.gatech.edu/papers/pdf/banks2020multi.pdfThe multi-agent task allocation cross-entropy (MTAC-E) algorithm is developed

Dynamic Resource Allocation for Load Balancing in Fog … · 2017. 12. 6. · resource utilization, and obtaining load balancing for the data centers [–]. In the cloud environment,

Load Balancing as a Service v2 - OpenStack · Load Balancing as a Service v2.0. Liberty and Beyond. Brandon Logan. Franklin Naval. Michael Johnson. ... • Haproxy (lbaas-agent) •

Balancing Organizational Regulation and Agent Autonomy: An … · 2012-05-09 · Balancing Organizational Regulation and Agent Autonomy: An MDE-based Approach? Loris Penserini1, Virginia