Grid Computing: From Old Traces to New Applications

April 20, 20231

Grid Computing:From Old Traces to New Applications

Fribourg, Switzerland

Alexandru Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Catalin Dumitrescu, Mathieu Jan, Dick Epema

Parallel and Distributed Systems Group, TU Delft

Big thanks to our collaborators: U Wisc./Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …

DGSimThe FailureTraceArchive

About the Speaker

• Systems• The Koala grid scheduler• The Tribler BitTorrent-compatible P2P file-sharing• The POGGI and CAMEO gaming platforms

• Performance• The Grid Workloads Archive (Nov 2006)• The Failure Trace Archive (Nov 2009)• The Peer-to-Peer Trace Archive (Apr 2010)• Tools: DGSim trace-based grid simulator, GrenchMark

workload-based grid benchmarking

• Team of 15+ active collaborators in NL, AT, RO, US• Happy to be in Berkeley until September

April 20, 20232

The Grid

An ubiquitous, always-on computational and data storage platform on which users can seamlessly run their (large-scale) applications

April 20, 20233

Shared capacity & costs, economies of scale

April 20, 20234

The Dutch Grid: DAS System and Extensions

VU (85 nodes)

TU Delft (68) Leiden (32)

SURFnet6

10 Gb/s lambdas

UvA/MultimediaN (46)

UvA/VL-e (41)

• 272 AMD Opteron nodes 792 cores, 1TB memory• Heterogeneous: 2.2-2.6 GHz single/dual core nodes• Myrinet-10G (excl. Delft)• Gigabit Ethernet

DAS-4 (upcoming)• Multi-cores: general purpose, GPU, Cell, …

DAS-3: a 5-cluster grid

Clouds• Amazon EC2+S3, Mosso, …

April 20, 20235

Many Grids Built

DAS, Grid’5000, OSG, NGS, CERN, …

Why grids and not The Grid?Why grids and not The Grid?

April 20, 20236

Agenda

1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. New Application Types6. Suggestions for Collaboration7. Conclusion

April 20, 20237

The Failure Trace ArchiveFailure and Recovery Events

20+ traces online

http://fta.inria.frhttp://fta.inria.fr

D. Kondo, B. Javadi, A. Iosup, D. Epema, The Failure Trace Archive: Enabling Comparative Analysis of Failures in Diverse Distributed Systems, CCGrid 2010 (Best Paper Award)

Euro-Par 2008, Las Palmas, 27 August 20088

System Availability CharacteristicsResource Evolution: Grids Grow by Cluster

April 20, 20239

System Availability CharacteristicsGrid Dynamics: Grids Shrink Temporarily

Grid-level view

Average availability:

69%

April 20, 202310

Resource Availability Model

• Assume no correlation of failure occurrence between clusters

• Which site/cluster? • fs, fraction of failures at cluster s

MTBF MTTR Correl.

• Weibull distribution for IAT• the longer a node is online, the higher the chances that it

will fail

Was it the System?

• No• System can grow fast• Good data and models to support system

designers• Yes• Grid middleware unscalable• Grid middleware failure-prone• Grid resources unavailable• Poor online information about resource

availabilityApril 20, 2023

11

April 20, 202312

Agenda

1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. Was it Another Hype?6. Suggestions for Collaboration7. Conclusion

April 20, 202313

The Grid Workloads ArchivePer-Job Arrival, Start, Stop, Structure, etc.

6 traces online

http://gwa.ewi.tudelft.nlhttp://gwa.ewi.tudelft.nl

1.5 yrs >750K >250

A. Iosup, H. Li, M. Jan, S. Anoep, C. Dumitrescu, L. Wolters, D. Epema, The Grid Workloads Archive, FGCS 24, 672—686, 2008.

April 20, 202314

Grid Systems

How Are Real Grids Used?

Data Analysis and Modeling• Grids vs. parallel production environments such

as clusters and (small) supercomputers• Bags of single-processor tasks vs. single parallel jobs• Bigger bursts of job arrivals• More jobs

Parallel production environments

April 20, 202315

Bags-of-Tasks (BoTs)

Grid Workloads

Analysis: Grid Workload Components

Time [units]

Workflows (WFs)

• BoT size = 2-70 tasks, most 5-20• Task runtime highly variable,

from minutes to tens of hours

• WF size = 2-1k tasks, most 30-40

• Task runtime of minutes

April 20, 202316

• Single arrival process for both BoTs and parallel jobs• Reduce over-fitting and complexity of “Feitelson adapted”

by removing the RunTime-Parallelism correlated model• Validated with 7 grid workloads

Grid Workloads

Modeling Grid Workloads: adding users, BoTs

A. Iosup, O. Sonmez, S. Anoep, and D.H.J. Epema. The Performance of Bags-of-Tasks in Large-Scale Distributed Systems, HPDC, pp. 97-108, 2008.

April 20, 202317

Grid Workloads

Load Imbalance Across Sites and Grids

• Overall workload imbalance: normalized daily load (5:1)

• Temporary workload imbalance: hourly load (1000:1)

Overall imbalanc

e Temporary imbalance

Was it the Workload?

• No• Similar workload characteristics across grids• High utilization possible due to single-node jobs• High load imbalance• Good data and models to support system

designers• Yes• Too many tasks (system limitation)• Poor online information about job characteristics +

High variability of job resource requirements • How to schedule BoTs, WFs, mixtures in grids?

April 20, 202318

April 20, 202319

Agenda

1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. Was it Another Hype?6. Suggestions for Collaboration7. Conclusion

April 20, 202320

Problems in Grid Scheduling and Resource Management

The System1. Grid schedulers do not own resources

themselves• They have to negotiate with autonomous local schedulers

• Authentication/multi-organizational issues

2. Grid schedulers interface to local schedulers• Some may have support for reservations, others are queuing-based

3. Grid resources are heterogeneous and dynamic• Hardware (processor architecture, disk space, network)

• Basic software (OS, libraries)

• Grid software (middleware)

• Resources may fail

• Lack of complete and accurate resource information

April 20, 202321

Problems in Grid Scheduling and Resource Management

The Workloads

4. Workloads may be heterogeneous and dynamic• Grid schedulers may not have control over the full workload

(multiple submission points)

• Jobs may have performance requirements

• Lack of complete and accurate job information

5. Application structure may be heterogeneous• Single sequential job

• Bags of Tasks; parameter sweeps (Monte Carlo), pilot jobs

• Workflows, pipelines, chains-of-tasks

• Parallel jobs (MPI); malleable, coallocated

April 20, 202322

The Koala Grid Scheduler

• Developed in the DAS system• Has been deployed on the DAS-2 in September 2005• Ported to DAS-3 in April 2007• Independent from grid middlewares such as Globus• Runs on top of local schedulers

• Objectives:• Data and processor co-allocation in grids• Supporting different application types• Specialized application-oriented scheduling policies

Koala homepage: http://www.st.ewi.tudelft.nl/koala/

April 20, 202323

Koala in a Nutshell

• Parallel Applications• MPI, Ibis,…• Co-Allocation• Malleability

• Parameter Sweep Applications• Cycle Scavenging• Run as low-priority jobs

• Workflows

A bridge between theory and practice

April 20, 202324

How To Compare Existing and New Grid Systems?

The Delft Grid Simulator (DGSim)

DGSimDGSim……tudelft.nl/~iosup/dgsim.phptudelft.nl/~iosup/dgsim.php

Discrete event generator

Generate realistic workloads

Automate simulation process (10,000s of tasks)

April 20, 202325

How to Inter-Operate Grids?

Existing (, Working?) Alternatives

Independent Centralized

HierarchicalDecentralized

Condor

Globus GRAM Alien

Koala

OAR

CCS

Moab/Torque

OAR2

NWIRE

OurGrid

Condor Flocking

Load imbalance? Resource selection? Scale?

Root ownership?

Node failures?

Accounting?Trust? Scale?

April 20, 202326

3

3

3

333

2

Inter-Operating Grids Through Delegated MatchMaking [1/3]

The Delegated MatchMaking Architecture

1. Start from a hierarchical architecture2. Let roots exchange load3. Let siblings exchange load

Delegated MatchMaking Architecture=Delegated MatchMaking Architecture=Hybrid hierarchical/decentralized Hybrid hierarchical/decentralized

architecture for grid inter-operationarchitecture for grid inter-operation

April 20, 202327

Inter-Operating Grids Through Delegated MatchMaking [3/3]

The Delegated MatchMaking Mechanism

1. Deal with local load locally (if possible)2. When local load is too high, temporarily bind resources from

remote sites to the local environment. • May build delegation chains. • Delegate resource usage rights, do not migrate jobs.

3. Deal with delegations each delegation cycle (delegated matchmaking)

Delegate

Local load too high

Resource request

Resource usage rights

Bind remote resource

The Delegated MatchMaking Mechanism=The Delegated MatchMaking Mechanism=Delegate Resource Usage Rights, Delegate Resource Usage Rights,

Do Not Delegate JobsDo Not Delegate Jobs

April 20, 202328

• DMM• High goodput• Low wait time• Finishes all jobs

• Even better for load imbalance between grids

• Reasonable overhead• [see thesis]

What is the Potential Gain of Grid Inter-Operation?

Delegated MatchMaking vs. Alternatives

Independent

Centralized

Decentralized

DMM

(Higher is better)

Grid Inter-Operation (through DMM)Grid Inter-Operation (through DMM)delivers good performancedelivers good performance

April 20, 202329

4.2. Studies on Grid Scheduling [5/5]

Scheduling under Cycle Stealing

Scheduler

CS-Runner

Node

submits PSA(s)

JDF

grow/shrink

messagesregisters

Clusters

Launcher

Launcher

Head Node

KCMKCM

submitslaunchers

deploys, monitors,

and preempts

tasks

monitors/informs

idle/demanded resources

CS Policies:• Equi-All: grid-wide basis • Equi-PerSite: per cluster

CS Policies:• Equi-All: grid-wide basis • Equi-PerSite: per cluster

Application Level Scheduling:• Pull-based approach• Shrinkage policy

Application Level Scheduling:• Pull-based approach• Shrinkage policy

Launcher

Launcher

O. Sonmez, B. Grundeken, H. Mohamed, A. Iosup, D. Epema: Scheduling Strategies for Cycle Scavenging in Multicluster Grid Systems. CCGRID 2009: 12-19

Requirements1.Unobtrusiveness Minimal delay for (higher priority) local and grid jobs

2.Fairness3. Dynamic Resource Allocation4. Efficiency5. Robustness and Fault

Tolerance

Deployed as Koala Runner

Was it the System Designer?

• No• Mechanisms to inter-operate grids: DMM [SC|07], …• Mechanisms to run many grid application types:

WFs, BoTs, parameter sweeps, cycle scavenging, …• Scheduling algorithms with inaccurate information

[HPDC ‘08, ‘09, ‘10]• Tools for empirical and trace-based experimentation

• Yes• Still too many tasks• What about new application types?

April 20, 202330

April 20, 202331

Agenda

1. Introduction2. Was it the System?3. Was it the Workload?4. Was it the System Designer?5. New Application Types6. Suggestions for Collaboration7. Conclusion

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 32

MSGs are a Popular, Growing Market

• 25,000,000 subscribed players (from 150,000,000+ active)

• Over 10,000 MSGs in operation

• Market size 7,500,000,000$/year

Sources: MMOGChart, own research. Sources: ESA, MPAA, RIAA.

Cloud Futures Workshop 2010 – Cloud Computing Support for Massively Social Gaming 33

Massively Social Gaming as New Grid/Cloud Application

1. Virtual worldExplore, do, learn, socialize, compete+

2. ContentGraphics, maps, puzzles, quests, culture+

3. Game analyticsPlayer stats and relationshipsRomeo and

Juliet

Massively Social Gaming(online) games with massive numbers of players (100K+), for which social interaction helps the gaming experience

[SC|08, TPDS’10]

[EuroPar09

BPAward, CPE10]

[ROIA09]

April 20, 202334

Suggestions for CollaborationSuggestions for Collaboration

• Scheduling mixtures of grid/HPC/cloud workloads• Scheduling and resource management in practice• Modeling aspects of cloud infrastructure and workloads

• Condor on top of Mesos

• Massively Social Gaming and Mesos• Step 1: Game analytics and social network analysis in Mesos

• The Grid Research Toolbox• Using and sharing traces: The Grid Workloads Archive and

The Failure Trace Archive• GrenchMark: testing large-scale distributed systems• DGSim: simulating multi-cluster grids

April 20, 202335

Alex Iosup, Ozan Sonmez, Nezih Yigitbasi, Hashim Mohamed, Dick Epema

Thank you! Questions? Observations?

More Information:• The Koala Grid Scheduler: www.st.ewi.tudelft.nl/koala

• The Grid Workloads Archive: gwa.ewi.tudelft.nl

• The Failure Trace Archive: fta.inria.fr

• The DGSim simulator: www.pds.ewi.tudelft.nl/~iosup/dgsim.php

• The GrenchMark perf. eval. tool: grenchmark.st.ewi.tudelft.nl

• Cloud research: www.st.ewi.tudelft.nl/~iosup/research_cloud.html

• Gaming research: www.st.ewi.tudelft.nl/~iosup/research_gaming.html

• see PDS publication database at: www.pds.twi.tudelft.nl/

email: [email protected]

Big thanks to our collaborators: U. Wisc.-Madison, U Chicago, U Dortmund, U Innsbruck, LRI/INRIA Paris, INRIA Grenoble, U Leiden, Politehnica University of Bucharest, Technion, …

DGSim

Documents

Grid Computing: From Old Traces to New Applications