26
Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny WiND and Condor Projects 6 May 2003 Pipeline and Batch Sharing in Grid Workloads

Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau, and Miron Livny WiND and Condor Projects 6 May 2003 Pipeline and Batch Sharing in

Embed Size (px)

Citation preview

Douglas Thain, John Bent, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau,

and Miron LivnyWiND and Condor Projects

6 May 2003

Pipeline and Batch Sharing

in Grid Workloads

www.cs.wisc.edu/condor

Goals

› Study diverse range of scientific apps Measure CPU, memory and I/O demands

› Understand relationships btwn apps Focus is on I/O sharing

www.cs.wisc.edu/condor

Batch-Pipelined workloads

› Behavior of single applications has been well studied sequential and parallel

› But many apps are not run in isolation End result is product of a group of apps Commonly found in batch systems Run 100s or 1000s of times

› Key is sharing behavior btwn apps

www.cs.wisc.edu/condor

Batch-Pipelined Sharing

Pip

elin

e

Batch width

Shared dataset

Pipeline sharing

Shared dataset

www.cs.wisc.edu/condor

3 types of I/O

› Endpoint: unique input and output

› Pipeline: ephemeral data

› Batch: shared input data

www.cs.wisc.edu/condor

Outline

› Goals and intro

› Applications

› Methodology

› Results

› Implications

www.cs.wisc.edu/condor

Six (plus one) target scientific applications

› BLAST - biology

› IBIS - ecology

› CMS - physics

› Hartree-Fock - chemistry

› Nautilus - molecular dynamics

› AMANDA -astrophysics

› SETI@home - astronomy

www.cs.wisc.edu/condor

Common characteristics

› Diamond-shaped storage profile

› Multi-level working sets logical collection may be greater than

that used by app

› Significant data sharing

› Commonly submitted in large batches

www.cs.wisc.edu/condor

BLAST

search string

blastp

matches

genomic database

BLAST searches for matching proteins and nucleotides in a genomic database. Has only a single executable and thus no pipeline sharing.

www.cs.wisc.edu/condor

IBIS

inputs

analyze

forecast

climate data

IBIS is a global-scale simulation of earth’s climate used to study effects of human activity (e.g. global warming). Only one app thus no pipeline sharing.

www.cs.wisc.edu/condor

CMSconfiguration

cmkin

raw events

geometry

CMS is a two stage pipeline in which the first stage models accelerated particles and the second simulates the response of a detector. This is actually just the first half of a bigger pipeline.

cmsim

triggered events

configuration

www.cs.wisc.edu/condor

Hartree-Fockproblem

setup

initial state HF is a three stage simulation of the non-relativistic interactions between atomic nuclei and electrons. Aside from the executable files, HF has no batch sharing.

argos

integral

scf

solutions

www.cs.wisc.edu/condor

Nautilusinitial state

nautilus

intermediate Nautilus is a three stage pipeline which solves Newton’s equation for each molecular particle in a three-dimensional space. The physics which govern molecular interactions is expressed in a shared dataset. The first stage is often repeated multiple times.

bin2coord

coordinates

rasmol

visualization

physics

www.cs.wisc.edu/condor

AMANDAinputs

corsika

raw eventsAMANDA is a four stage astrophysics pipeline designed to observe cosmic events such as gamma-ray bursts. The first stage simulates neutrino production and the creation of muon showers. The second transforms into a standard format and the third and fourth stages follow the muons’ paths through earth and ice.

corama

standard events

mmc

noisy events

physics

mmc

triggered events

ice tables

geometry

www.cs.wisc.edu/condor

SETI@home

work unit

setiathome

analysis

SETI@home is a single stage pipeline which downloads a work unit of radio telescope “noise” and analyzes it for any possible signs that would indicate extraterrestrial intelligent life. Has no batch data but does have pipeline data as it performs its own checkpointing.

www.cs.wisc.edu/condor

Methodology

› CPU behavior tracked with HW counters

› Memory tracked with usage statistics

› I/O behavior tracked with interposition mmap was a little tricky

› Data collection was easy. Running the apps was challenge.

www.cs.wisc.edu/condor

Resources Consumed

0

5

10

15

20

25

30

SETI

BLAST

IBIS

CMS HF

Nautilu

s

AMANDA

Tim

e (

ho

urs

)

0

1000

2000

3000

4000

5000

Me

mo

ry,

I/O (

MB

)

Real time

Memory

I/O

•Relatively modest. Max BW is 7 MB/s for HF.

www.cs.wisc.edu/condor

I/O Mix

0.1

1

10

100

1000

10000

Tra

ffic

(M

B)

Endpoint

Pipeline

Batch

•Only IBIS has significant ratio of endpoint I/O.

www.cs.wisc.edu/condor

Observations about individual applications

› Modest buffer cache sizes sufficient Max is AMANDA, needs 500 MB

› Large proportion of random access IBIS, CMS close to 100%, HF ~ 80%

› Amdahl and Gray balances skewed Drastically overprovisioned in terms of

I/O bandwidth and memory capacity

www.cs.wisc.edu/condor

Observations about workloads

› These apps are NOT run in isolation Submitted in batches of 100s to

1000s

› Large degree of I/O sharing Significant scalability implications

www.cs.wisc.edu/condor

Scalability of batch width

Storage center (1500 MB/s)

Commodity disk (15 MB/s)

www.cs.wisc.edu/condor

Batch elimination

Storage center (1500 MB/s)

Commodity disk (15 MB/s)

www.cs.wisc.edu/condor

Pipeline elimination

Storage center (1500 MB/s)

Commodity disk (15 MB/s)

www.cs.wisc.edu/condor

Endpoint only

Storage center (1500 MB/s)

Commodity disk (15 MB/s)

www.cs.wisc.edu/condor

Conclusions

› Grid applications do not run in isolation

› Relationships btwn apps must be understood

› Scalability depends on semantic information Relationships between apps Understanding different types of I/O

www.cs.wisc.edu/condor

Questions?

› For more information:• Douglas Thain, John Bent, Andrea Arpaci-

Dusseau, Remzi Arpaci-Dusseau and Miron Livny, Pipeline and Batch Sharing in Grid Workloads, in Proceedings of High Performance Distributed Computing (HPDC-12).

– http://www.cs.wisc.edu/condor/doc/profiling.pdf – http://www.cs.wisc.edu/condor/doc/profiling.ps