Transcript
Page 1: Pegasus WMS - 情報処理学会sighpc.ipsj.or.jp/HPCAsia2018/poster/post102s2-file2.pdf · • Pegasus can easily scale both the size of the workflow, and the resources that the

•  Pegasusisasystemformappingandexecu4ngabstractapplica4onworkflowsoverarangeofexecu4onenvironments.

•  Thesameabstractworkflowcan,atdifferent4mes,bemappeddifferentexecu4onenvironmentssuchasXSEDE,OSG,commercialandacademicclouds,campusgrids,andclusters.

•  Pegasuscaneasilyscaleboththesizeoftheworkflow,andtheresourcesthattheworkflowisdistributedover.Pegasusrunsworkflowsrangingfromjustafewcomputa4onaltasksupto1million.

•  WorkflowsoJenconsumetensofthousandsofhoursofcomputa4onandinvolvetransferofmanyterabytesofdata.

•  WorkflowshaveaDAGmodel

•  AnodeintheDAGisstartedonlywhenalltheparentnodeshavesuccessfullyfinished.

Event-Based Triggering and Management of Scientific Workflow Ensembles Suraj Pandey1, Karan Vahi2, Rafael Ferreira da Silva2, Ewa Deelman2, Ming Jiang3, Albert Chu3, Cyrus Harrison3, Henri Casanova1

1University of Hawaii, 2USC Information Sciences Institute,

3Lawrence Livermore National Laboratory

h=ps://github.com/pegasus-isi/pegasus-llnl ProjectissupportedbytheDepartmentofEnergyundercontractDE-AC52-07NA27344andbyLLNLunderproject16-ERD-036.

PegasusiscurrentlyfundedbytheNaRonalScienceFoundaRon(NSF)undertheOACSI2-SSIgrant1664162.

PegasusWMSSystemArchitecture

ExperimentalSetupatLLNLCatalystCluster

APIs

Users

Interfaces

Pegasus Dashboard

OpenStack, Eucalyptus, Nimbus

Pegasus WMS

Mapper

Engine

Scheduler

Monitoring & Provenance

Workflow DB

Logs

Clouds

Not

ifica

tions

j1

j2

jn Job Queue

Cloudware

Amazon EC2, Google Cloud, RackSpace, Chameleon Compute

Amazon S3, Google Cloud Storage, OpenStack Storage

Distributed Resources

Campus Clusters

Local Clusters

Open Science Grid

XSEDE

HTCondor GRAM

PBS LSF SGE

Middleware C O M P U T

E

GridFTP

Storage

Other workflow composition tools:

Submit Host

HTTP

FTP SRM

IRODS SCP

ProblemsmappingcertaincomputaRonstoDAGworkflows•  Withpushtowardsextremescalecompu4ngitispossibleruntradi4onalHPCsimula4oncodessimultaneouslyontensofthousandsofcores.•  GenerateddataoJenneedstobeperiodicallyanalyzedusingBigDataAnaly4csframeworks.•  Integra4ngBigDataanaly4cswithHPCsimula4onsisamajorchallengeforthecurrentgenera4onofscien4ficworkflowmanagementsystems.•  Needanabilitytoautoma4callyspawnandmanagetheanalysisworkflows,asthelongrunningsimula4onworkflowexecutes.

EnsembleManager

SoluRon•  PegasushasanEnsembleManagerServicethatallowsusertosubmit

acollec4onofworkflowscalledensembles.•  WeextendedtheEnsembleManagertosupporteventtriggersthat

cantriggeraddi4onofnewworkflowstoanexis4ngensemble.•  Wesupportthefollowingtypesoftriggers

a)  Filebasedeventtriggers–afilegetsmodifiedb)  Directorybasedeventtriggers–filesappearinadirectory

•  Ini4ally,ensemblehasasingleworkflowconsis4ngofthelongrunningHPCsimula4onworkflow.•  TheHPCsimula4onworkflowperiodicallygeneratesoutputdatathat

inadirectorythatistrackedbytheensemblemanager.•  Anewanalysisworkflowislaunchedautoma4callyastheoutputdata

isdetected.

•  Reliablyandrepeatedlytestthatimplementedsolu4onworks.•  Testedtheimplementa4ononLLNLCatalystCluster(150teraFLOP/ssystemwith324nodes,

eachwith128GBofDRAMand800GBofnonvola4lememory).ExperimentalSetup1.  Oncatalyst,aMagpieSLURMjobissubmiaedthatdoes-Determineswhichnodeswillbe“master”nodes,“slave”nodes,orothertypesofnodes.-Setsup,configures,andstartsappropriateBigDatadaemonstorunontheallocatednodes.Inoursetup,weusedtheMagpieSPARKtemplatetosetupadynamicSparkcluster-Reasonablyop4mizesconfigura4onforthegivenclusterhardwarethatitisbeingrunon.Magpiethenexecutesauserspecifiedscripttogivecontrolbacktotheuser.2.  TheuserscriptsetsupPegasusWMSandstartstheEnsembleManager.3.  EnsembleManagersubmitstheHPCSimula4onWorkflowconsis4ngofLULESHapplica4on-Every10simula4oncyclesLULESHwritesoutoutputstoadirectoryonthesharedfilesystem.-Thisdirectoryistrackedbytheensemblemanageraspartoftheeventtriggerspecified.-EnsembleManagerinvokesascriptthatgeneratestheBigDataAnaly4csWorkflowonthenewlygenerateddatasets.

HPCSimula+onWorkflow

EnsembleManager

BigDataAnaly+csWorkflow

MagpieSetup

SubmitSlurmJob

RunMagpieDaemons

RunMagpieUserScript

PegasusWMSSetup

LaunchPegasus&Condor

LaunchEnsembleManager

ReadEventConfigura@on

TriggerEvent?

Generateandsubmitanaly@csworkflow

LULESHMPIJob

GenerateTriggerFile

CopyDataintoHDFS

RunSparkAnaly@cs

WorkflowEnsembleExecu:onTimeline

Recommended