Fully Hierarchical Scheduling: Paving the Way to

Preview:

Citation preview

Schedulers’LimitationsWorkload System’s

Workaround SideEffectsMaxnumberofjobs Throttlesubmissions DecreasedjobthroughputLimitedjob throughput Aggregatejobs Increased workloadruntimeLackofjob/ensemblestatus &controlAPI

Trackindividual job’sstatus throughfiles I/Obottleneck

Lackofprogrammablefailuredetection Inspect failuresmanually Unnecessary job

resubmissions

FullyHierarchicalScheduling:PavingtheWaytoExascale Workloads

Motivation• EmergingHPCworkloadsrepresentanorderofmagnitudeincreaseinbothscaleandcomplexity;yetbatchschedulingremainsstuckinthedecades-old,centralizedschedulingmodel

Thefullyhierarchicalschedulingmodelanditsimplementation,Flux,providegeneralsolutionstotheselimitations

FutureWork

FromtheTopTenExascale ResearchChallenges.DOEASCACSubcommitteeReport.2014.

ReferencesandAcknowledgements[1]D.Ahn,et.al.Flux:ANext-generationResourceManagementFrameworkforLargeHPCCenters.InICCPW’14.[2]J.Gyllenhaal,et.al.EnablingHighJobThroughputforUncertaintyQuantificationonBG/Q.InScicomP’14.[3]T.Dahlgren,et.al.ScalingUncertaintyQuantificationStudiestoMillionsofJobs.InSC’15.• TheauthorsacknowledgetheadviceandsupportfromtheFluxteamandothersatLLNL:TomScogland,JimGarlick,MarkGrondona,BeckySpringmeyer,ChrisMorrone,andAlChu.

ThisworkwasperformedundertheauspicesoftheU.S.DepartmentofEnergybyLawrenceLivermoreNationalLaboratoryunderContractDE-AC52-07NA27344.National Science FoundationCCF#1318445.ThisresearchwassupportedbytheExascale ComputingProject:17-SC-20-SC

StephenHerbein1,2,Tapasya Patki2,DongH.Ahn2,DonLipari2,TamaraDahlgren2,DavidDomyancic2,MichelaTaufer1

1UniversityofDelaware,2LawrenceLivermoreNationalLaboratory

LLNL-POST-735166

Schedulers’LimitationsonNumberofJobsCentralizedModel• Exhaustslocalresourceswhenhandlinglargenumbersofjobs[2]

FullyHierarchicalModel• Distributeslocalresourcerequirementsacrossmultipleschedulers

• HPCbatchschedulershaveseverallimitationswithrespecttotheseemergingworkloads,whichhas ledtoaproliferationofworkflowsystemsthatprovidespecializedworkarounds[2,3]

• IntegrateFlux’sjob/ensemblestatus&controlAPIintotheUQPtosimplifythesubmission/trackingofthejobensembleswhilealsoeliminatingtheI/Obottleneck

• DevelopaprogrammablefailuredetectionmechanismwithinFluxtoreduceunnecessaryresubmissionsandsimplifyerrorhandlingforusers

“Itiswidelyexpectedthatrigorousuncertaintyquantificationoverhigh-dimensionalinputspaceswillplayacrucialroleinenablingextreme-scalescience.Indeed,athousand-foldincreaseincomputingpowerwouldfacilitateorders-of-magnitudemoresimulation realizations”

Movingfromthecentralizedtothefullyhierarchicalmodelincreasesthescheduler’sjobscalabilityby133x

CaseStudy:SyntheticStressTest CaseStudy:UQPWorkload

H H H H H H H H HH

H

H

H H H H H H H H HH

H

H

H H H H H H H H HH

H

H

1 2 4 8 16 32 64 128

256

512

1024

2048

4096

8192

16384

32768

65536

131072

262144

524288

1048576

1

10

100

1000

10000

H node

J J JJ J J J J J

J

J

J

J J J J J J J JJJ

J

J

J J J J J J J J JJ

J

J

1 2 4 8 16 32 64 128

256

512

1024

2048

4096

8192

16384

32768

65536

131072

262144

524288

1048576

1

10

100

1000

10000

J allocation

B B B B B B B BBB

BB

B B B B B B BBBB

B

B

B B B B B B B BBB

BB

1 2 4 8 16 32 64 128

256

512

1024

2048

4096

8192

16384

32768

65536

131072

262144

524288

1048576

1

10

100

1000

10000

Mak

espa

n (s

ec)

B coreCentricity:

Number of jobs (total)

133x

Lowerisbetter

Largerisbetter

SchedulingModel

Wor

kloa

d R

untim

e (s

ec)

Schedulers’LimitationsonJobThroughputCentralizedModel• TaskswithinanaggregatedUQPjobarerunserially,increasingtheworkload’sruntime

FullyHierarchicalModel• Aggregatedjobismanagedbyitsownfull-featuredscheduler,allowingtaskstoberunconcurrently

MovingfromthecentralizedschedulerSlurmtoafullyhierarchicalschedulerFluxresultsina37%fasterworkloadruntime

Lowerisbetter 37%

Schedulers’LimitedJobEnsembleSupportCentralizedModel• Submitandtrackeachjobindividually

FullyHierarchicalModel• SubmithierarchiesofjobsandtrackthematvariablelevelsofgranularitythroughanAPI

UQPStartupJobSubmissionFileCreationFileAccess

Non-I/O

RuntimeStages

Slurm(CentralizedScheduler)• LimitedAPIforjobstatus• UQPtracksjobstatesthrough

files,creatinganI/ObottleneckFlux(FullyHierarchicalScheduler)• Providesasubscription-based

jobstatusAPI,eliminatingI/O

Fluxandthefullyhierarchicalmodelsimplifiesthesubmissionandtrackingofjobensembles

andthuseliminatestheneedfortheUQP’sI/O

• Studyconfiguration:allthreeschedulermodelsevaluatedona32nodeclusterwithasyntheticworkloadofdummyjobs

• Studyconfiguration:UQPrunswithSlurm andFluxevaluatedona16nodeclusterwithaworkloadofasingle-coreMonteCarloapplication[3]

FullyHierarchicalSchedulingUnderFlux• NewHPCschedulingmodelaimedataddressingnext-generationschedulingchallengesusingonecommonresourceandjobmanagementframeworkatbothsystemandapplicationlevels [1]

• Appliesadivide-and-conquerapproachtoscheduling,allowingforthedistributionofschedulingworkacrossanarbitrarilydeephierarchyofschedulers

UncertaintyQuantificationPipeline(UQP)

Slurm Slurm +Moab Flux

Scalability

Centralized

Scheduler

Nodes

A B C

DEF

Low

JobQueueA B C D E F…

Medium

LimitedHierarchical

GlobalSched

JobQueue

Nodes

Sched3Sched1 Sched2

AB

EF

CD

A B C D E F…

FullyHierarchical

High

GlobalSched

Nodes

A EC

Sched2Sched1

DB F

Sched1.1 Sched1.1 Sched2.1 Sched2.2

JobQueueA B C D E F…

• Accountsforroughly50.9millionCPUhourseachyearatLLNL

• Simplifiesperforminguncertaintyquantificationstudies

• Requiresrunninganensembleofsimulationscontaininganywherebetween1,000and100,000,000jobs

• ProvidesworkaroundsforexistingHPCschedulers’limitations• Workaroundsresultindecreasedjob

throughputandanI/Obottleneck

Training Data

Ensemble Generation

Ensemble Execution

Surrogate Model Construction

Sensitivity and UQ Analysis

Input parameters,run environments

High fidelitysurrogate(s)

UQ configuration,Simulation files

ThefullyhierarchicalschedulerhandlesallofthechallengesencounteredbytheUQP

ExampleUQPWorkflow

Recommended