Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Schedulers’LimitationsWorkload System’s
Workaround SideEffectsMaxnumberofjobs Throttlesubmissions DecreasedjobthroughputLimitedjob throughput Aggregatejobs Increased workloadruntimeLackofjob/ensemblestatus &controlAPI
Trackindividual job’sstatus throughfiles I/Obottleneck
Lackofprogrammablefailuredetection Inspect failuresmanually Unnecessary job
resubmissions
FullyHierarchicalScheduling:PavingtheWaytoExascale Workloads
Motivation• EmergingHPCworkloadsrepresentanorderofmagnitudeincreaseinbothscaleandcomplexity;yetbatchschedulingremainsstuckinthedecades-old,centralizedschedulingmodel
Thefullyhierarchicalschedulingmodelanditsimplementation,Flux,providegeneralsolutionstotheselimitations
FutureWork
FromtheTopTenExascale ResearchChallenges.DOEASCACSubcommitteeReport.2014.
ReferencesandAcknowledgements[1]D.Ahn,et.al.Flux:ANext-generationResourceManagementFrameworkforLargeHPCCenters.InICCPW’14.[2]J.Gyllenhaal,et.al.EnablingHighJobThroughputforUncertaintyQuantificationonBG/Q.InScicomP’14.[3]T.Dahlgren,et.al.ScalingUncertaintyQuantificationStudiestoMillionsofJobs.InSC’15.• TheauthorsacknowledgetheadviceandsupportfromtheFluxteamandothersatLLNL:TomScogland,JimGarlick,MarkGrondona,BeckySpringmeyer,ChrisMorrone,andAlChu.
ThisworkwasperformedundertheauspicesoftheU.S.DepartmentofEnergybyLawrenceLivermoreNationalLaboratoryunderContractDE-AC52-07NA27344.National Science FoundationCCF#1318445.ThisresearchwassupportedbytheExascale ComputingProject:17-SC-20-SC
StephenHerbein1,2,Tapasya Patki2,DongH.Ahn2,DonLipari2,TamaraDahlgren2,DavidDomyancic2,MichelaTaufer1
1UniversityofDelaware,2LawrenceLivermoreNationalLaboratory
LLNL-POST-735166
Schedulers’LimitationsonNumberofJobsCentralizedModel• Exhaustslocalresourceswhenhandlinglargenumbersofjobs[2]
FullyHierarchicalModel• Distributeslocalresourcerequirementsacrossmultipleschedulers
• HPCbatchschedulershaveseverallimitationswithrespecttotheseemergingworkloads,whichhas ledtoaproliferationofworkflowsystemsthatprovidespecializedworkarounds[2,3]
• IntegrateFlux’sjob/ensemblestatus&controlAPIintotheUQPtosimplifythesubmission/trackingofthejobensembleswhilealsoeliminatingtheI/Obottleneck
• DevelopaprogrammablefailuredetectionmechanismwithinFluxtoreduceunnecessaryresubmissionsandsimplifyerrorhandlingforusers
“Itiswidelyexpectedthatrigorousuncertaintyquantificationoverhigh-dimensionalinputspaceswillplayacrucialroleinenablingextreme-scalescience.Indeed,athousand-foldincreaseincomputingpowerwouldfacilitateorders-of-magnitudemoresimulation realizations”
Movingfromthecentralizedtothefullyhierarchicalmodelincreasesthescheduler’sjobscalabilityby133x
CaseStudy:SyntheticStressTest CaseStudy:UQPWorkload
H H H H H H H H HH
H
H
H H H H H H H H HH
H
H
H H H H H H H H HH
H
H
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
1
10
100
1000
10000
H node
J J JJ J J J J J
J
J
J
J J J J J J J JJJ
J
J
J J J J J J J J JJ
J
J
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
1
10
100
1000
10000
J allocation
B B B B B B B BBB
BB
B B B B B B BBBB
B
B
B B B B B B B BBB
BB
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576
1
10
100
1000
10000
Mak
espa
n (s
ec)
B coreCentricity:
Number of jobs (total)
133x
Lowerisbetter
Largerisbetter
SchedulingModel
Wor
kloa
d R
untim
e (s
ec)
Schedulers’LimitationsonJobThroughputCentralizedModel• TaskswithinanaggregatedUQPjobarerunserially,increasingtheworkload’sruntime
FullyHierarchicalModel• Aggregatedjobismanagedbyitsownfull-featuredscheduler,allowingtaskstoberunconcurrently
MovingfromthecentralizedschedulerSlurmtoafullyhierarchicalschedulerFluxresultsina37%fasterworkloadruntime
Lowerisbetter 37%
Schedulers’LimitedJobEnsembleSupportCentralizedModel• Submitandtrackeachjobindividually
FullyHierarchicalModel• SubmithierarchiesofjobsandtrackthematvariablelevelsofgranularitythroughanAPI
UQPStartupJobSubmissionFileCreationFileAccess
Non-I/O
RuntimeStages
Slurm(CentralizedScheduler)• LimitedAPIforjobstatus• UQPtracksjobstatesthrough
files,creatinganI/ObottleneckFlux(FullyHierarchicalScheduler)• Providesasubscription-based
jobstatusAPI,eliminatingI/O
Fluxandthefullyhierarchicalmodelsimplifiesthesubmissionandtrackingofjobensembles
andthuseliminatestheneedfortheUQP’sI/O
• Studyconfiguration:allthreeschedulermodelsevaluatedona32nodeclusterwithasyntheticworkloadofdummyjobs
• Studyconfiguration:UQPrunswithSlurm andFluxevaluatedona16nodeclusterwithaworkloadofasingle-coreMonteCarloapplication[3]
FullyHierarchicalSchedulingUnderFlux• NewHPCschedulingmodelaimedataddressingnext-generationschedulingchallengesusingonecommonresourceandjobmanagementframeworkatbothsystemandapplicationlevels [1]
• Appliesadivide-and-conquerapproachtoscheduling,allowingforthedistributionofschedulingworkacrossanarbitrarilydeephierarchyofschedulers
UncertaintyQuantificationPipeline(UQP)
Slurm Slurm +Moab Flux
Scalability
Centralized
Scheduler
Nodes
A B C
DEF
Low
JobQueueA B C D E F…
Medium
LimitedHierarchical
GlobalSched
JobQueue
Nodes
Sched3Sched1 Sched2
AB
EF
CD
A B C D E F…
FullyHierarchical
High
GlobalSched
Nodes
A EC
Sched2Sched1
DB F
Sched1.1 Sched1.1 Sched2.1 Sched2.2
JobQueueA B C D E F…
• Accountsforroughly50.9millionCPUhourseachyearatLLNL
• Simplifiesperforminguncertaintyquantificationstudies
• Requiresrunninganensembleofsimulationscontaininganywherebetween1,000and100,000,000jobs
• ProvidesworkaroundsforexistingHPCschedulers’limitations• Workaroundsresultindecreasedjob
throughputandanI/Obottleneck
Training Data
Ensemble Generation
Ensemble Execution
Surrogate Model Construction
Sensitivity and UQ Analysis
Input parameters,run environments
High fidelitysurrogate(s)
UQ configuration,Simulation files
ThefullyhierarchicalschedulerhandlesallofthechallengesencounteredbytheUQP
ExampleUQPWorkflow