Scheduling Jobs With Dependenciessamir/DCscheduling18/slides...Robert Grandl, Srikanth Kandula,...

Preview:

Citation preview

Scheduling Jobs With Dependencies: New Applications, Classic Problems

JanardhanKulkarni,MicrosoftResearch,Redmond.

31July2018,TTI,ChicagoTTICSUMMERWORKSHOP:DATACENTERSCHEDULINGFROMTHEORYTOPRACTICE

Roadmap

Ø  Whichtheorymodelsaremoreclosertodata-centersettings.

SrikanthKandulaRatulMahajanAmarPhanishayeeMoniaGhobadi

Ø Focusonalgorithms Evencomplexalgorithmscanhavealgorithmicintuitionswhichareusefulinpractice.

Ø OneexampleOnesystemheuristicandonecomplexprovablealgorithm(UsingLPHierarchies)thathasgoodheuristicvalue.

LuleåFBDataCenter,SouthofArticCircle

LuleåFBDataCenter,SouthofArticCircle

Itisbeautifullikethisfor3days…..

LuleåFBDataCenter,SouthofArticCircle

cold,cold,place…

5%“aslargeascities”

Efficiency Matters a Lot

Efficiency Matters a Lot:

“aslargeascities”Emphasis on Principled Algorithms

Cost

Time

Simpleheuristics

TheoreticallySoundAlgorithms

Simplicityisnoteverything!

How we Measure Efficiency

Ø Makespan

Minimizingthemaximumcompletiontimeamongasetofjobs.Lengthoftheschedule.

Ø  Average(ortotal)Flow-time(aka,JobCompletion-time)

•  sameasresponsetime•  measuresthetimeajobspendsinasystem

Fj = Cj � rj

How we Measure Efficiency

Ø Makespan

Minimizingthemaximumcompletiontimeamongasetofjobs.Lengthoftheschedule.

Ø  Average(ortotal)Flow-time(aka,JobCompletion-time)

•  sameasresponsetime•  measuresthetimeajobspendsinasystem

Fj = Cj � rj

Throughput,energy,fairness,utilization,etc..

Challenges of Data Center Scheduling

ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)

JobsComplexdependencies:DAGs,Co-flows,etc.

Algorithms

Fast,simple,oftenonline.

Challenges of Data Center Scheduling

ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)

JobsComplexdependencies:DAGs,Co-flows,etc.

Algorithms

Fast,simple,oftenonline.

Richtheorywithmanynicealgorithmswhenjobshavesimplestructures.

Scheduling on Heterogeneous Clusters

Ø SpecialpurposehardwareØ Datalocality

Ø Geographiclocation

Ø Privacyconcerns

Whyareclustersheterogeneous?

Scheduling on Heterogeneous Clusters

1000

100

300

jobsrunfasteronsomeclustersandsloweronothersModeling Heterogeneity

Jobsarriveovertime

jobs

machines

1151000…..10

661005…..98

11588…..13

1007889…..13

Modeling Heterogeneity jobsrunfasteronsomeclustersandsloweronothers

Jobsarriveovertime

jobs

machines

1151000…..10

661005…..98

11588…..13

1007889…..13

Heterogeneous == “Unrelated Machines Scheduling”

Assign(match)jobstoclusters+scheduletoMinimizeQoS.

Beautiful Algorithms For Unrelated Machines Scheduling Problems

MakespanFlow-timeEnergyLST’87 CGK’09 AGK’12 ST’89 AGK’12 KLS’10

Svensson’12 BK’15IKMP’14AAFPW’97 IKMP’14P’07

KD’18 A’06

Offline,Online,Multidimensional,Clairvoyant,Non-Clairvoyant,Stochastic,Truthfulness…`

Hasleadtodevelopmentofveryniceideas:Useofvertexsolutionsanddualityindesignofalgorithms,configurationLPs,potentialfunctions,connectionstogametheoreticideas…

Beautiful Algorithms For Unrelated Machines Scheduling Problems

MakespanFlow-timeEnergyLST’87 CGK’09 AGK’12 ST’89 AGK’12 KLS’10

Svensson’12 BK’15IKMP’14AAFPW’97 IKMP’14P’07

KD’18 A’06

Offline,Online,Multidimensional,Clairvoyant,Non-Clairvoyant,Stochastic,Truthfulness…`

Hasleadtodevelopmentofveryniceideas:Useofvertexsolutionsanddualityindesignofalgorithms,configurationLPs,potentialfunctions,connectionstogametheoreticideas…

RESEARCHDIRECTION:FewMachinetypes:Canwegetbetteralgorithmsforsomeclassicunrelated

machinesscheduling?

Challenges of Data Center Scheduling

ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)

JobsComplexdependencies:DAGs,Co-flows,etc.

Algorithms

Fast,simple,oftenonline.

The plan

GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.

OneHeuristic

RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.

OneComplexTheoreticalFramework

Verygeneral,workswellinpractice,asbadasanyotheralgorithmonpaperJ

LeveyandRothvoss’16.Garg,Kulkarni,Li’18.Garg,Kukarni,Li’18.

Veryspecific,provable,andquitecomplex.

The plan

GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.

OneHeuristic

RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.

OneComplexTheoreticalFramework

LeveyandRothvoss’16.Garg,Kulkarni,Li’18.Garg,Kukarni,Li’18.

Oneofthebiggesthammersinapproximationalgorithms.“LiftandProject”

Verygeneral,workswellinpractice,asbadasanyotheralgorithmonpaperJ

Veryspecific,provable,andquitecomplex.

The plan

GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.

OneHeuristic

RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.

OneComplexTheoreticalFramework

LeveyandRothvoss’16.Garg,Kulkarni,Li’18.Garg,Kukarni,Li’18.

Verygeneral,workswellinpractice,asbadasanyotheralgorithmonpaperJ

Veryspecific,provable,andquitecomplex.

ADirectedAcyclicGraph(DAG)SchedulingProblem inLargeClusters

GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.

RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.

DAG Model Supported in Hadoop

Multidimensionality

Heterogeneityofclusters

Resourcesofacluster

(1,1,1)Dtypesofresources

Cluster Scheduling

AsinglejobrepresentedasaDAG(task)

Resourcesofacluster

(1,1,1)Dtypesofresources

Cluster Scheduling

AsinglejobrepresentedasaDAG(task)

DemandVector (1,0,…,1/2)

(1/2,1/2,…,1/2)

(1/4,1,…,1/10)

Resourcesofacluster

(1,1,1)Dtypesofresources

Cluster Scheduling

AsinglejobrepresentedasaDAG(task)

DemandVector (1,0,…,1/2)

(1/2,1/2,…,1/2)

(1/4,1,…,1/10)

Processinglength(duration)

Cluster Scheduling: Minimize Makespan

AsinglejobrepresentedasaDAG

(1,0),2

(0,1),1

(1,1),1

(1,1),1(0,1),1

1 234567 1 234567

(1,1)

Cluster

Is There a Good Algorithm?

Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.

Theorem:BansalandKhot‘09.

Ø  Anynon-idlingalgorithmisequallygoodorequallybad!

Notausefulintuitionforsystemdesigners.

Is There a Good Algorithm?

Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.

Theorem:BansalandKhot‘09.

1 234567

OptimalAlgorithm:Doagreedyschedulerespectingprecedenceconstraints

Atleastoneresourceisused.Congestionforthatresourcedecreases.

Is There a Good Algorithm?

Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.

Theorem:BansalandKhot‘09.

1 234567

OptimalAlgorithm:Doagreedyschedulerespectingprecedenceconstraints

Atleastoneresourceisused.Congestionforthatresourcedecreases.

Is There a Good Algorithm?

Itisunlikely(UGC-hard)thatapolynomialtimealgorithmcanachievebetterthanDapproximationtotheDAGschedulingproblem.ThisholdsevenifalltasksoftheDAGhave1)samelength,2)requireexactlyoneresource.

Theorem:BansalandKhot‘09.

1 234567

OptimalAlgorithm:Doagreedyschedulerespectingprecedenceconstraints

Atleastoneresourceisused.Congestionforthatresourcedecreases.

When did System Designers Care for Lowerbounds?

GRAPHENE:PackingandDependency-AwareSchedulingforData-ParallelClusters.OSDI2016.

RobertGrandl,SrikanthKandula,SriramRao,AdityaAkella,JanardhanKulkarni.

² CouldfindalmostoptimalsolutionsonMSdatasets.² Improvesmakespanby30%atleastcomparedtosimplegreedyheuristics.

Intuition of Graphene

“pathologicallybadschedulesintoday’sapproachesmostlyariseduetotworeasons:(a)long-runningtaskshavenootherworktooverlapwiththem,whichreducesparallelism,and(b)thetasksthatarerunnabledonotpackwellwitheachother,whichincreasesresourcefragmentation.”

Whatgreedyalgorithmsmiss?(List-Scheduling,CriticalPath,etc)

Intuition of Graphene

MainSteps

Ø  Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.

Intuition of Graphene

MainSteps

Ø  Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø  Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.

Intuition of Graphene

MainSteps

Ø  Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø  Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø  Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.

Canweform

alizethisin

tuition?

A-ApproximationforMakespanSchedulingwithPrecedenceConstraintsusingLPHierarchies.(1 + ✏)

LeveyandRothvoss‘16

Identical Machines Scheduling

AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.

midenticalmachines(orCPUs)

MinimizeMakespan

(SpecialcaseofDAGscheduling)

Identical Machines Scheduling

AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.

midenticalmachines(orCPUs)

Identical Machines Scheduling

AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.

midenticalmachines(orCPUs)

Chainoflength4

Identical Machines Scheduling

AsingleDAG.Eachtaskneedstobescheduledonexactlyonemachine.Eachtaskneeds1unitofCPU.

midenticalmachines(orCPUs)

Chainoflength4

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

BADSLOTS GOODSLOTS

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

BADSLOTS GOODSLOTS+Makespan

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

BADSLOTS GOODSLOTS

LengthofLongestchain n/m+Makespan

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

BADSLOTS GOODSLOTS

LengthofLongestchain n/m+Makespan OPT � OPT �

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

BADSLOTS GOODSLOTS

LengthofLongestchain n/m+Makespan OPT � OPT �

Ø Optimaltheoretically.Butconveysverylittleinformationinpractice.

Ø  Doesnotworkwellinpracticewhentherearemorethanoneresourcetype.

Identical Machines Scheduling

Thereisaquasi-polynomialtimeapproximationforminimizingmakespanwhenjobshaveunitlengths,whennumberofmachinesisaconstant.

Theorem.LevyandRothvoss’16.

(1 + ✏)

Garg’17madeitstrictlyquasi-polynomialtime.

Identical Machines Scheduling

Thereisaquasi-polynomialtimeapproximationforminimizingmakespanwhenjobshavearbitrarylengths,whennumberofmachinesisaconstant.Thealgorithmschedulesjobsonasinglemachineandmaypreemptjobswithinamachine.

Theorem.Kulkarni,Li’18.

(1 + ✏)

Identical Machines Scheduling

Thereisapolynomialtimeoptimalapproximationforminimizingweightedcompletiontimeofjobs,whennumberofmachinesandjobsizesareuniform.

Theorem.Garg,Kulkarni,Li’18.

(2 + ✏)

Identical Machines Scheduling

GreedyorList-Schedulingis2approximationforminimizingmakespan.

Theorem.Graham1960.

BADSLOTS GOODSLOTS

LengthofLongestchain n/m+Makespan OPT � OPT �

Crucial Observation

BADSLOTS GOODSLOTS

LengthofLongestchain n/m+Makespan OPT � ✏ ·OPT

(1 + ✏) ·OPT

Crucial Observation

BADSLOTS GOODSLOTS

LengthofLongestchain n/m+Makespan OPT � ✏ ·OPT

(1 + ✏) ·OPT

troublesometasks

Howtoscheduletroublesometasks?

Framework

TimeInterval

T0 T1 T2 T3

Partitionthetasksintoasetofbottomtasksandasinglesetoftoptasks.Foreachsetofbottomtaskswefindasub-intervalwheretheyshouldbescheduled.

Thendoarecursiveschedulingofbottomtasks.

Framework

TimeInterval

T0 T1 T2 T3

Toptasks

BottomtasksBottomtasksBottomtasks

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsacrossbottomtasksareautomaticallysatisfied.

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

[T2, T3]

Foreverytaskinthesetoftoptaskswehavebasedonthetentativeassignmentofbottomjobs.

T0 T1 T2 T3

[rj , dj ]

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

Thereisenoughspacetoscheduletoptasks

T0 T1 T2 T3

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

Thereisenoughspacetoscheduletoptasksiftherearenoprecedenceconstraintsbetweentoptasks.

Foreverytaskinthesetoftoptaskswehavebasedonthetentativeassignmentofbottomjobs.

[rj , dj ]

T0 T1 T2 T3

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

Thereisenoughspacetoscheduletoptasksiftherearenoprecedenceconstraintsbetweentoptasks.

EDFwillschedulealltoptasksintheemptyspacebutmayviolatetheprecedenceconstraintsbetweentoptasks

Foreverytaskinthesetoftoptaskswehavebasedonthetentativeassignmentofbottomjobs.

[rj , dj ]

Intuition of Graphene

MainSteps

Ø  Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø  Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø  Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

[T2, T3]

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

[T2, T3]

Thechainlengthamongtoptasksisverysmall.

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

[T2, T3]

Thechainlengthamongtoptasksisverysmall.

Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

[T2, T3]

Thechainlengthamongtoptasksisverysmall.

Thealgorithmhasrecognizedacrudeschedulefortroublesometasks.That’swhychainlengthamongtoptasksissmall.

Intuition of Graphene

MainSteps

Ø  Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø  Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø  Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.

LR’16 Framework

TimeInterval

T0 T1 T2 T3

BottomtasksBottomtasksBottomtasks

Precedenceconstraintsgoingfrombottomtotoptasksareloose.

[T2, T3]

Thechainlengthamongtoptasksisverysmall.

Foreverytaskinthesetoftoptaskswehave

T0 T1 T2 T3

[rj , dj ]Precedenceconstraintsgoingfrombottomtotoptasksareloose.

Thereisenoughspacetoscheduletoptasksiftherearenoprecedenceconstraintsbetweentoptasks.

EDFwillschedulealltoptasksintheemptyspacebutmayviolatetheprecedenceconstraintsbetweentoptasks

allexceptfew

TimeInterval

T0 T1 T2 T3

Toptasks

BottomtasksBottomtasksBottomtasks

HowtopartitiontheDAG?

1.   Precedenceconstraintsbetweenbottomtasksshouldbeimplied.2.   Theprecedenceconstraintsbetweentopandbottomtasksareloose.3.   Thechainlengthamongtoptasksissmall.

Linear Programming Formulation

TX

t=1

xjt = 1

BinarysearchtheoptimalmakespanasT

Foreverytaskj

X

j

xjt m

isscheduled.

Fortimeslott hasatmostmjobs.

Forprecedencerelation issatisfiedateachtimestept.

xjt > 0Allvariables arenon-negative

i ! j,X

t0<t

xit0 �X

t0t

xjt0

LP Cheats…

2/3 1/3 2/3

Optimalmakespanis4butLPcancompletein3timeslots.

Time

DAG

LPcanscheduleajobfractionallyinatimeslot.

Interval of a task

Time

ConsidertheLPsolution.Intervalofataskissmallestintervalthatcontainsfractionalscheduleofthetask.

1/10 1/10 3/10 5/10

t1 t2

What LP gives? Anintervalforeachtask.

Time

What LP gives? Anintervalforeachtask.

Time

WeusetheseintervalstopartitiontheDAGintotopandbottomtasks.

Building Binary Tree

0 TLPSchedulesalltasksbetween [0, T ]

0 TT/2 T

2+ 1

0 T

4+ 1

T

4

T

2

Building Binary Tree LPSchedulesalltasksin[0, T ]

log T

[0, T ]

[0, T/2][T/2 + 1, T ]

Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.

Building Binary Tree LPSchedulesalltasksin[0, T ]

log T

[0, T ]

[0, T/2][T/2 + 1, T ]

Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.

Building Binary Tree LPSchedulesalltasksin[0, T ]

log T

[0, T ]

[0, T/2][T/2 + 1, T ]

Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.

Building Binary Tree LPSchedulesalltasksin[0, T ]

log T

[0, T ]

[0, T/2][T/2 + 1, T ]

Assigneachtasktothesmallestintervalnodeinthetreethatfullycontainsit.

Defining Top and Bottom Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

(log log T )2

Defining Top and Bottom Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

(log log T )2

ThrowThemAway!!

log log T

Defining Top and Bottom Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

(log log T )2

TopTasks

BottomTasksSets

ThrowThemAway!!

log log T

Defining Top and Bottom Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

(log log T )2

TopTasks

BottomTasksSets

ThrowThemAway!!

1.   Precedenceconstraintsbetweenbottomtasksshouldbeimplied.2.   Theprecedenceconstraintsbetweentopandbottomtasksareloose.3.   Thechainlengthamongtoptasksissmall.

Defining Top and Bottom Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

TopTasks

BottomTasksSets

ThrowThemAway!!

log log T

TimeInterval

0 T1 T2 T3

Precedenceconstraintsgoingfrombottomtotoptasksareloose. [T2, T3]

T4

Everytoptaskcanlooseoneintervaltotheleftandoneintervaltotherightintermsofspaceinwhichitshouldbescheduled.But,bottomintervalsaretinycomparedtotop,sothisisnotabigloss.

Toptasks

Bottomtasks

Bottomtasks

TimeInterval

0 T1 T2 T3

Precedenceconstraintsgoingfrombottomtotoptasksareloose. [T2, T3]

T4

Everytoptaskcanlooseoneintervaltotheleftandoneintervaltotherightintermsofspaceinwhichitshouldbescheduled.But,bottomintervalsaretinycomparedtotop,sothisisnotabigloss.

Toptasks

Bottomtasks

Bottomtasks

1.   Precedenceconstraintsbetweenbottomtasksshouldbeimplied.2.   Theprecedenceconstraintsbetweentopandbottomtasksareloose.3.   Thechainlengthamongtoptasksissmall.

Lift and Project Method (LP Hierarchies) Dimensions

NumberofvariablesinLPthatyouwantintegral

OriginalLP

Allthevariablesareintegral.

Asystematicwayofplacingtroublesometasks!

Lift and Project Method (LP Hierarchies) Dimensions

NumberofvariablesinLPthatyouwantintegral

OriginalLP

Allthevariablesareintegral.

RunningtimeIncreasesbyafactorofn.

O(nS)

Asystematicwayofplacingtroublesometasks!

Lift and Project Method (LP Hierarchies)

Time

1/10 1/10 3/10 5/10

t1 t2

“Conditioning”

Touchavariable,anditbecomesintegral!

Lift and Project Method (LP Hierarchies)

Time

1/10 1/10 3/10 5/10

t1 t2

“Conditioning”

Touchavariable,anditbecomesintegral!

Lift and Project Method (LP Hierarchies)

Time

10/10

“Conditioning”

Touchavariable,anditbecomesintegral!

Lift and Project Method (LP Hierarchies)

Time

10/10

t1 t2“Conditioning”

Touchavariable,anditbecomesintegral!

Lift and Project Method (LP Hierarchies)

Time

“Conditioning”

Touchavariable,anditbecomesintegral!

TheLPsolutionchangesinsuchawaythat,foreveryothertaskon,theintervalinwhichitisscheduledinthenewsolutiononlyshrinks.

Ihaveabetterunderstandingofwherethistaskgotscheduled.

Reducing Chain Length of Top Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.

Reducing Chain Length of Top Tasks

0 T✏T

TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.

Reducing Chain Length of Top Tasks

0 T✏Txjt > 0

TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.

Reducing Chain Length of Top Tasks

0 T✏Txjt > 0

TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.

Reducing Chain Length of Top Tasks

0 T✏T

Reducing Chain Length of Top Tasks [0, T ]

[0, T/2][T/2 + 1, T ]

TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.

Reducing Chain Length of Top Tasks

0 T✏T

Howmanyconditioningarerequired? m/✏Nowrecallthatnumberofintervalsintoptasksis 2(log logT )2 (log T )log log T

TheintervalisoflengthT.Wewillmakesurethatthereisnochainoflengthassignedtothisinterval.

Reducing Chain Length of Top Tasks

0 T✏T

Howmanyconditioningarerequired? m/✏Nowrecallthatnumberofintervalsintoptasksis 2(log logT )2 (log T )log log T

O(m/✏ · (log T )log log T )

Runningtime.

Thereisaquasi-polynomialtimeapproximationforminimizingmakespanwhenjobshavearbitrarylengths,whennumberofmachinesisaconstant.Thealgorithmschedulesjobsonasinglemachineandmaypreemptjobswithinamachine.

Theorem.Garg,Kulkarni,Li’18.

(1 + ✏)

Thereisapolynomialtimeoptimalapproximationforminimizingweightedcompletiontimeofjobs,whennumberofmachinesandjobsizesareuniform.

(2 + ✏)

Moresophisticateduseofconditioningandnewalgorithmsforschedulingtoptasks.

Ø  Ourapproachistoidentifythepotentiallytroublesometasks,suchasthosethatrunforaverylongtimeorarehardtopack.Ø  Placethetroublesometasksfirstontoavirtualresource-timespace.Thisspacewouldhaved+1dimensionswhentasksrequiredresources;thelastdimensionbeingtime.Ø  Ourintuitionisthatplacingthetroublesometasksfirstleadstoagoodschedulesincetheremainingtaskscanbeplacedintoresultantholesinthisspace.

IntuitionofGraphene

Ø  UsingLiftandProjecttofigureoutplacinglongtasks.Isthereasimple,sayDPapproachtoit?Ø  CanweuseLPsupportforplacingtasks?

Ø  CanrecursionhelpinGraphenesetting?

LiftandProjectAlgorithms

Big Picture

IdenticalMachinesSchedulingandTrainingNeuralNetworks

PipeDream:FastandEfficientPipelineParallelDNNTrainingAaronHarlap,DeepakNarayanan,AmarPhanishayee,VivekSeshadri,NikhilDevanur,GregGanger,PhilGibbons

Training Deep Learning Models

Ø  Largefractionofthedatacenterworkloadsformanycompanies.

Ø  Improvingtrainingtimeisconsideredveryimportant.

Ø  DAGsaregoodabstractionsofDNNtrainingcomputations.

Ø  ConnectionstoDAGschedulingandcommunicationdelayproblems.

Two Paradigms

DataParallelism

ModelParallelism

Model Parallelism

Ø Schedulethelayersamongasetofmachines.TypicallyIdentical.

Ø Oratmost2types:CPU+FPGA,CPU+GPUsetc.

Model Parallelism

Ø Schedulethelayersamongasetofmachines.TypicallyIdentical.

Ø Oratmost2types:CPU+FPGA,CPU+GPUsetc.

Ø Thereiscommunicationbetweenlayers.Communicationcostiscrucial.

Model Parallelism

Theseproblemsarequitesimilartoschedulingwithcommunicationdelays,whenthereareprecedenceconstraints.(PY’90,VLL’90,MH’95,HLV’94)Verypoorlyunderstood.

Goodschedulinghassameeffectascaching!

ZhichengYin,JinSun,MingLi,JaliyaEkanayake,HaiboLin,MarcFriedman,JoséA.Blakeley,ClemensA.Szyperski,NikhilR.Devanur.BubbleExecution:Resource-awareReliableAnalyticsatCloudScale.PVLDB11(7).

PipeDream:FastandEfficientPipelineParallelDNNTrainingAaronHarlap,DeepakNarayanan,AmarPhanishayee,VivekSeshadri,NikhilDevanur,GregGanger,PhilGibbons

Summary: Data Center Scheduling

ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)

JobsComplexdependencies:DAGs,Co-flows,etc.

Algorithms

Fast,simple,oftenonline.

Summary: Data Center Scheduling

ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)

JobsComplexdependencies:DAGs,Co-flows,etc.

Algorithms

Fast,simple,oftenonline.

Ø  Oftenhardinworstcase.What’stherightmodel?Ø  UnderstandDAGsthatariseinpractice.SayDNNs.Ø  Whatarethehigh-levelalgorithmicintuitions?

Summary: Data Center Scheduling

ResourcesHeterogeneous(FPGA+CPU,GPU+CPU)Multidimensional(CPU,memory,network)

JobsComplexdependencies:DAGs,Co-flows,etc.

Algorithms

Fast,simple,oftenonline.

Ø  Oftenhardinworstcase.What’stherightmodel?Ø  UnderstandDAGsthatariseinpractice.SayDNNs.Ø  Whatarethehigh-levelalgorithmicintuitions?

Recommended