Upload
nicholas-burke
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Why static is bad!
HadoopHadoop
PregelPregel
MPIMPIShared cluster
Today: static partitioning
Want dynamic sharing
Comparing Sharing Frameworks: choice• Choice of resources• Can a framework pick between all resources?• A predefined subset?• Or a random chosen subset?
• Why important?• Policies may need to be global --localization• If you can preempt you can get your
preference
Comparing Sharing Frameworks: Interference• Can frameworks tray to use the
same machines?• Can a framework pick between all resources?
• How to avoid this?• Offer resources to machines one at a time• Statically partition• Offer in parallel and arbitrate when conflict
arises.
Comparing Sharing Frameworks: Granularity• Allocation Granularity• MPI tasks: gang-schedule, job can’t run until all slots are
acquired.• Hadoop: elastic, job can start running when it allocates a few
slots
• Why important?• If gang-scheduling, then the framework will hoard until it gets
all the slots it needs. • The cluster may or may not be underutilized.
• Cluster-wide behaviors
Mesos
Other Benefits of MesosRun multiple instances of the same framework
»Isolate production and experimental jobs»Run multiple versions of the framework
concurrently
Build specialized frameworks targeting particular problem domains
»Better performance than general-purpose abstractions
GoalsHigh utilization of resources
Support diverse frameworks (current & future)
Scalability to 10,000’s of nodes
Reliability in face of failuresResulting design: Small microkernel-like core that pushes scheduling logic to frameworks
Design ElementsFine-grained sharing:
»Allocation at the level of tasks within a job»Improves utilization, latency, and data
locality
Resource offers:»Simple, scalable application-controlled
scheduling mechanism
Element 1: Fine-Grained Sharing
Framework 1Framework 1
Framework 2Framework 2
Framework 3Framework 3
Coarse-Grained Sharing (HPC):
Fine-Grained Sharing (Mesos):
+ Improved utilization, responsiveness, data locality
Storage System (e.g. HDFS) Storage System (e.g. HDFS)
Fw. 1Fw. 1
Fw. 1Fw. 1Fw. 3Fw. 3
Fw. 3Fw. 3 Fw. 2Fw. 2Fw. 2Fw. 2
Fw. 2Fw. 2
Fw. 1Fw. 1
Fw. 3Fw. 3
Fw. 2Fw. 2Fw. 3Fw. 3
Fw. 1Fw. 1
Fw. 1Fw. 1 Fw. 2Fw. 2Fw. 2Fw. 2
Fw. 1Fw. 1
Fw. 3Fw. 3 Fw. 3Fw. 3
Fw. 3Fw. 3
Fw. 2Fw. 2
Fw. 2Fw. 2
Element 2: Resource OffersOption: Global scheduler
»Frameworks express needs in a specification language, global scheduler matches them to resources
+ Can make optimal decisions– Complex: language must support all framework needs
– Difficult to scale and to make robust– Future frameworks may have unanticipated needs
Element 2: Resource OffersMesos: Resource offers
»Offer available resources to frameworks, let them pick which resources to use and which tasks to launch
+ Keeps Mesos simple, lets it support future frameworks
- Decentralized decisions might not be optimal
Mesos ArchitectureMPI jobMPI job
MPI scheduler
MPI scheduler
Hadoop job
Hadoop job
Hadoop schedulerHadoop
scheduler
Allocation
module
Mesosmaster
Mesos slaveMesos slaveMPI
executor
Mesos slaveMesos slave
MPI executo
rtasktask
Resource offer
Resource offer
Pick framework to offer
resources to
Pick framework to offer
resources to
Mesos ArchitectureMPI jobMPI job
MPI scheduler
MPI scheduler
Hadoop job
Hadoop job
Hadoop schedulerHadoop
scheduler
Allocation
module
Mesosmaster
Mesos slaveMesos slaveMPI
executor
Mesos slaveMesos slave
MPI executo
rtasktask
Pick framework to offer
resources to
Pick framework to offer
resources toResource offer
Resource offer
Resource offer = list of (node, availableResources)
E.g. { (node1, <2 CPUs, 4 GB>), (node2, <3 CPUs, 2 GB>) }
Resource offer = list of (node, availableResources)
E.g. { (node1, <2 CPUs, 4 GB>), (node2, <3 CPUs, 2 GB>) }
Mesos ArchitectureMPI jobMPI job
MPI scheduler
MPI scheduler
Hadoop job
Hadoop job
Hadoop schedulerHadoop
scheduler
Allocation
module
Mesosmaster
Mesos slaveMesos slaveMPI
executor
Hadoop executo
r
Hadoop executo
r
Mesos slaveMesos slave
MPI executo
rtasktask
Pick framework to offer
resources to
Pick framework to offer
resources to
task
Framework-specific
scheduling
Framework-specific
scheduling
Resource offer
Resource offer
Launches and isolates
executors
Launches and isolates
executors
Drawbacks• Poor fairness• Jobs with long tasks can dominate• There is NO preemption!!
• Sticky slots• Jobs with higher priority can dominate a set of preferred slots• Mesos uses lottery scheduling, probability of being offered a slot is
proportional to the frameworks priority
• Head of line blocking• Mesos offers resources one framework at a time
• Prevents frameworks from trying to use the same slots• Based on assumptions: scheduling decisions are quick, • Mesos revokes offers if a schedules takes too long• Essentially leads to a queue
Omega
Omega• Scales• Central layer only does optimistic conflict
resolution• No head of Line blocking
• Allows for flexible and evolvable scheduling• Framework can implement any arbitrary form of
scheduling• Each framework has global view• Frameworks can preempt each other
Comparing Sharing Frameworks• Choice of resources
• Interference
• Allocation Granularity
• Cluster-wide behaviors
Comparing Frameworks