Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
A Hybrid Scheduling Approach for
Scalable Heterogeneous Hadoop Systems
Authors:
Aysan Rasooli
Douglas G. Down
McMaster University
November 2012
Agenda
McMaster University 2
• Introducing the Hadoop System
• Heterogeneity and Scalability in Hadoop
• Performance Issues of Existing Hadoop Schedulers
• Proposed Hybrid Scheduling System
• Evaluation
• Conclusion
MapReduce
McMaster University 3
Hadoop System
McMaster University 4
Heterogeneity and Scalability in Hadoop
McMaster University 5
• Cluster
• Workload
• User
Hadoop Schedulers
McMaster University 6
• FIFO
• Fair Sharing
• COSHH
Fair Sharing
McMaster University 7
(Zaharia et al., 2010)
• Group jobs into “pools”
• Assign each pool a guaranteed minimum share
• Divide excess capacity evenly between pools
J1
Job Queue
J2
R4
R3
R2
R1
Fair Sharing
McMaster University 8
• Goal: fast response times for small jobs, guaranteed service
levels for long jobs
• Considers Minimum Share satisfaction, Fairness
Drawbacks:
• Does not take into account locality
• Does not take into account heterogeneity
Classes, Suggested Classes for all
Resources
Execution Time of the New Job on
all Resources
Hadoop System
Task Scheduling Process
Queuing Process
Routing Process
Heartbeat New Job
Selected Jobs
Assigned Tasks
COSHH Scheduler
Hadoop System
Task Scheduling Process
Queuing Process
Routing Process
• Estimation of Job execution time across all resources
• Classify the jobs
• Calculate the best set of suggested job classes for each resource
• Select a job for the current free resource using suggested jobs of the Queuing Process
• Consider the fairness and the minimum share satisfaction in the system
COSHH Scheduler
McMaster University 10
• Considering the heterogeneity in the Hadoop system
• Improves Mean Completion Time
• Considers:
• Minimum Share Satisfaction
• Fairness
• Locality
Performance Issues of Existing Schedulers
McMaster University 11
Problem I. Small Jobs Starvation
FIFO :
J1
Job Queue
J2
R4
R3
R2
R1
Performance Issues of Existing Schedulers
McMaster University 12
Problem II. Sticky Slots
Fair Sharing:
R4
R3
R2
R1
Job Fair
Share Running
Tasks
Job 1 2 2
Job 2 2 2
Job Fair
Share Running
Tasks
Job 1 2 1
Job 2 2 2
Performance Issues of Existing Schedulers
McMaster University 13
Problem III. Resource and Job Mismatch
Storage
Unit
slice 1
slice 2
slice 3
Computation
Unit
slot 1
slot 2
slot 3
slot 4
slot 5
R1
Storage
Unit
slice 1
slice 2
slice 3
slice 4
slice 5
slice 6
Computation
Unit
slot 1
slot 2
R2
File 1:
File 2:
J2
Job Queue
J1
Storage
Unit
slice 1
slice 2
slice 3
Computation
Unit
slot 1
slot 2
slot 3
slot 4
slot 5
Storage
Unit
slice 1
slice 2
slice 3
slice 4
slice 5
slice 6
Computation
Unit
slot 1
slot 2
Problem I. Small Jobs Starvation
McMaster University 14
FIFO :
Problem II. Sticky Slots
McMaster University 15
Fair Sharing :
Problem III. Resource and Job Mismatch
McMaster University 16
COSHH:
Performance Issues of Existing Schedulers
McMaster University 17
Experimental Environment
McMaster University 18
Real Hadoop Workloads
McMaster University 19
(Chen et al., 2011)
Scalability Analysis- Results
Job Number Scalability
McMaster University 20
Scalability Analysis- Results
Resource Number Scalability
McMaster University 21
Scalability Analysis- Hybrid Scheduler
McMaster University 22
Scalability Analysis- Hybrid Scheduler
Job Number Scalability
McMaster University 23
Scalability Analysis- Hybrid Scheduler
Resource Number Scalability
McMaster University 24
Conclusion
McMaster University 25
• Performance Issues of Hadoop Schedulers:
• Small Jobs Starvation
• Sticky Slots
• Resource and Jobs Mismatch
• Propose a Hybrid Hadoop Scheduler
Thanks
McMaster University