Upload
hadoop-summit
View
799
Download
2
Tags:
Embed Size (px)
Citation preview
Hadoop-2 @ebayMayank Bansalebay
Hadoop – 2 @ ebay
Mayank Bansal
Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
Who I am
• Principal Engineer @ ebay• Apache Hadoop Committer• Apache Oozie PMC and Committer
• Current• Leading Hadoop Core Development for
YARN and MapReduce @ ebay
• Past• Working on Scheduler / Resource
Managers• Working on Distributed Systems• Data Pipeline frameworks
Mayank Bansal
Who we are
• ebay Hadoop Team
• We are around 40 people developing and supporting Hadoop
• Thousands of Hadoop Users @ ebay
Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
Hadoop Evolution @ ebay
20071-10 nodes
2010100+ nodes1000s + cores1 PB
20111000+ node10,000+ cores10+ PB
2012
3000+ node30,000+ cores50+ PB
2013/2014
10,000 nodes 150,000+ cores 150 PB
200950+ nodes
Hadoop - 1 Architecture
Hadoop-1 Limitations
• Scalability• Maximum Cluster Size 4-5K nodes• Maximum concurrent tasks ~40K• Job Tracker scalability
• Availability• Failure kills all the jobs
• Hard partition on Maps and Reduce• Less Cluster utilization
• Lack support for alternate Paradigms
Hadoop-2
Single Use System Batch Apps
Multi Purpose PlatformBatch, Interactive, streaming
YARN
Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
Application Master
• Runs on Normal Node Manager machines
• Out Of Memory Errors
• Slow Machines
• Flaky Network
Application Master
Nodes Goes Down
• Map Reduce• Can Build state from Job History Files
• Generic Applications• Application Time Line/History Server
• YARN-321• YARN-1530
Application Master
• Slow Machines• Automation/Monitoring
• Flaky Network• Split Brain problem
• Fixed for Map Reduce• All the AppMasters have to fix this
Application MasterOut Of Memory
• Physical Memory Errors• yarn.app.mapreduce.am.resource.mb• yarn.app.mapreduce.am.command-opts
• Virtual Memory Errors• Default Ratio 2.1, needs to be tweaked• yarn.nodemanager.vmem-check-enabled• yarn.nodemanager.vmem-pmem-ratio
Binary Compatibility
• Works well • mapred apis are binary compatible• mapreduce apis are source compatible
• BUT …• Only works for 70% Applications
• Why?• Reflections• Uber Jars in class path
• MAPREDUCE-5108
Binary Compatibility
LZO Compression• LZO is not compiled with Hadoop-2
Avro• http://repo1.maven.org/maven2• Version => 1.7.4-hadoop2
Log Aggregation
• Loads lot of data in HDFS
• Per Day 5-7 TB of Data
• Default is 30 days we made that to 4 days• yarn.log-aggregation.retain-seconds
• Lot of load on Namenode
User Engagement
• Engage all users for verifying jobs
• Test with Production like data
• Verify all jobs just not the sample jobs
Agenda
• Who we are?
• Background of Hadoop and Hadoop at ebay
• What are the challenges
• What we achieved using Hadoop-2
Benchmarks
Benchmark Hadoop-1 Hadoop-2 Improvement
Sort 500 seconds 365 seconds ~20%
Tera Sort 182 seconds 180 seconds About the same
Shuffle 993 seconds 530 seconds ~2X
Scalability 1020 seconds
275 seconds ~4X
YARN-938
Hadoop-2 Numbers
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:0
0
11:0
0
12:0
0
13:0
0
14:0
0
15:0
0
16:0
0
17:0
0
18:0
0
19:0
0
20:0
0
21:0
0
22:0
0
23:0
00
100000
200000
300000
400000
500000
600000
700000
Tasks Starting per Hour
Hadoop-2 Hadoop-1
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:0
0
11:0
0
12:0
0
13:0
0
14:0
0
15:0
0
16:0
0
17:0
0
18:0
0
19:0
0
20:0
0
21:0
0
22:0
0
23:0
00
100000200000300000400000500000600000700000
Tasks Finishing Per Hour
Hadoop-2 Hadoop-1
~59% more tasks
~52% more tasks
Hadoop-2 Numbers
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:0
0
11:0
0
12:0
0
13:0
0
14:0
0
15:0
0
16:0
0
17:0
0
18:0
0
19:0
0
20:0
0
21:0
0
22:0
0
23:0
00
100
200
300
400
500
600
Apps Submitted per hour
Hadoop-2 Hadoop-1
0:00
1:00
2:00
3:00
4:00
5:00
6:00
7:00
8:00
9:00
10:0
0
11:0
0
12:0
0
13:0
0
14:0
0
15:0
0
16:0
0
17:0
0
18:0
0
19:0
0
20:0
0
21:0
0
22:0
0
23:0
00
100
200
300
400
500
600
Apps Finishing Per Hour
Hadoop-2 Hadoop-1
~51% more tasks
~50% more tasks
Hadoop-2 Numbers
0:00 1:20 2:40 4:00 5:20 6:40 8:00 9:20 10:40 12:00 13:20 14:40 16:00 17:20 18:40 20:00 21:20 22:400
0.2
0.4
0.6
0.8
1
1.2
Hadoop-2 Cluster Utilization
Utilization
Overall improvements
• Over All Job throughput • increased ~2X
• Over All Run time of jobs• Increased ~1.5X to 2X
Apps Beyond MapReduce
• Tez
• Storm
• Shark and Spark
• …
Availability
• Namenode HA
• RM Restart
• RM HA
• Rolling upgrades (Coming soon)
Conclusion
• There are some pain points.
• Need to plan User Testing
• Worth The Effort