Upload
adam-faris
View
719
Download
1
Tags:
Embed Size (px)
DESCRIPTION
This is a 5 minute lightening talk on why one would want to use "White Elephant" for capacity planning on a Hadoop cluster. This talk was done for the LSPE group, hosted by Yahoo! in Sunnyvale on Sept 19, 2013. http://www.meetup.com/SF-Bay-Area-Large-Scale-Production-Engineering/events/129859402/
Citation preview
Tracking multi-tenant resource usage with "White Elephant”
Adam Faris LinkedIn
Why track usage?
– Use Hadoop to process logs– Creates small file problem for HDFS– WebHDFS + HAR = “Problem Solver”
Job History Logs
– Requirements– Provides Data Aggregation– Provides Dashboard– Open Sourced by LinkedIn Engineering
http://en.wikipedia.org/wiki/White_elephant
Failed Tasks
Reduce Shuffle Bytes
It can do more?
• Total task time• Total speculative time• CPU Hours • Plus more
• Helps determine capacity
• Github: – https://github.com/linkedin/white-elephant
• LinkedIn Open Source Projects: – http://data.linkedin.com/opensource/white-elephant
• LinkedIn is Hiring: – http://careers.linkedin.com
• Questions/Comments:– twitter: @opsmekanix