View
7.277
Download
2
Category
Preview:
Citation preview
Notebook
How to Debug and Tune Hadoop
Alex RovnerProclivity Systems
Tune Your Cluster
Tune Your Cluster
Choose optimal number of mappers / reducers per nodemapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
Oversubscribe the CPU by 20-30% (8 Cores can generally handle 10 slots)
Mappers to reducers ratio 4:3
Tune Hadoop
Adjust memory allocationsmapred.child.javaopts=-Xmx512M
Use 80% of available memory
Do not oversubscribe memory to avoid swapping
Total Memory = Map Slots + Reduce Slots + TT + DN + Other Services + OS
Tune Hadoop
Increase buffers for sorting and shufflingio.sort.mb & fs.inmemorysize.mb
Set to 60-70% of Java heap size
Set it large enough to avoid disk spills
Compress intermediate datamapred.compress.map.output
Install native libraries for performance
Use LZO to minimize CPU cycles
Set compression to use BLOCK compression
Tune Your Job
Use Combiner where possible!Combiner is a mini reduce phase on the map side
Reduces the amount of data sent to the reducers
Does not need to be the same class as the reducer
Tune Your Job
Setup an appropriate number of reducersCheck job stats to figure out how many reducers are needed
Map output bytes will drive how many reducers you need
Rule of thumb is 1 Gig per reducer
Tune Your Job
Tune Your Job
7.4GB = 7 Reducers
Tune Your Job
Tune Your Job
Tune Your Job
Demo
Recommended