Tune hadoop

Notebook

How to Debug and Tune Hadoop

Alex RovnerProclivity Systems

Tune Your Cluster

Tune Your Cluster

Choose optimal number of mappers / reducers per nodemapred.tasktracker.map.tasks.maximum

mapred.tasktracker.reduce.tasks.maximum

Oversubscribe the CPU by 20-30% (8 Cores can generally handle 10 slots)

Mappers to reducers ratio 4:3

Tune Hadoop

Adjust memory allocationsmapred.child.javaopts=-Xmx512M

Use 80% of available memory

Do not oversubscribe memory to avoid swapping

Total Memory = Map Slots + Reduce Slots + TT + DN + Other Services + OS

Tune Hadoop

Increase buffers for sorting and shufflingio.sort.mb & fs.inmemorysize.mb

Set to 60-70% of Java heap size

Set it large enough to avoid disk spills

Compress intermediate datamapred.compress.map.output

Install native libraries for performance

Use LZO to minimize CPU cycles

Set compression to use BLOCK compression

Tune Your Job

Use Combiner where possible!Combiner is a mini reduce phase on the map side

Reduces the amount of data sent to the reducers

Does not need to be the same class as the reducer

Tune Your Job

Setup an appropriate number of reducersCheck job stats to figure out how many reducers are needed

Map output bytes will drive how many reducers you need

Rule of thumb is 1 Gig per reducer

Tune Your Job

Tune Your Job

7.4GB = 7 Reducers

Tune Your Job

Tune Your Job

Tune Your Job

Demo

Technology

Tune hadoop