HiTune: Dataflow-Based Performance Analysis for Big · PDF fileHiTune: Dataflow-Based...

Intel Confidential

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud

Bo Huang Principal Engineer

Intel APAC R&D Ltd. Aug 16, 2011

USENIX ATC’11

Big Data

• “Industrial Revolution of Data”

– The heartbeat of mobile, cloud and social computing

– Expanding faster than Moore’s law

• E.g., Internet of Things

• What is Big Data?

– Too large to work with using traditional tools

– Require a new architecture

• Massively parallel software running on 100s~1000s of servers

2011/8/23 3

Dataflow Model for Big Data Analytics

• Users

– Applications modeled as dataflow graphs

– Write subroutines running on the vertices

– Abstracted away from messy details of distributed computing

• System runtime

– Dynamically map dataflow graphs to the cluster

– Handles all the low level details

• Data partitioning, task distribution, load balancing, node communications, fault tolerance, …

Partitioned Input

Aggregated Output

Streaming dataflow

reduce

Aggregated Output

Partitioned Input

copier

copiersort

sortmerge

reduce

Sequential dataflow

shuffle

Map Tasks Reduce Tasks

MapReduce

Hadoop

What Worked

• Parallel programming is hard

– Distributed programming is harder

– Dataflow model makes it a lot easier

• An appropriately high level of abstraction

– User required to consider data parallelisms exposed by the dataflow

– Runtime distributes executions of subroutines by exploiting data dependencies encoded in the dataflow

Nontrivial software written with threads,

semaphores, and mutexes are

incomprehensible to humans.

Edward A. Lee

CGO 2007, March 2007

Auto-Partitioning Compiler

for Intel Network Processor (IXP)

What Didn’t Work

• Dataflow abstraction makes Big Data system appear as a “black box”

– Very difficult for a user to understand runtime behaviors

– Performance analysis & tuning remains a big challenge

• Key challenges of performance analysis for Big Data

– Massively distributed system

• How to correlate concurrent performance activities (across 10000s of programs and machines)?

– High level dataflow abstraction

• How to relate low level performance activities to high level dataflow model?

HiTune: “Vtune for Hadoop”

• Distributed instrumentations

– Lightweight sampling using binary instrumentation

• No source code modifications

– Implemented using Java programming language agents

• Generic sampling information collected

• Dataflow-driven analysis

– Re-constructing dataflow execution process using low level sampling information

• Based on a dataflow specification

– Implemented as several Hadoop jobs

Sampler

Dataflow diagram

Sampler

Aggregation engine

Tracker

Analysis engine

Specification file

Adaptor

Chukwa

Adaptor

Chukwa

Collector

Chukwa

Local HiTune

data HiTune

HiTune

Analyzer

Hadoop

HiTune Report (.csv)

Instrumentation

PostProcess

Analysis

HiveQL

Hive Query Excel Spreadsheet

Visual

Report Samples (.xlsm)

Chukwa

Collector

Chukwa

Collector

Aggregation

HiTune 0.9 Status

• Used intensively both inside Intel and by several external

customers

• Open sourced under Apache License 2.0

• Available at https://github.com/hitune/hitune

Overhead • Ratio of instrumented vs. uninstrumented clusters

– Less than 2% runtime overhead due to instrumentation

101% 102% 101%

101% 100% 101%

101% 100% 100%

Sort WordCount Nutch indexing

Workloads

5-slave cluster 10-slave cluster 20-slave cluster

100% 102% 100%

102% 101% 100%

100% 100% 100%

tiliza

Workloads

102% 100% 102%

102% 100% 101%

102% 100% 100%

Sort WordCount Nutch indexingRa

tiliza

Workloads

98% 100% 98%

99% 98% 100%

Workloads

The Hadoop Dataflow Model

Streaming dataflow

reduce

Aggregated Output

Partitioned Input

copier

copiersort

sortmerge

reduce

Sequential dataflow

shuffle

Map Tasks Reduce Tasks

Case Study: Limitation of Traditional Tools

• Sorting many small files (3200 500KB-sized files) using Hadoop 0.20.1

– Cluster very lightly utilized (extremely low CPU, disk I/O and network utilization)

– No obvious bottlenecks or hotspots in the cluster

– Traditional tools (e.g., system monitors and program profilers) fail to reveal the root cause

Case Study: Limitation of Traditional Tools

• HiTune results (dataflow execution) reveal the root cause

– Upgrading to “Fair Scheduler 2.0” fixes the issue

Dataflow

Execution Chart

Time line

Sorting small files

bootstrap map shuffle sort reduce idle

0.20.1 0.19.1

Time line

Sorting small files

0.20.1 0.19.1

Time line

Sorting small files

0.20.1 0.19.1 The Fix

The Low Utilization Issue

sks bootstrap

shuffle

reduce

Time line

Sorting small files

0.20.1 0.19.1

Case Study: Limitation of Hadoop Logs

• TeraSort

– Large gap between end of map and end of shuffle

• None of CPU, disk I/O and network bandwidth are bottlenecked during the gap

– “Shuffle Fetchers Busy Percent” metric reported by Hadoop is always 100%

• Increasing the number of copier threads brings no improvement

– Traditional tools or Hadoop logs fail to reveal the root cause

Case Study: Limitation of Hadoop Logs

• HiTune results (dataflow-based hotspot breakdown) reveal the root cause

– Copier threads idle 80% of the time, waiting for memory merge thread

– memory merge thread busy mostly due to compression

– Changing compression codec to LZO fixes this issue

Copier threads

Busy, 20%

Idle, 80%reserve, 79%

Others,1%

Compress, 81%

Memory Merge threads

Idle, 13%

Busy, 87%Others, 6%

Case Study: Extensibility

• Easily extended to support Hive

– Simply changing the dataflow specification

• Aggregation query in Hive performance benchmarks

– 68% of time spent on data input/output, Hadoop/Hive initialization & cleanup

– Critical to reduce intermediate results, improve data input/output, and reduce Hadoop/Hive overheads

Hive Init

Hive Active

Hive Close

Hive Processing Period

Active

Hive data flow stage timeline

map stage timeline

Stage Init

Stage Close

Hive Init

Hive Active

Hive Close

Stage Init

Hive Init

Hive Active

Hive Close

Active

map stage timeline

Stage Init

Stage Close

Hive Init

Hive Active

Hive Close

Stage Init

Active

reduce stage timeline

Stage Close

Hive Init

Hive Active

Hive Close

Active

reduce stage timeline

Stage Close

Hive Init

Hive Active

Hive Close

Hive Init

Close0.2%

Stage Close

Hive Input42.4%

Hive Operations

Hive Output3.2%

Hive Active77.6%

Hive Init Hive Close Stage Close

Hive Input Hive Operations Hive Output

Hive Init

Close0.2%

Stage Close

Hive Input42.4%

Hive Operations

Hive Output3.2%

Hive Active77.6%

Hive Init Hive Close Stage Close

Hive Input Hive Operations Hive OutputMap Tasks Reduce Tasks

Summary

• HiTune - “VTune for Hadoop”

– Better insights on Hadoop runtime behaviors

• Dataflow-based analysis

– Extremely low runtime overheads

– Very good scalability & extensibility

– v0.9 open sourced under Apache License 2.0

• See https://github.com/hitune/hitune

* Refer to our USENIX ATC’11 paper for more details

2011/8/23 16

HiTune: Dataflow-Based Performance Analysis for Big · PDF fileHiTune: Dataflow-Based...

Documents

ECE545 Lecture5 Dataflow

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud · –Performance analysis & tuning remains a big challenge • Key challenges of performance analysis for Big Data

HR & Benefits – Payroll Dataflow - ADP · HR & Benefits – Payroll Dataflow for ADP Workforce Now Module 1: Real-Time Dataflow Handout Manual . MODULE 1: REAL-TIME DATAFLOW HANDOUT

DataFlow & Beam

Actian DataFlow Whitepaper

Visual Dataflow Coordination2010

5 Dataflow Diagram

HiTune: Dataflow-Based Performance Analysis for Big Data Cloud

Sort Merge Buckets: Optimizing Repeated Skewed Joins in Dataflow · 2020. 4. 15. · Apache Crunch Data-Intensive Platform MillWheel Data-Intensive Platform Apache Storm Programming

a LRTAP dataflow

Overview of DataFlow

HiTune - Dataflow-Based Performance Analysis for Big Data ... · every day and 800TB compressed data scanned daily [1]. This type of “Big Data” phenomenon has led to the emergence

VHDL Dataflow and Structural Modeling and Testbencheseng.umb.edu/~cuckov/classes/engin341/Lectures/L02 - VHDL - Dataflow...VHDL –Dataflow and Structural Modeling and Testbenches

Petrolook Dataflow Diagram

Dataflow Monitoring

t Pl Dataflow

Dataflow Diagram

CS 744: DATAFLOW

Dataflow Models

dataflow - Lunds tekniska högskola · 3 motivation the need for a parallel programming model dataflow programming actors, dataflow, and the CAL actor language dataflow perspectives