Upload
jtedesco5
View
327
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides for presentation of "Theius: A Streaming Visualization Suite for Hadoop Clusters", given at IC2E 2013 in San Francisco, California.
Citation preview
Theius: A Streaming Visualization Suite for
Hadoop ClustersJon Tedesco
IC2E 2013, San Francisco, CA, USAJon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell
Problem
◦ System administrators
Bottleneck for detecting & responding to failures
Communicate state of system quickly
Monitoring
◦ Streaming, real-time data
◦ Ganglia
Widely used, scalable, and flexible
Prediction
◦ Online prediction algorithms (real-time)
Visualization Problem
◦ Ganglia
Static, time-based graphs
Motivation
Mon
itoring
Visualization
Prediction
3
Related Work - Ganglia
4
Related Work - Ganglia
5
Interactive
◦ Responsive and controllable
Real-time
◦ Streaming, real-time, automatic
Informative
◦ Direct attention to potential problems and artifacts
Intuitive
◦ Demand skill, not experience
Scalable
◦ Visualize large clusters without sacrificing usability
Design Principles
6
Objectives
◦ Streaming data
◦ Configurable and interactive
◦ Informative
Use cases
◦ Heterogeneous cluster
◦ Rack failure
◦ Node failure
◦ Uneven load distribution
Design Principles
7
Architecture
◦ Simulator
Generates simulated cluster data
Streams data to clients
◦ Webpage
Asynchronous & interactive
Implementation
◦ JavaScript
d3.js
jQuery
◦ Python
◦ AJAX
Visualization - Architecture
8
Data
◦ Methodology
Data types from previous work
Heuristic values
◦ Examples
CPU, memory, context switch rate
Log events
MapReduce tasks and jobs
Failure or event prediction
Visualization - Architecture
9
Visualization - Overview
10
Visualization – Main Visualization
Main Visualization
Customizable using control panel
Aggregate view
◦ Summarize and drill down
Draws attention to anomalies
11
Visualization – Navigation
Switch between main visualizations
Seamless transitions
◦ Uninterrupted data stream
12
Visualization - Tree
Hierarchy of nodes, organized by rack
Color and size configurable
Scalable using summarization and drill-
down
Identify abnormal rack or nodes
13
Visualization – Large Tree
Hierarchy of nodes, organized by rack
Color and size configurable
Scalable using summarization and drill-down
Identify abnormal rack or nodes
14
Visualization - MapReduce
Grouped by job
Color and size configurable
◦ Example uses role for color, time remaining
for size
Identify abnormal jobs or tasks
15
Visualization - TreeMap
Grouped by rack
Color and size configurable
◦ Example uses CPU usage and rack color coding
Identify abnormal nodes or racks
16
Visualization - Scatterplot
Identify trends with nodes and racks
Color, size, and plots configurable
Identify correlations between metrics
17
Visualization – Individual Node
Detailed data for individual node
Traditional visualizations for single
node
18
Visualization - Controls
Controls
Configure metrics for visualizations
Pause and resume data stream
Legend for main visualization
19
Visualization – Aggregate Data
Aggregate Data
Aggregate data for the cluster
◦ Log events stream
◦ Global node data
◦ Summarization data
20
Visualization - Timeline
History Controls
Snapshots of historical data
◦ See main visualization and sidebar data at
certain time
Visualize metric across time
21
Scalable
◦ Drill-down and summarization
◦ Efficient web-based framework
Intuitive, informative
◦ Topological visualization
◦ Draw attention to abnormalities
Interactive, real-time
◦ Designed for streaming data
◦ Configurable visualization
◦ Pause, rewind, resume
Visualizations Summary
22
Experimental Setup
◦ Compare Theius to Ganglia
◦ 5 graduate students at UIUC
No prior experience with Ganglia or Theius
◦ 4 comparative tasks
Both Ganglia & Theius
◦ 6 scenarios for trends and correlations
Theius only
◦ Timings & subjective feedback
User Study
23
Tasks
◦ Scenario 1
CPU usage in single node
◦ Scenario 2
Node with highest CPU
◦ Scenario 3
High memory usage nodes
◦ Scenario 4
Aggregate cluster use
User Study – Comparative Tasks
Scen
ario
1
Scen
ario
2
Scen
ario
3
Scen
ario
40
10
20
30
40
50
60
Theius
Ganglia
Seco
nd
s
24
Task 1
◦ Identify abnormal rack in heterogeneous cluster
Task 2
◦ Identify rack with abnormal CPU usage
Task 3
◦ Identify machine that logged the last fatal error
Task 4
◦ Identify machine with high CPU, memory usage, or context switch
rate
Task 5
◦ Identify rack with high CPU, memory usage, or context switch rate
Task 6
◦ Identify correlation between context switch rate and CPU usage
User Study – Theius Tasks
2.2 s
6.2 s
10.0 s
67.4 s
1.2 s
7.8 s
25
Conclusion & Future Work
Source Code
◦ https://github.com/jtedesco/Theius
Future Work
◦ User study
System administrators
Larger group
Timing as appropriate metric
◦ MapReduce-specific visualizations
◦ Scalability experiments
Questions?Jon Tedesco
IC2E 2013, San Francisco, CA, USAJon Tedesco, Roman Dudko, Abhishek Sharma, Reza Farivar, Roy Campbell