View
60
Download
0
Category
Preview:
DESCRIPTION
Monitoring and Debugging Dryad(LINQ) Applications with Daphne. Vilas Jagannath, Zuoning Yin, Mihai Budiu University of Illinois, Microsoft Research SVC International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS) 2011. Programming Clusters: Marketing. - PowerPoint PPT Presentation
Citation preview
Monitoring and Debugging Dryad(LINQ) Applications
with Daphne
Vilas Jagannath, Zuoning Yin, Mihai BudiuUniversity of Illinois, Microsoft Research SVC
International Workshop onHigh-Level Parallel Programming Models and
Supportive Environments (HIPS) 2011
Programming Clusters: Marketing
Map-Reduce
Programming Clusters: Reality
Complexity Exposed
Correctness or performance bugsbreak the single-system abstraction
Outline
• Motivation• Job structure• The Job Object Model• Tools for job understanding• Conclusions
Execution
Application
Data-Parallel Computation
6
Storage
Language
Map-Reduce
GFSBigTable
CosmosAzureHPC
Dryad
DryadLINQScope
Sawzall,FlumeJava
Hadoop
HDFSS3
Pig, Hive≈SQL LINQ, SQLSawzall, Java
7
2-D Piping• Unix Pipes: 1-D
grep | sed | sort | awk | perl
• Dryad: 2-D grep1000 | sed500 | sort1000 | awk500 | perl50
8
Dryad Job Structure
grep
sed
sortawk
perlgrep
grepsed
sort
sort
awk
Inputfiles
Vertices (processes)
Outputfiles
ChannelsStage
9
Dryad System Architecture
Networkjob schedule
data plane
control plane
NS,Sched Exec ExecExec
V V V
Job manager cluster
Fire
wal
l
How does it work in detail?
Cluster/Cloud
Cluster Scheduler
Job Manager(JM)
Exec
Storage
Localhost
Job Submission
Compiler
Application
IDE Vertex
Exec
Storage
Vertex
Exec
Storage
L: Logs, IO: Input/Output, R: Resources
L R IO L R IO L R IO
Logs – lots of them
• Job-related – Plan (xml), status, resources
• Job-manager– stdout.txt, stderr.txt, *.log
• Vertex– stdout.txt, *.log, *.xml, *.cmd
Monitoring Tools Structure
Cosm
os
Scop
e
HPC
v2
HPC
v3
Cluster abstraction
Job Object Model
Monitoring,Profiling,
Debugging
GUIs
Job Object Model
Logs
JOM
Views
JobVerticesPlan
Tools
Outline
• Motivation• Job structure• The Job Object Model• Tools for job understanding• Conclusions
The Job BrowserJob Stage Vertex
Job Schedule
Failure diagnosis
Diagnosis decision tree
• “Hand-made”• Least portable tool• Incomplete• High-coverage• Bug types:– User level– System-level– Cluster malfunction
Powershell = Interactive Queries
$cluster = get-cluster X $job = $cluster | select-AllJobs | sort-object Date | select-object -last 1 | select-DryadJob$failed = $job.Vertices | where-object { $_.State -eq "Failed" }
Vertex Debugging on Client
Vertex Profiling on Client
Debugging on Cluster
Collection<T> collection;var results = from c in collection
where c.name.length > 10 orderby c.age
select c.name;
where c.name.length > 10
Program Job
Breakpoint
Fire
wal
l
Cluster/Cloud
Storage
L R
Remote debugging
Cluster Scheduler
Job Manager(JM)
Localhost
Job Submission
DryadLINQ
Application
Visual Studio Vertex 1 Vertex 2
Breakpoint hit…
Breakpoint
L: Logs, IO: Input/Output, R: Resources
attach
Exec
Storage
Exec
Storage
Exec
L R IO L R IO IO
Fire
wal
l
Cluster/Cloud
Exec Exec
Storage Storage Storage
L L L
Notifications: Our Implementation
Cluster Scheduler
Job Manager(JM)
Localhost
Job Submission
DryadLINQ
Application
Visual Studio Vertex 1 Vertex 2
Daphne
L: Logs, IO: Input/Output, R: Resources
Exec
R IO R IO R IO
attach
Remote debugging
Open Problems
• What happens when 100,000 processes hit a breakpoint?
• How to evaluate expressions in the debugger when state is distributed?
• How to do large-scale performance debugging?• How to preserve map between distributed state
and original program state?• How much can the illusion of a
single system be preserved?
Conclusions
• Single-machine abstractions break down in the presence of (performance/correctness) bugs
• Job Object Model insulates tools from messy details
• Design the cluster runtime to make iteasy to build a JOM
• Rich interactive tools easily built on top of JOM• Much more work needed for debugging at scale
Recommended