44
Analysis: Looking Back and the Road Ahead Trishul Chilimbi Runtime Analysis & Design (RAD) Research in Software Engineering (RiSE) Microsoft Research

Dynamic Analysis: Looking Back and the Road Ahead

  • Upload
    burke

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Dynamic Analysis: Looking Back and the Road Ahead. Trishul Chilimbi Runtime Analysis & Design (RAD ) Research in Software Engineering ( RiSE ) Microsoft Research. Dynamic Analysis Breakdown. Measurement Representation Analysis. Measurement Methodology. Program. Compiler. - PowerPoint PPT Presentation

Citation preview

Page 1: Dynamic Analysis: Looking Back and the Road Ahead

Dynamic Analysis: Looking

Back and the Road Ahead

Trishul Chilimbi Runtime Analysis & Design (RAD)

Research in Software Engineering (RiSE)

Microsoft Research

Page 2: Dynamic Analysis: Looking Back and the Road Ahead

Dynamic Analysis Breakdown

Measurement

Representation

Analysis

WODA '092

Page 3: Dynamic Analysis: Looking Back and the Road Ahead

3WODA '09

Measurement Methodology

Program Compiler

Executable

Machine

SourceInstrumentation

BinaryInstrumentation

Instruction Emulation

HardwareInstrumentation

ATOM (PLDI’04)EEL (PLDI’05)

Dynamo (PLDI’00) DCPI (SOSP’97)

Page 4: Dynamic Analysis: Looking Back and the Road Ahead

Measurement Efficiency Hardware performance counters

DCPI (SOSP’97) Sampling

Bursty Tracing (PLDI’01, FDDO’01) Program Analysis

Path Profiling (MICRO’96)

WODA '094

Page 5: Dynamic Analysis: Looking Back and the Road Ahead

Representation Raw

Trace Structured

Path Profile (MICRO’96) Whole Program Paths (PLDI’99) Whole Program Data Accesses (PLDI’01)

Custom Eraser’s Lock Set (SOSP’97)

WODA '095

Page 6: Dynamic Analysis: Looking Back and the Road Ahead

Analysis Performance

Profiling and profile-driven optimization Correctness

Bug detection, heap and concurrency checkers Security

Security monitors, Taint Analysis

WODA '096

Page 7: Dynamic Analysis: Looking Back and the Road Ahead

Dynamic Analysis: The Road Ahead Industrial-strength dynamic analysis Scaling dynamic analysis to process and analyze

large quantities of data System Level Data Centers, Multi-core

WODA '097

Page 8: Dynamic Analysis: Looking Back and the Road Ahead

Scaling Dynamic Analyses System level analysis

Instrumentation Event Tracing for Windows (ETW)

Data volume Statistical Analysis Visualization

WODA '098

Page 9: Dynamic Analysis: Looking Back and the Road Ahead

ETW Tracing Infrastructure General purpose real-time event logging facility Core component of Windows operating systems

starting with Windows 2000, continually extended and improved

High speed 1200 to 2000 cycles per logging

Low overhead less than 5% of the total CPU cycles for 20,000 events/sec

Works for both user mode applications and drivers Immune to app crashes and hangs

Writes to a file or to a real time listener Dynamically enabled or disabled

No re-compile, no reboots, no app restarts, … Designed for app tracing in production mode

Scalable

9WODA '09

Page 10: Dynamic Analysis: Looking Back and the Road Ahead

ETW Architecture

Provider CProvider B

Provider A

Trace files

Controller

Consumer

Real time delivery

Logged Events

Session 1

Buffers

Session 2 Session 64

Event Tracing Sessions

Events

EventsEnable/DisableSession Control

Consumer

10WODA '09

Page 11: Dynamic Analysis: Looking Back and the Road Ahead

ETW Performance Diagnostics OS

Process/thread activity Module load Disk and File IO TCPIP/UDPIP Pagefault Registry Context Switch Heap and Critical Section

Server applications

Active Directory IIS6 File Server Print Server Exchange Server

11WODA '09

Page 12: Dynamic Analysis: Looking Back and the Road Ahead

ETW Statistics Kernel logger outputs:

~100K events in minutes ~200KB binary file ~100MB text dump Multiple traces/day

Expert analysis Processing a trace file: a few minutes Manual diagnosis time: sometimes minutes, sometimes

hours Manual diagnosis cannot keep up with rate of trace

collection

12WODA '09

Page 13: Dynamic Analysis: Looking Back and the Road Ahead

Scaling Dynamic Analyses System level analysis

Instrumentation Data volume

Statistical Analysis Visualization

WODA '0913

Page 14: Dynamic Analysis: Looking Back and the Road Ahead

HangViz Lock/resource contention lies at the root of many

performance problems Kernel manages most resources – not visible to

application developer Our solution

1. Start from an observed hang2. Pull out all relevant lock-related waits, represented

as a directed acyclic graph (DAG)3. Highlight critical path4. Provide visualization tool for further exploration5. Iterative feedback cycle

Joint work with Alice Zheng, Steve Hsaio, David Andrzewejski

14WODA '09

Page 15: Dynamic Analysis: Looking Back and the Road Ahead

HangViz Outline Constructing the Ready DAG Finding the critical path Visualization

15WODA '09

Page 16: Dynamic Analysis: Looking Back and the Road Ahead

Constructing A Ready DAG Relevant ETW events

CSwitch: context switches ReadyThread: thread releasing resource Stack: lock functions

Currently ETW does not track lock object ID Stack functions are used to differentiate between

different locks, but the signature is not perfect Sequence of wait and run intervals and

ReadyThread signals can be represented as a directed acyclic graph (DAG)

16WODA '09

Page 17: Dynamic Analysis: Looking Back and the Road Ahead

Example: Simple Ready Chain

Outlook UI(waiting)

Outlook UI(running)

SearchIndexer

(running)

ReadyThread file lockSearchIndexer

(waiting)

eTrust(running)

ReadyThreadfile lock

17WODA '09

Page 18: Dynamic Analysis: Looking Back and the Road Ahead

Complications: Non-Immediate Waits The immediate ready chain may not be the root

cause of the problem

18WODA '09

Page 19: Dynamic Analysis: Looking Back and the Road Ahead

Example: Ready Tree

Outlook UI(waiting)

SI(running

)

Outlook UI

(running)ReadyThread file lock

eTrust(running)

SI(waiting)

ReadyThread file lock

SI(running

)

SearchIndexer

(waiting)

ReadyThread registry

lock

Systems thread

(running)

19WODA '09

Page 20: Dynamic Analysis: Looking Back and the Road Ahead

Solution: Follow Overlapping Waits Look at all ready chains during the long wait

Follow any wait of the parent thread (e.g., SearchIndexer) that overlaps with the child wait

Repeat on parent thread Optional search depth to limit branching factor

20WODA '09

Page 21: Dynamic Analysis: Looking Back and the Road Ahead

More Complications: False Runs The thread runs, but not because the resource has

been released Timer wake up – thread wakes up, lock is still not

available, thread goes back to sleep APC – thread is woken up to execute code for

someone else Bottomline: timer wake ups and APCs do NOT

terminate the wait, should be counted towards total wait time

21WODA '09

Page 22: Dynamic Analysis: Looking Back and the Road Ahead

Example: APCs

Outlook UI(waiting)

SI(running)

Outlook UI

(running)ReadyThread

file lockSI

(waiting)

ReadyThreadAPC

Systems thread

SI(running

)

SI(waiting)

SearchIndexer(waiting)

SI(running)

SI(waiting)

ReadyThread

IExplorer(running)

22WODA '09

Page 23: Dynamic Analysis: Looking Back and the Road Ahead

Finding Individual Critical Waits Algorithm for finding individual critical waits

Bucket wait times by their lock set (set of lock-related functions on the stack)

For each lock set, build probabilistic model of wait time

Gaussian, exponential, Gamma, or mixture of Gaussians

Select the best model for each lock set A long wait is critical if it has extremely low

probabilities under the model

23WODA '09

Page 24: Dynamic Analysis: Looking Back and the Road Ahead

Probabilistic Model of Wait Times

7 us

10829 us

41 s

Low probability!

EnterCriticalRegion Wait Time Histogram Mixture of Gaussians Model

24WODA '09

Page 25: Dynamic Analysis: Looking Back and the Road Ahead

Finding the Critical Path Ready DAGs can be complex But there should be only one critical path

One resource holding up the entire chain (for example, network or I/O)

Multiple threads on the chain are experiencing long waits

Critical path probably has longest average wait time Other possible metrics

maximum wait time: might be shared among multiple paths

longest chain: could have many short waits longest chain with longest average wait time

Possible expansion to cross-trace analysis

25WODA '09

Page 26: Dynamic Analysis: Looking Back and the Road Ahead

Screen Shot I Generated ReadyTree (anomalous waits highlighted

in red)

26WODA '09

Page 27: Dynamic Analysis: Looking Back and the Road Ahead

Screen Shot (close-up)

27WODA '09

Page 28: Dynamic Analysis: Looking Back and the Road Ahead

Screen Shot (Annotation) Changing anomaly annotation

28WODA '09

Page 29: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

A picture is worth a “million words” [of trace data] Heap Allocation “movies” expose problems Easy to use and supports deep exploration Observe instantaneous program behavior

• Investigate memory footprint, WS, fragmentation, leaksBAD GOOD

AllocRay: (with George Robertson, VIBE)

29

Page 30: Dynamic Analysis: Looking Back and the Road Ahead

AllocRayHeap Allocation “Movies”

• Colors and filters help focus on different behaviors• Memory footprint• Fragmentation

• Pixels are tied to events and call stacks to facilitate investigation

30WODA '09

Page 31: Dynamic Analysis: Looking Back and the Road Ahead

Scaling Dynamic Analyses Data centers

10,000+ machines running web services such as search, mail, online shopping

Large opportunity for dynamic analyses to reduce data center operations cost

10,000 x 100 metrics/minute -> 10+GB/day

WODA '0931

Page 32: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Statistical Debugging (Liblit et al. PLDI’03) Algorithm sketch

1) Collect code profiles for a large number of successful and failing runs of the program

2) Find code fragments that strongly correlate with failure

Cause & correlation Correlation implies causation, a logical fallacy! Example : error handling code

Statistical debugging – build a statistical model of program outcome that discriminates cause from correlation

32

Page 33: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Holmes (Chilimbi et al. ICSE’09)

… Path profiles from successful and failing runs

…if (y=0) { x = x + 1}…Bug predictors

(likely root cause)

Statistical analysis

Statistical

model

33

Page 34: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Statistical analysis

Differentiate cause from correlation

Key idea – find path fragments that strongly correlate with failure but the context in which the fragment occurs does not

a

b c

d

e f

Context of a path

foo(x, y)

34

Page 35: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Statistical model Inputs

A set of path profiles, one for each run Each run’s outcome (success/failure)

Compute four statistics for each path So(p), Fo(p) : number of successful/failing runs in

which context of path p was executed Se(p), Fe(p) : number of successful/failing runs in

which path p was executed

35

Page 36: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Statistical model

How much is the context of a path correlated with failure?

Measure of how many failures does a path occur in?

How much more is the path correlated with failure?

Overall measure that combines sensitivity and increase (specificity)

36

Page 37: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Holmes in actionEDG C++ compiler

Importance

Context

Increase

37

Page 38: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Branches, predicates AND pathsHow close do they get you?

Study of 45 bugs in 6 applications from the SIR benchmark suite

Path profiles take you down the right path!

38

Page 39: Dynamic Analysis: Looking Back and the Road Ahead

Bug-directed Adaptive profiling

…Pr

oduc

tion

env

ironm

ent

Profiles

Bug report

s

Statistical analysis

Holmes backend

Holmes profiling

tools Bug predictors

myapp.dll

myapp.cpp

Static analysis

Root causewhile (is_eof_token(ch) {}if (id == 1) {} 39

Page 40: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

Adaptive Profiling Bootstrapping

Stack traces Branch profiles

Iterative Profiling Additional function selection using coupling Strengthening weak predictors with richer profiles

40

Page 41: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

HOLMES: Non-Adaptive Vs Adaptive

Benchmark

Holmes (Non-Adaptive)

Holmes1(Adaptive)

Holmes2 (Adaptive)

Holmes3 (Adaptive)

print_tokens 0.68 / 100% 0.42 / 100%

replace 0.57 / 99% 0.27 / 96% 0.53 / 98%

gcc 0.68 / 67% 0.58 / 67% 0.68 / 67%

translate.v1 0.53 / 58% 0.24 / 67% 0.47 / 25% 0.53 / 58%

translate.v2 0.83 / 93% 0.47 / 27% 0.89 / 80%

edg 0.65 / 98% 0.65 / 97% 0.64 / 96% 0.66 / 96%

41

Page 42: Dynamic Analysis: Looking Back and the Road Ahead

WODA '09

HOLMES: ADAPTIVE OVERHEADS

Benchmark

Branch Profiles

Holmes (Non-Adaptive)

Holmes1(Adaptive)

Holmes2 (Adaptive)

Holmes3 (Adaptive)

gcc 75 181.3 2.6 9.6

translate.v1

3.5 4.7 0.3 2.1 3.5

translate.v2

8.8 2.8 0.8 0.0

gcc 84.1 170.2 7.3 46.5

translate.v1

25.2 41.1 4.1 3.0 21.6

translate.v2

25.1 43.2 3.1 1.8

Time Overhead (%)

Space Overhead (%)

42

Page 43: Dynamic Analysis: Looking Back and the Road Ahead

Dynamic Analysis & Data Centers Data center environment is more controlled System level Vs. Application level metrics What is the analogue of paths that provides

context? Need predictive capability to take action

Reboot, Reimage, Notify operator

WODA '0943

Page 44: Dynamic Analysis: Looking Back and the Road Ahead

Conclusion Dynamic analyses have been successfully used to

improve program performance, reliability, and security

Efficient measurement Need to scale dynamic analysis to industrial

strength to address challenges posed by system-level analysis, multi-core, and data centers

Efficient data management and analysis Data management: Database/ Map-Reduce style

processing Statistical Analysis Techniques

WODA '09 44