Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Lecture 10. Dynamic Analysis
Wei LeThank Xiangyu Zhang, Michael Ernst, Tom Ball
for Some of the Slides
Iowa State University
2014.11
Outline
I What is dynamic analysis?
I Instrumentation
I AnalysisI Representing dynamic informationI Dynamic slicing
I Applications: specification generation and Diakon
What is dynamic analysis
Learn from executions:
I Run programsI where are the test inputs?I what are the coverage?I what parts of the code should be run?I how to create runtime environment?
I Collect dataI online or offline (log analysis)?I what is the overhead (memory and performance)I how to capture and replay the data: compress, statistical approach
I Analyze dataI dynamic slicingI model-basedI statistical and machine learning approaches (an interesting discussion
on comparing the two: https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning)
What is the meaning of Run?
I Abstract interpretation and static analyses run a program over anabstract domain:OUT=F(IN,s)
I Dynamic analysis: abstraction used in parallel with, not in place of,concrete values:OUT=F(IN, si, v)
How dynamic analysis is used?
I Profiling-guided code optimization: which paths are frequentlyexecuted?
I Performance analysis
I Mining program invariants, protocols
I Fault localization, debugging (replay concurrency bugs)
I Bug finding
I ...
Running the programs
I The test suite determines the expense (in time and space)
I The test suite determines the accuracy (what executions are neverseen)
I Coverage or frequency: statements, branches, paths, procedure calls,types, method dispatch
I Values computed: parameters, array indices
I Run time, memory usage
InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.
InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.
InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.
Source and Binary Instrumentation
I Source instrumentation: instrument source programs
I Binary instrumentation: instrument executables directly:I No need to recompile or relinkI Discover code at runtimeI Handle dynamically-generated codeI Attach to running processes
Binary Instrumentation Tool Is Dominant
I Libraries are a big pain for source code level instrumentation e.g.,Proprietary libraries: communication (MPI, PVM), linear algebra(NGA), database query (SQL libraries).
I Easily handle multi-lingual programs: source code levelinstrumentation is heavily language dependent.
I More complicated semantics: Turning off compiler optimizations canmaintain a almost perfect mapping from instructions to source codelines
I Worms and viruses are rarely provided with source code
Static Instrumentation
Dynamic Instrumentation
Instrumentation Tools
I Source instrumentation tools: AIMS, Paradyn, Pablo ... [1]
I Binary instrumentation tools:I Static (insert in the code and then run): ATOM, EEL, Etch, MorphI Dynamic (run while insert the instrumentation): Dyninst, Vulcan,
DTrace, Valgrind, Strata, DynamoRIO, PIN
Valgrind
See Valgrind slides
PIN
See PIN slides
Instrumentation: a Summary
I Source-level:I Pattern-matching over parse tree or AST and rewritingI Full access to source informationI A* [Ladd, Ramming], Astlog [Crew],
I Binary-level:I ATOM [Srivastava] , EEL [Larus], Diablo, BlutoI Analyze programs from multiple languagesI Typicaly done at some IR level
I Dynamic instrumentation (run-time): Valgrind, PIN
I Java instrumentation tools:I ASM (http://asm.objectweb.org/), Small and fast, Supports Java
1.5I BCEL (http://jakarta.apache.org/bcel/), Works well, No longer
under active developmentI SERP (http://serp.sourceforge.net/), Not designed for speed,
Somewhat memory intensiveI Soot (http://www.sable.mcgill.ca/soot/), Supports source, Designed
for optimizations
Design Your Instrumentation
I How much to generate?I EverythingI Just the necessary factsI Less than necessary
I On-line vs. off-line analysisI What/When to instrument?I Source code, IR, assembly, machine codeI Preprocessor, compile-time, link-time, executable, run-time
Control Flow Trace
Dynamic Dependence Graph
Whole Execution Trace
Dynamic Slicing
See Xiangyu’s slides
Specification Generation-Diakon
I Daikon infers invariants from a program trace
I Looks for invariants between each combination of variables
I Polynomial in the number of variables
I Easy to implement offline, first pass finds equal variables
I More complex to implement incrementally.
Specification Generation-Diakon
Specification Generation
Specification Generation
Data Analysis Techniques
Data mining (mining database) fields, related to information retrieval,text mining
I Associate rules mining
I Sequential patterns mining
I Beyond:Structure patterns (tree, graphs) mining
I Definition: what is it?
I Which algorithm? there exist off-the-shelf packages
Associate Rules Mining
I Proposed by Agrawal et al in 1993.
I Initially used for Market Basket Analysis to find how items purchasedby customers are related: Bread → Milk [sup = 5%, conf = 100%]
I Applications: Object, propertiesI patient / symptomsI movies / ratingsI web pages / keywordsI basket / productsI what are examples in computing?
The Model: Data
The Model: Rules
Algorithms and Extensions
I Apriori
I Eclat
I FP-growth
Relevant concepts:
I Frequent itemset mining
I Maximal itemset mining [Bayardo, 1998]
I Closed itemset mining [Pasquier, 1999]
Patterns in Sequences
I Substrings
I Regular expressions
I Partial orders
I Structured patterns, directed acyclic graphs (mining for frequentsubgraphs)
I Episodes
I Constraint-based sequential pattern mining (e.g., the item satisfiescertain values)
Sequential Pattern Mining [2]
Sequential pattern mining: given a set of sequences, find the completeset of frequent subsequences
Sequential Mining Algorithms
Software
C++ Implementations of Apriori, Eclat, FP-growth and several otheralgorithms are available onhttp://www.adrem.ua.ac.be/ goethals/software/ and onhttp://fimi.cs.helsinki.fi/
Designing Your Dynamic Analysis
Dynamic analysis is, at its heart, an experimental effort:
I Have insight
I Build tool
I Evaluate efficiency and effectiveness
I Rethink
Jonathan Chen.extensible source-level instrumentation tool for performancemeasurement.Technical report, Rensselaer Polytechnic Institute, 2007.
Carl H. Mooney and John F. Roddick.Sequential pattern mining – approaches and algorithms.ACM Comput. Surv., 45(2):19:1–19:39, March 2013.