37
Lecture 10. Dynamic Analysis Wei Le Thank Xiangyu Zhang, Michael Ernst, Tom Ball for Some of the Slides Iowa State University 2014.11

Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Lecture 10. Dynamic Analysis

Wei LeThank Xiangyu Zhang, Michael Ernst, Tom Ball

for Some of the Slides

Iowa State University

2014.11

Page 2: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Outline

I What is dynamic analysis?

I Instrumentation

I AnalysisI Representing dynamic informationI Dynamic slicing

I Applications: specification generation and Diakon

Page 3: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

What is dynamic analysis

Learn from executions:

I Run programsI where are the test inputs?I what are the coverage?I what parts of the code should be run?I how to create runtime environment?

I Collect dataI online or offline (log analysis)?I what is the overhead (memory and performance)I how to capture and replay the data: compress, statistical approach

I Analyze dataI dynamic slicingI model-basedI statistical and machine learning approaches (an interesting discussion

on comparing the two: https://www.quora.com/What-is-the-difference-between-statistics-and-machine-learning)

Page 4: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

What is the meaning of Run?

I Abstract interpretation and static analyses run a program over anabstract domain:OUT=F(IN,s)

I Dynamic analysis: abstraction used in parallel with, not in place of,concrete values:OUT=F(IN, si, v)

Page 5: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

How dynamic analysis is used?

I Profiling-guided code optimization: which paths are frequentlyexecuted?

I Performance analysis

I Mining program invariants, protocols

I Fault localization, debugging (replay concurrency bugs)

I Bug finding

I ...

Page 6: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Running the programs

I The test suite determines the expense (in time and space)

I The test suite determines the accuracy (what executions are neverseen)

I Coverage or frequency: statements, branches, paths, procedure calls,types, method dispatch

I Values computed: parameters, array indices

I Run time, memory usage

Page 7: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.

Page 8: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.

Page 9: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

InstrumentationA technique that inserts extra code into a program to collect runtimeinformation.

Page 10: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Source and Binary Instrumentation

I Source instrumentation: instrument source programs

I Binary instrumentation: instrument executables directly:I No need to recompile or relinkI Discover code at runtimeI Handle dynamically-generated codeI Attach to running processes

Page 11: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Binary Instrumentation Tool Is Dominant

I Libraries are a big pain for source code level instrumentation e.g.,Proprietary libraries: communication (MPI, PVM), linear algebra(NGA), database query (SQL libraries).

I Easily handle multi-lingual programs: source code levelinstrumentation is heavily language dependent.

I More complicated semantics: Turning off compiler optimizations canmaintain a almost perfect mapping from instructions to source codelines

I Worms and viruses are rarely provided with source code

Page 12: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Static Instrumentation

Page 13: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Dynamic Instrumentation

Page 14: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Instrumentation Tools

I Source instrumentation tools: AIMS, Paradyn, Pablo ... [1]

I Binary instrumentation tools:I Static (insert in the code and then run): ATOM, EEL, Etch, MorphI Dynamic (run while insert the instrumentation): Dyninst, Vulcan,

DTrace, Valgrind, Strata, DynamoRIO, PIN

Page 15: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Valgrind

See Valgrind slides

Page 16: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

PIN

See PIN slides

Page 17: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Instrumentation: a Summary

I Source-level:I Pattern-matching over parse tree or AST and rewritingI Full access to source informationI A* [Ladd, Ramming], Astlog [Crew],

I Binary-level:I ATOM [Srivastava] , EEL [Larus], Diablo, BlutoI Analyze programs from multiple languagesI Typicaly done at some IR level

I Dynamic instrumentation (run-time): Valgrind, PIN

I Java instrumentation tools:I ASM (http://asm.objectweb.org/), Small and fast, Supports Java

1.5I BCEL (http://jakarta.apache.org/bcel/), Works well, No longer

under active developmentI SERP (http://serp.sourceforge.net/), Not designed for speed,

Somewhat memory intensiveI Soot (http://www.sable.mcgill.ca/soot/), Supports source, Designed

for optimizations

Page 18: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Design Your Instrumentation

I How much to generate?I EverythingI Just the necessary factsI Less than necessary

I On-line vs. off-line analysisI What/When to instrument?I Source code, IR, assembly, machine codeI Preprocessor, compile-time, link-time, executable, run-time

Page 19: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Control Flow Trace

Page 20: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Dynamic Dependence Graph

Page 21: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Whole Execution Trace

Page 22: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Dynamic Slicing

See Xiangyu’s slides

Page 23: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Specification Generation-Diakon

I Daikon infers invariants from a program trace

I Looks for invariants between each combination of variables

I Polynomial in the number of variables

I Easy to implement offline, first pass finds equal variables

I More complex to implement incrementally.

Page 24: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Specification Generation-Diakon

Page 25: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Specification Generation

Page 26: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Specification Generation

Page 27: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Data Analysis Techniques

Data mining (mining database) fields, related to information retrieval,text mining

I Associate rules mining

I Sequential patterns mining

I Beyond:Structure patterns (tree, graphs) mining

I Definition: what is it?

I Which algorithm? there exist off-the-shelf packages

Page 28: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Associate Rules Mining

I Proposed by Agrawal et al in 1993.

I Initially used for Market Basket Analysis to find how items purchasedby customers are related: Bread → Milk [sup = 5%, conf = 100%]

I Applications: Object, propertiesI patient / symptomsI movies / ratingsI web pages / keywordsI basket / productsI what are examples in computing?

Page 29: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

The Model: Data

Page 30: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

The Model: Rules

Page 31: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Algorithms and Extensions

I Apriori

I Eclat

I FP-growth

Relevant concepts:

I Frequent itemset mining

I Maximal itemset mining [Bayardo, 1998]

I Closed itemset mining [Pasquier, 1999]

Page 32: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Patterns in Sequences

I Substrings

I Regular expressions

I Partial orders

I Structured patterns, directed acyclic graphs (mining for frequentsubgraphs)

I Episodes

I Constraint-based sequential pattern mining (e.g., the item satisfiescertain values)

Page 33: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Sequential Pattern Mining [2]

Sequential pattern mining: given a set of sequences, find the completeset of frequent subsequences

Page 34: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Sequential Mining Algorithms

Page 35: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Software

C++ Implementations of Apriori, Eclat, FP-growth and several otheralgorithms are available onhttp://www.adrem.ua.ac.be/ goethals/software/ and onhttp://fimi.cs.helsinki.fi/

Page 36: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Designing Your Dynamic Analysis

Dynamic analysis is, at its heart, an experimental effort:

I Have insight

I Build tool

I Evaluate efficiency and effectiveness

I Rethink

Page 37: Lecture 10. Dynamic Analysis - Iowa State Universityweb.cs.iastate.edu/~weile/cs641/10.DynamicAnalysis.pdf · Technical report, Rensselaer Polytechnic Institute, 2007. Carl H. Mooney

Jonathan Chen.extensible source-level instrumentation tool for performancemeasurement.Technical report, Rensselaer Polytechnic Institute, 2007.

Carl H. Mooney and John F. Roddick.Sequential pattern mining – approaches and algorithms.ACM Comput. Surv., 45(2):19:1–19:39, March 2013.