131
Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Embed Size (px)

Citation preview

Page 1: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Chord: A Versatile Platform for Program Analysis

Mayur Naik

Intel Labs, Berkeley

PLDI 2011 Tutorial

Page 2: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

What is Chord?

• Static and dynamic program analysis framework for Java

• Started in 2006 as static Checker of races and deadlocks

• Publicly available under New BSD License

• Key goals:– versatile: applies to various analyses, domains,

platforms– extensible: users can build own analyses atop given

ones– productive: facilitates rapid prototyping of analyses– robust: deterministic, handles partial programs, etc.

Page 3: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Key Features of Chord

• Many standard static and dynamic analyses

• Writing/solving analyses using Datalog/BDDs

• Analyses as “building blocks”

• Context-sensitive static analysis framework

• Dynamic analysis framework

Page 4: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Outline of Tutorial

• Part 1:• Getting Started With Chord• Program Representation

• Part 2:• Analysis Using Datalog/BDDs• Chaining Analyses Together

• Part 3:• Context-Sensitive Analysis• Dynamic Analysis

Page 5: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Downloading Chord

• Stable Binary Release– http://jchord.googlecode.com/files/chord-bin-2.0.tar.gz

• Stable Source Release1. http://jchord.googlecode.com/files/chord-src-2.0.tar.gz

(mandatory)– Chord’s source code + JARs of libraries used by Chord

2. http://jchord.googlecode.com/files/chord-libsrc-2.0.tar.gz (optional)– (adapted) Java source code of libraries used by Chord

• Latest Development Snapshotsvn checkout http://jchord.googlecode.com/svn/trunk/ chord

Or checkout only relevant directories under trunk/:– main/ (released as 1 above) – libsrc/ (released as 2 above)– test/ (Chord’s regression test suite)– … (many more)

Page 6: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Compiling Chord

• Requirements:– JVM for Java 5 or higher– Apache Ant– C++ compiler

(not needed by default)

• Optional: edit chord.properties– to enable C BuDDy library:

set chord.use.buddy=true

– to enable C++ JVMTI agent:set chord.use.jvmti=true

• Run in main directory:

ant compile

main/

build.xml

chord.properties

agent/

bdd/

doc/

examples/

lib/

src/

web/

chord.jar

libbuddy.so | buddy.dll | libbuddy.dylib

libchord_instr_agent.so

Page 7: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running Chord

• Requirements: JVM for Java 5 or higher• no other dependencies (e.g., Eclipse)

• Run either command in any directory:• ant –f <...>/build.xml [–Dkeyi=vali]* run

• requires Apache Ant• not available in Binary Release

• java –cp <…>/chord.jar [–Dkeyi=vali]* chord.project.Boot

where <…> denotes path of Chord’s main/ directory

–Dkeyi=vali sets value of system property keyi to vali

Page 8: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Chord Properties

• All inputs to Chord are specified via System Properties• conventionally named chord.* (e.g.,

chord.work.dir)

• Three choices with decreasing precedence:1. On command line via –Dkey=val format

• use to specify properties specific to the current Chord run

2. Via user-specified file denoted by chord.props.file• use to specify properties specific to program being

analyzed(e.g. its main class, classpath, etc.)

• default value = "[chord.work.dir]/chord.properties"

3. Via pre-defined file main/chord.properties• use to specify properties that must hold in every Chord

run(e.g., maximum memory to be used by JVM)

Page 9: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Architecture of Chord

Classic or Modern Runtime

bytecodetranslator

(joeq)

bytecodeinstrumentor(javassist)

saxon XSLT

bddbddb

BuDDy

Java2HTML

staticanalysis

Dataloganalysis

dynamicanalysis

programbytecode

domain D1

relation R12

relationR1

domain D2

relationR2

analysis result

in XML

analysis result

in HTML

programsource

programquadcode

relation R12

analysis

programinputs

domain D1

analysisdomain D2

analysis

example program analysis

Java

pro

gra

m

user demands this to run

starts, blocks on R2, D2

starts, runs to finish

starts, runs to finish

starts, blocks on D1, D2, R1, R12

starts, blocks on D1

resumes,runs to finish

resumes, runs to finish

starts, blocks on D1

resumes, runs to finish

resumes, runs to finish

Page 10: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Setting Up a Java Program for Analysis

Command to run in Chord’s main directory:

ant –Dchord.work.dir=<…>/example run

example/ src/ foo/ Main.java ... classes/ foo/ Main.class ... lib/ src/ taz/ ... jar/ taz.jar

chord.properties

chord_output/

bddbddb/

chord.main.class=foo.Mainchord.class.path=classes:lib/jar/taz.jarchord.src.path=src:lib/srcchord.run.ids=0,1chord.args.0="-thread 1 -n 10" chord.args.1="-thread 2 -n 50"

Page 11: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Java Program Representations

Java source code.java

Java bytecode.class

javac

DisassembledJava bytecode

javap

Page 12: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Java Source Code

1: package test;2:3: public class HelloWorld {4: public static void main(String[] args) {5: System.out.print("Hello World!");6: }7: }

File test/HelloWorld.java:

Page 13: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Pretty-Printing Java Bytecode

public class test.HelloWorld extends java.lang.Object

Constant pool: const #1 = Method #6.#20; // java/lang/Object."<init>":()V ...public static void main(java.lang.String[]);Code: Stack=2, Locals=1, Args_size=1 0: getstatic #2; // Field java/lang/System.out:Ljava/io/PrintStream; 3: ldc #3; // String Hello World! 5: invokevirtual #4; // Method java/io/PrintStream.println:... 8: return

javap –private –verbose –classpath <CLASS_PATH>

[–bootclasspath <BOOT_CLASS_PATH>] <CLASS_NAME>

SourceFile: "HelloWorld.java"

LineNumberTable: line 5: 0 line 6: 8LocalVariableTable: Start Length Slot Name Signature 0 9 0 args [Ljava/lang/String;

Run "javac –g" on .java files to keep debuginfo (lines, vars, source) in .class files

Page 14: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Java Program Representations

Java source code.java

QuadcodeJava bytecode

.class

javac

Joeq

DisassembledJava bytecode

javap

Page 15: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Pretty-Printing Quadcode

Class: test.HelloWorldMethod: main:([Ljava/lang/String;)[email protected] 0#1 5#3 5#2 8#4Control flow graph:BB0 (ENTRY) (in: <none>, out: BB2)BB2 (in: BB0 (ENTRY), out: BB1 (EXIT))1: GETSTATIC_A T1, .out3: MOVE_A T2, AConst: "Hello World!" 2: INVOKEVIRTUAL_V println:(Ljava/lang/String;)[email protected], (T1,T2)4: RETURN_VBB1 (EXIT) (in: BB2, out: <none>)Exception handlers: []Register factory: Registers: 3

ant –Dchord.work.dir=<WORK_DIR> –Dchord.out.file=<OUTPUT_FILE>

–Dchord.print.classes=<CLASS_NAMES> –Dchord.verbose=0 run

Alternative options: –Dchord.print.methods=<METHOD_SIGNS> –Dchord.print.all.classes=true

Replace any `$` by `#` toprevent shell interpretation

Page 16: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Type Hierarchy

jq_Type

jq_Primitive jq_Reference

jq_Class jq_Array

(all defined in package joeq.Class)

Page 17: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

chord.program.Program API

• static Program g()• fully-qualified name of the class, e.g., "java.lang.String[]"

• IndexSet<jq_Type> getTypes()• all types in classes that may be loaded

• IndexSet<jq_Reference> getClasses()• all classes that may be loaded

• IndexSet<jq_Method> getMethods()• all methods that may be called

Page 18: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

joeq.Class.jq_Class API

• String getName()• fully-qualified name of the class, e.g., "java.lang.String[]"

• jq_InstanceField[] getDeclaredInstanceFields()• all instance fields declared in the class

• jq_StaticField[] getDeclaredStaticFields()• all static fields declared in the class

• jq_InstanceMethod[] getDeclaredInstanceMethods()• all instance methods declared in the class

• jq_StaticMethod[] getDeclaredStaticMethods()• all static methods declared in the class

Page 19: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

joeq.Class.jq_Method API

• String getName().toString()• name of the method

• String getDesc().toString()• descriptor of the method, e.g., "(Ljava/lang/String;)V"

• jq_Class getDeclaringClass()• declaring class of the method

• ControlFlowGraph getCFG()• control-flow graph of the method

• Quad getQuad(int bci)• first quad at the given bytecode offset (null if missing)

• int getLineNumber(int bci)• line number of the given bytecode offset (-1 if

missing)

• String toString()• ID of the method in format mName:mDesc@cName

Page 20: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Control Flow Graphs (CFGs)

• Each CFG contains:• a set of registers (register factory) • a directed graph whose nodes are basic blocks

and edges denote control flow

• Register Factory:• one register per argument (local variables)

• named R0, R1, …, Rn

• one register per temporary (stack variables)• named Tn+1, Tn+2, …, Tm

• Basic Block (BB):• sequence of primitive statements (quads)• unique entry BB: no quads and no incoming

edges• unique exit BB: no quads and no outgoing edges

Page 21: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

joeq.Compiler.Quad.ControlFlowGraph API

• RegisterFactory getRegisterFactory()• set of all local variables

• EntryOrExitBasicBlock entry()• unique entry basic block

• EntryOrExitBasicBlock exit()• unique exit basic block

• List<BasicBlock> reversePostOrder ()• List of all basic blocks in reverse post-order

• jq_Method getMethod()• containing method of the CFG

Page 22: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

joeq.Compiler.Quad.BasicBlock API

• int size()• number of quads in the basic block

• Quad getQuad(int index)• quad at the given 0-based index

• List<BasicBlock> getPredecessors()• list of immediate predecessor basic blocks

• List<BasicBlock> getSuccessors()• list of immediately successor basic blocks

• jq_Method getMethod()• containing method of the basic block

Page 23: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Quad Instructions

• Each quad contains an operator and upto 4 operands

• Example: getfield l = b.f:

Operand lo = Getfield.getDest(q);Operand bo = Getfield.getBase(q);if (lo instanceof RegisterOperand && bo instanceof RegisterOperand) { Register l = ((RegisterOperand) lo).getRegister(); Register b = ((RegisterOperand) bo).getRegister(); jq_Field f = Getfield.getField(q).getField(); ...}

Page 24: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Kinds of Quads

joeq.Compiler.Quad.Operator

Move Getstatic Branch Invoke Phi Putstatic IntIfCmp

InvokeVirtual Unary Getfield Goto

InvokeStatic Binary Putfield Jsr

InvokeInterface New ALoad Ret NewArray AStore LookupSwitch MultiNewArray Checkcast TableSwitch Alength Instanceof Monitor Return

Page 25: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

joeq.Compiler.Quad.Quad API

• Operator getOperator()• kind of the quad

• int getBCI()• bytecode offset of the quad in its containing method

• String toByteLocStr()• unique identifier of the quad in format offset!

mName:mDesc@cName

• String toJavaLocStr()• location of the quad in format fileName:lineNum in Java

source code

• String toLocStr()• location of the quad in both Java bytecode and source code

• String toVerboseStr()• verbose description of the quad (its location plus contents)

• BasicBlock getBasicBlock()• containing basic block of the quad

Page 26: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Traversing Quadcode

import chord.program.Program;import joeq.Class.jq_Method;import joeq.Compiler.Quad.*;

QuadVisitor qv = new QuadVisitor.EmptyVisitor() { public void visitNew(Quad q) { ... } public void visitPhi(Quad q) { ... } ...};

Program program = Program.g();for (jq_Method m : program.getMethods()) { if (!m.isAbstract()) { ControlFlowGraph cfg = m.getCFG(); for (BasicBlock bb : cfg.reversePostOrder()) for (Quad q : bb.getQuads()) q.accept(qv); }}

Page 27: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Java Program Representations

Java source code.java

QuadcodeJava bytecode

.class

HTMLizedJava source code

.html

j2h

Java2HTML

javac

Joeq

DisassembledJava bytecode

javap

Page 28: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

HTMLizing Java Source Code

• Programmatically:

import chord.program.Program;

Program program = Program.g();program.HTMLizeJavaSrcFiles();

• From command line:

1. Use j2h:

ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_xref

2. Use Java2HTML:

ant –Djava.dir=<JAVA_DIR> –Dhtml.dir=<HTML_DIR> j2h_fast

Page 29: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Java Program Representations

Java source code.java

Jasmin code.j

QuadcodeJava bytecode

.class

HTMLizedJava source code

.html

j2h

Java2HTML

javac

Joeq

Chord

DisassembledJava bytecode

javap Jasmin

Page 30: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Analysis Scope Construction

• Determines which parts of the program to analyze

• Computed in either of these cases:• chord.build.scope=true

• chord.program.Program.g() is called

• Algorithm specified by chord.scope.kind=[rta|cha|dynamic]• Rapid Type Analysis (RTA)

• Class Hierarchy Analysis (CHA)

• Dynamic Analysis

• All three algorithms require specifying:• chord.main.class=<MAIN CLASS>

• chord.class.path=<CLASSPATH>

Page 31: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Analysis Scope Representation

• Reachable Methods• stored in file specified by chord.methods.file

(default = "[chord.out.dir]/methods.txt")

• Resolved Reflection• stored in file specified by chord.reflect.file

(default = "[chord.out.dir]/reflect.txt")

# resolvedClsForNameSites ...

# resolvedObjNewInstSites ...

# resolvedConNewInstSites ...

# resolvedAryNewInstSites ...

mname:mdesc@cname...

Class Class.forName(String)

Object Class.newInstance()

Object Constructor.newInstance(Object[])

Object Array.newInstance(Class, int)

bci!mname:mdesc@cname->cname1,cname2,...,cnameN

Page 32: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Rapid Type Analysis (RTA)

• Preferred (and default) scope construction algorithm

• Allows specifying reflection resolution via chord.reflect.kind=[none|static|dynamic]

• Preferred way to resolve reflection is ‘dynamic’ and requires specifying how to run program:• chord.run.args=id1,…,idN

• chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>

Page 33: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Dynamic Analysis Based Scope Construction

• Runs program and observes which classes are loaded

• Requires JVMTI (set chord.use.jvmti=true in file main/chord.properties)

• Requires specifying how to run program:• chord.run.args=id1,…,idN

• chord.args.id1=<ARGS1>, …, chord.args.idN=<ARGSN>

• All methods of each loaded class are deemed reachable

• Currently no support for reflection resolution

Page 34: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Additional Analysis Scope Features

• Scope Reuse• Enables using scope constructed by a previous run of

Chord

• Constructs scope from files specified by chord.methods.fileand chord.reflect.file

• Specified via chord.reuse.scope=true

• Scope Exclusion• Enables excluding certain classes from scope

• Treats all methods in such classes as no-ops

• Specified via three properties:

1. chord.std.scope.exclude (default = "")

2. chord.ext.scope.exclude (default = "")

3. chord.scope.exclude (default = "[chord.std.scope.exclude],[chord.ext.scope.exclude]")

Page 35: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Native Method Stubs

• Specified in file main/src/chord/program/stubs/stubs.txtin format:

mname:mdesc@cname stub_cname

where stub_cname denotes a class implementing:

public interface joeq.Compiler.Quad.ICFGBuilder { public ControlFlowGraph run(jq_Method m);}

• Example:start:()[email protected] chord.program.stubs.ThreadStartCFGBuilder

Page 36: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example Native Method Stub

public ControlFlowGraph run(jq_Method m) { jq_Class c = m.getDeclaringClass(); jq_Method n = c.getDeclaredInstanceMethod( new jq_NameAndDesc("run", "()V")); RegisterFactory f = new RegisterFactory(0, 1); Register r = f.getOrCreateLocal(0, c); ControlFlowGraph cfg = new ControlFlowGraph(m, 1, 0, f); Quad q1 = Invoke.create(0, m, Invoke.INVOKEVIRTUAL_V.INSTANCE, null, new MethodOperand(n), 1); Invoke.setParam(q1, 0, new RegisterOperand(r, c)); Quad q2 = Return.create(1, m, RETURN_V.INSTANCE); BasicBlock bb = cfg.createBasicBlock(1, 1, 2, null); bb.appendQuad(q1); bb.appendQuad(q2); BasicBlock eb = cfg.entry(), xb = cfg.exit(); eb.addSuccessor(bb); bb.addPredecessor(eb); bb.addSuccessor(xb); xb.addPredecessor(bb); return cfg;}

void start() { this.run(); return; }

Page 37: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Outline of Tutorial

• Part 1:• Getting Started With Chord• Program Representation

• Part 2:• Analysis Using Datalog/BDDs• Chaining Analyses Together

• Part 3:• Context-Sensitive Analysis• Dynamic Analysis

Page 38: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Program Domain

• Building block for analyses based on Datalog/BDDs

• Represents an indexed set of values of a fixed kind• typically artifacts from program being analyzed

(e.g., set of all methods in the program)

• Assigns unique 0-based index to each value• everything in Datalog/BDDs must be numbered• indices given in order in which values are added• order affects efficiency of running analysis on large

sets• initial indices (0, 1, ...) typically given to frequently-

usedvalues (e.g., the main method)

• O(1) access to value given index, and vice versa

Page 39: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example Predefined Program Domains

Name Description Defining Class

T types chord.analyses.type.DomT

M methods chord.analyses.method.DomM

F fields chord.analyses.field.DomF

V variables of ref type chord.analyses.var.DomV

P quads (program points)

chord.analyses.point.DomP

H object allocation quads

chord.analyses.alloc.DomH

I method call quads chord.analyses.invk.DomI

E heap-accessing quads chord.analyses.heapacc.DomE

A abstract threads chord.analyses.alias.DomA

C abstract method contexts

chord.analyses.alias.DomC

O abstract objects chord.analyses.alias.DomO

Page 40: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Writing a Program Domain Analysis

Domain M: all methods in the program– main method has index 0

– java.lang.Thread.start() method has index 1

package chord.analyses.method;

@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); }}

Page 41: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running a Program Domain Analysis

ant –Dchord.work.dir=<…> –Dchord.run.analyses=M run

package chord.analyses.method;

@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); }}

Page 42: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running a Program Domain Analysis

main:([Ljava/lang/String;)V@Bldgstart:()[email protected]<init>:()V@Bldg…

M <N> M.map

<N>chord_output/

bddbddb/

M.map

M.dom

package chord.analyses.method;

@Chord(name = "M")public class DomM extends ProgramDom<jq_Method> { @Override public void fill() { Program program = Program.g(); add(program.getMainMethod()); jq_Method start = program.getThreadStartMethod(); if (start != null) add(start); for (jq_Method m : program.getMethods()) add(m); }}

Page 43: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

chord.project.analyses.ProgramDom<T> API

• void setName(String name)• set name of domain

• boolean add(T val)• add value to domain if not present; return true if added

• int getOrAdd(T val)• add value to domain if not present; return its index in either

case• void save()

• save domain to disk (.dom and .map files)• String toUniqueString(T val)

• unique string representation of value• int size()

• number of values in domain• T get(int index)

• value having the given index; IndexOutofBoundsEx if not found

• int indexOf(T val)• index of given value; -1 if not found

Note: values once added

cannot be removed!

Page 44: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Program Relation

• Building block for analyses based on Datalog/BDDs

• Represents a set of tuples over one or more fixed program domains

• Represented symbolically as a BDD• enables storing and manipulating large relations

efficiently

• Provides various relational operations• projection, selection, join, etc.

• BDD size and efficiency of operations depends heavily on encoding of relation content as opposed to size• ordering of values within program domains• relative ordering between program domains

Page 45: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Writing a Program Relation Analysis

Relation MI: tuples (m, i) such that method m contains call i

package chord.analyses.invk;

@Chord(name = "MI", sign = "M0,I0:M0_I0")public class RelMI extends ProgramRel { @Override public void fill() { DomI domI = (DomI) doms[1]; for (Quad q : domI) { jq_Method m = q.getMethod(); add(m, q); } }}

• M0_I0: Domain order• Only dictates

performance• Can also be I0_M0 or

I0xM0

• Easy to change over time

• M0,I0: Domain names• Order mnemonically

(hard to change over time)

• Suffix 0, 1, etc. distinguishes repeating domains

Page 46: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Writing a Program Relation Analysis

package chord.analyses.var;

@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } }}

Relation VT: tuples (v, t) such that local variable v has type t

Page 47: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running a Program Relation Analysis

ant –Dchord.work.dir=<…> –Dchord.run.analyses=VT run

package chord.analyses.var;

@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } }}

Page 48: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

package chord.analyses.var;

@Chord(name = "VT", sign = "V0,T0:T0_V0")public class RelVT extends ProgramRel { @Override public void fill() { for (each RegisterOperand o of each quad) { Register v = o.getRegister(); jq_Type t = o.getType(); add(v, t); } }}

Running a Program Relation Analysis

chord_output/

bddbddb/

V.dom, T.dom, V.map, T.map

VT.bdd

# V0:2 T0:2# 1 2# 3 46 42 1 4 37 4 0 16 3 7 15 3 0 74 2 5 03 2 6 52 1 3 4

Page 49: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Program Relation as Binary Function

Variable v0 has types t1, t2, t3

Variable v1 has type t3

Variable v2 has type t3

Relation VT = {

(0, 1), (0, 2), (0, 3),

(1, 3),

(2, 3)

}

V T

b1 b2 b3 b4 f

0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 10 1 0 0 00 1 0 1 00 1 1 0 00 1 1 1 11 0 0 0 01 0 0 1 01 0 1 0 01 0 1 1 11 1 0 0 01 1 0 1 01 1 1 0 01 1 1 1 0

Page 50: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Binary Decision Diagrams (Bryant 1986)

b2

b4

b3 b3

b4 b4 b4

0 0 0 1 0 0 0 0

b2

b4

b3 b3

b4 b4 b4

0 1 1 1 0 0 0 1

b1 0 edge

1 edge

Graphical Encoding of a Binary Function

Page 51: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Collapsing Redundant Nodes

b2

b4

b3 b3

b4 b4 b4

0 0 0 1 0 0 0 0

b2

b4

b3 b3

b4 b4 b4

0 1 1 1 0 0 0 1

b1 0 edge

1 edge

Page 52: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Collapsing Redundant Nodes

b2

b4

b3 b3

b4 b4 b4

b2

b4

b3 b3

b4 b4 b4

0

b1

1

0 edge

1 edge

Page 53: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Collapsing Redundant Nodes

b2

b4

b3 b3

b2

b3 b3

b4 b4

0

b1

1

0 edge

1 edge

Page 54: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Collapsing Redundant Nodes

b2

b4

b3 b3

b2

b3

b4 b4

0

b1

1

0 edge

1 edge

Page 55: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Eliminating Unnecessary Nodes

b2

b4

b3 b3

b2

b3

b4 b4

0

b1

1

0 edge

1 edge

Page 56: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD: Eliminating Unnecessary Nodes

0 edge

1 edge

b2

b3

b2

b3

b4

0

b1

1

Page 57: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD Representation on Disk

b2

b3

b2

b3

b4

0

b1

1

2

3 4

6

5

7

chord_output/

bddbddb/

V.dom, T.dom, V.map, T.map

VT.bdd

# V0:2 T0:2# b1 b2# b3 b46 4b2 b1 b4 b37 b4 0 16 b3 7 15 b3 0 74 b2 5 03 b2 6 52 b1 3 4

BDDvariabl

eorder

# BDDvariable

s

# internalnodes

One entry per internal node of form:

<nodeId, varId, loNodeId, hiNodeId>

Page 58: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

BDD Variable Order is Important

b1

b3

b4

0 1

b2

b1b2 + b3b4

b1 < b2 < b3 < b4 b1 < b3 < b2 < b4

b1

b3

b4

0 1

b2

b3

b2

Page 59: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

chord.project.analyses.ProgramRel<T> API

• void setName(String name)• set name of relation

• void setSign(RelSign sign)• set signature (domain names and order) of relation

• void setDoms(Dom[] doms)• set domains of relation

• void zero() or one()• initialize contents of relation to zero (no tuples) or one (all

tuples)

• void add(T1 e1, …, TN eN)• add tuple (e1, …, eN) to relation

• void remove(T1 e1, …, TN eN)• remove tuple (e1, …, eN) from relation

• void save()• save contents of relation to disk

Page 60: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

chord.project.analyses.ProgramRel<T> API

• void load()• load contents of relation from disk

• Iterable<T1,…,TN> getAryNValTuples()• iterate over all tuples in the relation

• int size()• number of tuples in the relation

• boolean contains(T1 e1, …, TN eN)• does relation contain tuple (e1, …, eN)?

• RelView getView()• obtain a copy of the relation upon which to do projection,

selection, etc. without affecting original relation

• void close()• free memory used to hold relation

Page 61: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Pointer Analysis

• Answers which pointers can point to which objects at run-time

• Central to many program optimization & verification problems

• Problem is undecidable• No exact (i.e. both sound and complete) solution

• But many conservative (i.e. sound) approximate solutions exist• Determine which pointers may point to which objects• All are incomplete but differ in precision (i.e. false-positive

rate)

• Continues to be active area of research

Page 62: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; }}

0

List

Bldg

Event

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Event

1

b

el fl

fe e f

a a

disjoint-reach(el, fl)?

Page 63: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

0-CFA Pointer Analysis for Java

• Flow sensitivity• flow-insensitive: ignores intra-procedural control

flow

• Call graph construction

• Heap abstraction

• Aggregate modeling

• Context sensitivity

Page 64: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Flow Insensitivity

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; Event e = new Event(); el.elems[ ] = e; Floor f = new Floor(); fl.elems[ ] = f; }}

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)*i

*i

Page 65: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

0-CFA Pointer Analysis for Java

• Flow sensitivity• flow-insensitive: ignores intra-procedural control

flow

• Call graph construction• “on-the-fly”: mutually recursively with pointer

analysis

• Heap abstraction

• Aggregate modeling

• Context sensitivity

Page 66: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Call Graph (Base Case)

Code deemed reachable so far …

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; Event e = new Event(); el.elems[*] = e; Floor f = new Floor(); fl.elems[*] = f; }}

reachableM(0).

Page 67: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

0-CFA Pointer Analysis for Java

• Flow sensitivity• flow-insensitive: ignores intra-procedural control

flow

• Call graph construction• “on-the-fly”: mutually recursively with pointer

analysis

• Heap abstraction• allocation sites: objects at same site

indistinguishable

• Aggregate modeling

• Context sensitivity

Page 68: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Heap Abstraction

class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; }}

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; }}

Page 69: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

v = newh …

Rule for Object Allocation Sites

• Before:

• After:

v newh’

……

v

newh

newh’

……

VH(v, h) :- reachableM(m), MobjValAsgnInst(m, v, h).

Page 70: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

v1 = v2

Rule for Copy Assignments

• Before:

• After:

v1 newh’

……

v1

newh

newh’

……

VH(v1, h) :- reachableM(m), MobjVarAsgnInst(m, v1, v2), VH(v2, h).

v2 newh

……

v2 newh

……

Page 71: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

0-CFA Pointer Analysis for Java

• Flow sensitivity• flow-insensitive: ignores intra-procedural control

flow

• Call graph construction• “on-the-fly”: mutually recursively with pointer

analysis

• Heap abstraction• allocation sites: objects at same site

indistinguishable

• Aggregate modeling• instance field sensitive but array element

insensitive

• Context sensitivity

Page 72: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

b.f = v

b

Rule for Heap Writes

• Before:

• After:

newh1

……

v newh2

……

v newh2

……

newh3newh1

……

newh1

f

newh2

newh3

……

……b newh1

…… f

f

f is instance field or [*] (array element)

HFH(h1, f, h2) :- reachableM(m), MputInstFldInst(m, b, f, v), VH(b, h1), VH(v, h2).

Page 73: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

v = b.f

v

Rule for Heap Reads

newh

v

newh2

newh

……

……

……

b newh1

……

b newh1

……

newh2newh1

……

f

newh2newh1

……

f

f is instance field or [*] (array element)

• Before:

• After:

VH(v, h2) :- reachableM(m), MgetInstFldInst(m, v, b, f), VH(b, h1), HFH(h1, f, h2).

Page 74: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

0-CFA Pointer Analysis for Java

• Flow sensitivity• flow-insensitive: ignores intra-procedural control

flow

• Call graph construction• “on-the-fly”: mutually recursively with pointer

analysis

• Heap abstraction• allocation sites: objects at same site

indistinguishable

• Aggregate modeling• instance field sensitive but array element

insensitive

• Context sensitivity• context-insensitive: ignores inter-procedural control

flow (analyzes each method in single context)

Page 75: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

• Before:

• After:

Tn.bar() Tm.foo()

v.foo()

Rule for Dynamically Dispatching Calls

v newh

……

v newh…

T

T

i

i

Tn.bar() { …; ; …; }

CHA(T, foo) =

Tm.foo() { … }

Tm.foo() { … }

IM(i, m) :- reachableM(n), MI(n, i), virtIM(i, m’), IinvkArg0(i, v), VH(v, h), HT(h, t), CHA(t, m’, m).reachableM(m) :- IM(_, m).

Page 76: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

#name=cipa-0cfa-dlog

.include "V.dom"

.include "T.dom"

...

.bddvarorder M0xI0_F0_V0xV1_T0_H0xH1

VT(v:V0, T0) inputreachableM(m:M0)FH(f:F0, h:H0) outputVH(v:V0, h:H0) outputHFH(h1:H0, f:F0, h2:H1) outputIM(i:I0, m:M0) output...

reachableM(m) :- IM(_, m)....

Writing a Datalog Analysis

analysis constraints(Horn clauses) solved via BDD

operations

input, intermediate, outputprogram relations

represented as BDDs

BDD variable order

program domains

Page 77: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running a Datalog Analysis

chord_output/

bddbddb/

V.dom, T.dom, V.map, T.map

VT.bdd

reachableM.bdd

FH.bdd

VH.bdd

HFH.bdd

IM.bdd

#name=cipa-0cfa-dlog

.include "V.dom"

.include "T.dom"

...

.bddvarorder M0xI0_F0_V0xV1_T0_H0xH1

VT(v:V0, T0) inputreachableM(m:M0)FH(f:F0, h:H0) outputVH(v:V0, h:H0) outputHFH(h1:H0, f:F0, h2:H1) outputIM(i:I0, m:M0) output...

reachableM(m) :- IM(_, m)....

ant –Dchord.work.dir=<…> –Dchord.run.analyses=cipa-0cfa-dlog run

Page 78: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example

b

new1 Bldg

el

new2 List

fl

new3 List

e

new5 Floor

new6 Obj[]

f

new4 Event

events floors

elems

[*][*]

12,3

a

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)

class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; }}

elems

Page 79: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Printing Program Relations (Command Line)

Relation rVV:el!<init>:()V@Bldg, fl!<init>:()V@Bldg...

ant –Dwork.dir=<…>/chord_output/bddbddb –Ddlog.file=a.dlog solve

.include "V.dom"

.include "H.dom"

.include "F.dom"

.bddvarorder ...

VH(v:V0, h:H0) inputHFH(h1:H0, f:F0, h2:H1) inputrVH(v:V0, h:H0)rVV(v1:V0, v2:V1) printtuples

rVH(v, h) :- VH(v, h).rVH(v, h) :- rVH(v, h’), HFH(h’, _, h).rVV(v1, v2) :- v1<v2, rVH(v1, h), rVH(v2, h).

disjoint-reach(el, fl)?

File a.dlog:b

new1 Bldg

el

new2 List

fl

new3 List

e

new5 Floor

new6 Obj[]

f

new4 Event

events floors

elems

[*][*] a

elems

Page 80: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Querying Program Relations (Command Line)

ant –Dwork.dir=<…>/chord_output/bddbddb –Ddlog.file=q.dlog debug

b!main:(…)@Bldg...

null1!main:(…)@Bldg2!<init>:()V@Bldg3!<init>:()V@Bldg...

.include "V.dom"

.include "H.dom"

.include "F.dom"

.bddvarorder ...

VH(v:V0, h:H0) inputHFH(h1:H0, f:F0, h2:H1) input

File H.map:

File V.map:

prompt> VH(0,h)?1!main:(…)@Bldg

prompt> HFH(1,_,h)?2!<init>:()V@Bldg3!<init>:()V@Bldg

File q.dlog:

b

new1 Bldg

el

new2 List

fl

new3 List

e

new5 Floor

new6 Obj[]

f

new4 Event

events floors

elems

[*][*] a

elems

Page 81: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Pros and Cons of Datalog/BDDs

1. Good for rapidly crafting initial versions of analysis with focus on false positive/negative rate instead of scalability

2. Good for analyses …1. whose constraint solving strategy is not obvious (e.g. best

known alternative is chaotic iteration)

2. on data with lots of redundancy and too large to compute/store/read using Java if represented explicitly (e.g. cloning-based analyses)

3. involving few simple rules (e.g. transitive closure)

3. Bad for analyses …1. with more complicated formulations (e.g. summary-based

analyses)

2. over domains not known exactly in advance (i.e. on-the-fly analyses)

3. involving many interdependent rules (e.g. points-to analyses)

4. Unintuitive effects of BDDs on performance (e.g. k-CFA: small non-uniform k across sites worse than large uniform k)

Page 82: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Writing an Analysis in Chord

• Declaratively in Datalog or imperatively in Java

• Datalog analysis is any file that:• has extension .dlog or .datalog

• occurs in path specified by property chord.dlog.analysis.path

• Java analysis is any class that:• is annotated with @Chord

• occurs in path specified by property chord.java.analysis.path

Page 83: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

• Create subclass of chord.project.analyses.JavaAnalysis:

• Compile above class to a location in path specified by any of:

@Chord(name = "my-java", consumes = { "C1", ..., "Cm" }, produces = { "P1", ..., "Pn" }, namesOfTypes = { “T1", ..., “Tk" }, types = { T1.class, ..., Tk.class }, namesOfSigns = { "S1", ..., "Sr" }, signs = { "...", ..., "..." })public class MyAnalysis extends JavaAnalysis { @Override public void run() { ... }}

Writing a Java Analysis

Property name Default value

chord.std.java.analysis.path

"chord.jar"

chord.ext.java.analysis.path

""

chord.java.analysis.path concat. of above two property values

mandatoryfield

target typesnot

inferableotherwiserelation signsnot

inferableotherwise

Page 84: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Chord Project

• Global entity for organizing all analyses and their inputs and outputs (collectively called analysis results)

• Computed if chord.project.Project.g() is called

• Consists of set of each of:• analyses called tasks

• analysis results called targets

• data/control dependencies between tasks and targets

• Either of two kinds chosen by chord.classic=[true|false]:• chord.project.ClassicProject (this tutorial)

• only data dependencies, can only run tasks sequentially

• chord.project.ModernProject (ongoing)• data and control dependencies, can run tasks in

parallel

Page 85: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Computing a Chord Project

• Compute all tasks:• Each file with extension .dlog/.datalog in

chord.dlog.analysis.path

• Each class having annotation @Chord in chord.java.analysis.path

• Compute all targets:• Each target consumed or produced by some task

• Compute dependency graph:• Nodes are all tasks and targets

• Edge from target C to task T if T consumes C

• Edge from task T to target P if T produces P

• Perform consistency checks• Error if target has no type or has multiple types, error if

relation has no sign, warn if target produced by multiple tasks, etc.

Page 86: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Chord Project

T1 T2 T3

T4

R1 R2

R3 R4

{} T1 { R1 }

{} T2 { R1 }

{ R4} T3 { R2 }

{ R1, R2 } T4 { R3, R4 }

Each task has form { C1, …, Cm } T { P1, …, Pn } where:

– T is name of task

– C1, …, Cm are names of targets consumed by the task

– P1, …, Pn are names of targets produced by the task

Page 87: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running a Java Analysis

ant –Dchord.work.dir=<…> –Dchord.run.analyses=my-java run

@Chord(name = "my-java", consumes = { "C1", ..., "Cm" }, produces = { "P1", ..., "Pn" })public class MyAnalysis extends JavaAnalysis { @Override public void run() { ... }}

• If done bit of this analysis is 1: do nothing

• Else do the following in order:• For each of C1, …, Cm whose done bit is 0:

• Recursively run unique analysis producing it

• Report runtime error if none or multiple such analyses exist

• Execute run() method of this analysis

• Set done bits of this analysis and P1, …, Pn to 1

Page 88: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running a Java Analysis

T1 T2 T3

T4

R1 R2

R3 R4

{} T1 { R1 }

{} T2 { R1 }

{ R4} T3 { R2 }

{ R1, R2 } T4 { R3, R4 }

ant –Dchord.work.dir=<…> –Dchord.run.analyses=T1,T4 run

Page 89: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Predefined Analysis Templates

JavaAnalysis

ProgramDom

ProgramRel

DlogAnalysis

RHSAnalysis

ForwardRHSAnalysis

BackwardRHSAnalysis

BasicDynamicAnalysis DynamicAnalysis

Organized in a hierarchy in package chord.project.analyses:

Page 90: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

chord.project.ClassicProject API

• ITask getTask(String name)• representation of named task

• Object getTrgt(String name)• representation of named target

• ITask runTask(String name)• run named task (and any needed tasks prior to it)

• boolean is[Task|Trgt]Done(String name)• is named task/target already executed/computed?

• void set[Task|Trgt]Done(String name)• set ‘done’ bit of named task/target to 1

• void reset[Task|Trgt]Done(String name)• Set ‘done’ bit of named task/target to 0

Page 91: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example Java Analysis

package chord.analyses.alias;

@Chord(name = "cicg-java", consumes = { "IM" })public class CICGAnalysis extends JavaAnalysis { private ProgramRel cg; @Override public void run() { cg = (ProgramRel) ClassicProject.g().getTrgt("IM"); } public Set<jq_Method> getCallees(Quad q) { if (!cg.isOpen()) cg.load(); RelView view = cg.getView(); view.selectAndDelete(0, q); Iterable<jq_Method> res = view.getAry1ValTuples(); Set<jq_Method> callees = new HashSet<jq_Method>(); for (jq_Method m : res) callees.add(m); view.free(); return callees; } public void free() { if (cg.isOpen()) cg.close(); }}

Page 92: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example Java Analysis

@Chord(name = "my-java")public class MyAnalysis extends JavaAnalysis { @Override public void run() { ClassicProject p = ClassicProject.g(); CICGAnalysis a = (CICGAnalysis) p.getTask("cicg-java"); p.runTask(a); for (Quad q : ...) { Set<jq_Method> tgts = a.getCallees(q); ... } a.free(); }}

Page 93: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Specialized Java Analyses

• ProgramDom:• Consumes targets specified in @Chord annotation• Produces only a single target (the defined program

domain itself)• run() method computes and saves domain to disk

• ProgramRel:• Consumes targets specified in @Chord annotation, plus

target of each of its program domains• Produces only a single target (the defined program

relation itself)• run() method computes and saves relation to disk

• DlogAnalysis:• Consumes only its declared domains and declared input

relations• Produces only its declared output relations• run() method runs bddbddb

Page 94: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Analyses as Building Blocks

1. Modularity• each analysis is written independently

2. Flexibility• analyses can interact in powerful ways with other

analyses (by user-specified data/control dependencies)

3. Efficiency• analyses executed in demand-driven fashion• results computed by each analysis automatically

cached for reuse by other analyses without re-computation

• independent analyses automatically executed in parallel

4. Reliability• result is independent of order in which analyses are

run

Page 95: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Outline of Tutorial

• Part 1:• Getting Started With Chord• Program Representation

• Part 2:• Analysis Using Datalog/BDDs• Chaining Analyses Together

• Part 3:• Context-Sensitive Analysis• Dynamic Analysis

Page 96: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Context-Sensitive Analysis

• Respects inter-procedural control-flow to varying degrees

• Broadly two kinds:• Bottom-Up: analyze method without any knowledge of

its callers

• Top-Down: analyze method only in called contexts

• Two kinds of top-down approaches:• Cloning-based (k-limited)

• Summary-based

• Fully context-sensitive approaches:• Bottom-up

• Top-down summary-based

Page 97: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Context-Sensitive Analysis in Chord

• Top-down: both cloning-based and summary-based

• Cloning-based analysis• k-CFA, k-object-sensitivity, hybrid

• Summary-based analysis• Tabulation algorithm from Reps, Horwitz, Sagiv (POPL’95)

Page 98: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Context-Insensitive Analysis

1

2, 3

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)

disjoint-reach(el, fl)?class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; }}

class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; }}

b

new1 Bldg

el

new2 List

fl

new3 List

e

new5 Floor

new6 Obj[]

f

new4 Event

events floors

elems

[*][*] a

elems

Page 99: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Cloning-Based Analysis

1

2

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)

3

2 3

disjoint-reach(el, fl)?

List() { Obj[] a = new6 Obj[…]; this.elems = a; }

class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; }}

b

new1 Bldg

el

new2 List

fl

new3 List

e

new5 Floor

new6 Obj[]

f

new4 Event

events floors

elems

[*][*] a

elems

Page 100: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Cloning with Object Sensitivity

1

2

for (int i = 0; i < K; i++)

for (int i = 0; i < M; i++)

3

b

new1 Bldg

el

new2 List

fl

new3 List

e

new5 Floor

new6 Obj[]

f

new4 Event

events floors

elems elems

[*][*]a

disjoint-reach(el, fl)?

new6 Obj[]

a

2 3

2 3

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new1 Bldg(); } Bldg() { List el = new2 List(); this.events = el; List fl = new3 List(); this.floors = fl; Event e = new4 Event(); el.elems[*] = e; Floor f = new5 Floor(); fl.elems[*] = f; }}

List() { Obj[] a = new6 Obj[…]; this.elems = a; }

class List { Obj[] elems; List() { Obj[] a = new6 Obj[…]; this.elems = a; }}

Page 101: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running Cloning-based Analyses in Chord

• chord.ctxt.kind=[ci|cs|co]• kind of context sensitivity for each method and its locals

• chord.inst.ctxt.kind=[ci|cs|co]• kind of context sensitivity for each instance method and

its locals

• chord.stat.ctxt.kind=[ci|cs|co]• kind of context sensitivity for each static method and its

locals

• chord.kobj.k=[1|2|…]• k value to use for each object allocation site

• chord.kcfa.k=[1|2|…]• k value to use for each method call site

ant –Dchord.work.dir=<…> –Dchord.run.analyses=<ONE OF ABOVE> run

cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog

Page 102: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Output of Pointer/Call-Graph Analyses in Chord

cspa_0cfa.dlog, cspa_kcfa.dlog, cspa_kobj.dlog, cspa_hybrid.dlog

• rootCM• (c,m): m is entry method in ctxt c

• CICM• (c1,i,c2,m): call site i in ctxt c1 may call

method m in ctxt c2

• CVC• (c,v,o): local v may point to object o in

ctxt c of its declaring method

• FC• (f,o): static field f may point to object o

• CFC• (o1,f,o2): instance field f of object o1 may point to

object o2

cipa_0cfa.dlog

• rootM

• IM

• VH

• FH

• HFH

Page 103: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Cloning-Based vs. Summary-Based Analysis

• Cloning-based Analysis:• Flow-insensitive

• Notion of method contexts is somewhat arbitrary

• Summary-based Analysis:• Flow-sensitive

• Notion of method contexts is defined by the user

Page 104: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Thread-Escape Analysisclass Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; }}

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

0

List

Bldg

Event

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Event

1

el fl

b

Page 105: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Thread-Escape Analysis

Elev

Elev

floors

floors

p:

= local = shared

local(p,v): Is v reachablefrom single thread at p?

v

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); }}

0

List

Bldg

Event

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Event

1

el fl

b

Page 106: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Trivial Pointer Abstraction

v

p:

local(p, v)?

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); }}

Elev

Elev

floors

floors

0

List

Bldg

Event

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Event

1

Page 107: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Allocation Sites Pointer Abstraction

p:

local(p, v)?v

Elev

Elev

floors

floors

0

List

Bldg

Event

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Event

1

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); }}

Page 108: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: k-CFA Pointer Abstraction

p:

class List { Obj[] elems; List() { Obj[] a = new Obj[…]; this.elems = a; }}

class Bldg { List events, floors; static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); }} local(p, v)?v

Elev

Elev

floors

floors

0

List

Bldg

Event

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Event

1

Page 109: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Complexity of Static Analyses

pointer abstraction

max abstract values (N)

trivial 1

allocation sites H

k-CFA H . I^k

pre

cis

e

scal

ab

le

2-partition 2

Our Static Analysis:

control-flow abstraction

maxabstract states

flow and contextinsensitive 1

flow sensitivecontext

insensitiveL

flow and contextsensitive L . 2^(N2 . F)

flow and context sensitive Q . L . 4^F

Challenge: an abstraction that is both precise and scalable

L = program points, F = fieldsH = allocation sites, I = call sites

Q = queries

Page 110: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Drawback of Existing Static Analyses

• Different queries require different parts of the program to be abstracted precisely

• But existing analyses use the same abstraction to prove all queries simultaneously

⇒ existing analyses sacrifice precision and/or scalability

Q

1

Q

2

abstraction A

P ⊢ Q 1?

P ⊢ Q 2?

Q

1

Q

2

static analysis

P

Page 111: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Insight 1: Client-Driven Static Analysis

• Query-driven: allows using separate abstractions for proving different queries

• Parametrized: parameter dictates how much precision to use for each program part for a given query

static analysis

abstraction A 2

static analysis

abstraction A 1 P

P ⊢ Q 1?

Q

1

Q

2

P ⊢ Q 2?

Page 112: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

static void main(…) { Bldg b = new Bldg(); for (*) List el = b.events; Event v = el.elems[*]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (*) Event e = new Event(); el.elems[*] = e; for (*) Floor f = new Floor(); fl.elems[*] = f; for (*) Elev t = new Elev(fl); t.start(); }

List() { Obj[] a = new Obj[…]; this.elems = a; }

h6:

Example: Client-Driven Static Analysis (RHS)

p:h1 h2 h3 h4 h5 h7h6

h1:

h4:

h5:

h3:

h2:

b

this

[*]

events

floors

elems

this

[*]

events

floors

elems

e flthis f

[*]elems

elems

this this

elems

this this

local(p, v)?

el

h7:

el t

b elv

Page 113: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Writing a Summary-Based Analysis in Chord

@Chord(name = "…")public class MyAnalysis extends ForwardRHSAnalysis<PE, SE> { @Override ICICG getCallGraph() { … } @Override Set<Pair<Location, PE>> getInitPathEdges() { … } @Override PE getInitPathEdge(Quad q, jq_Method m, PE pe) { … } @Override PE getMiscPathEdge(Quad q, PE pe) { … } @Override PE getInvkPathEdge(Quad q, PE clr, jq_Method m, SE tgt) { … } @Override SE getSummaryEdge(jq_Method m, PE pe); @Override public boolean doMerge() { … } @Override PE getCopy(PE pe) { … }}

• Implement representations of path/summary edges:

• Create a subclass of chord.project.analyses.rhs.[Forward|Backward]RHSAnalysis

class PE, SE implements chord.project.analyses.rhs.IEdge { @Override public boolean matchesSrcNodeOf(IEdge edge) { … } @Override public boolean mergeWith(IEdge edge) { … }}

Page 114: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Insight 2: Leveraging Dynamic Analysis

Pdynamic analysis

• Challenge: Efficiently find cheap parameter to prove query• 2^H choices, most choices imprecise or unscalable

• Our solution: Use dynamic analysis• parameter is inferred efficiently (linear in H)

• it can fail to prove query, but it is precise in practice and no cheaper parameter can prove query

Q

inputsI1 ... In

static analysis

abstraction A P ⊢ Q?

H

Page 115: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Example: Leveraging Dynamic Analysis

h6:

p:

h1:

h4:

h5:

h3:

h2:

h7:

static void main(String[] a) { Bldg b = new Bldg(); for (i = 0; i < K; i++) List el = b.events; Event v = el.elems[i]; } Bldg() { List el = new List(); this.events = el; List fl = new List(); this.floors = fl; for (int i = 0; i < K; i++) Event e = new Event(); el.elems[i] = e; for (int i = 0; i < M; i++) Floor f = new Floor(); fl.elems[i] = f; for (i = 0; i < N; i++) Elev t = new Elev(fl); t.start(); }

List() { Obj[] a = new Obj[…]; this.elems = a; }

v

0

List

Elev

Bldg

List

events floors

Obj[]

elems

Obj[]

elems

Floor

0

Floor

1

Elevfloors

floors

1

Event Event

h1 h2 h3 h4 h5 h7h6

local(p, v)?

Page 116: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Dynamic Analysis Implementation Space for Java

Chord supports instrumenting bytecode at load-time and offline

Implement inside a JVM Use JVMTI

Instrument bytecode atload-time

Instrumentbytecode offline

Portability

dependency on specific version of

specific JVM

not supported by some JVMs (e.g. Android)

not supported by some JVMs (e.g. Android)

Efficiency

Flexibility

no support for what is doable by bytecode

instru.

can change only

method bytecode after class loaded

Other issues

not trivial to modify

production JVM

event handing code must be

written in C/C++

must run program twice to find which classes to instru.

bytecode verifier may fail at runtime

Page 117: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Writing A Dynamic Analysis in Chord

import chord.project.analyses.DynamicAnalysis;

@Chord(name = "…")public class MyDynamicAnalysis extends DynamicAnalysis { @Override public InstrScheme getInstrScheme() { InstrScheme s = new InstrScheme(); s.set<event1>(<args1>); ... s.set<eventN>(<argsN>); return scheme; } @Override public void initAllPasses() { … } @Override public void doneAllPasses() { … } @Override public void initPass() { … } @Override public void donePass() { … } @Override public void process<event1>(<args1>) { … } ... @Override public void process<eventN>(<argsN>) { … }}

Page 118: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Predefined Instrumentation Events

• EnterMainMethod(t)

• EnterMethod(m, t)

• LeaveMethod(m, t)

• EnterLoop(b, t)

• LoopIteration(b, t)

• LeaveLoop(b, t)

• BasicBlock(b, t)

• Quad(p, t)

• [Bef|Aft]MethodCall(i, t, o)

• [Bef|Aft]New(h, t, o)

• NewArray(h, t, o)

• [Get|Put]staticPrimitive(e, t, b, f)

• [Get|Put]staticReference (e, t, b, f, o)

• [Get|Put]fieldPrimitive(e, t, b, f)

• [Get|Put]fieldReference (e, t, b, f, o)

• [Get|Put]aloadPrimitive(e, t, b, i)

• [Get|Put]aloadReference (e, t, b, i, o)

• [Get|Put]astorePrimitive(e, t, b, i)

• [Get|Put]astoreReference (e, t, b, i, o)

• Thread[Start|Join](i, t, o)

• [Acquire|Release]Lock([l|r], t, o)

• Wait|NotifyAny|NotifyAll(i, t, o)

Dynamic IDs: t=thread ID, o=object ID (0 denotes null)

Static IDs: m:M, b:B, p:P, i:I, h:H, e:E, f:F, l:L, r:R

Page 119: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Configuring Dynamic Analysis

• Bytecode instrumentation kind: chord.instr.kind=[online|offline]

• How to communicate events: chord.trace.kind=[none|pipe|full]

• JVMTI to start/end generating events: chord.use.jvmti=[true|false]

• Reuse traces from older Chord run: chord.reuse.traces=[true|false]

in same JVM as that running

instrumented program

Pro: can inspect state

Con: either exclude JDK from

instrumentation or don’t use it in event

handling code, to avoid correctness or

performance problems

in separate JVM after JVM running

instrumented program finishes

Con: infeasible for long-running

programs which generate lots of events, since all

events are stored in a (binary) file on disk

in separate JVM in parallel with JVM

running instrumented

program

Best option: uses buffered POSIX pipe

to communicate events between

event-generating JVM and event-handling

JVM

Page 120: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Architecture of Dynamic Analysis in Chord

• chord.project.analyses.BasicDynamicAnalysis• workhorse run() method: configures and runs dynamic

analysis

• chord.project.analyses.DynamicAnalysis• provides interface to handle predefined instrumentation

events

• chord.instr.BasicInstrumentor• provides interface to instrument various parts of a Java

program

• chord.instr.Instrumentor• instruments predefined events

• chord.runtime.BasicEventHandler• starts/stops one-JVM dynamic analysis and maintains

object IDs

• chord.runtime.TraceEventHandler• starts/stops two-JVM dynamic analysis

• chord.runtime.EventHandler• writes predefined events to buffer encapsulating trace file

Page 121: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Combining Static and Dynamic Analysis

• Static followed by Dynamic• reduce instrumentation overhead of dynamic

• Dynamic followed by Static• Counterexamples: query is false on some input• Likely invariants: a query true on some inputs is

likely true on all inputs [Ernst 2001]• Proofs: a query true on some inputs is likely true on

all inputs and for likely the same reason [this talk]

• Static and Dynamic interleaved• Yogi, concolic testing (EXE, DART, CUTE, SAGE)

Page 122: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Benchmark Characteristics

classesmethods(x 1000)

bytecodes(x 1000)

allocation

sites(x 1000)

queries(x 1000)

hedc 309 1.9 151 1.9 0. 6

weblech 532 3.1 230 3.0 0.7

lusearch 611 3.8 267 3.5 7.2

hsqldb 771 6.4 472 5.1 14.4

avrora 1498 5. 9 312 5.9 14.4

sunflow 992 6.6 478 6.1 10.0

Page 123: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Benchmark Characteristics

classesmethods(x 1000)

bytecodes(x 1000)

allocation

sites(x 1000)

queries(x 1000)

hedc 309 1.9 151 1.9 0. 6

weblech 532 3.1 230 3.0 0.7

lusearch 611 3.8 267 3.5 7.2

hsqldb 771 6.4 472 5.1 14.4

avrora 1498 5. 9 312 5.9 14.4

sunflow 992 6.6 478 6.1 10.0

Page 124: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Precision Comparison

Previous Approach Our Approach

• Pointer abstraction:• Allocation sites

• Control abstraction:• Flow insensitive• Context

insensitive

• Pointer abstraction:• 2-partition

• Control abstraction:• Flow sensitive• Context

sensitive

hedc

weble

ch

luse

arch

hsqld

b

avro

ra

sunflo

w0%

20%

40%

60%

80%

100%

hedc

weble

ch

luse

arch

hsqld

b

avro

ra

sunflo

w0%

20%

40%

60%

80%

100%

unknown

thread-shared

thread-local

Page 125: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

hedc

weble

ch

luse

arch

hsqld

b

avro

ra

sunflo

w0%

20%

40%

60%

80%

100%

hedc

weble

ch

luse

arch

hsqld

b

avro

ra

sunflo

w0%

20%

40%

60%

80%

100%

unknown

thread-shared

thread-local

Precision Comparison

Previous Approach Our Approach

• Previous scalable approach resolves 27% of queries

• Our approach resolves 82% of queries• 55% of queries are proven thread-local• 27% of queries are observed thread-shared

Page 126: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Running Time Breakdown

baseline static

analysis

our approach

dynamicanalys

is

static analysis

total

per query group

mean max

hedc 24s 6s 38s 1s 2s

weblech 39s 8s 1m 2s 4s

lusearch 43s 31s 8m 3s 6s

hsqldb 1m08s 35s 86m 11s 21s

avrora 1m00s 32s 41m 5s 8s

sunflow 1m18s 3m 74m 9s 19s

Page 127: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Sparsity of Our Abstraction

total

# sites

# sites set to

all queries proven queries

mean max mean max

hedc 1,914 3.2 12 1.4 5

weblech 2,958 2.2 8 1.5 5

lusearch 3,549 2.2 18 1.5 18

hsqldb 5,056 2.7 56 1.3 5

avrora 5,923 12.1 195 2.3 31

sunflow 6,053 2.2 18 1.3 15

Page 128: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Related Open-Source Projects

• JikesRVM: Java Research Virtual Machine

• Soot + Paddle: Static analysis and transformation framework for Java bytecode

• IBM WALA: Static analysis framework for Java bytecode and related languages

• RoadRunner (Flanagan & Freund): Dynamic analysis framework for Java concurrency

Page 129: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Acknowledgments

• Joeq: Static analysis and transformation framework for Java bytecode

• Javassist: Java bytecode manipulation framework

• bddbddb: BDD-based Datalog solver

Page 130: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Further Information

• Chord homepage:

http://jchord.googlecode.com/

• Chord user guide:

http://chord.stanford.edu/user_guide/

• Chord questions:

[email protected]

Page 131: Chord: A Versatile Platform for Program Analysis Mayur Naik Intel Labs, Berkeley PLDI 2011 Tutorial

Thank You!