Upload
raymond-cunningham
View
218
Download
2
Embed Size (px)
Citation preview
Dynamic Purity Analysis for Java Programs
Haiying Xu, Christopher J.F. Pickett, Clark VerbruggeSchool of Computer Science, McGill University
PASTE ’07 Conference, San Diego, CA
Presented by Derek White
CSE 6329
Outline
• Introduction• Approach and Contributions• Design: Static Purity Analysis• Kinds of Dynamic Purity• Design: Dynamic Purity Analysis• Memoization• Experimental Evaluation• Conclusions
Introduction
• Functional programming emphasizes application of functions and avoids mutable data (side effects)
• Popular functional languages include Scheme, Haskell, F#, OCaml, Scala, etc
• But you can program in a functional style using other languages
• “Pure” methods are methods that have functional (side effect free) behavior– Several definitions for purity, either no externally visible side
effects or the extent of side effects is limited– Constraints may also be placed on level of dependency on
previously available state
Introduction (2)
• Why do we care if a method is pure?• Helpful in program understanding, allows us
to isolate side effect free parts• Verification in model checking• Can be used to guide compiler optimization– Better method purity info allows for less
conservative assumptions– Caching (memoization) of function calls
Introduction (3)
• Static analysis has allowed large classifications for pure methods, there is variation in precise definitions used
• Static analysis is conservative with respect to runtime behavior
• It is unclear if some classes of pure methods have any practical value
• So, the authors present a detailed examination of method purity for Java– Considering several definitions of purity– Investigating both static and dynamic properties
Approach and Contributions
• Extending previous work on static analysis, showing different forms of purity at different frequencies in dynamic environment
• Design and implementation of dynamic purity analysis, online and offline– Scalable, handles SPECjvm98 at size 100 “with
acceptable overhead”• Support for multiple purity definitions in order to
compare to static purity analysis, also identified pure forms only observable dynamically
Approach and Contributions (2)
• Three metrics for the evaluation of extent of dynamic purity– Method, invocation, bytecode– These are applied to a static analysis as well as
dynamic purity definitions• Implementation of memoization on JVM, a
traditional consumer of purity information– Doesn’t achieve any speedup, just a functional
test module
Design: Static Analysis• Previous work has found that a large number of methods have weak
purity properties, stronger purity properties result in fewer pure method
• Static work done here considers strong purity– Method is “strongly pure” iff it doesn’t depend on OR change initial state
beyond primitive input values– Must always return the same result for the same input
• Specifically, the method may not:– Read/write heap or static data– Synchronize– Allocate objects– Invoke native methods– Throw exceptions– Invoke any non-pure methods
Design: Static Analysis (2)
• Java class files used as input• Flow-insensitive analysis done using Soot
SootSableVM
Class files
Jimple
Static Analysis
Attribute Generation
Class files + attributes
Attribute Parser
Dynamic Metrics
Output
Figure 1. Static analysis framework
Design: Static Analysis (3)• Instructions within a method are scanned, any instructions found to
be impure mark the method as impure• Interprocedural analysis is done next, propagating impurity up from
leaves of a CHA-based call graph• Assumption is made that exceptions do not propagate up the call
stack uncheckedImpurity Instructions
Native code exec native INVOKE*
Heap access NEW, NEWARRAY, ANEWARRAY, MULTIANEWARRAY, GETFIELD, PUTFIELD, *ALOAD, *ASTORE
Static access GETSTATIC, PUTSTATIC
Synchronization synchronized INVOKE*, synchronized *RETURN, MONITORENTER, MONITOREXIT
Exceptions ATHROW
Design: Static Analysis (4)
• Easily extended for dynamic evaluation of strong static purity analysis
• Soot writes purity information to class file attributes• SableVM reads attributes and records:– Pure methods reached at runtime– Frequency of pure method invocations– Percentage of pure bytecode executed by pure methods
• Provides indications about how static results correlate with dynamic runtime behavior
Design: Dynamic Analysis
• Under the static analysis, a method is determined to be pure for all possible executions or is impure otherwise – may be too conservative
• Methods that were flagged impure with static analysis may only execute pure flow control at runtime
• Goal of dynamic analysis is to identify pure methods based on runtime behavior, increasing number of pure methods found
Design: Dynamic Analysis (2)
Figure 2. Dynamic purity analysis framework
Design: Dynamic Analysis (3)
• Class files read into SableVM, instruction stream is examined for purity
• Purity analysis module uses an online escape analysis tracking writes to locally allocated objects
• Purity information can be used immediately by the VM or written to a file as offline analysis for a later execution
• Offline analysis removes the execution overhead• Clients of analysis are memoization and metrics used in
static analysis• Four kinds of purity: strong, moderate, weak, once-impure
Kinds of Dynamic Purity: Strong
• Same criteria as strong static purity• Only executed instructions are considered• All methods start with unknown status• Impure method information propagates up
the call stack• As with static, once a method is identified as
impure it is conservatively always considered impure
Kinds of Dynamic Purity: Moderate• Objects can be created and altered as long as the objects do not escape
the method execution context• A method may call an impure method as long as the impurity is contained• Must not change behavior based on heap or global state, based
completely on primitive input arguments• Methods still cannot:
– Invoke native methods– Read/write existing heap or static objects– Perform monitor operations– Throw exceptions– Call moderately impure methods, unless modified data belongs to and is
contained in the caller• Native System.arraycopy() and Object.clone() treated as heap access and
allocation instructions
Kinds of Dynamic Purity: Moderate (2)
• Analysis needs to take a closer look at *NEW*, GETFIELD, PUTFIELD, *ALOAD, *ASTORE
• *NEW* instructions used to determine object locality– Objects of a method are local if they do not escape the method, or if they
escape from a callee– Frames in the call stack have an object table storing all currently local
objects• PUTFIELD can allow objects local to the callee to escape to the
caller (requires an update to the object table)• GETFIELD, PUTFIELD, *ALOAD, *ASTORE can be
classified depending on a frame’s object table• Moderately pure methods can only use object parameters for
reference comparisons
Kinds of Dynamic Purity: Weak
• Allows heap reads so a method can inspect object parameters
• Maintains property that the method is function on its input
• GETFIELD is always safe• PUTFIELD still is considered in the context of
the escape analysis
Kinds of Dynamic Purity: Once-Impure
• Observed that some impure methods became weakly pure after a first invocation
• Once-Impure is a weakly pure method that was impure during its first execution
Memoization: Optimization with Purity
• All forms of purity mentioned previously ensure that there is a unique result for any given input
• All are candidates for memoization• Memoization caches argument to return value
mapping allowing the VM to bypass repeated execution of a method with the same arguments
• Benefit from jumping past execution must outweigh cost of looking up the return value in cache
Memoization (2)
• Method must be long enough to be worth optimizing• After the first invocation, arguments are hashed together,
looked up in a hash table, and the stored return value is substituted for invocation
• Primitive args stored directly, reference args are flattened (gathering type and primitive fields)– Done so that garbage collection doesn’t invalidate memo tables
• Direct object reference comparisons cannot be safely memoized, so ACMP_* bytecodes must be considered impure
• Upper bounds on memory consumption limit the number of method invocations that can be cached
Experimental Evaluation
• Experiments conducted using programs from SPEC JVM98 benchmark
• Metrics– Static method purity - percentage of all methods in the call
graph that are pure– Dynamic method purity - percentage of methods reached
at runtime that are pure– Dynamic invocation purity – percentage of method
invocations that are pure– Dynamic bytecode purity – percentage of executed
bytecode stream belonging to pure methods
Experimental Evaluation: Static• Experimental analysis includes both application and class library code used• On average, 13% of methods are found to be strongly pure• Not all methods are invoked at runtime, dynamically it is found that 5-6%
of reached methods are statically identified as pure• Many of these methods are small (20 inst or less) or are executed
infrequently
Table 2. Strong Static Purity: Static methods row shows percentage of all methods in the call graph identified as statically pure. Dynamic methods row shows percentageof all dynamic method invocations that execute a statically pure method. Bytecoderow shows the percentage of the bytecode stream that is executed by a staticallypure method
Experimental Evaluation: Dynamic
• Strong dynamic purity is a weaker than the static equivalent
• First row of Tables 3, 4, 5 show an improvement over the runtime use of strong static purity in rows 2-4 of Table 2
• Table 3 shows up to 4% more pure methods reached with strong dynamic purity
• Some methods invoked with significant frequency, Table 4 shows 13% more pure invocations for db
Experimental Evaluation: Dynamic (2)
Table 3. Dynamic method purity: All reached methods
Table 4. Dynamic invocation purity: Invoked methods that are pure for dynamic purity definitions
Table 5. Dynamic bytecode purity: Bytecode instruction streams that are pure for dynamic purity definitions
Experimental Evaluation: Dynamic (3)
• Reasons for impurity
Table 8. Reasons for dynamic impurity
Experimental Evaluation: Memoization
• Once-impure dynamic purity analysis used, a method is always invoked once prior to memoization
• Only applied to methods meeting cost effective criteria
Table 11. Memoized/memoizable methods: Minimum method size setting shown in far left column
Experimental Evaluation: Execution
Figure 3. Execution times: Minimum method size for memoization is set to 50
Conclusions
• Dynamic purity analyses identify considerable amounts of purity
• Actual program behavior is not predictable based on only on static observations
• Little variation in purity over the benchmark suite
• May be the case that memoization is of limited use for non-functional languages
Questions