Program Analysis (470) - doc.ic.ac.ukherbert/teaching/00_TasterSlides.pdfProgram Analysis Program...

Program Analysis (470)

Herbert Wiklickyherbert@doc.ic.ac.uk

Department of ComputingImperial College London

Spring 2015

Program Analysis

Program analysis is an automated technique for finding outproperties of programs without having to execute them.

Static Analysis vs Dynamic Testing

Compiler OptimisationProgram VerificationSecurity Analysis

Program Analysis

Techniques

Some techniques used in program analysis include:

Data Flow AnalysisControl Flow AnalysisTypes and Effects SystemsAbstract Interpretation

Flemming Nielson, Hanne Riis Nielson and Chris Hankin:Principles of Program Analysis. Springer Verlag, 1999/2005.

Techniques

A First Example

Consider the following fragment in some procedural language.

1: m← 2;2: while n > 1 do3: m← m × n;4: n← n − 15: end while6: stop

[m← 2]1;while [n > 1]2 do

[m← m × n]3;[n← n − 1]4

end while[stop]5

We annotate a program such that it becomes clear about whatprogram point we are talking about.

A First Example

[m← 2]1;while [n > 1]2 do

[m← m × n]3;[n← n − 1]4

end while[stop]5

A First Example

[m← 2]1;while [n > 1]2 do

[m← m × n]3;[n← n − 1]4

end while[stop]5

A Parity Analysis

Claim: This program fragment always returns an even m,idependently of the initial values of m and n.

We can statically determine that in any circumstances the valueof m at the last statement will be even for any input n.

A program analysis, so-called parity analysis, can determinethis by propagating the even/odd or parity information forwardsform the start of the program.

A Parity Analysis

Properties

We will assign to each variable one of three properties:

even — the value is known to be evenodd — the value is known to be oddunknown — the parity of the value is unknown

For both variables m and n we record its parity at each stage ofthe computation (beginning of each statement).

Properties

A First Example

Executing the program with abstract values, parity, for m and n.

Important: We can restart the loop!

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do3: m← m × n;4: n← n − 15: end while6: stop

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n;4: n← n − 15: end while6: stop

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n; . even(m) – unknown(n)4: n← n − 15: end while6: stop . even(m) – unknown(n)

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n; . even(m) – unknown(n)4: n← n − 1 . even(m) – unknown(n)5: end while6: stop . even(m) – unknown(n)

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n; . even(m) – unknown(n)4: n← n − 1 . even(m) – unknown(n)5: end while . even(m) – unknown(n)6: stop . even(m) – unknown(n)

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n;4: n← n − 15: end while6: stop . even(m) – unknown(n)

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n; . even(m) – unknown(n)4: n← n − 15: end while6: stop . even(m) – unknown(n)

A First Example

1: m← 2; . unknown(m) – unknown(n)2: while n > 1 do . even(m) – unknown(n)3: m← m × n; . even(m) – unknown(n)4: n← n − 1 . even(m) – unknown(n)5: end while6: stop . even(m) – unknown(n)

A First Example

The first program computes 2 times the factorial for any positivevalue of n. Replacing ‘2’ by ‘1’ in the first statement gives:

i.e. the factorial – but then the program analysis is unable to tellus anything about the parity of m at the end of the execution.

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do3: m← m × n;4: n← n − 15: end while6: stop

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . odd(m) – unknown(n)3: m← m × n;4: n← n − 15: end while6: stop

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . odd(m) – unknown(n)3: m← m × n; . odd(m) – unknown(n)4: n← n − 15: end while6: stop . odd(m) – unknown(n)

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . odd(m) – unknown(n)3: m← m × n; . odd(m) – unknown(n)4: n← n − 1 . unknown(m) – unknown(n)5: end while6: stop . odd(m) – unknown(n)

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . odd(m) – unknown(n)3: m← m × n; . odd(m) – unknown(n)4: n← n − 1 . unknown(m) – unknown(n)5: end while . unknown(m) – unknown(n)6: stop . odd(m) – unknown(n)

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . unknown(m) – unknown(n)3: m← m × n;4: n← n − 15: end while6: stop . odd(m) – unknown(n)

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . unknown(m) – unknown(n)3: m← m × n; . unknown(m) – unknown(n)4: n← n − 15: end while6: stop . unknown(m) – unknown(n)

A First Example

1: m← 1; . unknown(m) – unknown(n)2: while n > 1 do . unknown(m) – unknown(n)3: m← m × n; . unknown(m) – unknown(n)4: n← n − 15: end while6: stop . unknown(m) – unknown(n)

Loss of Precision

The analysis of the new program does not give a satisfyingresult because:

m could be even — if the input n > 1, orm could be odd — if the input n ≤ 1.

However, even if we fix/require the input to be positive andeven — e.g. by some suitable conditional assignment — theprogram analysis still might not be able to accurately predictthat m will be even at statement 5.

Loss of Precision

Safe Approximations

Such a loss of precession is a common feature of programanalysis: many properties that we are interested in areessentially undecidable and therefore we cannot hope todetect (all of) them accurately.

We only aim to ensure that the answers/results we obtain byprogram analysis are at least safe, i.e.

yes means definitely yes,no means maybe no.

Safe Approximations

Facets of Program Analysis

We can identify the following facets of program analysis whichwe will mostely cover in this course:

specification,implementation,correctness,applications.

Data Flow Analysis

The starting point for data flow analysis is a representation ofthe control flow graph of the program: the nodes of such agraph may represent individual statements – as in a flowchart –or sequences of statements; arcs specify how control may bepassed during program execution.

The data flow analysis is usually specified as a set of equationswhich associate analysis information with program points whichcorrespond to the nodes in the control flow graph. Thisinformation may be propagated forwards through the program(e.g. parity analysis) or backwards.

When the control flow graph is not explicitly given, we need apreliminary control flow analysis

Data Flow Analysis

Control Flow Information

[x:=x-1]4

[z:=z*y]3

[x>0]2

[z:=1]1?

This allows us to determine the predecessors pred andsuccessors succ of each statement, e.g. pred(2) = {1,4}.

Control Flow Information

[x:=x-1]4

[z:=z*y]3

[x>0]2

[z:=1]1?

This allows us to determine the predecessors pred andsuccessors succ of each statement, e.g. pred(2) = {1,4}.

Reaching Definition

Reaching Definition (RD) analysis determines which set ofdefinitions (i.e. assignments) are current when control reachesa certain program point p.

The analysis can be specified by equations of the form:

RDentry(p) =

RDinit if p is initial⋃

p′∈pred(p)

RDexit(p′) otherwise

RDexit(p) = (RDentry(p)\killRD(p)) ∪ genRD(p)

Reaching Definition

RDentry(p) =

p′∈pred(p)

Reaching Definition

RDentry(p) =

p′∈pred(p)

Reaching Definition

RDentry(p) =

p′∈pred(p)

Analysis Information

At each program point some definitions get “killed” (those whichdefine the same variable as at the program point) while othersare “generated”.

A suitable representation for properties are sets of pairs, whereeach pair contains a variable x and a program point p: themeaning of the pair (x ,p) is that the assignment to x at point pis the current one. The initial value in this case is:

RDinit = {(x , ?) | x is a variable in the program}

Reaching Definitions is a forward analysis and we require theleast (most precise) solutions to the set of equations.

Equations & Solutions

For our initial program fragment

[m← 2]1;while [n > 1]2 do

[m← m × n]3;[n← n − 1]4

end while[stop]5

some of the RD equations we get are:

RDentry(1) = {(m, ?), (n, ?)}RDentry(2) = RDexit(1) ∪ RDexit(4)

For our initial program fragment

[m← 2]1;while [n > 1]2 do

[m← m × n]3;[n← n − 1]4

end while[stop]5

some of the RD equations we get are:

RDentry RDexit1 {(m, ?), (n, ?)} {(m,1), (n, ?)}2 {(m,1), (m,3), (n, ?), (n,4)} {(m,1), (m,3), (n, ?), (n,4)}3 {(m,1), (m,3), (n, ?), (n,4)} {(m,3), (n, ?), (n,4)}4 {(m,3), (n, ?), (n,4)} {(m,3), (n,4)}5 {(m,1), (m,3), (n, ?), (n,4)} {(m,1), (m,3), (n, ?), (n,4)}

Solving Equations

How can we construct solution to the data flow equations?Answer: Iteratively, by improving approximations/guesses.

INPUT: Control Flow Graphi.e. initial(p), pred(p).

OUTPUT: Reaching Definitions RD.

METHOD: Step 1: InitialisationStep 2: Iteration

Solving Equations

Some Examples

Some examples of data flow analyses — and the possibleapplications and optimisations they allow for — are:

Reaching Definitions — Constant FoldingAvailable Expressions — Avoid Re-computationsVery Busy Expressions — HoistingLive Variables — Dead Code Elimination

Information Flow — Computer SecurityShape Analyis — Pointer Analysis — etc.

Some Examples

Code Optimisation

To illustrate the ideas we shall show how Reaching Definitionscan be used to perform Constant Folding.

There are two ingredients to this:

Replace the use of a variable in some expression by aconstant if it is known that the value of that variable willalways be a constant.Simplify an expression by partially evaluating it:subexpressions that contain no variables can be evaluated.

Code Optimisation

Constant Folding I

RD ` [ x := a ]` B [ x := a[y 7→ n] ]`

y ∈ FV(a) ∧ (y , ?) /∈ RDentry(`) ∧∀(y ′, `′) ∈ RDentry(`) :

y ′ = y ⇒ [. . .]`′= [ y := n ]`

RD ` [ x := a ]` B [ x := n ]`

FV(a) = ∅ ∧ a is not constant ∧a evaluates to n

Constant Folding I

RD ` [ x := a ]` B [ x := a[y 7→ n] ]`

y ∈ FV(a) ∧ (y , ?) /∈ RDentry(`) ∧∀(y ′, `′) ∈ RDentry(`) :

y ′ = y ⇒ [. . .]`′= [ y := n ]`

RD ` [ x := a ]` B [ x := n ]`

FV(a) = ∅ ∧ a is not constant ∧a evaluates to n

Constant Folding II

RD ` S1 B S′1

RD ` S1;S2 B S′1;S2

RD ` S2 B S′2

RD ` S1;S2 B S1;S′2

RD ` S1 B S′1

RD ` if [b]` then S1 else S2 B if [b]` then S′1 else S2

RD ` S2 B S′2

RD ` if [b]` then S1 else S2 B if [b]` then S1 else S′2

RD ` S B S′

RD ` while [b]` do S B while [b]` do S′

Constant Folding II

RD ` S1 B S′1

RD ` S1;S2 B S′1;S2

RD ` S2 B S′2

RD ` S1;S2 B S1;S′2

RD ` S1 B S′1

RD ` S2 B S′2

RD ` S B S′

Constant Folding II

RD ` S1 B S′1

RD ` S1;S2 B S′1;S2

RD ` S2 B S′2

RD ` S1;S2 B S1;S′2

RD ` S1 B S′1

RD ` S2 B S′2

RD ` S B S′

Constant Folding II

RD ` S1 B S′1

RD ` S1;S2 B S′1;S2

RD ` S2 B S′2

RD ` S1;S2 B S1;S′2

RD ` S1 B S′1

RD ` S2 B S′2

RD ` S B S′

Constant Folding II

RD ` S1 B S′1

RD ` S1;S2 B S′1;S2

RD ` S2 B S′2

RD ` S1;S2 B S1;S′2

RD ` S1 B S′1

RD ` S2 B S′2

RD ` S B S′

An Example

To illustrate the use of the transformation consider:

[ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

The (least) solution to the Reaching Definition analysis is:

RDentry(1) = {(x , ?), (y , ?), (z, ?)}RDexit(1) = {(x ,1), (y , ?), (z, ?)}

RDentry(2) = {(x ,1), (y , ?), (z, ?)}RDexit(2) = {(x ,1), (y ,2), (z, ?)}

RDentry(3) = {(x ,1), (y ,2), (z, ?)}RDexit(3) = {(x ,1), (y ,2), (z,3)}

An Example

To illustrate the use of the transformation consider:

[ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

The (least) solution to the Reaching Definition analysis is:

RDentry(1) = {(x , ?), (y , ?), (z, ?)}RDexit(1) = {(x ,1), (y , ?), (z, ?)}

RDentry(2) = {(x ,1), (y , ?), (z, ?)}RDexit(2) = {(x ,1), (y ,2), (z, ?)}

RDentry(3) = {(x ,1), (y ,2), (z, ?)}RDexit(3) = {(x ,1), (y ,2), (z,3)}

Constant Folding

We have for example the following:

RD ` [ y := x + 10 ]2 B [ y := 10 + 10 ]2

and therfore the rules for sequential composition allow us to dothe following transformation:

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3 B[ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

Constant Folding

We have for example the following:

RD ` [ y := x + 10 ]2 B [ y := 10 + 10 ]2

and therfore the rules for sequential composition allow us to dothe following transformation:

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3 B[ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

Transformation

We can continue this kind of transformation and obtain:

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

after which no more steps are possible.

Transformation

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

Transformation

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

Transformation

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

Transformation

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

Transformation

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

Transformation

RD ` [ x := 10 ]1; [ y := x + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 10 + 10 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := y + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 20 + 10 ]3

B [ x := 10 ]1; [ y := 20 ]2; [ z := 30 ]3

Additional Issues

The above example shows that optimisation is in general theresult of a number of successive transformations.

RD ` S1 B S2 B . . . B Sn.

This could be costly because one S1 has been transformed intoS2 we might have to re-compute the Reaching Definitionanalysis before the next transformation step can be done.

It could also be the case that different sequences oftransformations either lead to different end results or are of verydifferent length.

Additional Issues

RD ` S1 B S2 B . . . B Sn.

Additional Issues

RD ` S1 B S2 B . . . B Sn.

Lecture: Executive Summary

Data Flow AnalyisMonotone FrameworksControl Flow AnalysisAbstract InterpretationFurther Topics

Lecture: Milestones

Coursework I: Mid February – TestCoursework II: Early March – Test

Examination: Sometime in Week 11

Lecture: Milestones

Program Analysis (470) - doc.ic.ac.ukherbert/teaching/00_TasterSlides.pdfProgram Analysis Program...

Documents

Program Synthesis for Program Analysis

Gene ﬁnding and gene structure prediction - Vital-IT · Gene ﬁnding and gene structure prediction Lorenzo Cerutti Swiss Institute of Bioinformatics EMBnet course, 2004 Outline

Independent Component Analysis: Algorithms and …mlsp.cs.cmu.edu/courses/fall2012/lectures/ICA_Hyvarinen.pdfsignal recordings, ﬁnding hidden factors in ﬁnancial time s eries,

Algorithmic Solutions for Envy-Free Cake Cutting · finding an envy-free cake cutting is the same as finding a Brouwer’s fixed point, or finding a Nash equilibrium or a market

Audience Costs: An Historical Analysis · the audience costs theory plausible, so even if a statistical analysis supports those hypotheses, that ﬁnding would not convince these

Lecture 15b Eigenbases (finding eigenbases) · 2020. 12. 31. · Lecture 15b Eigenbases (finding eigenbases) Slide 2/8 Review Recall: Definition (Eigenbases) An eigenbasis for

Supervised Principal Component Analysis: Visualization ... · called “Supervised Principal Component Analysis (Supervised PCA)”. It is a generalization of PCA which aims at ﬁnding

CCRMA MIR Workshop 2014 Beat-ﬁnding and Rhythm Analysis · PDF fileCCRMA MIR Workshop 2014 Beat-ﬁnding and Rhythm Analysis. Outline • Modelling Rhythm Cognition. ... (Desain

Pentesting Java/J2EE, ﬁnding remote holes - HITBconference.hitb.org/hitbsecconf2006kl/materials/DAY 1 - Marc... · Pentesting Java/J2EE, ﬁnding remote holes Marc Schoenefeld University

VULNERABILITY MODELS - UniTrentoeprints-phd.biblio.unitn.it/1282/1/nguyen-thesis.pdf · 2014-06-17 · The extensive experimental analysis reveals an interesting ﬁnding about the

Postmortem Program Analysis with Hardware-Enhanced Post ... · and data ﬂows are given. As such, the research on post-mortem program analysis primarily focuses on ﬁnding out control

Ockham’s razor, empirical complexity, and truth-finding ... · Ockham’s razor, empirical complexity, and truth-finding efficiency Kevin T. Kelly Department of Philosophy, Carnegie

Heuristic evaluation: Comparing ways of ﬁnding and reporting …527675/FULLTEXT01.pdf · 2013-01-22 · Heuristic evaluation: Comparing ways of ﬁnding and reporting usability

design Rule-based topology ﬁnding of patterns towards

Statically Balanced Tensegrity Mechanisms · Statically Balanced Tensegrity Mechanisms A literature review ... The ﬁrst stage in tensegrity design and analysis is form ﬁnding,

Program Analysis

Motif ﬁnding - fu-berlin.de

Newtonian Program Analysis via Tensor Product1.Introduction Many interprocedural dataflow-analysis problems can be formu-lated as the problem of finding the least fixed-point of

Comparison of methods for ﬁnding saddle points without …theory.cm.utexas.edu/henkelman/pubs/olsen04_9776.pdf · 2007-12-17 · Comparison of methods for ﬁnding saddle points

Structural form-ﬁnding of Auxetic Materials using Graphic