V CFG 1. Generic Programming D F C K I S V D S 5 Pre-analysis & …rosaec.snu.ac.kr/meet/file/20140115v.pdf · 2018. 4. 12. · 1Seoul National University 2Universiteit Gent Bruno

•선별적�� 문맥�� 구분�� 분석�� -�� 전분석�� :�� 모든�� 문맥�� 구분,�� 값은�� 양수만�� 구분

1.�� 동기 3.�� 적용�� 사례

선별적으로�� 정확도를�� 높이는�� 프로그램�� 분석오학주,�� 이원찬,�� 허기홍,�� 양홍석,�� 이광근�� 서울대학교,�� University�� of�� Oxford

4.�� 실험�� 결과

•일괄적으로�� 정확도를�� 높이는�� 것은�� 비효율�� -�� 엄청난�� 시간과�� 메모리�� 소모�� -�� 정확도를�� 높여도�� 증명할�� 수�� 있는�� 성질에는�� 한계

Static Analysis Scalability Improvement

Sound-&-global analyzer class

Kwangkeun Yi Collage of Static Analysis

•�� ,�� 정확도�� 향상을�� 고민할�� 때�� !-�� 100만�� 줄�� C�� 코드�� 전체�� 분석�� ,�� 허나�� 많은�� 허위경보�� 예)�� make-3.76.1�� 2.7만�� 줄�� 1872�� 개�� 경보

let OrdList = (| · · · : 8A.OrdhAi ) OrdhListhAii|) inlet OrdInt = (| · · · : OrdhInti|) inimplicit {OrdInt ,OrdList} in

sorthListhIntii ?(OrdhListhIntii) [[2, 5], [1, 3]]

http://rosaec.snu.ac.kr

ProgrammingResearch Laboratory

http://ropas.snu.ac.kr

The Implicit Calculus

1Seoul National University 2Universiteit GentBruno C. d. S. Oliveira1 Tom Schrijvers2 Wontae Choi1 Wonchan Lee1 Kwangkeun Yi1

A New Foundation for Generic Programming

2. The Implicit Calculus λ⇒• Calculus of the essence of GP: rules, scoping, and type-directed resolution

In the paper• Type system• Elaboration semantics to System F• Higher-order rules and partial resolution• Source language and its translation to λ⇒

1. Generic Programming

• Decoupling algorithms from types

sort [3,1,2] // [1,2,3]sort [‘c’,’a’,’b’] // [‘a’,’b’,’c’]sort [[2,5],[1,3]] // [[1,3],[2,5]]

• Decoupling by parametrization

• Implicit instantiation

sort<A>: Ord<A> ⇒ List<A> → List<A>

sort< ? > ? [3,1,2]

sort<Int> ? [3,1,2]

sort<Int> OrdInt [3,1,2]

type inference

elements of type A should have order!

• Well-known GP mechanisms: Haskell Type Classes, C++0x Concepts, Scala Implicits

• Queries and Resolution

➾

OrdInt gives order between Ints

resolution

e ::= ?⇢ | (|e : ⇢|) | e with e : ⇢ | · · ·queries rules scoping?⇢ (|e : ⇢|)

OrdInt : OrdhIntiOrdInt

OrdList : 8A.OrdhAi) OrdhListhAii

OrdList

⇢ ::= 8~↵.⇢̄ ) ⌧

rule environment➾ sorthListhIntii (OrdList OrdInt) [[2, 5], [1, 3]]OrdIntOrdList

• Rules and Scoping

OrdList

OrdInt

?(OrdhListhIntii)

(| · · · : 8A.OrdhAi ) OrdhListhAii|)(| · · · : OrdhInti|)

OrdListOrdIntimplicit in

with

• Translation from Source Langauge

sorthInti ?(OrdhInti) [3, 1, 2]?(OrdhInti)➾ sorthInti OrdInt [3, 1, 2]OrdInt

simple case

sorthListhIntii ?(OrdhListhIntii) [[2, 5], [1, 3]]?(OrdhListhIntii)recursive case

sort [[2,5],[1,3]]source language λ⇒

sorthListhIntii with {?(OrdhListhIntii)} [[2, 5], [1, 3]]?(OrdhListhIntii)with;

syntactic sugarimplicit e : ⇢ in e1 : ⌧

def= (|e1 : ⇢̄ ) ⌧ |) with e : ⇢

implicit in

with

formalized, but restrictive general, but never formalized

















sort< ? > ? [3,1,2]

sort<Int> ? [3,1,2]


type inference




➾


resolution




OrdList

⇢ ::= 8~↵.⇢̄ ) ⌧



OrdList

OrdInt

?(OrdhListhIntii)



with



simple case





def= (|e1 : ⇢̄ ) ⌧ |) with e : ⇢

implicit in

with


















sort< ? > ? [3,1,2]

sort<Int> ? [3,1,2]


type inference




➾


resolution




OrdList

⇢ ::= 8~↵.⇢̄ ) ⌧



OrdList

OrdInt

?(OrdhListhIntii)



with



simple case





def= (|e1 : ⇢̄ ) ⌧ |) with e : ⇢

implicit in

with


-�� 허위�� 경보�� 대표�� 원인�� :�� 함수�� 문맥,�� 변수�� 관계�� 요약

•선별적�� 관계�� 분석�� (팔각형octagon�� 분석)�� -�� 전분석�� :�� 모든�� 변수�� 관계,�� 차는�� 유한성만�� 구분

5.�� 결론

•�� 에�� 선별적으로�� 정확한�� 분석�� 적용�� -�� 효율적으로�� 문맥�� 구분,�� 관계�� 분석�� -�� 흐름,�� 문맥&관계�� 등�� 다른�� 부분도�� 적용�� 계획

!!!!!!!!!!

1 char* xmalloc (int n) {

2 return malloc(n);

3 }

4

5 void f (int size) {

6 p = xmalloc (size);

7 assert (sizeof(p) > 1); // Query 18 q = xmalloc (input());

9 assert (sizeof(q) > 1); // Query 210 }

11

12 int main() {

13 f (8);

14 f (16);

15 }

Figure 1. Example Program

of this program calls procedure f and g. Procedure multi glob iscalled in f and g with different argument values.

The program contains two queries. The first query at line 5 askswhether p points to a buffer of size larger than 1. The other query atline 7 asks a similar question, but this time for the pointer variableq. Note that the first query always holds, but the second query isnot necessarily true.

Context-insensitive analysis If we analyze the program usinga context-insensitive interval analysis, we cannot prove the firstquery. Since the analysis is insensitive to calling contexts, it esti-mates the effect of xmalloc under all the possible inputs, and usesthis same estimation as the result of every call. Note that an inputto xmalloc at line 6 can be any integer, and the analysis concludesthat xmalloc allocates a buffer of size in [�1,+1].

Context-sensitive analysis A natural way to fix this precisionissue is to increase the context-sensitivity. One popular approachis k-CFA analysis [16, 17]. It uses sequences of call sites up tolength k to distinguish calling contexts of a procedure, and analyzesthe procedure separately for such distinguished calling contexts.For instane, 3-CFA analyzes the procedure xmalloc separately foreach of the following calling contexts:

4 · 10 · 14 4 · 10 · 15 4 · 11 · 16 4 · 11 · 176 · 10 · 14 6 · 10 · 15 6 · 11 · 16 6 · 11 · 17 (1)

Here a · b · c denotes a sequence of call sites a, b and c (we usethe line numbers as call sites), with a being the most recent call.Note that the 3-CFA analysis can prove the first query: the analysisanalyzes the first four contexts separately and infers that a buffer ofsize greater than 1 gets allocated under these calling contexts.

Need of selective context-sensitivity However, using such a “uni-form” context-sensitivity is not ideal. It is often too expensive to runsuch an analysis with high enough k, such as k � 3 that our exam-ple needs.More importantly, for many procedure calls, increasingcontext-sensitivity does not help—either it does not improve theanalysis results of these calls, or the increased precision is not use-ful for answering queries. For instance, at the second query, for ev-ery k � 0, the k-CFA analysis concludes that p points to a buffer ofsize [�1,+1]. Also, it is unnecessary to analyze g separately forcall sites 16 and 17, because those two calls have the same effecton the query.

Our selective context-sensitivity With our approach, an analysiscan analyze procedures with only needed context-sensitivity. It an-alyzes a procedure separately for a calling context if doing so islikely to improve the precision of the analysis and reduce falsealarms in its answers for given queries. For the example program,

our analysis first predicts that increasing context-sensitivity is un-likely to help answer the second query (line 7) accurately, but islikely to do so for the first query (line 5). Next, the analysis findsout that we can bring the full benefit of context-sensitivity for thefirst query, by distinguishing only the following four types of call-ing contexts of xmalloc:

4 · 10 · 14, 4 · 10 · 15, 4 · 11, all the other contexts (2)

Note that contexts 4 · 11 · 16 and 4 · 11 · 17 are merged into asingle context 4 · 11. This merging happens because the analysisfigures out that two callers of g (line 16 and 17) do not provideany useful information for resolving the first query. Finally, theanalysis analyzes the given program using the interval domainwhile distinguishing calling contexts above and their suffixes (i.e.,10 ·14, 10 ·15, 14, 15, 11). This selective context-sensitive analysisis able to prove the first query.

Impact pre-analysis Our key idea is to approximate the mainanalysis under full context-sensitivity using a pre-analysis, andestimate the impact of context-sensitivity on the results of the mainanalysis. This impact pre-analysis uses a simple abstract domainand transfer functions, and can be run efficiently even with fullcontext-sensitivity.

For instance, we approximate the interval analysis in this ex-ample using a pre-analysis with two abstract values: F and >.Here > means all intervals, and F intervals of the form [l, u] with0 l u. A typical abstract state in this domain is [x : >, y : F],which means the following set of states in the interval domain:

{[x : [lx

, u

x

], y : [ly

, u

y

]] | lx

u

x

^ 0 l

y

u

y

}.This simple abstract domain of the pre-analysis is chosen becausewe are interested in showing the absence of buffer overruns andthe analysis proves such properties only when it finds non-negativeintervals for buffer sizes and indices.

We run this pre-analysis under full context-sensitivity (i.e., 1-CFA). For our example program, we obtain a summary of theprocedure xmalloc with eight entries, each corresponding to adifferent context in (1). The third column of the table below showsthis summary:

Size of the allocated buffer in xmalloc

Contexts Main analysis Pre-analysis4 · 10 · 14 [8, 8] F4 · 10 · 15 [16, 16] F4 · 11 · 16 [4, 4] F4 · 11 · 17 [4, 4] F6 · 10 · 14 [�1,+1] >6 · 10 · 15 [�1,+1] >6 · 11 · 16 [�1,+1] >6 · 11 · 17 [�1,+1] >

The second column of the table shows the results of the intervalanalysis with full context-sensitivity. Note that the pre-analysisin this case precisely estimates the impact of context-sensitivity:it identifies calling contexts (i.e., the first four contexts in thetable) where the interval analysis accurately tracks the size of theallocated buffer in xmalloc under the full context-sensitivity. Ingeneral, our pre-analysis might lose precision and use > more oftenthan in the ideal case. However, even when such approximationoccurs, it does so only in a sound manner—if the pre-analysiscomputes F for the size of a buffer, the interval analysis underfull context-sensitivity is guaranteed to compute a non-negativeinterval.

Use of pre-analysis results Next, from the pre-analysis results,we select calling contexts that help improve the precision regard-ing given queries. We first identify queries whose expressions areassigned with F in the pre-analysis run. In our example, the pre-

2 2014/1/9

!!!!!

CFG

1x = 1

2call f

3y = x

4call g

5z = y+1

6z > 0?

7y = 10

8call g

m

f

g

h

CallingContexts

0 2·0 {4·2·0, 8·1} 1

ContextSelector K = {m 7! ✏, f 7! {2, ✏}, g 7! {4·2, 8}, h 7! ✏}

Figure 2. Example context selector. Gray and black nodes in CFGare source and query points, respectively.

context 0. We require this condition because our selective context-sensitive analysis aims at distinguishing only the calls after pass-ing the sources of dependency and analyzing context-insensitivelythose encountered before reaching those sources, which do notcontribute to the query. This well-formedness assumption is not astrong restriction and its violation nearly never happens in prac-tice. We did not observe any violation of the assumption in ourbenchmark programs (Section 7). If the program is not well-formedto a query, then we simply ignore it.

Let us explain the condition with an example. Suppose that0 = c3 · c2 · c1 is the initial context at c0 and

i

= c1 is thecontext at c

i

. Suppose further that c

i

is a call node. Then, ourcondition requires that c

i

should not be one of call site c1, c2, andc3. Formally, the condition is defined as follows:

We say the given program is well-formed with respect to thequery (c

q

, x

q

) iff for every (c0, x0) 2 �(cq ,xq) and its valid value-flow path

((c0,0), x0) ,!K1 · · · ,!K1 ((c

q

,

q

), xq

)

for all 0 i n such that ci

2 Cc

, ci

is not included in the initialcalling context 0; i.e.,

c

i

62 0 (11)where we write c 2 when there exists some

0 such that c · 0 isa suffix of .

In summary, for the path in (10), collecting contexts

{0 0, . . . ,q

0}

give all the necessary partial calling contexts, where each

i

0

belongs to the calling contexts of procedure fid(ci

). Thus, wedefine the context selector for the dependency path (10) as follows:

Definition 9 (Kp

, Context Selector for Path p). Let p be a depen-dency path from a source (c0, x0) to query (c

q

, x

q

):

p = ((c0,0), x0) ,!K1 · · · ,!K1 ((c

q

,

q

), xq

),

where 0 is an initial context at c0 such that (◆, ✏)!⇤K1 (c0, x0).

The context selector Kp

for the path is defined as,

K

p

= �f. {i

0 | fid(ci

) = f ^ ((ci

,

i

), ) 2 p}.

Example 6. From the path p1 in Example 5, the collection of i

is {0, 2 · 0, 4 · 2 · 0} (see Figure 2). Hence, the collection of

i

0 is {✏, 2, 4 · 2}, where ✏ belongs to procedure m, 2 to f, and

4 · 2 to g. Similar for path p2. Thus, Kp1 and K

p2 are:

K

p1 =

2

4m 7! {✏}f 7! {2}g 7! {4 · 2}

3

5K

p2 =

h 7! {✏}g 7! {8}

�

Then, the final context selector K is the union of Kp

’s:

Definition 10 (K, Context Selector). Let (cq

, x

q

) be a query. Thecontext selector K 2 F! }(C⇤

c

) for our selective analysis is:

K(f) = E(f) [[

{Kp

(f) | p 2 Paths(cq ,xq)} (12)

where E(f) = {✏} if f 6= fid(cq

); and otherwise, E(f) = ;.

Running selective context-sensitive main analysis Finally, werun the main analysis with selective context-sensitivity K definedby the result of the impact pre-analysis. The following propositionstates that the pre-analysis-guided context-sensitivity (K) managesto pay off at the selective main analysis, although the pre-analysisis fully context-sensitive and the main analysis is not.

Proposition 1 (Impact Realization). Let PA

K1 2 C ! S]

be the result of the impact pre-analysis (Definition 5). Let q 2Q] be a selected query (8). Let K be the context selector for q

(Definition 10) defined using the pre-analysis result PAK1 . Let

MAK 2 CK

! S be the main analysis result with the contextselector K. Then, the selective main analysis is at least as preciseas the fully context-sensitive pre-analysis for the selected query q:

MAK vq

PA

K1

where MAK vq

PA

K1 iff (q let= (c, x))

8 2 K(fid(c)). MAK(, c) 2 �(>[x 7! PA

K1(c)(x)]).

This impact realization holds thanks to two key properties. First,our selective context-sensitivity K (Definition 10) distinguishes allthe calling contexts that matter for the queries selected by the pre-analysis. Second, the main analysis designed in Section 4 isolatesthese distinguished contexts from other undistinguished contexts(✏), ensuring that spurious flows caused by merging contexts neveradversely affect the precision of the selected query.

6. Application to Selective Relational AnalysisA general principle behind our method is that we can selectivelyimprove the precision of the analysis by using an impact pre-analysis that estimates the main static analysis of the maximalprecision. In this section, we use the same principle to develop aselective relational analysis with the octagon domain [11].

Overview Consider the following code snippet:

1 int a = b;

2 int c = input(); // User input3 for (i = 0; i < b; i++) {

4 assert (i < a); // Query 15 assert (i < c); // Query 26 }

The first query at line 4 always holds but the second one at line 5 isnot necessarily true.

A fully relational octagon analysis, which tracks contraints ofthe form ±x± y c (where c 2 Z [ {1}) between all variablesx and y, can prove the first query. The analysis infers constraintsb � a 0 at line 1 and i � b �1 at line 3. Then, combiningthe two via a closure operation [11], the analysis concludes thatconstraint i � a �1 holds at line 4. More specifically, thefully relational octagon analysis computes the table (i.e., difference

7 2014/1/92.�� 선별적으로�� 정확도를�� 높이는�� 분석

•정확도�� 상승�� 효과를�� 가늠하는�� 전분석�� 이용�� -�� 특정�� 정확도를�� 최대로�� 높인�� 분석을�� 가정�� 예)�� 모든�� 문맥�� 구분�� 분석,�� 모든�� 변수�� 관계�� 분석�� -�� 나머지�� 부분은�� 과감히�� 요약한�� 전분석�� 설계

•전분석�� 결과를�� 이용하여�� 선별적�� 정확도�� 상승�� 예)�� 선별적�� 문맥�� 구분,�� 선별적�� 변수�� 관계��

!!!!!!!!!!!!!!!!!!

main

f

f

xmalloc

xmalloc

xmalloc

xmalloc

1

1

4

6

4

6

!!!!

{a, b, i} {c}

K(f) = {1, ✏}. Note that our analysis isolates undistinguishedcontexts from distinguished ones: ✏ means only 2 or 3, not 1.

Example 1. The analysis is context-insensitive when K = �f.{✏}and fully context-sensitive when K=�f.C⇤

c

. Our selective context-sensitive analysis in Section 2 uses the following context selectorK= {main 7! {✏}, f 7! {14, 15}, g 7! {✏}, multi glob 7!{10 ·14, 10 ·15, 11}, xmalloc 7!{4 ·10 ·14, 4 ·10 ·15, 4 ·11, ✏}}.

Next, we define the abstract domain D of the analysis:

D = (CK

! S) (3)

The analysis keeps multiple abstract states at each program node c,one for each context 2 K(fid(c)). The abstract transfer functionF of the analysis works on C

K

, and it is defined as follows:

F (X)(c,) = Jcmd(c)K(G

(c0,0)!K(c,)

X(c0,0)). (4)

The static analysis computes an abstract element X 2 D satis-fying the following condition:

s

I

v X(◆, ✏) ^ 8(c,) 2 CK

. F (X)(c,) v X(c,) (5)

In general, many X can satisfy the condition in (5). Someanalyses compute the least X satisfying (5). Other analyses usea widening operator [1],

`: D ⇥ D ! D, and compute not

necessarily the least, but some solution of (5).

Example 2 (Interval Analysis). The interval analysis is a standardexample that uses a widening operator. Let I be the domain ofintervals: I = {[l, u] | l, u 2 Z [ {�1,+1} ^ l u}.Using this domain, we specify the rest of the analysis:

1. The abstract states are ? or functions from program variablesto their interval values: S = {?} [ (Var ! I)

2. The initial abstract state is: sI

(x) = [�1,+1].3. The abstract semantics of primitive commands is:

JskipK(s) = s, Jx := eK(s) =⇢

s[x 7! JeK(s)] (s 6= ?)? (s = ?)

where JeK is the abstract evaluation of the expression e:

JnK(s) = [n, n], Je1 + e2K(s) = Je1K(s)+ Je2K(s)JxK(s) = s(x), Je1 � e2K(s) = Je1K(s)� Je2K(s)

4. The last component of the analysis is a widening operator,which is defined as a pointwise lifting of the following wideningoperators

`I

: I⇥ I ! I for intervals:

[l, u]`

I

[l0, u0] = [ite(l0 < l, ite(l0 < 0,�1, 0), l),ite(u0

> u,+1, u)]

where ite(p, a, b) evaluates to a if p is true and b otherwise. Theabove widening operator uses 0 as a threshold, which is usefulwhen proving the absence of buffer overruns.

Queries Queries are triples in Q ✓ C ⇥ S ⇥ Var, and they aregiven as input to our static analysis. A query (c, s, x) representsan assertion that every reachable concrete state at node c is over-approximated by the abstract state s. The last component x de-scribes that the query is concerned with the value of the variablex. For instance, in the interval analysis, a typical query is

(c, �y. if (y = x) then [0,1] else >, x)

for some variable x. It asserts that at program node c, the variablex should always have a non-negative value. Proving the queries oridentifying those that are likely to be violated is the goal of theanalysis.

5. Impact Pre-Analysis for Finding KSuppose that we would like to develop a selective context-sensitiveanalysis in Section 4 for a given program and given queries, usingone of the existing abstract domains specified by the following data:

(S, s

I

2 S, J�K : S ! S),

To achieve our aim, we need to construct K a specification oncontext-sensitivity for the given program and queries. Once thisconstruction is done, the rest is standard. The analysis can analyzethe program under partial context-sensitivity, using the inducedabstract domain D and transfer function F : D ! D for thisprogram in (3) and (4). We assume that the analysis employs thefixpoint algorithm based on widening operation

`: D⇥ D ! D.

How should we automatically choose an effective K that bal-ances the precision and cost of the induced interprocedural anal-ysis? In this section, we give an answer to this question. In Sec-tion 5.1, we present an impact pre-analysis, which estimates thebehavior of the main analysis (S, s

I

, J�K) under full context-sensitivity. In Section 5.2, we describe how to use the results ofthis pre-analysis for constructing an effective context selector K.Throughout the section, we fix our main analysis to (S, s

I

, J�K).

5.1 Designing an Impact Pre-AnalysisAn impact pre-analysis for context sensitivity aims at estimatingthe main analysis (S, s

I

, J�K) under full context-sensitivity. It isspecified by the following data:

(S]

, s

]

I

2 S]

, J�K] : S] ! S]

, K1).

This specification and the way that the data are used in our pre-analysis are fairly standard. S] and JcmdK] are, respectively, thedomain of abstract states and the abstract semantics of cmd usedby the pre-analysis, and s

]

I

is an initial state. K1 = �f.C⇤c

is thecontext selector for full context-sensitivity. The pre-analysis usesthe abstract domain D] = C

K1 ! S] and the following transferfunction F

] : D] ! D] for the given program:

F

](X)(c,) = Jcmd(c)K](G

(c0,0)!K1 (c,)

X(c0,0)).

It computes the least X satisfying

s

]

I

v X(◆, ✏) ^ 8(c,) 2 CK

. F

](X)(c,) v X(c,) (6)

What is less standard is the soundness and efficiency conditionsfor our pre-analysis, which provides a guideline on the design ofthese pre-analyses. Let us discuss these conditions separately.

Soundness condition Intuitively, our soundness condition saysthat all the components of the pre-analysis have to over-approximatethe corresponding ones of the main analysis.1 This is identical tothe standard soundness requirement of a static program analysis,except that the condition is stated not over the concrete semanticsof a given program, but over the main analysis. The condition hasthe following four requirements:

1. There should be a concretization function � : S] ! }(S). Thisfunction formalizes the fact that an abstract state of the pre-analysis means a set of abstract states of the main analysis.

2. The initial abstract state of the pre-analysis has to overapproxi-mate the initial state of the main analysis, i.e., s

I

2 �(s]I

).

1 We design a pre-analysis as an over-approximation of the main analysis,because an under-approximating pre-analysis would be too optimistic incontext selection and the resulting selective main analysis is hardly cost-effective.

4 2014/1/11

3. The abstract semantics of commands in the pre-analysis shouldbe sound with respect to that of the main analysis:

8s 2 S, s] 2 S]

. s 2 �(s]) =) JcmdK(s) 2 �(JcmdK](s])).

4. The join operation of the pre-analysis’s abstract domain over-approximates the widening operation of the main analysis: forall X,Y 2 D and X

]

, Y

] 2 D],

(X 2 �(X]) ^ Y 2 �(Y ])) =) X

`Y 2 �(X] t Y

]).

The purpose of our condition is that the impact pre-analysisover-approximates the fully context-sensitive main analysis:

Lemma 1. Let M 2 D be the main analysis result, i.e., a solutionof (5) under full context-sensitivity (K = K1). Let P 2 D]

be the pre-analysis result, i.e., the least solution of (6). Then,8c 2 C, 2 C⇤

c

. M(c,) 2 �(P (c,)).

Efficiency condition The next condition is for the efficiency ofour pre-analysis. It consists of two requirements, and ensures thatthe pre-analysis can be computed using efficient algorithms:

1. The abstract states are ? or functions from program variablesto abstract values: S] = {?} [ (Var ! V), where V is a finitecomplete lattice (V,v

v

,?v

,>v

,tv

,uv

). An initial abstractstate is s]

I

= �x.>v

.2. The abstract semantics of primitive commands has a simple

form involving only join operation and constant abstract value,which is defined as follows:

JskipK](s) = s, Jx := eK](s) =⇢

s[x 7! JeK](s)] (s 6= ?)? (s = ?)

where JeK] has the following form: for every s 6= ?,

JeK](s) = s(x1) t . . . t s(xn

) t v

for some variables x1, . . . , xn

and an abstract value v 2 V, allof which are fixed for the given e. We denote these variablesand the value by

var(e) = {x1, . . . , xn

}, const(e) = v.

Example 3 (Impact Pre-Analysis for the Interval Analysis). Wedesign a pre-analysis for our interval analysis in Example 2,which satisfies our soundness and efficiency conditions. The pre-analysis aims at predicting which variables get associated withnon-negative intervals when the program is analyzed by an inter-val analysis with full context-sensitivity K1.

1. Let V = {?v

,F,>v

} be a lattice such that ?v

vv

F vv

>v

.

Define the function �

v

: {?v

,F,>v

} ! }(I) as follows:

�

v

(>v

) = I, �

v

(F) = {[a, b] 2 I | 0 a}, �

v

(?v

) = ;This function determines the meaning of each element in V interms of a collection of intervals. The only non-trivial caseis F, which denotes all non-negative intervals according tothis function. We include such a case because non-negativeintervals, not negative ones, prove buffer-overrun properties.

2. The domain of abstract states is defined as S] = {?}[(Var !V). The meaning of abstract states in S] is given by � such that�(?) = {?} and, for s] 6= ?,

�(s]) = {s 2 S | s = ? _ 8x 2 Var. s(x) 2 �

v

(s](x))}.

3. Initial abstract state: s]I

= > = �x.>v

.

4. Abstract evaluation JeK] of expression e: for every s 6= ?,

JnK(s)= ite(n � 0,F,>v

), Je1 + e2K(s)= Je1K(s)tv

Je2K(s)JxK(s)= s(x), Je1 � e2K(s)= >

v

The analysis approximately tracks numbers, but distinguishesthe non-negative cases from general ones: non-negative num-bers get abstracted to F by the analysis, but negative numbersare represented by >

v

. Observe that the + operator is inter-preted as the least upper bound t

v

, so that e1+e2 evaluates toF only when both e1 and e2 evaluates to F. This implementsthe intuitive fact that the addition of two non-negative intervalsgives another non-negative interval. For expressions involvingsubtractions, the analysis simply produces >

v

.

Running the pre-analysis via reachability-based algorithm Theclass of our pre-analyses enjoys efficient algorithms for computingthe least solution X that satisfies (6), even though it is fully context-sensitive. For instance, we can translate the system of such analysisequations into an inferior context-free grammar [2] and compute itsleast solution using the Knuth’s algorithm [8], or we can transformthe analysis problem into a graph reachability problem [15].

For our purpose, we provide a variant of the graph reachability-based algorithm. Our algorithm is specialized for our pre-analysisand is more efficient than the algorithm in [15] (see the end of thissubsection). In addition, our algorithm works on value-flow graphthat reveals dependencies of queries in a natural way, and henceour context selection procedure (Section 5.2) works on the resultsof this algorithm. Next, we go through each step of our algorithmwhile introducing concepts necessary to understand it. In the restof this section, we interchangeably write K for K1.

First, our algorithm constructs the value-flow graph of the givenprogram, which is a finite graph (⇥, ,!) defined as follows:

⇥ = C⇥ Var, (,!) ✓ ⇥⇥⇥

The node set consists of pairs of program nodes and variables, and(,!) is the edge relation between the nodes.

Definition 2 (,!). The value-flow relation (,!) ✓ (C ⇥ Var) ⇥(C⇥Var) links the vertices in ⇥ based on how values of variablesflow to other variables in each primitive command:

(c, x) ,! (c0, x0) iff8<

:

c ! c

0 ^ x = x

0 (cmd(c0) = skip)c ! c

0 ^ x = x

0 (cmd(c0) = y := e ^ y 6= x

0)c ! c

0 ^ x 2 var(e) (cmd(c0) = y := e ^ y = x

0)

We can extend the ,! to its context-enriched version ,!K

:

Definition 3 (,!K

). The context-enriched value-flow relation(,!

K

) ✓ (CK

⇥Var)⇥(CK

⇥Var) links the vertices in CK

⇥Var

according to the specification below:

((c,), x) ,!K

((c0,0), x0) iff8<

:

(c,) !K

(c0,0) ^ x = x

0 (cmd(c0) = skip)(c,) !

K

(c0,0) ^ x = x

0 (y 6= x

0)(c,) !

K

(c0,0) ^ x 2 var(e) (y = x

0)

(where cmd(c0) in the last two cases is y := e)

Second, the algorithm computes the interprocedurally-validreachability relation (,!†

K

) ✓ ⇥⇥⇥:

Definition 4 (,!†K

). The reachability relation (,!†K

) ✓ ⇥ ⇥ ⇥connects two vertices when one node can reach the other via aninterprocedurally-valid path:

(c, x) ,!†K

(c0, x0) i↵9,0

. (◆, ✏) !⇤K

(c,) ^ ((c,), x) ,!⇤K

((c0,0), x0).

While computing (,!†K

), the algorithm also collects the set C ofreachable nodes:

C = {c | 9. (◆, ✏) !⇤K

(c,)}. (7)

5 2014/1/11

F ] 2 (C ! S]) ! (C ! S])S] = {?} [ (Var ! {?v,F,>v})

3. The abstract semantics of commands in the pre-analysis shouldbe sound with respect to that of the main analysis:

8s 2 S, s] 2 S]

. s 2 �(s]) =) JcmdK(s) 2 �(JcmdK](s])).

4. The join operation of the pre-analysis’s abstract domain over-approximates the widening operation of the main analysis: forall X,Y 2 D and X

]

, Y

] 2 D],

(X 2 �(X]) ^ Y 2 �(Y ])) =) X

`Y 2 �(X] t Y

]).

The purpose of our condition is that the impact pre-analysisover-approximates the fully context-sensitive main analysis:

Lemma 1. Let M 2 D be the main analysis result, i.e., a solutionof (5) under full context-sensitivity (K = K1). Let P 2 D]

be the pre-analysis result, i.e., the least solution of (6). Then,8c 2 C, 2 C⇤

c

. M(c,) 2 �(P (c,)).

Efficiency condition The next condition is for the efficiency ofour pre-analysis. It consists of two requirements, and ensures thatthe pre-analysis can be computed using efficient algorithms:

1. The abstract states are ? or functions from program variablesto abstract values: S] = {?} [ (Var ! V), where V is a finitecomplete lattice (V,v

v

,?v

,>v

,tv

,uv

). An initial abstractstate is s]

I

= �x.>v

.2. The abstract semantics of primitive commands has a simple

form involving only join operation and constant abstract value,which is defined as follows:

JskipK](s) = s, Jx := eK](s) =⇢

s[x 7! JeK](s)] (s 6= ?)? (s = ?)

where JeK] has the following form: for every s 6= ?,

JeK](s) = s(x1) t . . . t s(xn

) t v

for some variables x1, . . . , xn

and an abstract value v 2 V, allof which are fixed for the given e. We denote these variablesand the value by

var(e) = {x1, . . . , xn

}, const(e) = v.

Example 3 (Impact Pre-Analysis for the Interval Analysis). Wedesign a pre-analysis for our interval analysis in Example 2,which satisfies our soundness and efficiency conditions. The pre-analysis aims at predicting which variables get associated withnon-negative intervals when the program is analyzed by an inter-val analysis with full context-sensitivity K1.

1. Let V = {?v

,F,>v

} be a lattice such that ?v

vv

F vv

>v

.

Define the function �

v

: {?v

,F,>v

} ! }(I) as follows:

�

v

(>v

) = I, �

v

(F) = {[a, b] 2 I | 0 a}, �

v

(?v

) = ;This function determines the meaning of each element in V interms of a collection of intervals. The only non-trivial caseis F, which denotes all non-negative intervals according tothis function. We include such a case because non-negativeintervals, not negative ones, prove buffer-overrun properties.

2. The domain of abstract states is defined as S] = {?}[(Var !V). The meaning of abstract states in S] is given by � such that�(?) = {?} and, for s] 6= ?,

�(s]) = {s 2 S | s = ? _ 8x 2 Var. s(x) 2 �

v

(s](x))}.

3. Initial abstract state: s]I

= > = �x.>v

.

4. Abstract evaluation JeK] of expression e: for every s 6= ?,

JnK(s)= ite(n � 0,F,>v

), Je1 + e2K(s)= Je1K(s)tv

Je2K(s)JxK(s)= s(x), Je1 � e2K(s)= >

v

The analysis approximately tracks numbers, but distinguishesthe non-negative cases from general ones: non-negative num-bers get abstracted to F by the analysis, but negative numbersare represented by >

v

. Observe that the + operator is inter-preted as the least upper bound t

v

, so that e1+e2 evaluates toF only when both e1 and e2 evaluates to F. This implementsthe intuitive fact that the addition of two non-negative intervalsgives another non-negative interval. For expressions involvingsubtractions, the analysis simply produces >

v

.

Running the pre-analysis via reachability-based algorithm Theclass of our pre-analyses enjoys efficient algorithms for computingthe least solution X that satisfies (6), even though it is fully context-sensitive. For instance, we can translate the system of such analysisequations into an inferior context-free grammar [2] and compute itsleast solution using the Knuth’s algorithm [8], or we can transformthe analysis problem into a graph reachability problem [15].

For our purpose, we provide a variant of the graph reachability-based algorithm. Our algorithm is specialized for our pre-analysisand is more efficient than the algorithm in [15] (see the end of thissubsection). In addition, our algorithm works on value-flow graphthat reveals dependencies of queries in a natural way, and henceour context selection procedure (Section 5.2) works on the resultsof this algorithm. Next, we go through each step of our algorithmwhile introducing concepts necessary to understand it. In the restof this section, we interchangeably write K for K1.

First, our algorithm constructs the value-flow graph of the givenprogram, which is a finite graph (⇥, ,!) defined as follows:

⇥ = C⇥ Var, (,!) ✓ ⇥⇥⇥

The node set consists of pairs of program nodes and variables, and(,!) is the edge relation between the nodes.

Definition 2 (,!). The value-flow relation (,!) ✓ (C ⇥ Var) ⇥(C⇥Var) links the vertices in ⇥ based on how values of variablesflow to other variables in each primitive command:

(c, x) ,! (c0, x0) iff8<

:

c ! c

0 ^ x = x

0 (cmd(c0) = skip)c ! c

0 ^ x = x

0 (cmd(c0) = y := e ^ y 6= x

0)c ! c

0 ^ x 2 var(e) (cmd(c0) = y := e ^ y = x

0)

We can extend the ,! to its context-enriched version ,!K

:

Definition 3 (,!K

). The context-enriched value-flow relation(,!

K

) ✓ (CK

⇥Var)⇥(CK

⇥Var) links the vertices in CK

⇥Var

according to the specification below:

((c,), x) ,!K

((c0,0), x0) iff8<

:

(c,) !K

(c0,0) ^ x = x

0 (cmd(c0) = skip)(c,) !

K

(c0,0) ^ x = x

0 (y 6= x

0)(c,) !

K

(c0,0) ^ x 2 var(e) (y = x

0)

(where cmd(c0) in the last two cases is y := e)

Second, the algorithm computes the interprocedurally-validreachability relation (,!†

K

) ✓ ⇥⇥⇥:

Definition 4 (,!†K

). The reachability relation (,!†K

) ✓ ⇥ ⇥ ⇥connects two vertices when one node can reach the other via aninterprocedurally-valid path:

(c, x) ,!†K

(c0, x0) i↵9,0

. (◆, ✏) !⇤K

(c,) ^ ((c,), x) ,!⇤K

((c0,0), x0).

While computing (,!†K

), the algorithm also collects the set C ofreachable nodes:

C = {c | 9. (◆, ✏) !⇤K

(c,)}. (7)

5 2014/1/11

!!!!!!!!!!!!!!!!!!

main

f

f

xmalloc

xmalloc

xmalloc

xmalloc

★

★

★

T

★

T

F ] 2 (C ! O]) ! (C ! O])

O] = {?} [ {F,>v}2|Var|⇥2|Var|

use ⇧ = {Var}, which basically prescribes to track all possible relationship be-tween variables, the analysis might have enough precision but become quicklyintractable once the input program is of non-trivial size. In the rest of the paper,we describe how to find a good packing configuration using pre-analysis, whichuses a further value abstraction of fully relational analysis (⇧ = {Var}) to detectnecessary relationship between variables.

5 Pre-analysis & Variable Packing

In this section, we explain our method for finding packing configuration (⇧) thatyields a precise yet e�cient packed octagon analysis. We design a pre-analysisthat over-approximates the main octagon analysis (Section ??). The pre-analysisis used in two ways. Based on the pre-analysis results, we select queries that havegreater chances of being proved (Section ??) and build a packing configurationthat prescribes which variable relations to keep in order to prove the selectedqueries (Section ??).

5.1 Pre-analysis Design

The goal of our pre-analysis is to estimate the behavior of the fully relationaloctagon analysis. The pre-analysis basically considers all the possible relation-ships among variables (that is, the analysis is fully relational). Yet, the analysisis practical because its abstract semantics is more approximate than that of theoctagon analysis. In this subsection, we define the abstract domain and semanticsof the pre-analysis.

Abstract Domain Just like the octagon analysis, our pre-analysis aims totrack lower and upper bounds of x+ y and x� y for all program variables x andy. However, unlike the octagon domain, our pre-analysis approximately tracksthe bounds, distinguishing only the information whether the bound can be +1or not. We use a totally ordered set V = {F,>V} (F vV >V), which forms acomplete lattice:

(V,vV,F,>V,tV,uV)

where F represents the set of finite integers and >V means integers as well as+1:

�V(F) = Z, �V(>v

) = Z1

For instance, when the octagon analysis computes constraint 1 x� y +1,our pre-analysis instead computes “abstract” constraint F vV x � y vV >V,where the lower bound 1 is approximated by F and +1 by >V. By trackingonly the binary values (F and >V), instead of the infinitely many elements inZ1, we can e�ciently run a fully relational analysis as a pre-analysis.

Formally, the domain of abstract states is the following complete lattice:

(O],v],?],>],t],u])

b - a <= ★ i - b <= ★ i - a <= ★ c - b <= ⊤ i - c <= ⊤ …

프로�� 그램

LOC문맥�� 비구분 선별적�� 문맥�� 구분 선별된��

호출지점알람�� 감소

부하알람 시간 알람 전분석 본분석 합계

bc 13,093 606 14 483 2 14 16 29�� /�� 177�� !

20% 14%

tar 20,258 940 42 799 5 42 47 51�� /�� 1,213 15% 12%

sed 26,807 1,325 108 1,238 7 110 117 25�� /�� 868 7% 8%

a2ps�� 64,590 3,682 118 2,121 30 148 178 237�� /�� 2,450 42% 51%

bison 101,807 1,894 136 1,742 35 139 174 173�� /�� 2,038 8% 28%

•선별적�� 문맥�� 구분�� 분석

프로�� 그램

LOC구문�� 기반�� 패킹 선별적�� 관계�� 분석 정확

도부하

Q 시간 팩 Q명

전분석 본분석 합계 팩

spell 2,213 1 5 119(8) 15 2 1 3 6(11) +14 -40%

barcode 4,460 16 12 276(8)�� 37 12 18 30 18(3) +21 150%

http 6,174 16 26 454(7) 26 11 5 16 8(6) +10 -38%

tar 20,258 0 1,043 1,259(8) 11 599 63 662 7(4) +11 -37%

a2ps 64,590 0 29,479 2,608(8) 11 2,224 518 2,742 6(7) +11 -91%

•선별적�� 관계�� 분석

Documents

V CFG 1. Generic Programming D F C K I S V D S 5 Pre-analysis & …rosaec.snu.ac.kr/meet/file/20140115v.pdf · 2018. 4. 12. · 1Seoul National University 2Universiteit Gent Bruno