Compositional Bitvector Analysis For Concurrent Programs

Compositional Bitvector Analysis ForConcurrent Programs With Nested Locks

Azadeh Farzan Zachary Kincaid

University of Toronto

Abstract. We propose a new technique to perform bitvector data flowanalysis for concurrent programs. Our algorithm works for concurrentprograms with nested locking synchronization. We show that this al-gorithm computes precise solutions (meet over all paths) to bitvectorproblems. Moreover, this algorithm is compositional: it first solves a lo-cal (sequential) data flow problem, and then efficiently combines thesesolutions leveraging reachability results on nested locks [5,6]. We imple-mented our algorithm on top of an existing sequential data flow analysistool, and demonstrated that the technique performs and scales well.

1 Introduction

Writing concurrent software is difficult and error prone. In principle, static anal-ysis offers an appealing way to mitigate this situation, but dealing with con-currency remains a serious obstacle. Theory and practice of automatically andstatically determining dynamic behaviours of concurrent programs lag far be-hind those for sequential programs. Enumerating all possible interleavings toperform flow-sensitive analyses is infeasible. It is imperative to formulate com-positional analysis techniques and proper behaviour abstractions to tame thisso-called interleaving explosion problem. We believe that the work presented inthis paper is a big step in this direction. We propose a compositional algorithmto compute precise solutions for bitvector problems for a general and useful classof concurrent programs.

Data flow analysis has proven to be a useful tool for debugging, maintain-ing, verifying, optimizing, and testing sequential software. Bitvector analyses(also known as the class of gen/kill problems) are a very useful subclass of dataflow analyses. Bitvector analyses have been very widely used in compiler opti-mization. There are a number of applications for precise concurrent bitvectoranalyses. To mention a few, reaching definitions analysis can be used for preciseslicing of concurrent programs with locks, which can be used as a debugging aidfor concurrent programs1. Both problems of race and atomicity violation detec-tion can be formulated as variations of the reaching definitions analysis. Lighterversions of information flow analyses may also be formulated as bitvector analy-ses. Precision will substantially decrease the number false positives reported byany of the above analyses.

1Concurrent program slicing has been discussed previously [10], but to our knowl-edge there is no method up until now that handles locks precisely.

There is an apparent lack of techniques to precisely and efficiently solve dataflow problems, and more specifically bitvector problems for concurrent programswith dynamic synchronization primitives such as locks. The source of this diffi-culty lies in the lack of a precise and efficient way to represent program paths.programs. Control flow graphs (CFG) are used to represent program paths formost static analyses on sequential programs, but concurrent analogs to CFGssuffer major disadvantages. Concurrent adaptations of CFGs mainly fall intotwo categories: (1) Those obtained by taking the Cartesian product of CFGsfor individual threads, and removing inconsistent nodes. These product CFGsare far too large (possibly even infinite) to be practical. (2) Those obtained bytaking the union of the CFGs for individual threads, adding inter-thread edges,and performing a may-happen-in-parallel heuristic to get rid of infeasible paths.These union CFGs may still have an abundance of infeasible paths and cannotbe used for precise analyses.

Bitvector problems have the interesting property that solving them preciselyis possible without analyzing whole program paths. The key observation is that,in a forward may bitvector analysis, a fact f is true at a control location c iffthere exists a path to c on which f is generated and not subsequently killed; whathappens “before” f is generated, is irrelevant. Therefore, bitvector problemsonly require reasoning about partial paths starting at a generating transition.For programs with only static synchronization (such co-begin/co-end), bitvectorproblems can be solved with a combination of sequential reasoning and a lightconcurrent predecessor analysis [8]. Under the concurrent program model in [8],a fact f holds at a control location c if and only if the control location c′ atwhich f is generated is an immediate concurrent predecessor of c. Therefore, it issufficient to only consider concurrent paths of length two to compute the precisebitvector solution. Moreover, the concurrent predecessor analysis is very simplefor co-begin/coend synchronization.

Dynamic synchronization (which was not handled in [8]) reduces the numberof feasible concurrent paths in a program, but unfortunately makes their finiterepresentation more complex. This complicates data flow analyses, since a pre-cise concurrent data flow analysis must compute the meet-over-all-feasible-pathssolution, and the analysis should only consider feasible paths (that are no longerlimited to paths of length two). Evidence of the degree of difficulty that dynamicsynchronization introduces is the fact that pairwise reachability (which can beformulated as a bitvector problem) is undecidable for recursive programs withlocks. It is however decidable [6] if the locks are acquired in a nested manner (i.e.locks are released in the reverse order that were acquired). We use this resultto introduce sound and complete abstractions for the set of feasible concurrentpaths, which are then used to compute the meet-over-all-feasible-paths solutionto the class of bitvector analyses.

We propose a compositional (and therefore scalable) technique to preciselysolve bitvector analysis problems for concurrent programs with nested locks. Theanalysis proceeds in three phases. In the first phase, we perform the sequentialbitvector analysis for each thread individually. In the second phase, we use a

sequential data flow analysis to compute an abstract semantics for each threadbased on an abstract interpretation of sequential trace semantics. We then com-bine the abstract semantics for each pair of threads to compute a second setof data flow facts, namely those who reach concurrently. In the third phase, wesimply combine the results of the sequential and concurrent phases into a soundand complete final solution for the problem. This procedure is quadratic in thenumber of threads and exponential (in the worst case) in the number of sharedlocks in the program; however, we do not expect to encounter even close to theworst case in practice. In fact, in our experiments the running time follows agrowth pattern that almost matches that of sequential programs. Our approachavoids the limitations typically imposed by concurrent adaptations of CFGs: it isscalable and compositional, in contrast with the product CFG; and it is precise,in contrast with union CFGs.

We have implemented our algorithm on top of the C language front-endCIL [17], which performs the sequential data flow analyses required by our al-gorithm. We show through experimentation that this technique scales well andhas running time close to that of sequential analysis in practice.

Related Work. Program flow analysis was originally developed for sequentialprograms to enable compiler optimizations [1]. Although the majority of flowanalysis research has been focused on sequential software [19,14,20], flow analy-sis for concurrent software has also been studied. Flow-insensitive analyses canbe directly adapted into the concurrent setting. Existing flow-sensitive analy-ses [13,15,16,18,21] have at least one of the following two restrictions: (a) theprograms they handle have extremely simplistic concurrency/synchronizationmechanisms and can be handled precisely using the union of control flow graphsof individual programs, or (b) the analysis is sound but not complete, and solvesthe data flow problem using heuristic approximations.

RADAR [2] attempts to address some of the problems mentioned above,and achieves scalability and more precision by using a race detection engine tokill the data flow facts generated and propagated by the sequential analysis.RADAR’s degree of precision and performance depends on how well the racedetection engine works. We believe that although RADAR is a good practicalsolution, it does not attempt to solve the real problem at hand, nor does itprovide any insights for static analysis of concurrent programs.

Knoop et al [8] present a bitvector analysis framework which comes closestto ours in that it can express a variety of data flow analysis problems, and givessound and complete algorithms for solving them. However, it cannot handledynamic synchronization mechanisms (such as locks). This approach has beenextended for the same restricted synchronization mechanism to handle proce-dures in [3,4,22] and generalizations of bitvector problems in [9,22].

Foundational work on nested locks appears in [5,6]. Recently, analyses basedon this work have been developed, including [7] and [11]. Notably, the authorsof [7] detect violations of properties that can be expressed as phase automata,which is a more general problem than bitvector analysis. However, their methodis not tailored to bitvector analysis, and is not practically viable when a “full”

solution (a solution for every fact and every control location) to the problem isrequired, which is often the case.

2 Preliminaries

A concurrent program CP is a pair (T ,L) consisting of a finite set of threads Tand a finite set of locks L. We represent each thread T ∈ T as a control flowautomaton (CFA). CFAs are similar to a control flow graphs, except actionsare associated with edges (which we will call transitions) rather than nodes.Formally, a CFA is a graph (NT , ET ) with a unique entry node sT and a functionstmtT : ET → Stmt that maps edges to program statements. We assume no twothreads have a common node (edge), and refer to the set of all nodes (edges) byN (E). In the following, we will often identify edges with their correspondingprogram statements. CFA statements execute atomically, so in practice we splitnon-atomic statements prior to CFA construction.

For each lock l ∈ L, we distinguish two synchronization statements acq(l)and rel(l) that acquire and release the lock l, respectively. Locks are the onlymeans of synchronization in our concurrent program model. For a path π throughthread T , we let Lock-SetT (π) denote the set of locks held by T after executingπ2.

A local run of thread T is any finite path starting at its entry node; we referto the set of all such runs by RT . A run of CP is a sequence ρ = ρ1 . . . ρn ∈ E?of edges such that:

i) ρ projected onto each thread T (denoted by ρT ), is a local run of Tii) There exists no point p along ρ at which two threads T, T ′ hold the same lock

(@T, T ′, p.T 6= T ′ ∧ Lock-SetT ((ρ1· · · ρp)T ) ∩ Lock-SetT ′((ρ1· · · ρp)T ′) 6= ∅).

We use RCP to denote the set of all runs of CP (just R when there is noconfusion about CP). For a sequence ρ = ρ1 . . . ρn ∈ E? and 1 ≤ r ≤ s ≤ n weuse ρ[r] to denote ρr, ρ[r, s] to denote ρr . . . ρs, and |ρ| to denote n.

A program CP respects nested locking if for every thread T ∈ T and for everylocal run π of T , π releases locks in the opposite order it acquires them. Thatis, there exists no l, l′ (l 6= l′) such that π contains a contiguous subsequenceacq(l); acq(l′); rel(l) when projected onto the the acquire and release transitionsof l and l′3.

From this point on, whenever we refer to a concurrent program CP, we assumethat it respects nested locking. Restricting our attention to programs that respectnested locking allows us to keep reasoning about run interleavings tractable. Wewill make critical use of this assumption in the following.

2.1 Locking Information

2Formally, Lock-SetT (π) = {l ∈ L | ∃i.πT [i] = acq(l) ∧ @j > i s.t . πT [j] = rel(l)}3this implies that locks are not re-entrant

acq(l2);acq(l1);

...rel(l1);a: ...

rel(l2);

acq(l1);acq(l2);

...rel(l2);while (...) {

if (...) {rel(l1);acq(l1);

} else {b: ... // gen "d"

}}c: ... // kill "d"

rel(l1);

Fig. 1. Locking information.

Consider the example in Figure 1. Wewould like to know whether the fact dgenerated at the location b reaches thelocation a (without being killed at loca-tion c). If the thread on the right takesthe else branch in the first execution ofthe loop, it will have to go through loca-tion c and kill the fact d before the execu-tion of the program can get to location a.However, if the program takes the thenbranch in the first iteration of the loopand takes the else branch in the secondone, then execution can follow to a without having to kill d first. This exam-ple shows that in general, the sorts of interleavings that we must consider in abitvector analysis can be quite complicated.

In [5] and [6], compositional reasoning approaches for programs that respectnested locking were introduced, which are based on local locking information. Wequickly give an overview of this here. In the following, T ∈ T denotes a thread,and ρ ∈ E?T denotes a sequence of edges of T (in practice, ρ will be a run or asuffix of a run of T ).

– Locks-HeldT (ρ, i) = {l ∈ L | ∀k ≥ i.l ∈ Lock-SetT (ρ[1, k])}: the set of locksheld continuously by T through ρ, starting no later than at position i.

– Locks-AcqT (ρ) = {l ∈ L | ∃k.ρ[k] = T :acq(l)}: the set of locks that areacquired by T along ρ.

– fahT (ρ) (forward acquisition history): a partial function which maps eachlock l whose last acquire in ρ has no matching release, to the set of locks thatwere acquired after the last acquisition of l (and is undefined otherwise)4.

– bahT (ρ, i) (backward acquisition history): a partial function which maps eachlock l that is held at ρ[i] and is released in ρ[i, |ρ|] to the set of locks that werereleased before the first release of l in ρ[i, |ρ|] (and is undefined otherwise).

We omit T subscripts for all of these functions when T is clear from the context.Of these definitions, fah and bah are the least common, so we illustrate

their use with runs from Figure 1. The run of the right thread that starts at thebeginning, enters the while loop, and takes the else branch to end at b hasforward acquisition history [l1 7→ {l2}]. If that run continues to loop, takingthe then branch and then the else branch to end back at b, that run hasforwards acquisition history [l1 7→ {}]. The run of the left thread that executesthe entire code block has backwards acquisition history [l2 7→ {}] at a and[l2 7→ {l1}; l1 7→ {}] between the acquire and release of l1.

2.2 Bitvector data flow Analysis

Let D be a finite set of data flow facts of interest. The goal of data flow analysisis to replace the full semantics by an abstract version which is tailored to deal

4Note that the domain of fahT (ρ) (denoted dom(fahT (ρ))) is exactly Lock-SetT (ρ).

with a specific problem. The abstract semantics is specified by a local semanticfunctional J·KD : E → (D → D) where for each transition t, JtKD denotes thetransfer function associated with t. J·KD gives abstract meaning to every CFAedge (program statement) in terms of a transformation function from a semi-lattice (D,u) (where u is ∪ or ∩) into itself. We will drop D and simply use JtKwhen D is clear from the context. We extend J·K from transitions to transitionsequences in the natural way: JεK = id, and JtρK = JρK ◦ JtK.

Bitvector problems can be characterized by the simplicity of their local se-mantic functional J·K: for any transition t, there exist sets gen(t) and kill(t)(⊆ D) such that JtK(D) = (D ∪ gen(t)) \ kill(t). Equivalently, for any t, JtK candecomposed into |D| monotone functions JtKi : B → B, where B is the Booleanlattice ({ff, tt},⇒).

Our goal is to compute the concurrent meet-over-paths (CMOP) value ofedge5 t of CP, defined as

CMOP [t] =l

ρt∈RCP

JρKD(>D)

CMOP [t] is the optimal solution to the data flow problem. Note in par-ticular that only runs that respect the semantics of locking contribute to thesolution. This definition is not effective, however, since RCP may be infinite; thecontribution of this work is an efficient algorithm for computing CMOP [t].

In the following, we discuss the class of intraprocedural forward may bitvectoranalyses for a concurrent program model with nested locking as our main contri-bution. In Section 6, we discuss how to adapt our techniques to other programmodels and analyses, including interprocedural analysis, backwards analysis, andan extension to a parameterized concurrent program model.

3 Concurrent Data Flow Framework

Fix a concurrent program CP with set of threads T , set of locks L, and a set ofdata flow facts D with meet ∪ (bitvector problems that use ∩ for meet can besolved using their dual problem). For a data flow fact d ∈ D, and for a transitiont, let JtKd denote JtKD projected onto d. Call a sequence π d-preserving ifJπKd = id. In particular, the empty sequence ε is d-preserving for any d ∈ D.

The following observation from [8] is the key to the efficient computation ofthe interleaving effect. It pinpoints the specific nature of a semantic functionalfor bitvector analysis, whose codomain only consists of constant functions andthe identity:

Lemma 1. [8] For a data flow fact d ∈ D, and a transition t of a concurrentprogram CP, d ∈ CMOP [t] iff there exists a run t1· · · tnt ∈ RCP and there existsk, (1 ≤ k ≤ n) such that JtkKd = consttt and for all m, (k < m ≤ n), we haveJtmKd = id.

5For the CFA formulation of data flow analysis, data flow transformation functionsand solutions correspond to transitions rather than nodes.

Call such a run a d-generating run for t, and call tk the generating transitionof that run.

(a) (b)

Fig. 2. A witness run (a) and a normal witness run (b) for definition f reaching7

This lemma restricts the possible interference within a concurrent program:if there is any interference, then the interference is due to a single statementwithin a parallel component. Consider the program in Figure 2(a). In a reachingdefinitions analysis, only transition f (of T ′) can generate the “definition at freaches” fact. For any witness trace and any fact d, we can pinpoint a singletransition that generates this fact (namely, the last occurrence of a generatingtransition on that trace). This is not true for data flow analyses which are notbitvector analyses. For example, in a null pointer dereference analysis, witnessesmay contain a chain of assignments, no single one of which is “responsible” for thepointer in question being null, but combined they make the value of a pointer null.Our algorithm critically takes advantage of the simplicity of bitvector problemsto achieve both efficiency and precision, and cannot be trivially generalized tohandle all data flow problems.

Based on Lemma 1 and the observation from [6] that runs can be projectedonto runs with fewer threads, we get the following:

Lemma 2. For a data flow fact d ∈ D, and for a transition t of thread T , thereexists a d-generating run for t if and only if one of the following holds:

– There exists a local d-generating run for t (that is, a d-generating run consist-ing only of transitions from T ). Call such a run a single-indexed d-generatingrun.

– There exists a thread T ′ (T 6= T ′) such that there is a d-generating run π fort consisting only of transitions from T and T ′ and such that the generatingtransition of π belongs to T ′. Call such a run a double-indexed d-generatingrun.

Proof. First, we note a simple fact: for any run π ∈ RCP and for any subsetof threads T ′ ⊆ T , πT ′ ∈ RCP . That is, we may take any run of CP, projectonto an arbitrary subset of threads, and obtain another run. This is clear: sincethreads initially hold no locks, not executing transitions from a thread will neverdisable transitions that would be enabled had the thread been running.

Let π be an d-generating path to t and let tg be the generating transitionof π. If tg is a transition of T , then πT is a d-generating path for t. If tg is atransition of some other thread T ′, then π{T,T ′} is a d-generating path for t. ut

Thus, to determine whether d ∈ CMOP [t] (i.e. fact d may be true at t), itis sufficient to check whether there is a single- or double-indexed d-generatingrun to t. Therefore, the precise solution to the concurrent bitvector analysisproblem can be computed by only reasoning about concurrent programs withone or two threads, so long as we consider each pair of threads in the system.The existence of a single-indexed d-generating run to t can be determined by asequential bitvector data flow analysis, which have been studied extensively.

Here, we discuss a compositional technique for enumerating the double-indexed d-generating runs. In order to achieve compositionality, we (1) char-acterize double-indexed d-generating runs in terms of two local runs, and (2)provide a procedure to determine whether two local runs can be combined intoa global run. First, we define for each thread T , each transition t of T , and eachdata flow fact d ∈ D:

– IRT [t]d = {〈π, σ〉 | πσt ∈ RT ∧ JσK = id}– GRT [t]d = {〈π, σ〉 | πσ ∈ RT ∧ Jσ[1]K = const tt ∧ Jσ[2, |σ| − 1]K =id ∧ σ[|σ|] = t}.

Intuitively, IRT [t]d and GRT ′ [t′]d correspond to sets of local runs of threadsT and T ′ which can be combined (interleaved in a lock-valid way) to create aglobal d-generating run to t. For example, in Figure 2(a), the definition at line freaches the use at line 7 (in a reaching-definitions analysis) since the local runsπ1σ1 of T and π2σ2 of T ′ can be combined into the run πσ (demonstrated in thecenter) to create a double-indexed generating run. The following proposition isthe key to our compositional approach:

Proposition 1. Assume a concurrent program CP with two threads T1 and T2.There exists a double-indexed d-generating run to transition t1 of thread T1 if andonly if there exists a transition t2 of thread T2 such that there exists 〈π1, σ1〉 ∈IRT1 [t1]d and 〈π2, σ2〉 ∈ GRT2 [t2]d and a run πσ ∈ RCP such that πT1 = π1,πT2 = π2, σT1 = σ1 and σT2 = σ2.

Since IR and GR are sets of local runs, they can be computed locally andindependently, and checked whether they can be interleaved in a second phase.However, IRT [t]d and GRT [t]d are (in the general case) infinite sets, so we needto find finite means to represent them. In fact, we do not need to know aboutall such runs: the only thing that we need to know is whether there exists ad-generating run in one thread, and a d-preserving run in the other thread that

can be combined into a lock-valid run to carry the fact d generated in one threadto a particular control location in the other thread. Proposition 2, a simpleconsequence of a theorem from [5], provides a means to represent these sets withfinite abstractions.

Proposition 2. Let CP be a concurrent program, and let T1, T2 be threads ofCP. Let π1σ1 be a local run of T1 and let π2σ2 be a local run of T2. Then thereexists a run πσ ∈ RCP with πT1 = π1, πT2 = π2, σT1 = σ1, and σT2 = σ2 if andonly if:

– Lock-Set(π1) ∩ Lock-Set(π2) = ∅– fah(π1) and fah(π2) are consistent– Lock-Set (π1σ1) ∩ Lock-Set (π2σ2) = ∅.– fah(π1σ1) and fah(π1σ2) are consistent– bah(π1σ1, |π1|) and bah(π2σ2, |π2|) are consistent– Locks-Acq(σ1) ∩ Locks-Held(π2σ2, |π2|) = ∅ and

Locks-Acq(σ2) ∩ Locks-Held(π1σ1, |π1|) = ∅.Observe that Proposition 2 states that one can check whether two local runs

can be interleaved into a global run by performing a few consistency checkson finite representations of the local lock behaviour of the two runs. In otherwords, one does not have to know what the runs are; one has to only know whatthe locking information for the runs are. Therefore, we use this information asour finite representation for the set of runs; more precisely, we use a quadrupleconsisting of two forwards acquisition histories, a backwards acquisition history,and a set of locks acquired to represent an abstract run6. Let P be the set ofall such abstract runs. We say that two run abstractions are compatible if theymay be interleaved (according to Proposition 2). We then define an abstractionfunction α : E? × E? → P that computes the abstraction of a run:

α(〈π, σ〉) = 〈fah(π),Locks-Acq(σ), fah(πσ), bah(πσ, |π|)〉

For each transition t ∈ ET and data flow fact d, this abstraction functioncan be applied to the sets IRT [t]d and GRT [t]d to yield the sets IRαT [t]d andGRαT [t]d, respectively:

IRαT [t]d = {α(〈π, σ〉) | 〈π, σ〉 ∈ IRT [t]d}GRαT [t]d = {α(〈π, σ〉) | 〈π, σ〉 ∈ GRT [t]d}

For example, the abstraction of the preserving run π1σ1 and the generatingrun π2σ2 from Figure 2(a) are (in order):

〈[l1 7→ {}]︸︷︷︸fah(π1)

, [l1, 7→ {l2}]︸︷︷︸fah(π1σ1)

, [l2 7→ {}]︸︷︷︸bah(π1σ1,|π1|)

, {l2}︸︷︷︸Locks-Acq(σ1)

〉

〈[l2 7→ {l1}]︸︷︷︸fah(π2)

, [l2 7→ {}]︸︷︷︸fah(π2σ2)

, [l2 7→ {}]︸︷︷︸bah(π2σ2,|π2|)

, {l2}︸︷︷︸Locks-Acq(σ2)

〉

6Note that for a run πσ, we can compute Lock-Set(π) as dom(fah(π)), Lock-Set(πσ)as dom(fah(πσ)), and Locks-Held(πσ, |π|) as (dom(fah(π)) ∩ dom(fah(πσ))) \Locks-Acq(σ).

The definitions of IRα and GRα, and Proposition 2 imply the followingproposition:

Proposition 3. Assume a concurrent program CP with two threads T1 and T2.There exists a double-indexed d-generating run to transition t1 of thread T1 ifand only if there exists a transition t2 of thread T2 such that there exists elementsof IRαT [t]d and GRαT ′ [t

′]d which are compatible.

For a fact d and a transition t ∈ ET , the sets IRαT [t]d and GRαT [t]d are finite,and therefore one can use Proposition 3 to provide a solution to the concurrentbitvector problem, once IRαT [t]d and GRαT [t]d have been computed. In the nextsection, we propose an optimization that provides the same solution using apotentially much smaller subsets of these sets.

3.1 Normal Runs

The sets IRαT [t]d and GRαT [t]d may be large in practice, so we introduce theconcept of normal runs to replace IRαT [t]d and GRαT [t]d with smaller subsetsthat are still sufficient for solving bitvector problems.

Definition 1. Call a double-indexed d-generating run πσt (consisting of transi-tions of threads T and T ′, where t is a transition of T ) with generating transitionσ[1] (of thread T ′) normal if:

– |σ| = 1 (that is, σ[1] is an immediate predecessor of t), or– All of the following hold:• The first T transition in σt is an acquire transition.• The last T ′ transition in σt is a release transition.• @i (1 ≤ i ≤ |σ|) such that after executing π(σ[1, i]), T frees all held locks.• @i (1 < i ≤ |σ|) such that after executing π(σ[1, i]), T ′ frees all held

locks.

Intuitively, normal runs minimize the number of transitions between the gen-erating transition and the end of the (double-indexed) run. Consider the run inFigure 2(a): it is a witness for definition at f reaching the use at 7, but it isnot normal, since σ1 does not begin with an acquire and σ2 does not end witha release. Figure 2(b) pictures a witness that uses the same transitions (excepth has been removed from the end of σ2), but which has a shorter σ component.Note that runs that are minimal in this sense are indeed normal; the reverse,however, does not hold.

Definition 2. Let CP be a concurrent program, and let T be a thread of CP.A pair 〈π, σ〉 is id-normal if πσ ∈ RT , and either

– σ = ε, or– σ[1] is an acquire transition and there is no proper prefix σ′ of σ such that

Lock-Set(πσ′) = ∅.

A pair 〈π, σ〉 is gen-normal if πσ ∈ RT , and either

– |σ| = 1, or– σ[|σ|] is a release transition and there is no proper prefix σ′ of σ such that

Lock-Set(πσ′) = ∅.

Intuitively, for a concurrent program CP with two threads T and T ′, a normaldouble-indexed d-generating run to t ∈ ET is made up of a double-indexed d-generating run to t that is id-normal when projected onto T , and gen-normalwhen projected onto T ′.

We show that it is sufficient to consider only normal runs for our analysis,by proving that the existence of a double-indexed d-generating run implies theexistence of a normal double-index d-generating run.

Lemma 3. Let t1· · · tn ∈ RT and let t′1· · · t′m ∈ RT ′ . If there is a run ρ =πσ (ρ ∈ RCP) such that there exist 1 ≤ i ≤ n and 1 ≤ j ≤ m where:

πT = t1 . . . ti σT = ti+1 . . . tn πT ′ = t′1 . . . t′j σT ′ = t′j+1 . . . t

′m

Then, the following hold:

1. If t′j+1 is not an acquire transition, then ∃π′, σ′ such that π′σ′ ∈ RCP , andπ′T = t1 . . . ti σ′T = ti+1 . . . tn π′T ′ = t′1 . . . t

′j+1 σ′T ′ = t′j+2 . . . t

′m

2. If t′m is not a release, then ∃σ′ such that πσ′ ∈ RCP is a valid run, andπ′T = t1 . . . ti σ′T = ti+1 . . . tn π′T ′ = t′1 . . . t

′j σ′T ′ = t′j+1 . . . t

′m−1

Proof. This lemma is a consequence of Lipton’s theory of movers [12]:

1. non-acquire transitions are left movers, so if the first transition from T ′ in πis not an acquire, it can be moved to the beginning of π.

2. non-release transitions are right movers, so if the last transition from T ′ inπ is not a release, it can be moved right to the end of π, and then need notbe executed to form a run.

ut

Lemma 3 is used to trim the beginning of a d-preserving run if it does notstart with an acquire (part 1) and the end of a d-generating run if it does notend in a release (part 2). The run in Figure 2(b) is obtained from the run inFigure 2(a) by an application of Lemma 3.

Lemma 4. If there is a run πσ of concurrent program CP (consisting of twothreads T and T ′) such that πT = t1 . . . ti, σT = ti+1 . . . tn, πT ′ = t′1 . . . t

′j and

σT ′ = t′j+1 . . . t′m, and if there exists k (j < k ≤ m) such that Lock-SetT ′(t′1 . . . t

′k) =

∅, then there exists a run π′σ′ of CP where π′T = t1 . . . ti, σ′T = ti+1 . . . tn,π′T ′ = t′1 . . . t

′k and σ′T ′ = t′k+1 . . . t

′m.

Similarly, if there exists k (j ≤ k ≤ m−1) such that Lock-SetT ′(t′1 . . . t′k) = ∅,

then there exists a run πσ′ where σ′T = σT and σ′T ′ = ti . . . tk.

Proof. Let ρ be the sequence of transitions in σ appearing after t′k, and let tr bethe first transition from T appearing before t′k. Then we construct π′ and σ′ asfollows:

π′ = t′1· · · t′kt1· · · tiσ′ = ti+1· · · trρ

First, we see that t′1· · · t′k is a run of T ′, and thus is a run of CP. SinceLock-Set(t′1· · · t′k) = ∅, every transition in t1· · · tr is enabled after executingt′1· · · t′k, and so t′1· · · t′kt1· · · tr is a run. Finally, since T and T ′ are in the samestates they were in after executing t′1· · · t′kt1· · · tr as they were after executingπσ up to tk, ρ is enabled and so π′σ′ is a run. ut

Lemma 4 is a consequence of Proposition 2.It is used to trim the beginning of d-preservingruns and the end of d-generating runs. The fig-ure to the right illustrates the application ofthis lemma: thread T ′ holds no locks after exe-cuting g, so transitions h, i, and j need not beexecuted. The witness pictured for definition freaching 7 corresponds to the normal witnessobtained by removing the dotted box.

The following Proposition, which is a con-sequence of Lemmas 3 and 4, implies that it issufficient to only consider normal runs for theanalysis. Therefore, we can ignore runs that arenot normal without sacrificing soundness.

Proposition 4. If there exists a double-indexed d-generating run of concurrentprogram CP leading to a state q, then there exists a normal double-indexed d-generating run of CP leading to q.

Proof. Let πσt be a d-generating path to t (of thread T ) with generating tran-sition σ[1] (of thread T ′) and such that |σ| is minimal. We show that πσt isnormal.

First, if |σ| = 1, then πσt is obviously normal. So assume |σ| > 1. If πσt failsany of the normality conditions, we may use Lemma 3 or Lemma 4 to construct ad-generating path to t with a shorter σ component, contradicting the minimalityof |σ|.

Thus, d-generating paths minimizing |σ| are normal (although the conversedoes not hold). Since any transition t with a double-indexed d-generating pathhas a double-indexed d-generating path that minimizes |σ| (by well-ordering), itfollows that t has a normal d-generating path. ut

Therefore, for any transition t and data flow fact d, we define normal versions(subsets of these sets which contain only normal runs) of IRαT [t]d and GRαT [t]das follows:

NIRT [t]d = {α(〈π, σ〉) | 〈π, σ〉 ∈ IRT [t]d ∧ (|σ| = 0∨ |σ| = 0(@k.Lock-Set(πσ[1, k]) = ∅ ∧ σ[1]is an acquire))}

NGRT [t]d = {α(〈π, σ〉) | 〈π, σ〉 ∈ GRT [t]d ∧ (|σ| = 1∨ @k.Lock-Set(πσ[1, k]) = ∅)}

NIRT [t]d (NGRT [t]d) is a finite representation of sets of normal d-preserving(generating) runs. In order to compute solutions for every data flow fact in Dsimultaneously, we extend NIRT [t]d and NGRT [t]d to sets of data flow facts.We define NIRT [t] to be a partial function that maps each abstract run p to theset of facts d for which there is a d-preserving run whose abstraction is p, andis undefined if there is no concrete run to t whose abstraction is p. NGRT [t] isdefined analogously.

4 The Analysis

Here, we summarize the results presented in previous sections into a procedurefor the bitvector analysis of concurrent programs. The procedure is outlined inAlgorithm 1. Data flow facts reaching a transition t are computed in two differentgroups as per Lemma 2: facts with single-indexed generating runs and facts withdouble-indexed generating runs.

Algorithm 1 Concurrent Bitvector Analysis1: Compute Summaries and Helper Sets // see Algorithm 22: for each T ∈ T do3: Compute MFPT // Single-indexed facts4: for each t ∈ ET do5: Compute NDIFT [t] // (Normal) double-indexed facts6: CMOPT [t] := MFPT [t] ∪NDIFT [t]7: end for8: end for

On line 6, the facts from single- and double-indexed generating runs arecombined into the solution of the concurrent bitvector analysis. Facts from single-indexed generating runs can be computed efficiently using well-known (maximumfixed point) sequential data flow analysis techniques. It remains to show howto to efficiently compute facts from double-indexed generating runs using thesummaries and helper sets which are computed at the beginning of the analysis.

A naive way to compute NDIF would involve iterating over all pairs oftransitions from different threads to find compatible elements of NIR and NGR.This would create an |E|2 factor in our algorithm, which can be quite large. Weavoid this by computing thread summaries for each thread T . The summary,

NGenT , combines information about each transition in T that is relevant forNDIF computation for other threads. More precisely, NGenT is a function thatmaps each run abstraction p to the set of facts for which there is a generatingrun whose abstraction is p. Intuitively, NGenT groups together transitions thathave similar locking information so that they can be processed at once. Thisspeeds up our analysis significantly in practice.

Algorithm 2 Computing Summaries.1: for each T ∈ T do2: Compute NGRT // Normal generating runs3: NGenT := λp.

S{NGRT (t)(p) | t ∈ ET ∧ p ∈ dom(NGRT (t)(p))}

4: Compute NIRT // Normal preserving runs5: end for

The essential procedure for finding facts in NDIFT [t] is to: 1) find compatible(abstract) runs p ∈ dom(NIRT [t]) and p′ ∈ dom(NGen′T ), and 2) add factsthat are both preserved by p and generated by p′. This procedure is elaboratedin Algorithm 3.

Algorithm 3 Compute NDIF (concurrent facts from non-predecessors)Input: Thread T , transition t ∈ ET

Output: NDIFT [t].1: NDIFT [t] := ∅2: for each T ′ 6= T in T do3: for each p ∈ dom(NIRT [t](p)) do4: NDIFT [t] := NDIFT [t] ∪ {NGenT ′ [p

′] ∩NIRT [t](p) | compatible(p, p′)}5: end for6: end for7: return NDIFT [t]

Complexity Analysis. The best known upper bound on the time complex-ity of our algorithm is quadratic in the number of threads, quadratic in the size ofthe domain, linear in the number of transitions in each thread, and double expo-nential in the number of locks. We stress that this is a worst-case bound, and weexpect our algorithm to perform considerably better in practice. Programmerstend to follow certain disciplines when using locks, which decreases the doubleexponential factor in our algorithm. For example, allowing only constant-depthnesting of locks reduces the factor to single exponential. Our experimental re-sults in Section 5 confirm that our algorithm does indeed perform very well onreal programs.

4.1 Computing NIR and NGR

It remains to show how to compute NIR and NGR. Both of these sets canbe computed by sequential data flow analyses. Since the analyses are local, wefix a thread T and refer to NIRT and NGRT as NIR and NGR. Recall fora transition t, NIR[t] and NGR[t] are defined to be partial functions mappingabstract runs to sets of dataflow facts; that is NIR,NGR ∈ P 2D. In thefollowing, we will represent a partial function f : P 2D by a subset F ⊆ P×2D

defined by F = {〈p, f(p)〉 : p ∈ dom(f)}. We adapt the partial function spaceorder and meet operation to our choice of representation as follows:

Y v X ⇐⇒ ∀p ∈ P. (∃D ⊆ D.〈p,D〉 ∈ X ⇒ ∃D′ ⊆ D.D ⊆ D′ ∧ 〈p,D′〉 ∈ Y )

X u Y = {〈a, v〉 ∈ X | @v′.〈a, v′〉 ∈ Y }∪{〈a, v〉 ∈ Y | @v′.〈a, v′〉 ∈ X}∪{a, 〈v ∪ v′〉 | 〈a, v〉 ∈ X ∧ 〈a, v′〉 ∈ Y }

Then for any transition t, we define NIR and NGR formally as:

NIR[t] =l{〈α(π, σ), JσKid(D)〉 | πσt ∈ RT ∧ normalid(α(π, σ))

NGR[t] =l{〈α(π, σ), JσK(∅)〉 | πσt ∈ RT ∧ normalgen(α(π, σ))

whereJtKid(D) = D \ (gen(t) ∪ kill(t))

normalid(π, σ) = (|σ| > 0⇒ ∃l.σ[1] = acq(l))∧ @k ≥ 1.Lock-Set(π(σ[1, k])) = ∅

normalgen(π, σ) = |σ| > 1⇒ @k < |σ| − 1. Lock-Set(π(σ[1, k])) = ∅

We show how to compute NIR[·] using a local data flow analysis; the analysisfor NGR[·] is similar. Since for any transition t, NIR[t] is represented by a subsetof P × 2D, the domain of the analysis is D = 2P×2D

, with the meet operation asdefined above. The semantic functional for the analysis is defined as:

JtKNIR(X) = filter(extend(t,X u begin(t)))

wherefilter(X) = {〈〈s, a, f, b〉, D〉 ∈ X | dom(f) 6= ∅ ∧ a 6= ∅}

begin(t) = {〈〈h, ∅, h, λx.h(x) ∩ ∅〉,D〉 | h ∈ AH[t]}7

AH[t] denotes the set of forwards acquisition histories that reach t (for-mally, AH[t] = {fah(π) | πt ∈ RT }). In practice, AH can be computed as themaximum fixedpoint solution to the dataflow problem with domain (2P ,∪) and

7equivalently, begin(t) = { 〈α(π, ε),D〉 | πt ∈ RT }

transfer functional JtKAH(P ) = {extendfah(t, p) | p ∈ P}. This is essentially acorollary of Lemma 5 and the fact that JtKAH is distributive.

extend(t,X) = {〈extendP(〈s, a, f, b〉, t), JtKid(D)〉 | 〈〈s, a, f, b〉, D〉 ∈ X}

extendP(t, 〈s, a, f, b〉) = 〈s, extendacq(t, a), extendfah(t, f), extendbah(t, b, a)〉

extendacq(t, a) = if t = acq(`) then a ∪ {`} else a

extendfah(t, f) =

λx.if x = ` then ∅ else f(x) ∪ {`} if t = acq(`)λx.if x = ` then ⊥ else f(x) if t = rel(`)f otherwise

extendbah(t, b, a) =

{λx.if x ∈ a then b(x) else b(x) ∪ {`} if t = rel(`)b otherwise

It is easy to check that JtKNIR is distributive for all t, so we can computethe meet-over-all-paths solution to the NIR dataflow analysis problem (denotedMOPNIR[t]) using standard maximum fixed point techniques. In the remainderof the section, we prove that MOPNIR coincides with NIR.

Lemma 5. For all πσt ∈ RT ,

α(π, σt) = extendP(t, α(π, σ))

Proof. This is a trivial consequence of the definitions of α and extendP , withone caveat: the backwards acquisition histories we compute with extendP areslightly different from the definition of backward acquisition histories as definedin Section 2.1. The domain of a backwards acquisition history is the set of locksthat are held when σ starts rather than the set of locks that are held whenσ starts and released along σ. This is a minor technical detail that makes theanalysis slightly easier to describe. But but the difference is immaterial, becausethe two formulations represent the same information for our purposes. As such,we abuse notation and claim equality. ut

For convenience, we introduce a new functional ef : ET → 2P×2D → 2P×2D

defined byef [t](X) = filter(extend(t,X))

We may extend ef to operate on sequences of transitions in the natural way:ef [ε](X) = X and ef [tσ](X) = ef [σ](ef [t](X)).

Lemma 6. Let πσt ∈ RT and D ⊆ D. 〈π, σ〉 is id-normal iff

ef [σ]({〈α(π, ε), D〉}) = {〈α(π, σ), JσKid(D)〉}

Proof. First note that, given Lemma 5, it is clear that if ef [σ]({〈α(π, ε), D〉}) isnot ∅, then ef [σ]({〈α(π, ε), D〉}) = {〈α(π, σ), JσKid(D)〉}.

If σ = ε, the result is trivial. So let |σ| ≥ 1. Note that ef [σ]({〈α(π, ε), D〉}) =∅ iff there exists a prefix σ′t of σ such that ef [σ′t]({〈α(π, ε), D〉}) = ∅. If σ′ isthe shortest prefix with this property, we have

ef [σ′t]({〈α(π, ε), D〉}) = filter(extend(t, ef [σ]({〈α(π, ε), D〉})))= filter({〈α(π, σ′t), Jσ′tKid(D))

So ef [σ]({〈α(π, ε), D〉}) = ∅ iff there exists a prefix σ′t of σ such thatdom(fah(πσ)) = Lock-Set(πσ) = ∅ or Locks-Acq(σ′t) = ∅. Noting that Locks-Acq(σ′t) =∅ iff σ[1] is not an acquire, we have that ef [σ]({〈α(π, ε), D〉}) = ∅ iff 〈π, σ〉 isnot id-normal, so the lemma holds for all σ of length at least 1. ut

Lemma 7. For all tσ ∈ E?T ,

JtσKNIR(∅) = JσKNIR(∅) u ef [tσ](begin(t)).

Proof. By induction on σ.

– Base case: σ = ε

JtσKNIR(∅) = JtKNIR(∅)= ef [t](∅ u begin(t))= ef [t](begin(t))= JσKNIR(∅) u ef [tσ](begin(t))

– Induction: let σ = σ′t′, and assume Jtσ′KNIR(∅) = Jσ′KNIR(∅)uef [σ′](begin(t)).

Jtσ′t′KNIR(∅) = Jt′KNIR(Jtσ′KNIR(∅))= Jt′KNIR(Jσ′KNIR(∅) u ef [σ′](begin(t)))= Jt′KNIR(Jσ′KNIR(∅)) u ef [t′](ef [σ′](begin(t)))= Jσ′t′KNIR(∅) u ef [σ′t′](begin(t))

We may now state the main result of this section:

Proposition 5. For all t ∈ ET ,

NIR[t] = MOPNIR[t] u beginNIR(t)

Proof. First, we note that NIR[t] = MOPNIR[t] u beginNIR(t) iff for all p ∈P and d ∈ D, 〈p, {d}〉 w NIR[t] ⇐⇒ 〈p, {d}〉 w MOPNIR[t] ∨ 〈p, {d}〉 wbeginNIR(t).

We begin by proving the “⇒” direction. Let 〈p, {d}〉 w NIR[t]. Then thereexists an id-normal d-preserving run 〈π, σ〉 to t such that α(π, σ) = p. Distinguishtwo cases:

1. Case: σ = ε. Then fah(π) ∈ AH[t] and 〈p,D〉 ∈ begin(t), whence {〈p, {d}〉} wbegin(t).

2. Case: |σ| ≥ 1. Then π is a run that ends at σ[1], so fah(π) ∈ AH[σ[1]] and〈α(π, ε),D〉 ∈ begin(σ[1]).Since 〈π, σ〉 is id-normal, it follows from Lemma 6 that ef [σ]({〈α(π, ε), {d}〉}) ={〈α(π, σ), JσKid({d})〉} = {〈p, {d}〉}, and from Lemma 7 that

JπσKNIR(∅) v JσKNIR(∅)= JσK(∅) u ef [σ](begin(σ[1]))v ef [σ](begin(σ[1]))v ef [σ]({〈α(π, ε), {d}〉})= {〈p, {d}〉}.

Since JπσKNIR wMOP [t]NIR, we have {〈p, {d}〉} wMOP [t]NIR.

Now we prove the “⇐” direction.

1. Let 〈p, {d}〉 w begin(t). Then there exists some run π to t with α(π, ε) = p.Since for any π, 〈π, ε〉 is id-normal and d-preserving, 〈p, {d}〉 w NIR[t].

2. Let 〈p, {d}〉 w MOPNIR[t]. Then there exists some ρ ∈ RT such that〈p, {d}〉 w JρKNIR(∅).Let σ by the shortest suffix of ρ such that 〈p, {d}〉 w JσKNIR(∅).By Lemma 7, JσKNIR(∅) = Jσ[2, |σ|]KNIR(∅) u ef [σ](begin(σ[1])). By theminimality of σ, we must have 〈p, {d}〉 w ef [σ](begin(σ[1])). Since ef actspointwise, there exists some p′ such that 〈p′,D〉 ∈ begin(σ[1]) and 〈p, {d}〉 wef [σ]({〈p′,D〉).Since 〈p′,D〉 ∈ begin(σ[1]), there exists some π such that α(π, ε) = p′.Since 〈p, {d}〉 w ef [σ]({〈p′,D〉}), 〈π, σ〉 is id-normal (since ef [σ]({〈p′,D〉}) isnonempty) and is d-preserving (since d ∈ JσKid(D)). It follows that 〈p, {d}〉 =〈α(π, σ), {d}〉 w NIR[t].

Combining both directions, we get the desired proposition. ut

Having finished the development for NIR analysis, we will now define thetransfer function for the NGR analysis.

JtKNGR(X) = filterNGR(extendNGR(t,X u begin(t)))

whereextendNGR(t,X) = {〈extendP(t, a), JtK(D)〉|〈a,D〉 ∈ X}

filterNGR(X) = {〈〈s, a, f, b〉, D〉 ∈ X | dom(f) 6= ∅}

We can now state a proposition analogous to Proposition 5 for NGR analysis.The proof follows a similar development to the one for Proposition 5.

Proposition 6. For all t ∈ ET ,

NGR[t] = JtKNGR(MOPNGR[t])

5 Case Study

We implemented the intraprocedural version of our algorithm and evaluated itsperformance on a nontrivial concurrent program. Our experiments indicate thatour algorithm scales well in practice; in particular, its performance appears to beonly weakly dependent on the number of threads. This is remarkable, consideringthe program analysis community’s historical difficulties with multithreaded code.

The algorithm is implemented in OCaml, and is applicable to C programsusing the pthreads library for thread operations. We use the CIL program analysisinfrastructure for parsing, CFG construction, and sequential data flow analysis.The algorithm is parameterized by a module that specifies the gen/kill setsfor each instruction, so lifting sequential bitvector analyses to handle threadsand locking is completely automatic. We implemented a reaching definitionsanalysis module and instantiated our concurrent bitvector analysis with it; thisconcurrent reaching definitions analysis was the subject of our evaluation.

We evaluated the performance our algorithm on FUSE, a Unix kernel moduleand library that allows filesystems to be implemented in userspace programs.FUSE exposes parts of the kernel that are relevant to filesystems to userspaceprograms, essentially acting as a bridge between userspace and kernelspace. Weanalyzed the userspace portion.

Name |T | |N | |E| |D| |L| Time

5 avg 5 2568.1 3037.9 208.9 1.0 1.05 med 5 405.0 453.0 62.0 1.0 0.110 avg 10 4921.9 5820.1 401.6 1.3 1.810 med 10 988.5 1105.0 155.5 1.0 0.150 avg 50 24986.3 29546.0 2047.0 3.1 10.750 med 50 22628.5 26607.0 2022.5 3.0 4.6200a 200 79985 94120 6861 6 36.4200b 200 119905 142248 9515 4 116.5full 425 218284 258219 17760 6 347.8

Since our implementationcurrently supports only intrapro-cedural analyses, we inlined allof the procedures defined withinFUSE and ignored calls to li-brary procedures that did notacquire or release locks. We dida type-based must-alias anal-ysis to finitize the set of locksand shared variables. Some pro-cedures in the program had the (implicit) precondition that callers must holda particular lock or set of locks at each call site; these 35 procedures could notbe considered to be threads because they did not respect nested locking whenconsidered independently. Each of the remaining 425 procedures was consideredto be a distinct thread in our analysis. We divided these procedures into groupsof 5 procedures, and analyzed each of those separately (that is, we analyzed theprogram consisting of procedures 1-5, 6-10, 11-15, etc). We repeated this processwith groups of 10, 50, 100, 200, and also analyzed the entire program. We presentmean and median statistics for the groups of 5, 10, 50, and 100 procedures. Theexperiments were conducted on a 3.16 GHz Linux machine with 4GB of memory.

The |T |, |N |, |E|, |D|, |L|, and Time columns indicate number of threads,number of CFA nodes, number of CFA edges, number of data flow facts, numberof locks, and running time (in seconds), respectively. As a result of the inliningstep, there was a very large size gap between the smallest and the largest proce-

dures that we analyzed, which we believe accounts for the discrepancy betweenthe mean and median statistics.

In Figure 3(a), we observe that the running time of our algorithm appearsto grow quadratically in the number of threads in the program. However, thedispersion is quite high, which suggests that the running time has a weak re-lationship with the number of threads in the program. Indeed, the apparentquadratic relationship can be explained by the fact that the points that containmore threads also contain more total CFA edges. Figure 3(b) shows the runningtime of our algorithm as a function of total number of CFA edges in the program,which is a much tighter fit.

(a)

(b)

(c)

Fig. 3. Running time.

Figure 3(c) shows the runningtime of our algorithm as a func-tion of the product of the numberof CFA edges and the domain sizeof the program. This relationshipis interesting because the time com-plexity of sequential bitvector anal-ysis is O(|E| · |D|). Our resultsindicate that there is a linear re-lationship between the running timeof our algorithm and the prod-uct of the number of CFA edgesand domain size of the program,which suggests that our algorithm’srunning time is proportional to|E| · |D| in practice.

Our empirical analysis is notcompletely rigorous. In particu-lar, our data points are not in-dependent and our treatment ofmemory locations is not conser-vative. However, we believe thatthe results obtained are promis-ing and suggest that the algorithmcan be used as the basis for fur-ther work on data flow analysisfor concurrent programs.

6 Generalizations

The results presented in this paper focus on forward intra-procedural bitvectoranalyses of concurrent programs, with a fixed set of threads and a fixed set oflocks. We leave the generalization of the bitvector analyses to more expressive

classes of analyses as future work. The other constraints, however, were intro-duced for expository purposes rather than some limitation of our approach. Inthis Section, we discuss generalizations of our methods that remove these con-straints.

6.1 Dynamic Lock and thread creation

Our system model does not account for lock aliasing or dynamic lock creation.It is straightforward to extend our model to handle both features by leveraginga conservative pointer analysis for lock variables. Essentially, we may ignoreacquisitions and releases of lock variables that cannot be resolved to a single“actual” lock. Since ignoring locks leads to more program behaviors, the resultinganalysis sound. More precise handling of lock aliases is left as future work.

In our system model, each thread is created simultaneously at the beginningof the program. For some programs, this is the correct behavior; for example,libraries such as FUSE (see Section 5), which are not equipped with programentry points, must assume that every thread may start at the beginning of theprogram, because this assumption conservatively approximates any program thatcreates threads dynamically. So by default, our analysis produces safe solutionsfor programs that dynamically create a finite set of threads. If a program doesnot have a fixed number of threads (for example, the program creates threadsin a loop), since the number of thread creation points is still fixed, and we canuse our results on parameterization to model the program behavior soundly. Weleave more precise handling of thread creation and deletion as future work.

6.2 Function Calls and Recursion

Throughout the paper, we have presented our system model such that eachthread is a finite state automaton. This means no function calls and, naturally,no recursion. All the ideas presented in this paper extend to the case that threadsare pushdown systems, as long as we keep the principle of nested locking intact.The reachability and decomposition results of [5,6] hold for the case of push-down systems, and we need to only extend the notion of control locations toconfigurations of pushdown systems (a control location plus the content of thestack). The analysis stays essentially the same – the only major difference isthat the sequential data flow analyses employed by the algorithm must now beinter-procedural sequential data flow analyses.

6.3 Parameterization

One advantage of data flow analysis is that it can be applied to open programs;that is, programs that do not specify an entry point. This is useful for tworeasons: one, there are many interesting programs that are open (e.g., libraries);two, it enables analyses to take advantage of programs that are composed ofmodules. Open programs are often problematic for program analyses because the

number of threads that are active in the program is not known a priori – analysesmust be correct with respect to any number of threads in the program. Thus,for open programs, the ideal solution to a data flow problem the parameterizedconcurrent meet-over-paths solution (PCMOP). For a concurrent program CPwith threads T1, . . . , Tn, for any transition t ∈ Σ, PCMOP can be defined asfollows:

PCMOP [t] =l

k∈Nn

CMOPk[t]

where CMOP〈k1,...,kn〉[t] is the CMOP solution at t for the concurrent programconsisting of k1, k2, . . . , kn distinct copies of T1, T2, . . . Tn, respectively. That is,the PCMOP solution is the meet over all paths of all possible concurrent pro-grams that can be constructed using any number of copies the threads in CP.

Computing the PCMOP solution to a bitvector problem poses no additionaldifficulty for our algorithm. Indeed, we expect that computing the PCMOPsolution is sometimes even easier than computing the CMOP solution. This islargely a consequence of Lemma 2: as far as the CMOP solution is concerned, 2copies of a thread is indistinguishable from k threads, for any k ≥ 2. Next, notethat in the presentation of our algorithm, special care was taken to assure thatwe did not consider concurrent executions of a single thread. If this restriction isremoved, then the algorithm with compute the PCMOP solution. Moreover, wemay collapse the NGen thread summaries even further into program summaries,which can speed up the algorithm.

6.4 Backwards analysis

It is straightforward to adapt our techniques to handle backwards bitvector anal-yses such as liveness and very busyness analysis. Traditionally, backwards dataflow analyses can be solved using forward data flow analyses on reverse flowgraphs. This method is not directly applicable in the concurrent case becauseof complications from synchronization and deadlocks. However, backwards ana-logues of all our results do hold, and optimal solutions to backwards analysescan be obtained by an algorithm similar to the one presented in Section 4.

7 Application and Future work

We discussed a number of very important applications of bitvector analysis inSection 1. One of the most exciting applications of our precise bitvector frame-work (in our opinion) is our ongoing work on studying more suitable abstractionsfor concurrent programs. Intuitively, by computing the solution to reaching-definitions analysis for a concurrent program, we can collect information abouthow program threads interact. We are currently working on using this informa-tion to construct abstractions to be used for more powerful concurrent programanalyses, such as computing state invariants for concurrent libraries. The preci-sion offered by our concurrent bitvector analysis approach is quite important inthis domain, because it affects both the precision of the invariants that can becomputed, and the efficiency of their computation.

References

1. Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: principles, tech-niques, and tools. Addison-Wesley Longman Publishing Co., Inc., 1986.

2. R. Chugh, J. Voung, R. Jhala, and S. Lerner. Dataflow analysis for concurrentprograms using datarace detection. In PLDI, pages 316–326, 2008.

3. Javier Esparza and Jens Knoop. An automata-theoretic approach to interprocedu-ral data-flow analysis. In FoSSaCS ’99, pages 14–30, London, UK, 1999. Springer-Verlag.

4. Javier Esparza and Andreas Podelski. Efficient algorithms for pre* and post* oninterprocedural parallel flow graphs. In POPL ’00, pages 1–11, New York, NY,USA, 2000. ACM.

5. V. Kahlon and A. Gupta. On the analysis of interacting pushdown systems. InPOPL, pages 303–314, 2007.

6. V. Kahlon, F. Ivancic, and A. Gupta. Reasoning about threads communicatingvia locks. In CAV, pages 505–518, 2005.

7. Nicholas Kidd, Peter Lammich, Tayssir Touili, and Thomas Reps. A decision pro-cedure for detecting atomicity violations for communicating processes with locks.In SPIN ’09, pages 125–142, Berlin, Heidelberg, 2009. Springer-Verlag.

8. J. Knoop, B. Steffen, and J. Vollmer. Parallelism for free: Efficient and optimalbitvector analyses for parallel programs. ACM Transactions on Programming Lan-guages and Systems, 18(3):268–299, 1996.

9. Jens Knoop. Parallel constant propagation. In Euro-Par ’98, pages 445–455,London, UK, 1998. Springer-Verlag.

10. Jens Krinke. Static slicing of threaded programs. SIGPLAN Not., 33(7):35–42,1998.

11. Peter Lammich and Markus Muller-Olm. Conflict analysis of programs with proce-dures, dynamic thread creation, and monitors. In SAS ’08, pages 205–220, Berlin,Heidelberg, 2008. Springer-Verlag.

12. Richard J. Lipton. Reduction: a method of proving properties of parallel programs.Commun. ACM, 18(12):717–721, 1975.

13. S. P. Masticola and B. G. Ryder. Non-concurrency analysis. In PPOPP, pages129–138, New York, NY, USA, 1993.

14. S. S. Muchnick. Advanced Compiler Design and Imlementation. Morgan Kauf-mann, 1997.

15. G. Naumovich and G. S. Avrunin. A conservative data flow algorithm for detectingall pairs of statements that may happen in parallel. In SIGSOFT/FSE-6, pages24–34, New York, NY, USA, 1998.

16. G. Naumovich, G. S. Avrunin, and L. A. Clarke. An efficient algorithm for com-puting mhp information for concurrent java programs. In ESEC/FSE-7, pages338–354, London, UK, 1999.

17. G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. Cil: Intermediate languageand tools for analysis and transformation of c programs. In CC, pages 213–228,2002.

18. R. Netzer and B. Miller. Detecting data races in parallel program executions. InAdvances in Languages and Compilers for Parallel Computing, 1990 Workshop,pages 109–129, Irvine, Calif., 1990.

19. F. Nielson and H. Nielson. Type and effect systems. In Correct System Design,pages 114–136, 1999.

20. T. Reps, S. Schwoon, S. Jha, and D. Melski. Weighted pushdown systems andtheir application to interprocedural dataflow analysis. Sci. Comput. Program.,58(1-2):206–263, 2005.

21. A. Salcianu and M. Rinard. Pointer and escape analysis for multithreaded pro-grams. In PPoPP, 2001.

22. Helmut Seidl and Bernhard Steffen. Constraint-based inter-procedural analysisof parallel programs. In ESOP ’00, pages 351–365, London, UK, 2000. Springer-Verlag.

Documents

Compositional Bitvector Analysis For Concurrent Programs