Shared Memory Concurrent System Veri cation using ...interleavings can be studied with a matrix calculus is a novel approach in this research area. Our sparse matrix representations

Shared Memory Concurrent System Verificationusing Kronecker Algebra

Technical Report 183/1-155

Robert Mittermayr and Johann Blieberger

Institute of Computer-Aided Automation, TU Vienna, Austria

Abstract. The verification of multithreaded software is still a challenge.This comes mainly from the fact that the number of thread interleav-ings grows exponentially in the number of threads. The idea that threadinterleavings can be studied with a matrix calculus is a novel approachin this research area. Our sparse matrix representations of the programare manipulated using a lazy implementation of Kronecker algebra. Onegoal is the generation of a data structure called Concurrent ProgramGraph (CPG) which describes all possible interleavings and incorporatessynchronization while preserving completeness. We prove that CPGs ingeneral can be represented by sparse adjacency matrices. Thus the num-ber of entries in the matrices is linear in their number of lines. Henceefficient algorithms can be applied to CPGs. In addition, due to syn-chronization only very small parts of the resulting matrix are actuallyneeded, whereas the rest is unreachable in terms of automata. Thanks tothe lazy implementation of the matrix operations the unreachable partsare never calculated. This speeds up processing significantly and showsthat this approach is very promising.Various applications including data flow analysis can be performed onCPGs. Furthermore, the structure of the matrices can be used to proveproperties of the underlying program for an arbitrary number of threads.For example, deadlock freedom is proved for a large class of programs.

1 Introduction

With the advent of multi-core processors scientific and industrial interest focuseson the verification of multithreaded applications. The scientific challenge comesfrom the fact that the number of thread interleavings grows exponentially ina program’s number of threads. All state-of-the-art methods, such as modelchecking, suffer from this so-called state explosion problem. The idea that threadinterleavings can be studied with a matrix calculus is new in this research area.We are immediately able to support conditionals, loops, and synchronization.Our sparse matrix representations of the program are manipulated using a lazyimplementation of Kronecker algebra. Similar to [3] we describe synchronizationby Kronecker products and thread interleavings by Kronecker sums. One goalis the generation of a data structure called Concurrent Program Graph (CPG)which describes all possible interleavings and incorporates synchronization while

arX

iv:1

109.

5522

v1 [

cs.L

O]

26

Sep

2011

preserving completeness. Similar to CFGs for sequential programs, CPGs mayserve as an analogous graph for concurrent systems. We prove that CPGs ingeneral can be represented by sparse adjacency matrices. Thus the number ofentries in the matrices is linear in their number of lines.

In the worst-case the number of lines increases exponentially in the numberof threads. Especially for concurrent programs containing synchronization thisis very pessimistic. For this case we show that the matrix contains nodes andedges unreachable from the entry node.

We propose two major optimizations. First, if the program contains a lot ofsynchronization, only a very small part of the CPG is reachable. Our lazy imple-mentation of the matrix operations computes only this part (cf. Subsect. 3.6).Second, if the program has only little synchronization, many edges not accessingshared variables will be present, which are reduced during the output process ofthe CPG (cf. Subsect. 3.7). Both optimizations speed up processing significantlyand show that this approach is very promising.

We establish a framework for analyses of multithreaded shared memory con-current systems which forms a basis for analyses of various properties. Differ-ent techniques including dataflow analysis (e.g. [23–25, 14]) and model checking(e.g. [6, 9] to name only a few) can be applied to the generated Concurrent Pro-gram Graphs (CPGs) defined in Section 3. Furthermore, the structure of thematrices can be used to prove properties of the underlying program for an arbi-trary number of threads. For example in this paper, deadlock freedom is provedfor p-v-symmetric programs.

Theoretical results such as [21] state that synchronization-sensitive and con-text-sensitive analysis is impossible even for the simplest analysis problems. Oursystem model differs in that it supports subprograms only via inlining and re-cursions are impossible.

The outline of our paper is as follows. In Section 2 control flow graphs, edgesplitting, and Kronecker algebra are introduced. Our model of concurrency, itsproperties, and important optimizations like our lazy approach are presented inSection 3. In Section 4 we give a client-server example with 32 clients showingthe efficiency of our approach. For a matrix with a potential order of 1015 ourlazy approach delivers the result in 0.43s. Section 5 demonstrates how dead-lock freedom is proved for p-v-symmetric programs with an arbitrary number ofthreads. An example for detecting a data race is given in Section 6. Section 7 isdevoted to an empirical analysis. In Section 8 we survey related work. Finally,we draw our conclusion in Section 9.

2 Preliminaries

2.1 Overview

We model shared memory concurrent systems by threads which use semaphoresfor synchronization. Threads and semaphores are represented by control flowgraphs (CFGs). Edge Splitting has to be applied to the edges of thread CFGs

2

that access more than one shared variable. Edge splitting is straight forwardand is described in Subsect. 2.3. The resulting Refined CFGs (RCFGs) are rep-resented by adjacency matrices. These matrices are then manipulated by Kro-necker algebra. We assume that the edges of CFGs are labeled by elements of asemiring. Details follow in this subsection. Similar definitions and further prop-erties can be found in [16].

Semiring 〈L,+, ·, 0, 1〉 consists of a set of labels L, two binary operations +and ·, and two constants 0 and 1 such that

1. 〈L,+, 0〉 is a commutative monoid,2. 〈L, ·, 1〉 is a monoid,3. ∀l1, l2, l3 ∈ L : l1 · (l2 + l3) = l1 · l2 + l1 · l3 and (l1 + l2) · l3 = l1 · l3 + l2 · l3

hold and4. ∀l ∈ L : 0 · l = l · 0 = 0.

Intuitively, our semiring is a unital ring without subtraction. For each l ∈ Lthe usual rules are valid, e.g., l + 0 = 0 + l = l and 1 · l = l · 1 = l. In additionwe equip our semiring with the unary operation ∗. For each l ∈ L, l∗ is definedby l∗ =

∑j≥0l

j , where l0 = 1 and lj+1 = lj · l = l · lj for j ≥ 0. Our set of labelsL is defined by L = LV ∪ LS, where LV is the set of non-synchronization labelsand LS is the set of labels representing semaphore calls. The sets LV and LS

are disjoint. The set LS itself consists of two disjoint sets LSp and LSv . The firstdenotes the set of labels referring to P-calls, whereas the latter refers to V-callsof semaphores.

Examples for semirings include regular expressions (cf. [26]) which can beused for performing dataflow analysis.

2.2 Control Flow Graphs

A Control Flow Graph (CFG) is a directed labeled graph defined by G =〈V,E, ne〉 with a set of nodes V , a set of directed edges E ⊆ V × V , and aso-called entry node ne ∈ V . We require that each n ∈ V is reachable througha sequence of edges from ne. Nodes can have at most two outgoing edges. Thusthe maximum number of edges in CFGs is 2 |V |. We will use this property later.

Usually CFG nodes represent basic blocks (cf. [1]). Because our matrix calcu-lus manipulates the edges we need to have basic blocks on the edges.1 Each edgee ∈ E is assigned a basic block b. In this paper we refer to them as edge labelsas defined in the previous subsection. To keep things simple we use edges, theirlabels and the corresponding entries of the adjacency matrices synonymously.

In order to model synchronization we use semaphores. The correspondingedges typically have labels like p1 and v1, where px and vx ∈ LS. Usually twoor more distinct thread CFGs refer to the same semaphore to perform synchro-nization. The other labels are elements from LV. The operations on the basicblocks are ·,+, and ∗ from the semiring defined above (cf. [26]). Intuitively, ·,+,and ∗ model consecutive basic blocks, conditionals, and loops, respectively.

1 We chose the incoming edges.

3

(a) Binary Semaphore (b) CountingSemaphore

Fig. 1. Semaphores

In Fig. 1(a) and 1(b) a binary and a counting semaphore are depicted. Thelatter allows two threads to enter at the same time. In a similar way it is possibleto construct semaphores allowing n non-blocking P-calls.

2.3 Edge Splitting

A basic block consists of multiple consecutive statements without jumps. Forour purpose we need a finer granularity as we would have with basic blocksalone. To achieve the required granularity we need to split edges. Shared variableaccesses and semaphore calls may occur in basic blocks. For both it is necessaryto split edges. This ensures a representation of possible context switches in amanner exact enough for our purposes. We say “exact enough” because by usingbasic blocks together with the above refinement, we already have coarsened theanalysis compared to the possibilities on statement-level. Furthermore we do notlose any information required for the completeness of our approach. Anyway,applying this procedure to a CFG, i.e. splitting edges in a CFG, results in aRefined Control Flow Graph (RCFG).

Let V be the set of shared variables. In addition, let a shared variable v ∈ V bea volatile variable located in the shared memory which is accessed by two or morethreads. Splitting an edge depends on the number of shared variables accessedin the corresponding basic block. For edge e this number is being referred to asNSV(e). In the same way we refer to NSV(b) as the number of shared variablesaccessed in basic block b. If NSV(e) > 1, edge splitting has to be applied to edgee; the edge is used unchanged otherwise.

If edge splitting has to be applied to edge e which has basic block b assignedand NSV(b) = k then the basic blocks b1, . . . , bk represent the subsequent partsof b in such a way that ∀bi : NSV(bi) = 1, where 1 ≤ i ≤ k. Edges ej get assignedbasic block bj , where 1 ≤ j ≤ k. In Fig. 2 the splitting of an edge with basicblock b and NSV(b) = k is depicted.

For semaphore calls (e.g. p1 and v1) edge splitting is required in a similarfashion. In contrast to shared variable accesses we require that semaphore callshave to be the only statement on the corresponding edge. The remaining consec-

4

Fig. 2. Edge Splitting for Shared Variable Accesses

utive parts of the basic block are situated on the previous and succeeding edges,respectively.2

The effects of edge splitting for shared variables and semaphore calls can beseen in the data race example given in Section 6. Each RCFG depicted in Fig. 12is constructed out of one basic block (cf. Fig. 11).

Note that edge splitting ensures that we can model the minimal requiredcontext switches. The semantics of a concurrent programming language allowsusually more. For example consider an edge in a RCFG containing two consec-utive statements, where both do not access shared variables. A context switchmay happen in between. However, this additional interleaving does not providenew information. Hence our approach provides the minimal number of contextswitches.

Without loss of generality we assume that the statements in each basic blockare atomic. Thus, we assume while executing a statement, context switching isimpossible. In RCFGs the finest possible granularity is at statement-level. If,according to the program’s semantic, atomic statements may access two or moreshared variables, then we make an exception to the above rule and allow twoor more shared variable accesses on a single edge. Such edges have at most oneatomic statement in their basic block. The Kronecker sum (which is introducedin the next subsection) ensures that all interleavings are generated correctly.

2.4 Synchronization and Generating Interleavings with KroneckerAlgebra

Kronecker product and Kronecker sum form Kronecker algebra. In the followingwe define both operations, state properties, and give examples. In addition, forthe Kronecker sum we prove a property which we call Mixed Sum Rule.

We define the set of matricesM = {M = (mi,j) |mi,j ∈ L}. In the remainingparts of this paper only matrices M ∈ M will be used, except where statedexplicitly. Let o(M) refer to the order3 of matrix M ∈ M. In addition we willuse n-by-n zero matrices Zn = (zi,j), where ∀i, j : zi,j = 0.

2 Note that edges representing a call to a semaphore are not considered to accessshared variables.

3 A k-by-k matrix is known as square matrix of order k.

5

Definition 1 (Kronecker product). Given a m-by-n matrix A and a p-by-qmatrix B, their Kronecker product denoted by A⊗B is a mp-by-nq block matrixdefined by

A⊗B =

a1,1B · · · a1,nB...

. . ....

am,1B · · · am,nB

.

Example 1.

Let A =

(a1,1 a1,2a2,1 a2,2

)and B =

b1,1 b1,2 b1,3b2,1 b2,2 b2,3b3,1 b3,2 b3,3

. The Kronecker product C =

A⊗B is given by

a1,1b1,1 a1,1b1,2 a1,1b1,3 a1,2b1,1 a1,2b1,2 a1,2b1,3a1,1b2,1 a1,1b2,2 a1,1b2,3 a1,2b2,1 a1,2b2,2 a1,2b2,3a1,1b3,1 a1,1b3,2 a1,1b3,3 a1,2b3,1 a1,2b3,2 a1,2b3,3a2,1b1,1 a2,1b1,2 a2,1b1,3 a2,2b1,1 a2,2b1,2 a2,2b1,3a2,1b2,1 a2,1b2,2 a2,1b2,3 a2,2b2,1 a2,2b2,2 a2,2b2,3a2,1b3,1 a2,1b3,2 a2,1b3,3 a2,2b3,1 a2,2b3,2 a2,2b3,3

.

As stated in [18] the Kronecker product is also being referred to as Zehfussproduct or direct product of matrices or matrix direct product. 4

In the following we list some basic properties of the Kronecker product. Proofsand additional properties can be found in [2, 10, 7, 11]. Let A, B, C, and D bematrices. The Kronecker product is noncommutative because in general A⊗B 6=B⊗A. It is permutation equivalent because there exist permutation matrices Pand Q such that A ⊗ B = P (B ⊗ A)Q. If A and B are square matrices, thenA ⊗ B and B ⊗ A are even permutation similar, i.e., P = QT . The product isassociative as

A⊗ (B ⊗ C) = (A⊗B)⊗ C. (1)

In addition, the Kronecker product distributes over +, i.e.,

A⊗ (B + C) = A⊗B +A⊗ C, (2)

(A+B)⊗ C = A⊗ C +B ⊗ C. (3)

Hence for example (A+B)⊗ (C +D) = A⊗ C +B ⊗ C +A⊗D +B ⊗D.The Kronecker product allows to model synchronization (cf. Subsect. 3.2).

Definition 2 (Kronecker sum). Given a matrix A of order m and matrixB of order n, their Kronecker sum denoted by A ⊕ B is a matrix of order mndefined by

A⊕B = A⊗ In + Im ⊗B,4 Knuth notes in [15] that Kronecker never published anything about it. Zehfuss was

actually the first publishing it in the 19th century [27]. He proved that det(A⊗B) =detn(A) · detm(B), if A and B are matrices of order m and n and entries from thedomain of real numbers, respectively.

6

where Im and In denote identity matrices5 of order m and n, respectively.

This operation must not be confused with the direct sum of matrices, groupdirect product or direct product of modules for which the symbol ⊕ is used too.By calculating the Kronecker sum of the adjacency matrices of two graphs theadjacency matrix of the Cartesian product graph [12] is computed (cf. [15]).

Example 2. We use matrices A and B from Ex. 1. The Kronecker sum A⊕B isgiven by

A⊗ I3 + I2 ⊗B =(a1,1 a1,2a2,1 a2,2

)⊗

1 0 00 1 00 0 1

+

(1 00 1

)⊗

b1,1 b1,2 b1,3b2,1 b2,2 b2,3b3,1 b3,2 b3,3

=

a1,1 0 0 a1,2 0 00 a1,1 0 0 a1,2 00 0 a1,1 0 0 a1,2a2,1 0 0 a2,2 0 0

0 a2,1 0 0 a2,2 00 0 a2,1 0 0 a2,2

+

b1,1 b1,2 b1,3 0 0 0b2,1 b2,2 b2,3 0 0 0b3,1 b3,2 b3,3 0 0 00 0 0 b1,1 b1,2 b1,30 0 0 b2,1 b2,2 b2,30 0 0 b3,1 b3,2 b3,3

=

a1,1 + b1,1 b1,2 b1,3 a1,2 0 0

b2,1 a1,1 + b2,2 b2,3 0 a1,2 0b3,1 b3,2 a1,1 + b3,3 0 0 a1,2a2,1 0 0 a2,2 + b1,1 b1,2 b1,3

0 a2,1 0 b2,1 a2,2 + b2,2 b2,30 0 a2,1 b3,1 b3,2 a2,2 + b3,3

.

In the following we list basic properties of the Kronecker sum of matrices A,B, and C. Additional properties can be found in [20] or are proved in this paper.The Kronecker sum is noncommutative because for element-wise comparisonin general A ⊕ B 6= B ⊕ A. Anyway it essentially commutes because from agraph point of view, the graphs represented by matrices A ⊕ B and B ⊕ A arestructurally isomorphic.

Now we state a property of the Kronecker sum which we call Mixed SumRule.

Lemma 1. Let the matrices A and C have order m and B and D have ordern. Then we call

(A⊕B) + (C ⊕D) = (A+ C)⊕ (B +D)

the Mixed Sum Rule.

Proof. By using Eqs. (2) and (3) and Def. 2 we get (A ⊕ B) + (C ⊕ D) =A⊗In+Im⊗B+C⊗In+Im⊗D = (A+C)⊗In+Im⊗(B+D) = (A+C)⊕(B+D).

ut5 The identity matrix In is a n-by-n matrix with ones on the main diagonal and zeros

elsewhere.

7

For example let the matrices A and B be written as A =∑

i∈IAi and B =∑j∈JBj , respectively. In addition, let the sets I and J have the same number

of elements, i.e., |I| = |J |. By using the mixed sum rule we can write A ⊕ B =∑i∈I,j∈JAi ⊕Bj .We will frequently use the Mixed Sum Rule from now on without further

notice.The Kronecker sum is also associative, as (A⊕B)⊕C and A⊕ (B ⊕C) are

equal.

Lemma 2. Kronecker sum is associative.

Proof. In the following we will use Im ⊗ In = Im.n. Note that Z denotes zeromatrices. We have

A⊕ (B ⊕ C) =A⊕ (B ⊗ Io(C) + Io(B) ⊗ C)

{adding Zo(A)}= (A+ Zo(A))⊕ (B ⊗ Io(C) + Io(B) ⊗ C)

{Lemma 1}= (A⊕ (B ⊗ Io(C))) + (Zo(A) ⊕ (Io(B) ⊗ C))

{Eq.(1), Def.2}= (A⊕ (B ⊗ Io(C))) + Io(A) ⊗ Io(B) ⊗ C{ass.+, Def.2}=A⊗ Io(B).o(C) + Io(A) ⊗B ⊗ Io(C) +

Io(A).o(B) ⊗ C{comm. of +}=A⊗ Io(B) ⊗ Io(C) + Io(A).o(B) ⊗ C +

Io(A) ⊗B ⊗ Io(C)

{Def. 2}= ((A⊗ Io(B))⊕ C) + Io(A) ⊗B ⊗ Io(C)

{Def. 2}= ((A⊗ Io(B))⊕ C) + ((Io(A) ⊗B)⊕ Zo(C)

{Lemma 1}= (A⊗ Io(B) + Io(A) ⊗B)⊕ (C + Zo(C))

{rm. Zo(C)}= (A⊗ Io(B) + Io(A) ⊗B)⊕ C{Def. 2}= (A⊕B)⊕ C.

ut

The associativity properties of the operations ⊗ and ⊕ imply that the k-foldoperations

k⊗i=1

Ai and

k⊕i=1

Ai

are well defined.Note that Kronecker sum calculates all possible interleavings (see e.g. [17]

for a proof). Note that this is true even for general CFGs including conditionalsand loops. The following example illustrates interleaving of threads and howKronecker sum handles it.

Example 3. Let the matrices C and D be defined as follows:

C =

0 a 00 0 b0 0 0

D =

0 c 00 0 d0 0 0

.

8

(a) C (b) D

Interleavings

a · b · c · da · c · b · da · c · d · bc · a · b · dc · a · d · bc · d · a · b

(c) Interleavings (d) C ⊕D

Fig. 3. A Simple Example

The graph corresponding to matrix C is depicted in Fig. 3(a), whereas the graphof matrix D is shown in Fig. 3(b). The regular expressions associated to theCFGs are a · b and c · d, respectively. All possible interleavings by executing Cand D in an interleavings semantics are shown in Fig. 3(c). In Fig. 3(d) thegraph represented by the adjacency matrix C ⊕D is depicted. It is easy to seethat all possible interleavings are generated correctly.

3 Concurrent Program Graphs

Our system model consists of a finite number of threads and a finite numberof semaphores. Both, threads and semaphores, are represented by CFGs. TheCFGs are stored in form of adjacency matrices. The matrices have entries whichare referred to as labels l ∈ L as defined in Subsect. 2.1. Let S and T be the setsof adjacency matrices representing semaphores and threads, respectively. Thematrices are manipulated by using Kronecker algebra. Similar to [3] we describesynchronization by Kronecker products and thread interleavings by Kroneckersums. Note that higher synchronization features of programming languages suchas Ada’s rendezvous can be simulated by our system model as the runtime systemuses semaphores provided by the operating systems to implement them.

Formally, the system model consists of the tuple 〈T ,S,L〉, where

– T is the set of RCFG adjacency matrices describing threads,– S is the set of CFG adjacency matrices describing semaphores, and– L is the set of labels out of the semiring defined in Subsect. 2.1. The labels

in T ∈ T are elements of L, whereas the labels in S ∈ S are elements of LS.

A C oncurrent Program Graph (CPG) is a graph C = 〈V,E, ne〉 with a setof nodes V , a set of directed edges E ⊆ V × V , and a so-called entry nodene ∈ V . The sets V and E are constructed out of the elements of 〈T ,S,L〉.Details on how we generate the sets V and E follow in the next subsections.Similar to RCFGs the edges of CPGs are labeled by l ∈ L. Assuming withoutloss of generality that each thread has an entry node with index 1 in its adjacencymatrix t ∈ T , then the entry node of the generated CPG has index 1, too.

9

Fig. 4. Overview

In Fig. 4 an overview of our approach is given. As described in Subsect. 2.3the set of shared variables V is used to generate T .

3.1 Generating a Concurrent Program’s Matrix

Let T (i) ∈ T and S(i) ∈ S refer to the matrices representing thread i andsemaphore i, respectively. Let M = (mi,j) ∈ M. In addition, we define thematrix Ml as the matrix with entries of M equal to l and zeros elsewhere:

Ml = (ml;i,j), where ml;i,j =

{l if mi,j = l,0 otherwise.

We obtain the matrix representing the k interleaved threads as

T =

k⊕i=1

T (i), where T (i) ∈ T .

According to Fig. 1 we have for the binary and the counting semaphore anadjacency matrix of order two and three, respectively. If we assume that theith and the jth semaphore, where 1 ≤ i, j ≤ r, are a binary and a countingsemaphore, respectively, then we get the following adjacency matrices.

S(i) =

(0 pivi 0

)and S(j) =

0 pj 0vj 0 pj0 vj 0

In a similar fashion we can model counting semaphores of higher order.

10

The matrix representing the r interleaved semaphores is given by

S =

r⊕i=1

S(i), where S(i) ∈ S.

The adjacency matrix representing program P referred to as P is defined as

P = T ◦ S =∑l∈LS

(Tl ⊗ Sl) +∑l∈LV

(Tl ⊕ Sl) . (4)

When applying the Kronecker product to semaphore calls we follow the rulesvx · vx = vx and px · px = px.

In Subsect. 3.5 we describe how the ◦-operation can be implemented effi-ciently.

3.2 ◦-Operation and Synchronization

Lemma 3. Let T =⊕k

i=1 T(i) be the matrix representing k interleaved threads

and let S be a binary semaphore. Then T ◦ S correctly models synchronizationof T with semaphore S.6

Proof. First we observe that

1. the first term in the definition of Eq. (4) replaces

– each p in matrix T with

(0 p0 0

)and

– each v in matrix T with

(0 0v 0

),

2. the second term replaces each m ∈ LV with

(m 00 m

), and

3. both terms replace each 0 by

(0 00 0

).

According to the replacements above the order of matrix T ◦ S has doubledcompared to T .

Now, consider the paths in the automaton underlying T described by theregular expression

π =(∑

m∈LVm)∗ (

p(∑

m∈LVm)∗v(∑

m∈LVm)∗)∗

.

By the observations above it is easy to see that paths containing π are presentin T ◦S. On the other hand, paths not containing π are no more present in T ◦S.Thus the semaphore operations always occur in (p, v) pairs in all paths in T ◦S.This, however, exactly mirrors the semantics of synchronization via a semaphore.

ut6 Note that we do not make assumptions concerning the structure of T .

11

Generalizing Lemma 3, it is easy to see that the synchronization property isalso correctly modeled if we replace the binary semaphore by one which allowsmore than one thread to enter it. In addition, the synchronization property iscorrectly modeled even if more than one semaphore is present on the right-handside of T ◦ S.

As a byproduct the proof of Lemma 3 shows the following corollary.

Corollary 1. If the program modeled by T ◦ S contains a deadlock, then thematrix T ◦ S will contain a zero line `. Node ` in the corresponding automatonis no final node and does not have successors.

Thus deadlocks show up in CPGs as a pure structural property of the underlyinggraphs. Nevertheless, false positives may occur. From a static point of view, adeadlock is possible while conditions exclude this case at runtime. Our approachdelivers a path to a deadlock in any case. Nevertheless, our approach of findingdeadlocks is complete. If it states deadlock freedom, then the program undertest is certainly deadlock free.

A further consequence of Lemma 3 is that after applying the ◦-operation onlya small part of the underlying automata can be reached from its entry node. Thisallows for optimizations discussed later.

3.3 Unreachable Parts Caused by Synchronization

In this subsection we show that synchronization causes unreachable parts. Asan example consider Fig. 5. The program consists of two threads, namely T1and T2. The RCFGs of the threads are shown in Fig. 5(a) and Fig. 5(b). Theused semaphore is a binary semaphore similar to Fig. 1(a). Its operations arereferred to as p1 and v1. We denote a P and V-call to semaphore x of threadt as t.px and t.vx, respectively. T1 and T2 access the same shared variable in aand b, respectively. The semaphore is used to ensure that a and b are accessedmutually exclusively. Note that a and b may actually be subgraphs consisting ofmultiple nodes and edges.

For the example we have the matrices

T1 =

0 p1 0 00 0 a 00 0 0 v10 0 0 0

, T2 =

0 p1 0 00 0 b 00 0 0 v10 0 0 0

, and S =

(0 p1v1 0

).

Then we obtain the matrix T = T1 ⊕ T2, a matrix of order 16, consisting of thesubmatrices defined above and zero matrices of order four (instead of Z4 simplydenoted by 0) as follows.

T =

T2 p1 · I4 0 00 T2 a · I4 00 0 T2 v1 · I40 0 0 T2

12

In order to enable a concise presentation of T ◦ S we define the matrices

U =

0 0 0 p1 0 0 0 00 0 0 0 0 0 0 00 0 0 0 b 0 0 00 0 0 0 0 b 0 00 0 0 0 0 0 0 00 0 0 0 0 0 v1 00 0 0 0 0 0 0 00 0 0 0 0 0 0 0

, V =

0 p1 0 0 0 0 0 00 0 0 0 0 0 0 00 0 0 p1 0 0 0 00 0 0 0 0 0 0 00 0 0 0 0 p1 0 00 0 0 0 0 0 0 00 0 0 0 0 0 0 p10 0 0 0 0 0 0 0

,

W = a · I8, and X =

0 0 0 0 0 0 0 0v1 0 0 0 0 0 0 00 0 0 0 0 0 0 00 0 v1 0 0 0 0 00 0 0 0 0 0 0 00 0 0 0 v1 0 0 00 0 0 0 0 0 0 00 0 0 0 0 0 v1 0

of order 8.

Then we obtain the matrix T ◦ S, a matrix of order 32, consisting of the sub-matrices defined above and zero matrices of order eight (instead of Z8 simplydenoted by 0) as follows.

T ◦ S =

U V 0 00 U W 00 0 U X0 0 0 U

.

The generated CPG is depicted in Fig. 5(c). The resulting adjacency matrixhas order 32, whereas the resulting CPG consists only of 12 nodes and 12 edges.Large parts (20 nodes and 20 edges) are unreachable from the entry node. InFig. 6 these unreachable parts are depicted.

In general, unreachable parts exist if a concurrent program contains synchro-nization. If a program contains a lot of synchronization the reachable parts maybe very small. This observation motivates the lazy implementation described inSubsect. 3.6.

3.4 Properties of the Resulting Adjacency Matrix

In this subsection we prove interesting properties of the resulting matrices.A short calculation shows that the Kronecker sum in general generates at

most mn2 +nm2−nm non-zero entries.7 Stated the other way, at least (mn)2−mn2 − nm2 + mn entries are zero. We will see that CFGs and RCFGs containeven more zero entries. We will prove that for this case the number of edgesis in O

(mn). Thus, the number of edges is linear in the order of the resulting

adjacency matrix.

7 Assuming the corresponding matrices have an order of m and n, respectively.

13

T1.p1

T1.a

T1.v1

1

3

2

4

(a) T1

T2.p1

T2.b

T2.v1

1

3

2

4

(b) T2

T2.v1

T1.a

T1.v1

T2.p1

T2.v1

T2.b

T1.p1

T1.p1

T1.a

T2.b

T2.p1

T1.v1

24

10

16

18

31

30

28

1

4

7

6

25

(c) Resulting CPG

Fig. 5. Mutual Exclusion Example

T2.p1

T2.v1

T1.p1

T1.a

T2.b

T1.a

T1.a

T2.b

T1.a

T2.b

T2.p1

T1.v1

T1.v1

T1.p1

T2.v1

T1.aT1.a

T2.b

T2.b

T2.b

11 1312

20

21

17

23

19

32

22

26

2915

27

3

2

5

9

8

14

Fig. 6. Unreachable Parts of the Mutual Exclusion Example

14

Lemma 4 (Maximum Number of Nodes). Given a program P consistingof k > 0 threads (t1, t2, . . . , tk), where each ti has n nodes in its RCFG, thenumber of nodes in P’s adjacency matrix P is bounded from above by nk.

Proof. This follows immediately from the definitions of ⊗ and ⊕. For both theorder of the resulting matrix is given by the multiplication of the orders of theinput matrices. ut

Definition 3. Let M = (mi,j) ∈M. We denote the number of non-zero entriesby ||M || = |{mi,j |mi,j 6= 0}|.

For a RCFG with n nodes it is easy to see that it contains at most 2n edges.

Lemma 5 (Maximum Number of Entries). Let a program represented byMk ∈ M consisting of k > 0 threads be represented by the matrices T (i) ∈ T ,where each T (i) has order n. Then ||Mk|| is bounded from above by 2k nk.

Proof. We prove this lemma by induction on the definition of the Kronecker sum.For k = 1 the lemma is true. If we assume that for m threads ||Mm|| ≤ 2mnm,then for m+ 1 threads ||Mm+1|| ≤ 2mnm · n+ nm · 2n = 2(m+ 1)nm+1. Thus,we have proved Lemma 5. ut

Compared to the full matrix of order nk with n2k entries the resulting matrixhas significantly fewer non-zero entries, namely 2k nk. By using the followingdefinition we will prove that the matrices are sparse.

Definition 4 (Sparse Matrix). We call a n-by-n matrix M sparse if and onlyif ||M || = O

(n).

Lemma 6. CFGs and RCFGs have Sparse Adjacency Matrices.

Proof. Follows from Subsect. 2.2 and Def. 4. ut

Lemma 7. The Matrix P of a Program P is Sparse.

Proof. Let T =⊕k

i=1 T(i) ∈M be a N-by-N adjacency matrix of a program. We

require that each of the k threads has order n in its adjacency matrix T (i). FromLemma 5 we know ||T || = O

(2k nk

). In addition, N = nk is given by Lemma 4.

Hence, for k threads and by using Definition 4 we get ||T || ≤ 2k nk = 2kN =O(N). A similar result holds for S and P = T ◦ S. ut

Lemma 7 enables the application of memory saving data structures and effi-cient algorithms. Algorithms may for example work on adjacency lists. Clearly,the space requirements for the adjacency lists are linear in the number of nodes.In the worst-case the number of nodes increases exponentially in the number ofthreads.

15

3.5 Efficient Implementation of the ◦-Operation

This subsection is devoted to an efficient implementation of the ◦-operation.First we define the Selective Kronecker product which we denote by �. Thisoperator synchronizes only identical labels l ∈ LS of the two input matrices.

Definition 5 (Selective Kronecker product). Given two matrices A andB we call A �L B their Selective Kronecker product. For all l ∈ L ⊆ L letA�L B = (ai,j)�L (bp,q) = (ci.p,j.q), where

ci.p,j.q =

{l if ai,j = bp,q = l ∧ l ∈ L,0 otherwise.

Definition 6 (Filtered Matrix). We call ML a Filtered Matrix and defineit as a matrix of order o(M) containing entries l ∈ L ⊆ L of M and zeroselsewhere as follows.

ML = (mL;i,j), where mL;i,j =

{l if mi,j = l ∧ l ∈ L,0 otherwise.

Note that ∑l∈LS

(Tl ⊗ Sl) = T �LS S. (5)

In the following we use o(SLV) =∏r

i=1 o(S(i)) = o(S). Note that S con-

tains only labels l ∈ LS. Hence, when the ◦-operator is applied for a labell ∈ LV, we get Sl = Zo(S), i.e. a zero matrix of order o(S). Thus we obtain∑

l∈LV(Tl ⊕ Sl) = TLV ⊗ Io(S). We will prove this below.

Finally, we can refine Eq. (4) by stating the following lemma.

Lemma 8. The ◦-operation can be computed efficiently by

P = T ◦ S = T �LS S + TLV ⊗ Io(S).

Proof. Using Eq. (4) P = T ◦ S is given by∑

l∈LS(Tl ⊗ Sl) +

∑l∈LV

(Tl ⊕ Sl) .

According to Eq. (5) the first term is equal to T�LSS. By mentioning Sl = Zo(S)

for l ∈ LV, Lemma 1, and Def. 2, the second term fulfills.∑l∈LV

(Tl ⊕ Sl) =∑l∈LV

(Tl ⊕ Zo(S)

)= TLV ⊕ Zo(S) = TLV ⊗ Io(S).

Note that S contains only l ∈ LS. It is obvious that the non-zero entries ofthe first and the second term are l ∈ LS and l ∈ LV, respectively. Both termscan be computed by iterating once through the corresponding sparse adjacencymatrices, namely T and S. ut

16

3.6 Lazy Implementation of Kronecker Algebra

Until now we have primarily focused on a pure mathematical model for sharedmemory concurrent systems. An alert reader will have noticed that the order ofthe matrices in our CPG increases exponentially in the number of threads. Onthe other hand, we have seen that the ◦-operation results in parts of the matrixT ◦ S that cannot be reached from the entry node of the underlying automaton(cf. Subsect. 3.3). This comes solely from the fact that synchronization excludessome interleavings.

Choosing a lazy implementation for the matrix operations, however, ensuresthat, when extracting the reachable parts of the underlying automaton, theoverall effort is reduced to exactly these parts. By starting from the entry nodeand calculating all reachable successor nodes our lazy implementation exactlydoes this. Thus, for example, if the resulting automaton’s size is linear in terms ofthe involved threads, only linear effort will be necessary to generate the resultingautomaton.

Our implementation distinguishes between two kind of matrices: Sparse ma-trices are used for representing threads and semaphores. Lazy matrices are em-ployed for representing all the other matrices, e.g. those resulting from the op-erations of the Kronecker algebra and our ◦-operation. Besides the employedoperation, a lazy matrix simply keeps track of its operands. Whenever an en-try of a lazy matrix is retrieved, depending on the operation recorded in thelazy matrix, entries of the operands are retrieved and the recorded operation isperformed on these entries to calculate the result. In the course of this compu-tation, even the successors of nodes are evaluated lazily. Retrieving entries ofoperands is done recursively if the operands are again lazy matrices, or is doneby retrieving the entries from the sparse matrices, where the actual data resides.

In addition, our lazy implementation allows for simple parallelizing. For ex-ample, retrieving the entries of left and right operands can be done concurrently.Exploiting this, we expect further performance improvements for our implemen-tation if run on multi-core architectures.

3.7 Optimization for NSV

Our approach already works fine for practical settings. In this subsection wepresent additional optimizations which are optional.

As already mentioned in Subsect. 2.4 the Kronecker sum interleaves all en-tries. Sometimes this is disadvantageous because irrelevant interleavings will begenerated if some basic blocks do not access shared variables. Such basic blockscan be placed freely as long as other constraints do not prohibit it.

For example consider the CFGs in Fig. 3. Assume for a moment that a, b,c, and d do not access shared variables. Then the overall behavior of the C-D-system can be described correctly by choosing one of the six interleavingsdepicted in Fig. 3(d), e.g., by a · b · c · d. Hence the size of the CPG is reducedfrom nine nodes to five.

17

�

�

�

�

�

�

�

�

(a) E

�

�

�

(b) F

�

�

�

�

�

�

� �

�

�

�

� �

�

�

�

�

(c) E ⊕ F

Fig. 7. A Counterexample

From now on we divide set LV into two disjoint sets LSV and LNSV dependingon whether the corresponding basic blocks access shared variables or not.

The following example shows that NSV-edges cannot always be eliminated.

Example 4. In this example we use the graphs depicted in Fig. 7. The graphs Eand F form the input graphs. It is assumed that a is the only edge not accessinga shared variable. All graphs have Node 1 as entry node. We show that it is notsufficient to chose exactly one NSV-edge. The matrix E ⊕ F is given by

0 c a 0 0 0 0 00 0 0 a 0 0 0 00 0 0 c p 0 0 00 0 0 0 0 p 0 00 0 0 0 0 c b 00 0 0 0 0 0 0 bv 0 0 0 0 0 0 c0 v 0 0 0 0 0 0

.

The graph represented by E⊕F which is structurally isomorph to (E⊕F )◦Sis depicted in Fig.7(c). Both loops in the CPG must be preserved. Otherwise theprogram would be modeled incorrectly. By removing an edge labeled by a, wewould change the program behavior. Thus it is not sufficient to use only oneedge labeled by a.

18

In general, the only way to reduce the size of the resulting CPG is by studyingthe matrix T ◦ S. One way would be to output the automaton from T ◦ S andtry to find reductions afterwards. We decided to perform such reductions duringthe output process such that a unnecessarily large automaton is not generated.It turned out that the problems to be solved to perform these reductions arehard. This will be discussed in detail below.

Fig. 8 shows the algorithm employed for the output process in pseudo code.By `(n→ i) we denote the label assigned to edge (n→ i). In short, the algorithmrecords all NSV-edges and proceeds until no other edges can be processed. Thenit chooses one label of the NSV-edges. From the set of all recorded edges with thislabel a subset is determined such that all the edges in the subset can be reachedfrom all nodes that have been processed so long. This is a necessary condition, ifwe want to eliminate the edges outside the subset. Determining a minimal subsetunder this constraint, however, is known as the Set Covering Problem which isNP-hard. We decided to implement a greedy algorithm. However, it turned outthat in most cases we encountered a subset of size one, which trivially is optimal.

If no subset can be found, no edges can be eliminated.

Concerning Ex. 4 we note that the reason why none of the NSV-edges can beeliminated, can be found in the presence of the loop in E. Our output algorithmtraverses the CPG in such a way that we do not know in advance if a loop will beconstructed later on. Hence our algorithm has to be aware of loops that will beconstructed in the future. This is done by remembering eliminated edges whichwill be reconsidered if a suitable loop is encountered.

In detail, if edges can be eliminated, we remember the set of eliminated edgesR in set RECONSIDER together with a copy of the current set DONE. If later onwe encounter a path in the CPG that reaches some nodes in this set DONE, wehave to reconsider our decision. In this case all edges in R are reconsidered forbeing present in the CPG. Note that several RECONSIDER-sets can be affectedif such a “backedge” is found. Note also that this reconsider mechanism handlesEx. 4 correctly.

Our implementation showed that the decision which label is chosen in Line 29is also crucial. The number of edges (and nodes) being eliminated heavily de-pends on this choice. We are currently working on heuristics for this choice.

In the following we execute the algorithm on the example of Fig. 3 underthe above conditions, i.e., a, b, c, and d do not access shared variables. At thebeginning we have TBD = {1} and TBDNSV(a) = TBDNSV(b) = TBDNSV(c) =TBDNSV(d) = DONE = ∅. Since RECONSIDER-sets are not necessary in thisexample, we do not consider them in the following to keep things simple.

The 1st iteration finds NSV-edges only. So: TBDNSV(a) = {(1 → 4)},TBDNSV(c) = {(1→ 2)}, DONE = {1} and the other sets are empty.

The 2nd iteration chooses label a in Line 29. SUBSET clearly is {(1 → 4)},TBD = {4}, and TBDNSV(a) = ∅.

The 3rd iteration processes Node 4 and again finds NSV-edges only. So:TBDNSV(c) = {(1 → 2), (4 → 5)} and TBDNSV(b) = {(4 → 7)}. DONE be-comes {1, 4}.

19

OutputCPG ()1 TBD← {startnode}2 TBDNSV(` ∈ LNSV) {array of sets; all sets initialized to ∅}3 DONE← ∅4 while TBD 6= ∅ or ∃` : TBDNSV(`) 6= ∅ do5 if TBD 6= ∅ then6 n← Element(TBD) {choose one element of set TBD}7 print n8 for all edges (n→ i) do9 if `(n→ i) ∈ LNSV then10 TBDNSV(`(n→ i))← TBDNSV(`(n→ i))∪11 {(n→ i)}12 else13 TBD← TBD ∪ {i}14 print (n→ i)15 endif16 while ∃R : i ∈ R and ∃D : {(D,R)} ∈ RECONSIDER do17 {we have found a path back to a set of nodes18 which we have used to eliminate NSV edges;19 all these edges have now to be reconsidered}20 for (m→ j) ∈ R do21 TBDNSV(`(m→ j))← TBDNSV(`(m→ j))∪22 {(m→ j)}23 endfor24 RECONSIDER← RECONSIDER \ {(D,R)}25 endwhile26 endfor27 DONE← DONE ∪ {n};TBD← TBD \ DONE28 else {TBD = ∅ }29 `← NonEmptyElement(TBDNSV)30 {choose one label with non-empty set in TBDNSV}31 SUBSET← SmallestSubset(TBDNSV(`),DONE)32 {choose smallest subset of TBDNSV(`) such that33 subset can be reached from all nodes in set DONE}34 if TBDNSV(`) \ SUBSET 6= ∅ then35 RECONSIDER← RECONSIDER ∪36 {(DONE,TBDNSV(`) \ SUBSET)}37 {remember eliminated edges;38 in case we find a path back to nodes in DONE,39 we have to reconsider these edges}40 endif41 for (n→ i) ∈ SUBSET do42 print (n→ i)43 TBD← TBD ∪ {i}44 endfor45 TBDNSV(`)← ∅46 endif47 endwhile

Fig. 8. Output CPG

20

The 4th iteration chooses label b in Line 29. Thus SUBSET clearly is {(4→7)}, TBD = {7}, and TBDNSV(b) = ∅.

The 5th iteration processes Node 7 and finds one NSV-edge labeled c. So:TBDNSV(c) = {(1→ 2), (4→ 5), (7→ 8)}. DONE becomes {1, 4, 7}.

The 6th iteration handles label c. The smallest subset is found to be {(7 →8)} since Node 7 can be reached from each of the nodes in set DONE = {1, 4, 7}.Hence, edges (1→ 2) and (4→ 5) can be eliminated, i.e., they are not printed.So: TBDNSV(c) = ∅ and TBD = {8}.

The 7th iteration finds one NSV-edge labeled d. Thus we continue withTBDNSV(d) = {(8→ 9)}. DONE becomes {1, 4, 7, 8}.

The 8th iteration handles label d. We obtain TBDNSV(d) = ∅ and TBD ={9}.

The 9th iteration prints Node 9, sets DONE = {1, 4, 7, 8, 9} and TBD = ∅.The algorithm terminates and the result is depicted in Fig. 9.

Fig. 9. Sequentialized C-D-System

4 Client-Server Example

We have done analysis on client-server scenarios using our lazy implementation.For the example presented here we have used clients and a semaphore of theform shown in Fig. 10(a) and 10(b), respectively.

In Table 10(c) statistics for 1, 2, 4, 8, 16, and 32 clients are given. Fig. 10(d)shows the resulting graph for 8 clients. The few nodes in the resulting matrix andthe node IDs indicate that most nodes in the resulting matrix are superfluous.The case of 32 clients and one semaphore forms a matrix with an order of approx.3.706 × 1015. Our implementation generated only 65 nodes in 0.43s. In fact weobserved a linear growth in the number of clients for the number of nodes andedges and for the execution time. We did our analysis on an Intel Xeon 2.8 GHzwith 8GB DDR2 RAM. Note that an implementation of the matrix calculus forshared memory concurrent systems has to provide node IDs of a sufficient size.The order of T ◦S can be quite big, although the resulting automaton is small.

5 Generic Proof of Deadlock Freedom

Let Si for i ≥ 1 denote binary semaphores and let their operations be denotedby pi and vi.

21

�

�

��

�

��

��

(a) Client

�

�

��

(b) Semaphore

Clients Nodes Edges Exec. Time [s] Potential Nodes

1 3 3 0.0013 62 5 6 0.0013 184 9 12 0.0045 1628 17 24 0.0120 13,122

16 33 48 0.0680 86,093,42232 65 96 0.4300 3.706 ×1015

(c) Statistics

�

��

��

�

��

��

��

��

�

��

�

��

��

��

��

��

��

��

�

��

��

��

��

��

��

��

�

��

� �

��

��

��

��

(d) Result for 8 Clients

Fig. 10. Client-Server Example

Definition 7. Let M = (mi,j) ∈ M denote a square matrix. In addition, letPM = {(i, j, r) | mi,j = pr for some r ≥ 1} and VM = {(j, i, r) | mi,j =vr for some r ≥ 1} (note the exchanged indexes (j, i)). We call M p-v-symmetriciff PM = VM .

By definition of Kronecker sum and Kronecker product, it is easy to prove thefollowing lemma.

Lemma 9. Let M and N be p-v-symmetric matrices. Then M ⊕ N , M ⊗ N ,and M ◦N are also p-v-symmetric. ut

To be more specific, let Si =

(0 pivi 0

)for i ≥ 1. Then S(r) =

⊕ri=1 Si is p-v-

symmetric.

22

Now, consider the p-v-symmetric matrix

Mk =

0 p1 p2 . . . pkv1 0 0 . . . 0...

......

. . ....

vk 0 0 . . . 0

.

Thus M(n)k =

⊕ni=1Mk is also p-v-symmetric.

Now we state a theorem on deadlock freedom.

Theorem 1. Let P = M(n)k ◦ S(k) be the matrix of a n-threaded program with

k binary semaphores, where M(n)k and S(k) are defined above. Then the program

is deadlock free.

Proof. By definition and Lemma 9 P is p-v-symmetric. By Corollary 1 a deadlockmanifests itself by a zero line, say `, in matrix P . Since P is p-v-symmetric,column ` does only contain zeroes. Hence line ` is unreachable in the underlyingautomaton.

This clearly holds for all zero lines in P and thus the program is deadlockfree. ut

For counting semaphores we obtain matrices of the following type

0 p 0 · · · 0 0v 0 p · · · 0 00 v 0 · · · 0 0...

......

. . ....

...0 0 0 · · · 0 p0 0 0 · · · v 0

which clearly is p-v-symmetric. Thus a similar theorem holds if counting sema-phores are used instead of binary ones.

A short reflection shows that if we allow Mk to contain additional entries andnon-zero lines and columns which do not contain ps and vs, the system is stilldeadlock free. So, we have derived a very powerful criterion to ensure deadlockfreedom for a large class of programs, namely p-v-symmetric programs.

Concerning the example in Section 4 we note that if edges labeled a areremoved from the clients, we obtain p-v-symmetric matrices. Thus this simpleclient-server system is deadlock free for an arbitrary number of clients. If wereinsert edges labeled a into the clients, no zero lines and columns appear (asnoted above), so that the system is still deadlock free for an arbitrary numberof clients.

Theorem 1 may be compared to the results of [8, 5], where for homogenoustoken passing rings it is proved that checking correctness properties can be re-duced to rings of small sizes.

23

T1 ()1 s.p {edge T1.p}2 r ← sv {edge a}3 r ← r + 1 {edge b}4 sv ← r {edge b}5 s.v {edge T1.v}T2 ()1 t← sv {edge c}2 s.p {edge T2.p}3 t← t + 1 {edge d}4 sv ← t {edge d}5 s.v {edge T2.v}

Fig. 11. Example Program

(a) T1 (b) T2

Fig. 12. RCFGs after Edge Splitting

6 A Data Race Example

We give an example, where a programmer is supposed to have used synchroniza-tion primitives in a wrong way. The program consisting of two threads, namelyT1 and T2, and a semaphore s is given in Fig. 11. We assume that sv = 0 atprogram start. It is supposed that the program delivers sv = 2 when it termi-nates. Both threads in the program access the shared variable sv. The variablesr and t are local to the corresponding threads. The programmer inadvertentlyhas placed line 1 in front of line 2 in T2.

After edge splitting we get the RCFGs depicted in Fig. 12. As usual thesemaphore looks like Fig. 1(a). The corresponding matrices are

T1 =

0 T1.p 0 0 00 0 a 0 00 0 0 b 00 0 0 0 T1.v0 0 0 0 0

and T2 =

0 c 0 0 00 0 T2.p 0 00 0 0 d 00 0 0 0 T2.v0 0 0 0 0

.

24

Although the following matrices are not computed by our lazy implementa-tion, we give them here to allow the reader to see a complete example. To enablea concise presentation we define the following submatrices of order five:

H =

0 c 0 0 00 0 T2.p 0 00 0 0 d 00 0 0 0 T2.v0 0 0 0 0

, I =

T1.p 0 0 0 0

0 T1.p 0 0 00 0 T1.p 0 00 0 0 T1.p 00 0 0 0 T1.p

,

J = a · I5,K = b · I5, and L =

T1.v 0 0 0 0

0 T1.v 0 0 00 0 T1.v 0 00 0 0 T1.v 00 0 0 0 T1.v

.

Now, we get T = T1⊕T2, a matrix of order 25, consisting of the submatricesdefined above and zero matrices of order five (instead of Z5 simply denoted by0).

T =

H I 0 0 00 H J 0 00 0 H K 00 0 0 H L0 0 0 0 H

.

To shorten the presentation of P = T ◦S we define the following submatrices oforder ten:

U =

0 0 c 0 0 0 0 0 0 00 0 0 c 0 0 0 0 0 00 0 0 0 0 T2.p 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 d 0 0 00 0 0 0 0 0 0 d 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 T2.v 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0

, V =

0 T1.p 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 T1.p 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 T1.p 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 T1.p 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 T1.p0 0 0 0 0 0 0 0 0 0

,

25

W = a · I10, X = b · I10, and Y =

0 0 0 0 0 0 0 0 0 0T1.v 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 00 0 T1.v 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 T1.v 0 0 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 T1.v 0 0 00 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 T1.v 0

.

With the help of zero matrices of order ten we can state the program’s matrix

Fig. 13. Resulting CPG

26

P = T ◦ S = T ◦(

0 pv 0

)=

U V 0 0 00 U W 0 00 0 U X 00 0 0 U Y0 0 0 0 U

.

Matrix P has order 50. The corresponding CPG is shown in Fig. 13. Thelazy implementation computes only these 19 nodes. Due to synchronization theother parts are not reachable. In addition to the usual labels we have add a setof tuples to each edge in the CPG of Fig. 13. Tuple (x, y, z) denotes values ofvariables, such that sv = x, r = y and t = z. We use ⊥ to refer to an undefinedvalue. A tuple shows the values after the basic block on the corresponding edgehas been evaluated. The entry node of the CPG is Node 1. At program start wehave the variable assignment (0,⊥,⊥). At Node 49 we result in the set of tuples{(1, 1, 1), (2, 1, 2), (2, 2, 1)}. Due to the interleavings different tuples may occurat join nodes. This we reflect by a set of tuples. As stated above the program issupposed to deliver sv = 2. Thus the tuple (1, 1, 1) shows that the program iserroneous. The error is caused by a data race between the edges c of thread T2and the edges a and b of thread T1.

7 Empirical Data

In Sec. 4 we already gave some empirical data concerning client-server examples.In this section we give empirical data for ten additional examples.

Let o(P ) and o(C) refer to the order of the adjacency matrix P , which is notcomputed by our lazy implementation, and the order of the adjacency matrix Cof the resulting CPG, respectively. In addition k and r refer to the number ofthreads and the number of semaphores, respectively.

k r o(P )√

(o(P )) o(C) Runtime [s]

2 4 256 16,00 12 0,033 5 4800 69,28 30 0,0975424 6 124416 352,73 98 0,486553 6 75264 274,34 221 1,0575294 7 614400 783,84 338 2,5370824 8 1536000 1239,35 277 2,5665874 8 737280 858,65 380 3,7243644 13 298721280 17283,56 2583 96,0240734 11 55050240 7419,58 3908 146,815 6 14929920 3863,93 7666 309,371395

Table 1. Empirical Data

27

In the following we use the data depicted in Table 1.8 The numbers in thethird column are rounded to two decimal places. As a first observation we notethat except for one example all values of o(C) are smaller as the correspondingvalues of

√(o(P )). In addition, the runtime of our implementation shows a

strong correlation to the order o(C) of the adjacency matrix C of the generatedCPG with a Pearson product-moment correlation coefficient of 0,9990277130. Incontrast the values of the theoretical order o(P ) of the resulting adjacency matrixP correlates to the runtime only with a correlation coefficient of 0.2370050995.9

This observations show that the runtime complexity does not depend on theorder o(P ) which grows exponentially in the number of threads. We concludethis section by stating that the collected data give strong indication that theruntime complexity of our approach is linear in the number of nodes present inthe resulting CPG.

8 Related Work

Probably the closest work to ours was done by Buchholz and Kemper [3]. Itdiffers from our work as stated in the following. We establish a framework foranalyzing multithreaded shared memory concurrent systems which forms a ba-sis for studying various properties of the program. Different techniques includingdataflow analysis (e.g. [23–25, 14]) and model checking (e.g. [6, 9] to name onlya few) can be applied to the generated CPGs. In this paper we use our approachin order to prove deadlock freedom. Buchholz and Kemper worked on gener-ating reachability sets in composed automata. Our approach uses CFGs andsemaphores to model shared memory concurrent programs. Buchholz and Kem-per use it for describing networks of synchronized automata. Both approachesemploy Kronecker algebra. An additional difference is that we propose optimiza-tions concerning the handling of edges not accessing shared variables and lazyevaluation of the matrix entries.

In [9] Ganai and Gupta studied modeling concurrent systems for boundedmodel checking (BMC). Somehow similar to our approach the concurrent systemis modeled lazily. In contrast our approach does not need temporal logic spec-ifications like LTL for proving deadlock freedom for p-v-symmetric programsbut on the other hand our approach may suffer from false positives. Like allBMC approaches [9] has the drawback that it can only show correctness withina bounded number of k steps.

Kahlon et al. propose a framework for static analysis of concurrent pro-grams in [13]. Partial order reduction and synchronization constraints are usedto reduce thread interleavings. In order to gain further reductions abstract in-terpretation is applied.

In [22] a model checking tool is presented that builds up a system gradually,at each stage compressing the subsystems to find an equivalent CSP process

8 We did our analysis on an Intel Pentium D 3.0 GHz machine with 1GB DDR RAMrunning CentOS 5.6.

9 Both correlation coefficients are rounded to ten decimal places.

28

with many less states. With this approach systems of exponential size (≥ 1020)can be model checked successfully. This can be compared to our client-serverexample in Sect. 4, where matrices of exponential size can be handled in lineartime.

Although not closely related we recognize the work done in the field ofstochastic automata networks (SAN) which is based on the work of Plateau [19]and in the field of generalized stochastic petri nets (GSPN) (e.g. [4]) as relatedwork. Compared to ours these fields are completely different. Nevertheless, basicoperators are shared and some properties influenced this paper.

9 Conclusion

We established a framework for analyzing multithreaded shared memory concur-rent systems which forms a basis for studying various properties of programs.Different techniques including dataflow analysis and model checking can be ap-plied to CPGs. In addition, the structure of the matrices can be used to proveproperties of the underlying program for an arbitrary number of threads. In thispaper we used CPGs in order to prove deadlock freedom for the large class ofp-v-symmetric programs.

Furthermore, we proved that in general CPGs can be represented by sparsematrices. Hence the number of entries in the matrices is linear in their numberof lines. Thus efficient algorithms can be applied to CPGs.

We proposed two major optimizations. First, if the program contains a lotof synchronization, only a very small part of the CPG is reachable and, dueto a lazy implementation of the matrix operations, only this part is computed.Second, if the program has only little synchronization, many edges not accessingshared variables will be present, which are reduced during the output process ofthe CPG. Both optimizations speed up processing significantly and show thatthis approach is very promising.

We gave examples for both, the lazy implementation and how we are able toprove deadlock freedom.

The first results of our approach (such as Theorem 1) and the performance ofour prototype implementation are very promising. Further research is needed togeneralize Theorem 1 in order to handle systems similar to the Dining Philoso-phers problem. In addition, details on how to perform (complete and sound)dataflow analysis on CPGs have to be studied.

References

1. A. Aho, R. Sethi, and J. Ullman. Compilers: Principles, Techniques, and Tools.Addison Wesley, Reading, Massachusetts, 1986.

2. R. Bellman. Introduction to Matrix Analysis. Classics in Applied Mathematics.Society for Industrial and Applied Mathematics, 2nd edition, 1997.

3. P. Buchholz and P. Kemper. Efficient Computation and Representation of LargeReachability Sets for Composed Automata. Discrete Event Dynamic Systems,12(3):265–286, 2002.

29

4. G. Ciardo and A. S. Miner. A Data Structure for the Efficient Kronecker Solutionof GSPNs. In Proc. 8th Int. Workshop on Petri Nets and Performance Models(PNPM’99), pages 22–31. IEEE Comp. Soc. Press, 1999.

5. E. Clarke, M. Talupur, T. Touili, and H. Veith. Verification by Network Decom-position. In Proc. 15th CONCUR, LNCS 3170, pages 276–291. Springer, 2004.

6. E. M. Clarke, O. Grumberg, and D. A. Peled. Model Checking. MIT Press, 1999.7. M. Davio. Kronecker Products and Shuffle Algebra. IEEE Trans. Computers,

30(2):116–125, 1981.8. E. A. Emerson and K. S. Namjoshi. Reasoning about Rings. In Proc. of the 22nd

ACM SIGPLAN-SIGACT Symp. on Principles of Prog. Lang., POPL ’95, pages85–94, New York, NY, USA, 1995. ACM.

9. M. K. Ganai and A. Gupta. Efficient Modeling of Concurrent Systems in BMC. InProceedings of the 15th international workshop on Model Checking Software, SPIN’08, pages 114–133, Berlin, Heidelberg, 2008. Springer-Verlag.

10. A. Graham. Kronecker Products and Matrix Calculus with Applications. EllisHorwood Ltd., New York, 1981.

11. A. Hurwitz. Zur Invariantentheorie. Math. Annalen, 45:381–404, 1894.12. W. Imrich, S. Klavzar, and D. F. Rall. Topics in Graph Theory: Graphs and Their

Cartesian Product. A K Peters Ltd, 2008.13. V. Kahlon, S. Sankaranarayanan, and A. Gupta. Semantic Reduction of Thread

Interleavings in Concurrent Programs. In TACAS’09: Proceedings of the 15th Int.Conf. on Tools and Algorithms for the Construction and Analysis of Systems, vol-ume 5505, pages 124–138, Berlin, Heidelberg, 2009. Springer-Verlag.

14. J. B. Kam and J. D. Ullman. Global Data Flow Analysis and Iterative Algorithms.Journal of the ACM, 23:158–171, January 1976.

15. D. E. Knuth. Combinatorial Algorithms, volume 4A of The Art of ComputerProgramming. Addison-Wesley, 2011.

16. W. Kuich and A. Salomaa. Semirings, Automata, Languages. Springer-Verlag,1986.

17. G. Kuster. On the Hurwitz Product of Formal Power Series and Automata. Theor.Comput. Sci., 83(2):261–273, 1991.

18. J. Miller. Earliest Known Uses of Some of the Words of Mathematics, Rev. Aug.1, 2011. Website, 2011. Available online at http://jeff560.tripod.com/k.html;visited on Sept. 26th 2011.

19. B. Plateau. On the Stochastic Structure of Parallelism and Synchronization Modelsfor Distributed Algorithms. In ACM SIGMETRICS, volume 13, pages 147–154,1985.

20. B. Plateau and K. Atif. Stochastic Automata Network For Modeling ParallelSystems. IEEE Trans. Software Eng., 17(10):1093–1108, 1991.

21. G. Ramalingam. Context-Sensitive Synchronization-Sensitive Analysis Is Undecid-able. ACM Trans. Program. Lang. Syst., 22(2):416–430, 2000.

22. A. W. Roscoe, P. H. B. Gardiner, M. H. Goldsmith, J. R. Hulance, D. M. Jack-son, and J. B. Scattergood. Hierarchical Compression for Model-Checking CSP

or How to Check 1020 Dining Philosophers for Deadlock. In Proc. of the FirstInt. Workshop on Tools and Algorithms for Construction and Analysis of Systems(TACAS), pages 133–152, London, UK, 1995. Springer-Verlag.

23. B. G. Ryder and M. C. Paull. Elimination Algorithms for Data Flow Analysis.ACM Comput. Surv., 18(3):277–316, 1986.

24. B. G. Ryder and M. C. Paull. Incremental Data-Flow Analysis. ACM Trans.Program. Lang. Syst., 10(1):1–50, 1988.

30

25. V. C. Sreedhar, G. R. Gao, and Y.-F. Lee. A New Framework for Elimination-Based Data Flow Analysis Using DJ Graphs. ACM Trans. Program. Lang. Syst.,20(2):388–435, 1998.

26. R. E. Tarjan. A Unified Approach to Path Problems. Journal of the ACM,28(3):577–593, 1981.

27. J. G. Zehfuss. Ueber eine gewisse Determinante. Zeitschrift fur Mathematik undPhysik, 3:298–301, 1858.

31

Documents

Shared Memory Concurrent System Veri cation using ...interleavings can be studied with a matrix calculus is a novel approach in this research area. Our sparse matrix representations