Shape Analysis for Fine-Grained Concurrency using Thread Quantification Josh Berdine Microsoft Research Joint work with: Tal Lev-Ami, Roman Manevich, Mooly

Shape Analysisfor Fine-Grained Concurrency

using Thread Quantification

Josh BerdineMicrosoft Research

Joint work with:Tal Lev-Ami, Roman Manevich, Mooly Sagiv (Tel Aviv),

Ganesan Ramalingam (MSR India)

2

Non-blocking stack [Treiber,‘86]

void push(Stack *S, data_type v) {[1] Node *x = alloc(sizeof(Node));[2] x->d = v;[3] do {[4] Node *t = S->Top;[5] x->n = t;[6] } while (!CAS(&S->Top,t,x));[7] }

data_type pop(Stack *S){[8] do {[9] Node *t = S->Top;[10] if (t == NULL)[11] return EMPTY;[12] Node *s = t->n;[13] data_type r = t->d;[14] } while (!CAS(&S->Top,t,s));[15] return r;[16] }

benign data races

unbounded number of

threads

t points to valid memory? list remains acyclic?

if (S->Top == t) S->Top = x; evaluate to true;else evaluate to false;Stack

linearizable?Stack linearizable?

• Linearizable data structure

– Concurrent operations allowed to be interleaved

– Operations appear to execute atomically

• External observer gets the illusion that each operation takes effect

instantaneously at some point between its invocation and its

response

• Order of operations of same thread preserved

– Sequential specification defines legal sequential executions

3

time

push(4)

pop():4push(7)

push(4)

pop():4push(7)

Last In First Out

Concurrent LIFO stack

T1

T2

Linearizability [Herlihy and Wing, TOPLAS'90]

4

push2(4,5)

pop2():8,5push2(7,8)

void push2(Stack *S, data_type v1, data_type * v2) { push(s, v1); push(s, v2);}

void pop2(Stack *S, data_type * v1, data_type * v2) { *v2 = pop(s); *v1 = pop(s); }

time

push2(4,5)

pop2():8,5push2(7,8)

illegal sequential execution

Non-linearizable pairs stack

push2(4,5)

pop2():8,5push2(7,8)

5

void push2(Stack *S, data_type v1, data_type * v2) { push(s, v1); push(s, v2);}

void pop2(Stack *S, data_type * v1, data_type * v2) { *v2 = pop(s); *v1 = pop(s); }

time

push2(4,5)

pop2():8,5push2(7,8)

illegal sequential execution

Non-linearizable pairs stack

• Motivation + what is linearizability

• Universally quantified shape abstractions

• Checking linearizability

• Case studies

6

Outline

7

• Heaps contain both threads and objects

Concurrent heaps [Yahav, POPL’01]

thread object with

program counter

thread-local variable

list field

list object

pc=6 pc=2

x

n

x

Topt

global variab

le

8

• Heaps contain both threads and objects

– Logical structure, or

– Formula in subset of FOTC [Yorsh et al., TOCL‘07]

Concurrent heaps [Yahav, POPL’01]

pc=6 pc=2

x

n

x

Topt

pc(tr1)=6 pc(tr2)=2 v1,v2,v3. Top(v1) x(tr1,v2) t(tr1,v1) x(tr2,v3) n(v2,v1) …

v1

v3

v2

tr1 tr2

9

Unbounded concurrent heaps

void push(Stack *S, data_type v) {[1] Node *x = alloc(sizeof(Node));[2] x->d = v;[3] do {[4] Node *t = S->Top;[5] x->n = t;[6] } while (!CAS(&S->Top,t,x));[7] }

pc=6 pc=5

x

n

x

Toppc=1

pc=2

x

pc=2

x

t

pc=5

x

tpc=6

x

n

t

t

pc=1pc=1

Unbounded parallel composition:push(Top,?) || ... || push(Top,?)

n

n

10

• Each subheap

– Presents a view of heap relative to one thread

– Can be instantiated ≥0 times

Thread-relative subheaps

pc=5

t

pc=2

x

xpc=1 Top

Top

pc=6

t

n

x

Top

Top

n

n

n

n

n

n

n

n

11

• Each subheap

– Presents a view of heap relative to one thread

– Can be instantiated ≥0 times

– Bounded by finitary abstraction

Bounded thread-relative subheaps

pc=4

t

pc=2

x

xpc=1 Top

Top

pc=6

t

n

x

Top

Top

n

n

n

n

n

n

n

n

12

Concurrent heap

pc(tr1)=6 pc(tr2)=2 v1,v2,v3. Top(v1) x(tr1,v2) t(tr1,v1) x(tr2,v3) n(v2,v1) …

pc=6 pc=2

x

n

x

Topt v1

v3v2

tr1 tr2

13

pc=2

x

Top

pc(t)=6 v1,v2. Top(v1) x(t,v2) t(t,v1) n(v2,v1) …

t.pc(t)=2 v1,v3. Top(v1) x(t,v3) …

Universally quantified local heaps

pc=6

x

n

Topt

t t

v1 v1

v2

v3

symbolic

thread

symbolic

thread

14

pc(t)=6 v1,v2. Top(v1) x(t,v2) t(t,v1) n(v2,v1) …

t.pc(t)=2 v1,v3. Top(v1) x(t,v3) …

Meaning of quantified invariant

pc=6

x

n

Topt

x

pc=1

pc=6

pc=2

t

Information maintained (dis)equalities between

local variables of each thread and global variables

Objects reachable from global variables

Information lost (dis)equalities between

local variables of different threads

Number of threads

pc=2

x

Top

x

pc=1

pc=6

pc=3

t

pc=1

×m n×




• Case studies

15

Outline

• Linearizable data structure

– Concurrent operations allowed to be interleaved

– Operations appear to execute atomically

• External observer gets the illusion that each operation takes effect

instantaneously at some point between its invocation and its

response

• Order of operations of same thread preserved

– Sequential specification defines legal sequential executions

16

time

push(4)

pop():4push(7)

push(4) pop():4push(7)

Last In First Out

Concurrent LIFO stack

T1

T2

Linearizability [Herlihy and Wing, TOPLAS'90]

17

• Compare each concurrent execution to a specific sequential

execution

• Show that every (terminating) concurrent operation returns

the same result as its sequential counterpart

Verification of fixed linearization points [Amit et al., CAV’07]

linearizationpoint

operationConcurrent Execution

Sequential Execution

compare results

...

linearizationpoint

Conjoined Execution

compare

results

18

Toppc=1

Conjoined execution for push

concurrent state

sequential view

isomorphism

relationTop

void push(Stack *S, data_type v) {[1] Node *x = alloc(sizeof(Node));[2] x->d = v;[3] do {[4] Node *t = S->Top;[5] x->n = t;[6] } while (!CAS(&S->Top,t,x)); // @LINEARIZE on CAS[7] }

19

Top Toppc=1


conjoined state

duo-object

void push(Stack *S, data_type v) {[1] Node *x = alloc(sizeof(Node));[2] x->d = v;[3] do {[4] Node *t = S->Top;[5] x->n = t;[6] } while (!CAS(&S->Top,t,x)); // @LINEARIZE on CAS [7] }

20


Top Toppc=2

x

delta object

tracks differences between concurrent

and sequential execution per thread

Top Toppc=1


21



Top Toppc=2

x

Top Toppc=1Top Toppc=5

x t…Top Toppc=6

x t

n

Top

Toppc=7

n

if (S->Top == t) S->Top = x; evaluate to true;else evaluate to false;

22

Run operation sequentially


Top

Toppc=7

n

Top

Toppc=7

n

xTop

Toppc=7

n

x

t

Top

Toppc=7

n

x

t

n

Top Top

pc=7

n n

TopTop

pc=7

n

≈

Check results:concurrent and

sequential stacks are correlated

Observations used

• Unbounded number of heap objects

– Number of delta objects created per thread is bounded

– Objects in recursive data structures bounded by existing shape

abstractions

• Delta objects always referenced by local or global variables

– Captured by single thread’s view of heap

• Threads mutate data structures “near” global access points

– Can precisely model success/failure of CAS without looking deep

into heap

• Losing most inter-thread correlations is ok

– Fine-grained programs must protect themselves from interference23




• Case studies

24

Outline

25

Case studies

Verified Programs #states time (sec.)

Non-blocking stack[Treiber 1986]

764 7

Two-lock queue[Michael & Scott, PODC 1996]

3,415 17

Non-blocking queue[Doherty & Groves, FORTE 2004]

10,333 252

Related work

• [Gotsman et al., PLDI’07]– Thread-modular shape analysis for coarse-grained

concurrency

• [Vafeiadis et al.,’06,’07,’08]– Linearizability for an unbounded number of threads with

rely-guarantee & separation logic

26

• Strengths– Parametric shape abstraction for an unbounded number of

threads– Verifies linearizability of fine-grained concurrent implementations– Tunable scalability

• via thread-modular aspects– Tunable precision

• via abstract semantics using multiple-instantiations of invariants

• Limitations / Future work– Fixed, specified, linearization points– Setting the frameworks “knobs” optimally can be difficult, and

require understanding program– Only as good as underlying heap abstraction– Does not prove encapsulation of data structure– May want to prove more than linearizability

27

Conclusion

Documents

Shape Analysis for Fine-Grained Concurrency using Thread Quantification Josh Berdine Microsoft Research Joint work with: Tal Lev-Ami, Roman Manevich, Mooly