On Sequentializing Concurrent Programs Gennaro Parlato University of Southampton, UK UPMARC 7 th Summer School on Multicore Computing, June 8-10, 2015

On Sequentializing Concurrent Programs

Gennaro ParlatoUniversity of Southampton, UK

UPMARC 7th Summer School on Multicore Computing, June 8-10, 2015

Concurrency for better performace

Clock rates are stalling, but Moore’s Law is still alive…• no longer feasible to increase the speed of individual

processors• additional gates turned into (caches and) multiple cores

⇒ For better performance programs must be concurrent!

⇒ Concurrency has become an important aspect for many areas of computer science:

algorithms, data structures, programming languages, software engineering, testing, verification, …

Writing concurrent programs is DIFFICULT

Programmers have to guarantee• correctness of sequential execution

of each individual thread• under nondeterministic interferences

from other threads (schedules)

Rare schedules result in errors that are difficult to find, reproduce, and repair• developers / testers can spend weeks chasing a single bug

⇒ huge productivity problem

communication mechanism

…T2 TN

T2

threads

Testing

Testing remains the most used (and often the only known) paradigm in industry...

… but is ineffective for concurrent programs:• large number of schedules makes scaling-up difficult• non-deterministic nature of scheduling makes repeatability

difficult

⇒ needs to be complemented by automated analyses that handle schedules symbolically

Verification approach

develop practical but theoretically well-founded symbolic verification techniques based on the idea of

Sequentialization

Sequentialization

Sequentialization: motivations

Building verification tools for full-fledged concurrent languages is difficult and expensive...

… but scalable verification techniques exist for sequential languages

• Abstraction• SAT/SMT techniques (i.e., bounded model checking)• …

⇒ Can we leverage these?

Sequentialization as a code-to-code translation

Code-to-code translation from multithreaded recursive programs to sequential programs that preserves reachability

Conc.program

“equivalent”

Sequential program with non determinism

shared variables

…T2 TNT1

Use existing automatic verification techniques designed for sequential programs to analyze concurrent programs

From concurrent to sequential

Always possible but can be inefficient•simulate the global behavior (track all locals of each thread)•current techniques do not work

What do we want?•avoid the extreme blow-up•track at any point only the locals of one thread

What we want is not always possible

But it is possible if we restrict behaviors•Bug-finding

Sequentialization: advantages

Keep focus on the concurrency aspects of programs, delegate sequential reasoning to an existing analysis tool

•code-to-code translation is much easier to implement than a full-fledged analysis tool

•simplifies experimentation with different approaches

•can be designed to target multiple backends for sequential program analysis

Concurrent (shared-memory) Programs

Formed of sequential programs T1 , … , TN

(each possibly with recursive function calls)

shared variables

…T2 TN

T1

threads

• each program Ti can read and write shared vars

• we assume sequential consistency (SC) (writes are immediately visible to all the other programs)

• an execution is an interleaving of the executions of each Ti

Anatomy of an execution

( l, s1 )

( l, s2 )

( l1’,s1 )

( l2’,s2 )

T1 T2

Keep It Simple and Sequential Sequentialization

A first sequentialization: KISS

KISS: Keep It Simple and Sequential [Quadeer-Wu, PLDI’04]

Under-approximation (subset of interleavings)

Thread creation function call•at context-switches either:

- the active thread is terminated or- a not yet scheduled thread is started (by calling its main function)

•when a thread is terminated either:- the thread that has called it is resumed (if any) or- a not yet scheduled thread is started

KISS schedules

(l1,s1)

T1

(l1,s3)

T2(l2,s1)

T3

(l3,s2)

(l4,s2)

(l5,s3)

Scheduling 1:1. Start T1

2. Start T2

3. Terminate T2

4. start T3

5. terminate T3

6. Resume T1

T1 T2 T3

Scheduling 2:1. start T1

2. start T2

3. start T3 4. terminate T3

5. resume T2

6. terminate T2

7. resume T1

T1 T2 T3

Scheduling 3:1. start T1

2. start T2

3. terminate T2

4. resume T1

5. start T3

6. terminate T3

7. resume T1

More on KISS

Allows dynamic thread creation in form of asynchronous calls

Bounds the number of threads that have been created but not started yet

-scheduler nondeterministically starts a thread from this set, or -resumes the last suspended thread (if any)

State space: no cross product

Context-switches:-does allow an unbounded number of context-switches-does not allow a bounded number context-switches between any two threads (for more than 1 interaction)

Bounded Context-Switching (CS) is essential

Switching between threads is allowed only a bounded number of times [Qadeer-Rehof, TACAS’05]

Systematic bounded CS is useful for bug hunting:

most concurrency related bugs manifest themselves within few CS [Musuvathi-Qadeer, PLDI’07]

Efficient sequentializations for bounded CS•Eager approach [Lal-Reps, CAV’08]

•Lazy approach [La Torre-Madhusudan-Parlato, CAV’09]

LR Sequentialization[ Lal-Reps, CAV’08 ]

LR sequentialization:

Bounded Round-Robin schedules

T1 TNTN-1T2

…

round 1

round 2

round k

round 3

… …

Bounded Round- Robin captures bounded context-switches

Schedule: T2 T3 T4 T1 T3 T2 T1; minimal number of rounds ???

LR sequentialization: simulation

Sequential program (k-rounds)

1. Guess a2, …, ak

2. Execute T1 to completion

Computes

- local states l1, .., lk, and

- global states b1, …, bk

(l1,b1)

T1

(l1,a2)

(l2,b2)

(l2,a3)

(l3,b3)

T2 T3



1. Guess a2,…,ak


3. Pass b1,…,bk to T2

(l1,b1)

T1

(l1,a2)

(l2,b2)

(l2,a3)

(l3,b3)

T2 T3

b1

b2

b3



1. Guess a2,…,ak



We can dismiss locals of T1

T1

a2

a3

T2 T3

b1

b2

b3



• Guess a2,…,ak

• Execute T1 to completion

• Pass b1,…,bk to T2


1. Dismiss locals of T2

2. Dismiss b1,…,bk

T1

a2

a3

T2 T3

b1

b2

b3

(l1’,c1)

c3

(l1’,b2)

(l0’,b1)

(l2’,c2)

(l2’,b3)



• Guess a2,…,ak


• Pass b1,…,bk to T2


T1

a2

a3

T2 T3

b1

b2

b3

c1

c2

c3



1. Guess a2,…,ak




5. Pass c1,…,ck to T3


1. Dismiss locals of T3

2. Dismiss c1,…,ck

T1

a2

a3

T2 T3

b1

b2

b3

c1

c2

c3

d3

(l0’’, c1)

(l1’,d1)(l1’,c2)

(l1’,d2)(l1’,c3)



1. Guess a2,…,ak






T1

a2

a3

T2 T3

b1

b2

b3

c1

c2

c3

d1

d2

d3



1. Guess a2,…,ak






7. Computation iff di = ai+1

i [1,k-1]

T1

a2

a3

T2 T3

b1

b2

b3

c1

c2

c3

d1

d2

d3



1. Guess a2,…,ak






7. Computation iff di = ai+1

i [1,k-1]

1. Report an error if a bug occurs during the simulation

T1

a2

a3

T2 T3

b1

b2

b3

c1

c2

c3

d1

d2

d3

LR seq. as a code-to-code translation for C programs + PthreadCSeq website: users.ecs.soton.ac.uk/gp4/cseq/

[Fischer-Inverso-Parlato, ASE’13]

…T1 T2

TN

…F1() F2() FN() main()

Concurrent program

“equivalent”Sequential program with non determinism

Sequentialization(code-to-code translation)

Simulation functions

translates

…

translates

translates

LR sequentialization as a code-to-code translation


• considers round-robin scheduleswith k rounds- thread → function, run to completion

• global memory copy for each round - scalar → array

• context switch → round counter++• first thread starts with

nondeterministic memory contents- other threads continue with content left by predecessor

T1 T2

S2,2

S0,2

S1,2

SK-1,2

TN

...

... ...

...

...

...

S2,1

S0,1

S1,1

SK-1,1

S2,N

S0,N

S1,N

SK-1,N


• considers round-robin scheduleswith k rounds- thread → function, run to completion

• global memory copy for each round - scalar → array

• context switch → round counter++• first thread starts with

nondeterministic memory contents- other threads continue with content left by predecessor

• checker prunes away inconsistent simulations

- assume(Si+1,0 == Si,N);

- requires second set of memory copies- errors can only be checked at end of simulation

• requires explicit error checks

T1 T2

S2,2

S0,2

S1,2

Sk-1,2

TN

...

... ...

...

...

...

S2,1

S0,1

S1,1

Sk-1,1

S2,N

S0,N

S1,N

Sk-1,N


//shared varstypeg1 g1; typeg2 g2; …

//thread functionst(){ typex1 x1; typex2 x2; … stmt1 ; stmt2 ; …} …

main(){ …}

//shared varstypeg1 g1[K]; typeg2 g2[K]; …uint round=1; bool ret=0; //aux vars

// context-switch simulationcs() { unsigned int j; j= nondet(); assume(round +j < K); round+=j; if (round==K-1 && nondet()) ret=1;}

//thread functionst(){ typex1 x1; typex2 x2; … cs(); if ret return; stmt1[round]; cs(); if ret return; stmt2[round]; …} …

main_thread(){ …}

main(){ … } //next slide


main(){ typeg1 _g1[K]; typeg2 _g2[K]; … // first thread starts with non-deterministic memory contents for (i=1;i++;i<K){ _g1[i] = g1[i] = nondet(); _g2[i] = g2[i] = nondet(); … } // thread simulations t[0] = main_thread; born[0] = ACTIVE; for (i=0;i++;i<N){ if(born[i]>NO_ACTIVE){ ret=0; round = born[i]; t[i](); } } // consistency check for (i=0;i++;i<K-1){ assume(_g1[i+1] == g1[i]); assume(_g2[i+1] == g2[i]); … } // error detection assert(err ==0); }

Implementations of variants of LR schema (SMT-based)

Corral (SMT-based analysis for Boogie programs)

– [ Lal–Qadeer–Lahiri, CAV’12 ]– [ Lal–Qadeer, FSE’14 ]

CSeq (code-to-code translation for C + PThread) – [ Fischer–Inverso–Parlato, ASE’13 ]

Rek (for Real-time Embedded Software Systems)

– [ Chaki–Gurfinkel–Strichman, FMCAD’11 ]

Storm: implementation for C programs – [ Lahiri–Qadeer–Rakamaric, CAV’09 ]– [Rakamaric, ICSE’10]

•General eager translation representing thread interactions using bounded DAGs [Bouajjani-Emmi-Parlato, SAS’11]

Lazy Sequentialization(Lazy Approach)

[ La Torre-Madhusudan-Parlato, CAV’09 ]

LR sequentialization is “eager”

T1

a2

T2 T3

b1

a3

b2

a2 is guesseda2 may be unreachable

EAGER

[Lal, Reps CAV’08]

LR sequentialization does not preserve assertions

void thread1() { while (blocked); x = x/y; if (x%2==1) ERROR; }

void thread2() { x=12; y=2;

//unblock thread2 blocked=false;}

// shared variablesbool blocked=true;int x=0, y=0;

Inv: y != 0

blocked=true

blocked=false

guess x=13

y=0

A lazy transformation is desirable

A lazy sequential program explores only reachable states of the concurrent program

Why is it desirable? • In model-checking it can drastically reduce the explored state-

space• Better invariants for deductive verification / abstract interpretation

We now illustrate a lazy transformation:

[La Torre-Madhusudan-Parlato, CAV’09]

Lazy transformation: main idea

Execute T1

Context-switch:

store s1 and abort

Execute T2 from s1

store s2 and abort

(l1,s1)

(l’1,s1)

(l’2,s2)

T1

(l0,s0)

T2

store s1

& abort store s2

& abort


Re-execute T1 till it reaches s1

May reach a new local state!

Anyway it is correct !!

Restart from global s2 and compute s3

(l1,s1)

(l’1,s1)

(l’2,s2)

T1

(l0,s0)

T2

store s1

& abort store s2

& abort

(l’’1,s1)

store s3

& abort

(l’’1,s2)


Switch to T2

Execute till it reaches s2

Continue computation from global s3

(l1,s1)

(l’1,s1)

(l’2,s2)

T1

(l0,s0)

T2

store s1

& abort store s2

& abort

(l’’1,s1)

store s3

& abort

(l’’’1,s2)

(l’’1,s2) (l’’’1,s3)


T1 T2

store s1

store s2

store s3

store s4

store s5

end

s1

s2

s3s4

s1

s2

s3

s4

s5

Lazy transformation: features

• Explores only reachable states

• Preserves invariants across the translation

• Tracks local state of one thread at any time

• Tracks values of shared variables at context switches

(s1, s2, …, sk)

• Requires recomputation of local states

…T1 T2

TN

…F1() F2() FN() main()

Concurrent program

“equivalent”Sequential program with non determinism

Sequentialization(code-to-code translation)

Simulation functions

translates

…

translates

translates

Lazy sequentialization as a code-to-code translation

Guess scheduling Orchestrate calls to threads (Fi)

Nondet jump to next context where this thread is active At last context-switch, store shared state, abort, and return to main

Lazy translation for Concurrent Boolean programs

• Concurrent Boolean programs Boolean programs

• We have implemented an eager and a lazy translator for concurrent Boolean programs- Download: http://www.cs.uiuc.edu/~madhu/getafix/cbp2bp

Experiments: Windows NT Bluetooth driver Context

switches

1-adder

1-stopper

2-adders

1-stopper

1-adder

2-stoppers

2-adders

2-stoppers

eager lazy eager lazy eager lazy eager lazy

1

2

3

4

5

6

N

N

N

N

N

N

0.1

0.3

43.3

73.6

930.0

-

0.1

0.2

1.4

5.5

20.2

66.8

N

N

N

Y

Y

Y

0.2

0.9

135.9

1601.0

-

-

0.1

0.8

6.3

2.6

18.0

122.9

N

N

Y

Y

Y

Y

0.1

0.7

70.1

597.2

-

-

0.1

0.9

0.4

2.9

14.0

66.1

N

N

Y

Y

Y

Y

0.2

1.6

177.6

out of mem.

out of mem.

out of mem.

0.1

2.0

0.8

7.5

66.5

535.9

• Backend sequential analysis: GetAFix [La Torre-Madhusudan-Parlato, PLDI’09]- BDD-based analysis: it stores summaries recomputations => no multiple thread explorations

• Lazy outperforms Eager

http://www.cs.uiuc.edu/~madhu/getafix/cbp2bp

More Sequentializations


• Eager and Lazy sequentializations can be extended to parameterized programs: [La Torre-Madhusudan-Parlato, CAV’10, FIT’12]

T1 T2Tm

in1

in2

in3

out1

out2

out3

Correctness of abstractions of several Linux device drivers



• Delay-bounded scheduling [Emmi-Qadeer-Rakamaric, POPL’11]

- Programs with asynchronous calls (creating tasks) - Each task is executed to completion (no interleaving with other tasks)- Sequentialization is according to a DFS scheduler of tasks- When dispatched, a task can be delayed to next round

– the total number of delays in a task-creation tree is bounded by k– total number of explored rounds is k+1

– The beginning of each round is guessed (eager)




• General sequentialization [Bouajjani-Emmi-Parlato, SAS’11]

- Programs with asynchronous calls

- Tasks can be interleaved with other ones

- Sequentialization based on– DAGs of contexts – Composition and compression operations

– Bound on the size of the DAGs

– Generalizes k-rounds Eager e delay bounded-scheduling sequentialization

aa

cc

dd

bb

ee

More Sequentializations



• General sequentialization [Bouajjani-Emmi-Parlato, SAS’11]

• Scope-bounded sequentialization [La Torre-Napoli-Parlato, FSTTCS’12]

• Budget-bounded sequentialization [Abdulla-Atig- Rezine-Stenman, FMCAD’12]

• Sequentialization for proving correctness [Garg-Madhusudan, TACAS’11]

• Bounded-phase sequentialization of message passing programs [Bouajjani-Emmi,

TACAS’12]

• Eager sequentialization for periodic programs[Chaki-Gurfinkel-Sinha, FMCAD’14], [Chaki-Gurfinkel-Strichman, FMCAD’13]

• …

Conclusions

Conclusions

Sequentialization is an effective approach to analyze concurrent programs–Fast prototyping–Re-use of mature technologies (tools designed for sequential programs)–Code-to-code translation

Presented translations:–keep track only of the local state of the current thread (no cross product)–Keep track of a finite number of copies of shared states –thread creation is implemented with calls

• KISS is a lazy sequentialization that allows an unbounded number of context-switches, but does not allow a bounded number context-switches between any two threads

•Bounded context-switches:- Eager translations require guessing of values of the shared variables and may

explore unreachable states- Lazy translations preserve the invariants and introduces many recursive calls

(thread re-computations)

Tomorrow’s lecture

Sequentializations for Bounded Model Checking backends:• Lazy-CSeq [ Inverso-Tomasco-Fischer-La Torre-Parlato, CAV’14 ]

• MU-Cseq [ Tomasco-Inverso-Fischer-La Torre-Parlato, TACAS’15 ]

framework for developing sequentialization of

concurrent C programs :

users.ecs.soton.ac.uk/gp4/cseq/

Documents

On Sequentializing Concurrent Programs Gennaro Parlato University of Southampton, UK UPMARC 7 th Summer School on Multicore Computing, June 8-10, 2015