Three papers
1. A Shape Analysis for Optimizing Parallel Graph Programs [POPL’11]
2. Elixir: a System for Synthesizing Concurrent Graph Programs [OOPSLA’12]
3. Parameterized Verification of Transactional Memories [PLDI’10]
What’s the connection?
A Shape Analysisfor Optimizing ParallelGraph Programs [POPL’11]
Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]
Parameterized Verificationof Transactional Memories [PLDI’10]
From analysisto language design
Creates opportunitiesfor more optimizations.Requires other analyses
Similarities betweenabstract domains
What’s the connection?
A Shape Analysisfor Optimizing ParallelGraph Programs [POPL’11]
Elixir: a System for SynthesizingConcurrent Graph Programs [OOPSLA’12]
Parameterized Verificationof Transactional Memories [PLDI’10]
From analysisto language design
Creates opportunitiesfor more optimizations.Requires other analyses
Similarities betweenabstract domains
A Shape Analysis for Optimizing Parallel Graph Programs
Dimitrios Prountzos1
Keshav Pingali1,2
Roman Manevich2
Kathryn S. McKinley1
1: Department of Computer Science, The University of Texas at Austin2: Institute for Computational Engineering and Sciences, The
University of Texas at Austin
6
Motivation• Graph algorithms are ubiquitous
• Goal: Compiler analysis for optimization of parallel graph algorithms
Computational biology Social Networks
Computer Graphics
9
Boruvka’s Minimum Spanning Tree Algorithm
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
c d
a b
e f
g
2 4
6
5
3
7
4
1d
a,c b
e f
g
4
6
3
4
17
lt
10
Parallelism in Boruvka
c d
a b
e f
g
2 4
6
5
3
7
4
1
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
11
Non-conflicting iterations
c d
a b
2
5
3
7
4
1
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
e f
g
4
6
12
Non-conflicting iterations
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
d
a,c b3
4
17
e f,g6
13
Conflicting iterations
c d
a b
e f
g
2 4
6
5
3
7
4
1
Build MST bottom-uprepeat { pick arbitrary node ‘a’ merge with lightest neighbor ‘lt’ add edge ‘a-lt’ to MST} until graph is a single node
Optimistic parallelization in Galois• Programming model
– Client code has sequential semantics– Library of concurrent data structures
• Parallel execution model– Thread-level speculation (TLS)– Activities executed speculatively
• Conflict detection– Each node/edge has associated exclusive
lock– Graph operations acquire locks on
read/written nodes/edges– Lock owned by another thread conflict
iteration rolled back– All locks released at the end
• Two main overheads– Locking– Undo actions
i1
i2
i3
Generic optimization structure
Program AnnotatedProgram
ProgramAnalyzer
ProgramTransformer
OptimizedProgram
Overheads (I): locking• Optimizations– Redundant locking elimination– Lock removal for iteration private data– Lock removal for lock domination
• ACQ(P): set of definitely acquired locks per program point P• Given method call M at P:
Locks(M) ACQ(P) Redundant Locking
18
Overheads (II): undo actions
Lockset Grows
Lockset Stable
Failsafe
…
foreach (Node a : wl) {
…
…
}
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Program point P is failsafe if: Q : Reaches(P,Q) Locks(Q) ACQ(P)
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Lockset analysis
• Redundant Locking• Locks(M) ACQ(P)
• Undo elimination• Q : Reaches(P,Q)
Locks(Q) ACQ(P)
• Need to compute ACQ(P)
: Runtime overhead
The optimization technically
• Each graph method m(arg1,…,argk, flag) contains optimization level flag• flag=LOCK – acquire locks • flag=UNDO – log undo (backup) data• flag=LOCK_UNO D – (default) acquire locks and
log undo• flag=NONE – no extra work
• Example:Edge e = g.getEdge(lt, n, NONE)
Analysis challenges
• The usual suspects: – Unbounded Memory Undecidability – Aliasing, Destructive updates
• Specific challenges:– Complex ADTs: unstructured graphs– Heap objects are locked– Adapt abstraction to ADTs
• We use Abstract Interpretation [CC’77]– Balance precision and realistic performance
Shape analysis overview
HashMap-Graph
Tree-based Set
……
Graph { @rep nodes @rep edges …}
Graph Spec
Concrete ADTImplementationsin Galois library
Predicate Discovery
Shape Analysis
Boruvka.javaOptimizedBoruvka.java
Set { @rep cont …}
Set Spec
ADT Specifications
ADT specification
Graph<ND,ED> {
@rep set<Node> nodes @rep set<Edge> edges
Set<Node> neighbors(Node n);
}
Graph Spec
...Set<Node> S1 = g.neighbors(n);
...
Boruvka.java
Abstract ADT state by virtual set fields
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) )
Assumption: Implementation satisfies Spec
Graph<ND,ED> {
@rep set<Node> nodes@rep set<Edge> edges
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}
Modeling ADTs
c
a bGraph Spec
dst
src
srcdst
dst
src
Modeling ADTs
c
a b
nodes edges
Abstract State
cont
ret nghbrs
Graph Spec
dst
src
srcdst
dst
src
Graph<ND,ED> {
@rep set<Node> nodes@rep set<Edge> edges
@locks(n + n.rev(src) + n.rev(src).dst + n.rev(dst) + n.rev(dst).src)@op( nghbrs = n.rev(src).dst + n.rev(dst).src , ret = new Set<Node<ND>>(cont=nghbrs) ) Set<Node> neighbors(Node n);}
Abstraction scheme
cont cont
S1 S2L(S1.cont) L(S2.cont)
(S1 ≠ S2) ∧ L(S1.cont) ∧ L(S2.cont)
• Parameterized by set of LockPaths: L(Path) o . o ∊ Path Locked(o)– Tracks subset of must-be-locked objects
• Abstract domain elements have the form: Aliasing-configs 2LockPaths …
27
( L(y.nd) ) )( ( L(x.nd))
( L(y.nd) ) ( L(y.nd) L(x.rev(src)) ) )( ( L(x.nd))
Joining abstract states
( L(y.nd) ) ( () L(x.nd) )
Aliasing is crucial for precisionMay-be-locked does not enable our optimizations
#Aliasing-configs : small constant (6)
28
lt
GSet<Node> wl = new GSet<Node>();wl.addAll(g.getNodes());GBag<Weight> mst = new GBag<Weight>();
foreach (Node a : wl) { Set<Node> aNghbrs = g.neighbors(a); Node lt = null; for (Node n : aNghbrs) { minW,lt = minWeightEdge((a,lt), (a,n)); } g.removeEdge(a, lt); Set<Node> ltNghbrs = g.neighbors(lt); for (Node n : ltNghbrs) { Edge e = g.getEdge(lt, n); Weight w = g.getEdgeData(e); Edge an = g.getEdge(a, n); if (an != null) { Weight wan = g.getEdgeData(an); if (wan.compareTo(w) < 0) w = wan; g.setEdgeData(an, w); } else { g.addEdge(a, n, w); } } g.removeNode(lt); mst.add(minW); wl.add(a);}
Example invariant in Boruvka The immediate neighbors of a and lt are locked
a
( a ≠ lt ) ∧ L(a) L(a.rev(src)) L(a.rev(dst))∧ ∧ ∧ L(a.rev(src).dst) L(a.rev(dst).src) ∧ ∧ L(lt) L(lt.rev(dst)) L(lt.rev(src)) ∧ ∧ ∧ L(lt.rev(dst).src) L(lt.rev(src).dst)∧
…..
Heuristics for finding LockPaths
• Hierarchy Summarization (HS)– x.( fld )*– Type hierarchy graph acyclic
bounded number of paths
– Preflow-Push: • L(S.cont) L(S.cont.nd)∧• Nodes in set S and their data are locked
Set<Node>
S
Node
NodeData
cont
nd
30
Footprint graph heuristic
• Footprint Graphs (FG)[Calcagno et al. SAS’07]– All acyclic paths from arguments of ADT method to locked
objects– x.( fld | rev(fld) )* – Delaunay Mesh Refinement: L(S.cont) L(S.cont.rev(src)) L(S.cont.rev(dst)) ∧ ∧ ∧ L(S.cont.rev(src).dst) L(S.cont.rev(dst).src)∧– Nodes in set S and all of their immediate neighbors
are locked
• Composition of HS, FG– Preflow-Push: L(a.rev(src).ed)FG HS
Experimental evaluation• Implement on top of TVLA– Encode abstraction by 3-Valued Shape Analysis
[SRW TOPLAS’02]• Evaluation on 4 Lonestar Java benchmarks
• Inferred all available optimizations• # abstract states practically linear in program size
Benchmark Analysis Time (sec)
Boruvka MST 6Preflow-Push Maxflow 7Survey Propagation 12Delaunay Mesh Refinement 16
Impact of optimizations for 8 threads
Boruvka MST Delaunay Mesh Refinement
Survey Propaga-tion
Preflow-Push Maxflow
0
50
100
150
200
250
Baseline Optimized
Tim
e (s
ec)
5.6× 4.7×11.4×
2.9× 8-core Intel Xeon @ 3.00 GHz
Note 1
• How to map abstract domain presented so far to TVLA?– Example invariant: (x≠y L(y.nd)) (x=y L(x.nd))– Unary abstraction predicate x(v) for pointer x– Unary non-abstraction predicate
L[x.p] for pointer x and path p– Use partial join– Resulting abstraction similar to the one shown
Note 2
• How to come up with abstraction for similar problems?1. Start by constructing a manual proof• Hoare Logic
2. Examine resulting invariants and generalize into a language of formulas• May need to be further specialized for a given
program – interesting problem (machine learning/refinement)
– How to get sound transformers?
Elixir : A System for Synthesizing Concurrent Graph Programs
Dimitrios Prountzos1
Roman Manevich2
Keshav Pingali1
1. The University of Texas at Austin2. Ben-Gurion University of the Negev
GoalAllow programmer to easily implement
correct and efficientparallel graph algorithms
• Graph algorithms are ubiquitousSocial network analysis, Computer graphics, Machine learning, …
• Difficult to parallelize due to their irregular nature
• Best algorithm and implementation usually
– Platform dependent– Input dependent
• Need to easily experiment with different solutions• Focus: Fixed graph structure
• Only change labels on nodes and edges• Each activity touches a fixed number of nodes
• Problem Formulation– Compute shortest distance
from source node S to every other node• Many algorithms
– Bellman-Ford (1957)– Dijkstra (1959)– Chaotic relaxation (Miranker 1969)– Delta-stepping (Meyer et al. 1998)
• Common structure– Each node has label dist
with known shortest distance from S• Key operation
– relax-edge(u,v)
Example: Single-Source Shortest-Path
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
12
2 A
C
3
if dist(A) + WAC < dist(C) dist(C) = dist(A) + WAC
Scheduling of relaxations:• Use priority queue of nodes,
ordered by label dist• Iterate over nodes u in priority
order• On each step: relax all
neighbors v of u – Apply relax-edge to all (u,v)
Dijkstra’s algorithm2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
7
53
6
<C,3> <B,5><B,5> <E,6> <D,7><B,5>
40
Chaotic relaxation
• Scheduling of relaxations:• Use unordered set of edges• Iterate over edges (u,v) in
any order• On each step:– Apply relax-edge to edge (u,v)
2 5
1 7
A B
C
D E
F
G
S
34
22
1
9
5
12
(S,A)(B,C)
(C,D)(C,E)
Insights behind Elixir
What should be done How it should
be done
Unordered/Ordered algorithms
Operator Delta
: activity
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
“TAO of parallelism”PLDI 2011
Insights behind ElixirParallel Graph
Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Dijkstra-style Algorithm
q = new PrQueueq.enqueue(SRC)
while (! q.empty ){ a = q.dequeue
for each e = (a,b,w){ if dist(a) + w < dist(b){
dist(b) = dist(a) + w q.enqueue(b)
} } }
Contributions
• Language– Operators/Schedule separation– Allows exploration of
implementation space
• Operator Delta Inference– Precise Delta required for efficient
fixpoint computations
• Automatic Parallelization– Inserts synchronization to
atomically execute operators– Avoids data-races / deadlocks– Specializes parallelization based on
scheduling constraints
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Synchronization
44
SSSP in Elixir
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd)
edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule≫
Graph type
Operator
FixpointStatement
Operators
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Redex pattern
GuardUpdate
ba if bd > ad + w
adw
bd
ba
adw
ad+w
Cautiousby construction – easy to generalize
46
Fixpoint statement
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Apply operator until fixpoint
Scheduling expression
Scheduling examples
Graph [ nodes(node : Node, dist : int) edges(src : Node, dst : Node, wt : int)]
relax = [ nodes(node a, dist ad) nodes(node b, dist bd) edges(src a, dst b, wt w) bd > ad + w ] ➔ [ bd = ad + w ]
sssp = iterate relax schedule ≫
Locality enhanced Label-correctinggroup b unroll 2 approx metric ad ≫ ≫Dijkstra-style
metric ad group b ≫
q = new PrQueueq.enqueue(SRC)while (! q.empty ) { a = q.dequeue for each e = (a,b,w) { if dist(a) + w < dist(b) { dist(b) = dist(a) + w q.enqueue(b) } }}
Operator Delta Inference
Parallel Graph Algorithm
Operators Schedule
Order activity processing
Identify new activities
Static Schedule
Dynamic Schedule
Delta Inference Example
ba
SMT Solver
SMT Solver
assume (da + w1 < db)
assume ¬(dc + w2 < db)
db_post = da + w1
assert ¬(dc + w2 < db_post)Query Program
relax1
c
w2
w1
relax2
(c,b) does not become active
assume (da + w1 < db)
assume ¬(db + w2 < dc)
db_post = da + w1
assert ¬(db_post + w2 < dc)Query Program
Delta inference example – active
SMT Solver
SMT Solver
ba
relax1
cw1
relax2
w2
Apply relax on all outgoing edges (b,c) such that:
dc > db +w2
and c a≄
System architecture
Elixir
Galois/OpenMP Parallel Runtime
Algorithm Spec
Parallel Thread-PoolGraph ImplementationsWorklist Implementations
Synthesize codeInsert synchronization
C++Program
ExperimentsExplored Dimensions
Grouping Statically group multiple instances of operator
Unrolling Statically unroll operator applications by factor K
Dynamic Scheduler Choose different policy/implementation for the dynamic worklist
...
Compare against hand-written parallel implementations
SSSP results
1 2 4 8 16 20 240
100
200
300
400
500
600
700
800
Elixir
Lonestar
Threads
Tim
e (m
s)
• 24 core Intel Xeon @ 2 GHz• USA Florida Road Network (1 M nodes, 2.7 M Edges)
Group + Unroll improve localityImplementation Variant
Breadth-First search results
Scale-Free Graph1 M nodes, 8 M edges
USA road network24 M nodes, 58 M edges
1 2 4 8 16 20 240
100
200
300
400
500
600
700
800
900
1000
Elixir (Variant 1)LonestarCilk
Threads
Tim
e (m
s)
1 2 4 8 16 20 240
1000
2000
3000
4000
5000
6000
7000
Elixir (Variant 2)LonestarCilk
Threads
Tim
e (m
s)
Conclusion
• Graph algorithm = Operators + Schedule– Elixir language :
imperative operators + declarative schedule• Allows exploring implementation space• Automated reasoning for efficiently computing
fixpoints• Correct-by-construction parallelization • Performance competitive with hand-
parallelized code
Parameterized Verificationof Software Transactional Memories
Michael Emmi Rupak MajumdarRoman Manevich
Motivation
• Transactional memories [Herlihy ‘93]– Programmer writes code w. coarse-grained atomic blocks – Transaction manager takes care of conflicts providing illusion
of sequential execution• Strict serializability – correctness criterion
– Formalizes “illusion of sequential execution”• Parameterized verification
– Formal proof for given implementation– For every number of threads– For every number of memory objects– For every number and length of transactions
59
STM terminology
• Statements: reads, writes, commit, abort• Transaction: reads and writes of variables followed
by commit (committing transaction)or abort (aborting transaction)
• Word: interleaved sequence of transactions of different threads
• Conflict: two statements conflict if– One is a read of variable X and other is a commit of a
transaction that writes to X– Both are commits of transactions that write to X
60
Safety property: strict serializability
• There is a serialization for the committing threads such that order of conflicts is preserved
• Order of non-overlapping transactions remains the same
61
Safety property: strict serializability
• There is a serialization for the committing threads such that order of conflicts is preserved
• Order of non-overlapping transactions remains the same• Example word: (rd X t1), (rd Y t2), (wr X t2), (commit t2), (commit t1)
=>
Can be serialized to : (rd X t1), (commit t1), (rd Y t2), (wr X t2), (commit t2)
conflict
conflict62
Main results
• First automatic verification ofstrict serializability for transactional memories– TPL, DSTM, TL2
• New proof technique:– Template-based invisible invariant generation– Abstract checking algorithm to check inductive
invariants
63
Challenging – requires reasoning on both
universal and existential properties
Outline
• Strict serializability verification approach• Automating the proof• Experiments• Conclusion• Related work
64
Proof roadmap 1
Goal: prove model M is strictly serializable1. Given a strict-serializability reference model
RSS reduce checking strict-serializabilityto checking that M refines RSS
2. Reduce refinement to checking safety– Safety property SIM: whenever M can execute a
statement so can RSS– Check SIM on product system M RSS
65
Proof roadmap 2
3. Model STMs M and RSS in first-order logic• TM models use set data structures and typestate bits
4. Check safety by generating strong enough candidate inductive invariant and checking inductiveness– Use observations on structure of transactional
memories– Use sound first-order reasoning
66
Reference strict serializability model
• Guerraoui, Singh, Henzinger, Jobstmann [PLDI’08]• RSS : Most liberal specification of strictly
serializable system– Allows largest language of strictly-serializable
executions• M is strictly serializable iff every word of M is also
a word of RSS– Language(M) Language(RSS)– M refines RSS
67
Modeling transactional memories
• Mn,k=(predicates,actions)– Predicate : ranked relation symbol
p(t), q(t,v),…– Binary predicates used for sets so instead of rs(t,v)
I’ll write vrs(t)– Action :
a(t,v) = if pre(a) then p’(v)=…, q’(u,v)=…• Universe = set of k thread individuals and n
memory individuals• State S = a valuation to the predicates
68
Reference model (RSS) predicates
• Typestates:– RSS.finished(t), RSS.started(t), RSS.pending(t),
RSS.invalid(t)• Read/write sets– RSS.rs(t,v), RSS.ws(t,v)
• Prohibited read/write sets– RSS.prs(t,v), RSS.pws(t,v)
• Weak-predecessor– RSS.wp(t1,t2)
69
DSTM predicates
• Typestates:– DSTM.finished(t), DSTM.validated(t),
DSTM.invalid(t), DSTM.aborted(t)• Read/own sets– DSTM.rs(t,v), DSTM.os(t,v)
70
RSS commit(t) action
71
if RSS.invalid(t) RSS.wp(t,t) thent1,t2 . RSS.wp’(t1,t2) t1t t2t (RSS.wp(t1,t2) RSS.wp(t,t2) (RSS.wp(t1,t) v . vRSS.ws(t1) vRSS.ws(t)))…
post-statepredicate
current-statepredicate
action precondition
write-write conflict
executing thread
DSTM commit(t) action
if DSTM.validated(t) thent1 . DSTM.validated’(t1) t1t DSTM.validated(t1) v . vDSTM.rs(t1) vDSTM.os(t1)…
72
read-own conflict
FOTS states and execution
v
rd v t1
DSTMt1
t2
73
memory locationindividual
threadindividual
state S1
FOTS states and execution
v
rd v t1
DSTM.rs
predicate evaluationDSTM.rs(t1,v)=1
DSTMt1
t2
DSTM.started
74
predicate evaluationDSTM.started(t1)=1
state S2
Product system
• The product of two systems: AB • Predicates = A.predicates B.predicates• Actions =
commit(t) = { if (A.pre B.pre) then … }rd(t,v) = { if (A.pre B.pre) then … }…
• M refines RSS iff on every executionSIM holds: action a M.pre(a) RSS.pre(a)
76
Checking DSTM refines RSS
• The only precondition in RSS is for commit(t)• We need to check SIM =
t.DSTM.validated(t) RSS.invalid(t) RSS.wp(t,t) holds for DSTM RSS for all reachable states
• Proof rule:
77
DSTMRSS SIM
DSTM refines RSS
how do we check this safety
property?
Checking safety by invisible invariants
• How do we prove that property holdsfor all reachable states of system M?
• Pnueli, Ruah, Zuck [TACAS’01]• Come up with inductive invariant that
contains reachable states of M and strengthens SIM:
78
I1: Initial I2: transition ’ I3:
M
Strict serializability proof rule
Proof roadmap:1. Divine candidate invariant 2. Prove I1, I2, I3
I1: Initial I2: transition ’ I3: SIM
DSTM RSS SIM
79
Two challenges
Proof roadmap:1. Divine candidate invariant 2. Prove I1, I2, I3But
how do we find a candidate ?infinite space of possibilitiesgiven candidate how do we check the proof rule? checking AB is undecidable for first-order logic
I1: Initial I2: transition ’ I3: SIM
DSTM RSS SIM
80
Our solution
Proof roadmap:1. Divine candidate invariant 2. Prove I1, I2, I3But
how do we find a candidate ?use templates and iterative weakeninggiven candidate how do we check the proof rule? use abstract checking
I1: Initial I2: transition ’ I3: SIM
DSTM RSS SIM
81
utilize insightson transactional memory
implementations
Invariant for DSTMRSSP1: t, t1 . RSS.wp(t,t1) RSS.invalid(t) RSS.pending(t)
v . vRSS.ws(t1) vRSS.ws(t)
P2: t, v . vRSS.rs(t) DSTM.aborted(t) vDSTM.rs(t)
P3: t, v . vRSS.ws(t) DSTM.aborted(t) vDSTM.os(t)
P4: t . DSTM.validated(t) RSS.wp(t,t)
P5: t . DSTM.validated(t) RSS.invalid(t)
P6: t . DSTM.validated(t) RSS.pending(t)
Inductive invariant involving only RSS – can use for all future proofs
Templates for DSTMRSSP1: t, t1 . 1(t,t1) 2(t) 3(t)
v . v4(t1) v5(t)
P2: t, v . v1(t) 2(t) v3(t)
P3: t, v . V1(t) 2(t) v3(t)
P4: t . 1(t) 2(t,t)
P5: t . 1(t) 2(t)
P6: t . 1(t) 2(t)
Templates for DSTMRSSt, t1 . 1(t,t1) 2(t) 3(t) v . v4(t1) v5(t)
t, v . v1(t) 2(t) v3(t)
t . 1(t) 2(t,t)
Why templates?• Makes invariant separable• Controls complexity of invariants• Adding templates enables refinement
Mining candidate invariants
• Use predefined set of templates to specify structure of candidate invariants– t,v . 1 2 3
– 1, 2, 3 are predicates of M or their negations
– Existential formulas capturing 1-level conflictsv . v4(t1) v5(t2)
• Mine candidate invariants from concrete execution
85
Iterative invariant weakening
• Initial candidate invariant C0=P1 P2 … Pk• Try to prove I2: transition ’
C1 = { Pi | I0 transition Pi for PiI0}
• If C1=C0 then we have an inductive invariant• Otherwise, compute
C2 = { Pi | C1 transition Pi for PiC1}• Repeat until either– found inductive invariant – check I3: Ck SIM– Reached top {} – trivial inductive invariant
I1: Initial I2: transition ’ I3: SIM
DSTM RSS SIM
86
Abstract proof rule
I1: (Initial) I2: abs_transition(()) ’ I3: () SIM
DSTM RSS SIM
I1: Initial I2: transition ’ I3: SIM
DSTM RSS SIM
Formulaabstraction
Abstracttransformer
Approximateentailment
88
Conclusion
• Novel invariant generation using templates – extends applicability of invisible invariants
• Abstract domain and reasoning to check invariants without state explosion
• Proved strict-serializability for TPL, DSTM, TL2– BLAST and TVLA failed
89
Verification resultsproperty TPL DSTM TL2 RSS
Bound for invariant gen.
(2,1) (2,1) (2,1) (2,1)
No. cubes 8 184 344 7296
Bounded time 4 8 10 23
Invariant mining time
6 13 26 57
#templates 28 28 28 28
#candidates 22 53 97 19
#proved 22 30 68 14
#minimal 4 8 5 -
avg, time per invariant
3.7 20.3 36 43.4
avg. abs. size 31.7 256.9 1.19k 2.86k
Total time 3.5m 54.3m 129.3m 30.9m
90
Insights on transactional memories
• Transition relation is symmetric – thread identifiers not usedp’(t,v) … t1 … t2
• Executing thread t interacts only with arbitrary thread or conflict-adjacent thread
• Arbitrary thread:v . vTL2.rs(t1) vTL2.ws(t1)
• Conflict adjacent:v . vDSTM.rs(t1) vDSTM.ws(t)
91
v1
readset
writeset
v2
t2
t
t3
writeset
read-writeconflict
write-writeconflict
Conflict adjacency
92
v . vrs(t) vDSTM.ws(t2)v . vws(t1) vDSTM.ws(t2)
Related work
• Reduction theoremsGuerarroui et al. [PLDI’08,CONCUR’08]
• Manually-supplied invariants – fixed number of threads and variables + PVSCohen et al. [FMCAS’07]
• Predicate Abstraction + Shape Analysis– SLAM, BLAST, TVLA
94
Related work
• Invisible invariantsArons et al. [CAV’01] Pnueli et al. [TACAS’01]
• Templates – for arithmetic constraints• Indexed predicate abstraction
Shuvendu et al. [TOCL’07]• Thread quantification
Berdine et al. [CAV’08]Segalov et al. [APLAS’09]
95