31
A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom, Konrad Slind School of Computing University of Utah Supported in part by SRC Contract 1031.001 and NSF Grants CCR-0081406, 0219805

A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

A Unified Framework for Constraint Based Shared Memory Consistency Analysis

a presentation in CP+CV’04

Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom, Konrad Slind School of Computing

University of Utah

Supported in part by SRC Contract 1031.001 and NSF Grants CCR-0081406, 0219805

Page 2: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

2

Problem Context #1: The design of efficient multiprocessors

Efficient Multiprocessors have Efficient Shared Memory Systems

... because CPUs grow faster faster than memory systems

Page 3: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

3

Efficient Shared-memory Multiprocessor Systems employ

• Weak memory models

– Controlled ways to postpone global view updates

. Advanced consistency protocols, advanced OS libraries, ... depend on it

Problem: How to specify these weak memory models?

How to use the specification in practice to support verification activities?

Page 4: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

4

Problem Context #2: The design of multithreaded software

Language-level weak memory models are being studied

... because languages with explicit threading cannot be implemented efficientlyon a wide variety of platforms

Examples:

Java C# OpenMP Re-entrant device drivers OS code that runs very fast with minimal locking ...

Page 5: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

5

Characteristics of language-level memory models

• Weak memory models

– Controlled ways to postpone global view updates

– Encompass compiler optimizations that “make sense”

Problem: How to specify language-level memory models?

How to use the specification in practice to support verification activities?

Page 6: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

6

Answer: Constraints seem very attractive!

• Constraint-based specification of several architecture-level memory models

• The use of these specifications to enable formal comparisons among the models

• The use of these specs for post-silicon verification (work in collaboration with Intel) (FOCUS OF THIS TALK)

• Analysis of proposals for language-level memory models (Yang’s dissertation)

• The use of specs of language-level memory models for

– Memory-model sensitive program analysis (preliminary work in Yang’s dissertation)

– ... for certifying compilers that exploit language-level memory models (future work)

Page 7: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

7

Publications

• Yang, Y., Gopalakrishnan, G., Lindstrom, G., and Slind, K., “Analyzing the Intel Itanium Memory Memory Ordering Rules using Logic Programming and SAT,” Charme 2003, LNCS 2860, October 2003.

• Yang, Y., Gopalakrishnan, G., Lindstrom, G., and Slind, K., “Nemos: A framework for axiomatic and executable specifications of memory consistency models,” IPDPS 2004, Santa Fe, NM, April 2004.

• Yang, Y., Gopalakrishnan, G., and Lindstrom, G., “A Constraint Based Approach for Specifying Memory Consistency Models,” Journal of Logic Programming (submitted to a special issue on constraints)

• Gopalakrishnan, G., Yang, Y., and Hemanthkumar, S., “QB or not QB: An efficient verification tool for memory orderings,” Accepted by CAV’04

Page 8: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

8

Proof of concept Software

• Nemos (Yue Yang)– Defines memory models via Constraint Logic Programs – Constraint-Prolog code available for experimentation

• DefectFinder (Yue Yang)– Constraint-Prolog code that models race and atomicity analysis – works in the small (no loops at present)– underlying shared memory model given as an explicit parameter

• QBF-based Memory Order Checker – written by the PI in Ocaml– Compiles memory order rules written in HOL to QBF– Does not scale yet – Might provide benchmarks to tune QBF-tools

• SAT-based Memory Order Checking Tool – Present version written by the PI in Ocaml– Next version expected to be formally derived – Replaces Constraint-Prolog version for Itanium– One “real” execution given by Intel was successfully run

Page 9: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

9

Another effort involving constraints (presented at DCC’04): Limited Observability Run-time Verification (with Ching Tsun Chou)

Page 10: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

10

The rest of the talk: Post-Si verification of MP Orderings using Constraints

Page 11: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

11

Weak memory models allow multiple executions...

MemoryCPU CPU

st c,1 ;st d,2

ld d;ld c

st c,1 ;st d,2

ld d, 2;ld c, 1

st c,1 ;st d,2

ld d, 2;ld c, 0

One possibleexecution...

Anotherexecution...

Impossible under SC Possible under Itanium

Possible under SC and under Itanium

Page 12: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

12

Commercial Weak Memory Model Specs are Complex

• Intel’s original specification was pretty voluminous

• They later issued a formal spec that clarified many things

• Yet, their “formal” spec left many things informal

• It was purely “on paper” (no machine-readable formal spec)

• Our exercise was to take Intel’s semi-formal spec and capture it in HOL

• Result: 36 pages of Intel’s formal spec in 3 pages of HOL spec

• Can prove “challenge theorems” (future work)

Page 13: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

13

Basic idea behind Intel’s Formal Spec (which we follow in our formal spec)

legalItanium(ops) =Exists order.( requireStrictTotalOrder ops order

/\ requireWriteOperationOrder ops order/\ requireItProgramOrder ops order/\ requireMemoryDataDependence ops order/\ requireDataFlowDependence ops order/\ requireCoherence ops order/\ requireAtomicWBRelease ops order/\ requireSequentialUC ops order/\ requireNoUCBypass ops order /\ requireReadValue ops order

SC(ops) =Exists order.( requireStrictTotalOrder ops order

/\ requireProgramOrder ops order

/\ requireReadValue ops order

Make it look like SC so that people have less trouble understanding!

Call it “otherOrder”

Page 14: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

14

But, how do we check executions against such specs?

legalItanium(ops) =Exists order.( requireStrictTotalOrder ops order

/\ requireWriteOperationOrder ops order/\ requireItProgramOrder ops order/\ requireMemoryDataDependence ops order/\ requireDataFlowDependence ops order/\ requireCoherence ops order/\ requireAtomicWBRelease ops order/\ requireSequentialUC ops order/\ requireNoUCBypass ops order /\ requireReadValue ops order

st c,1 ;st d,2

ld d, 2;ld c, 1

st c,1 ;st d,2

ld d, 2;ld c, 0

SC(ops) =Exists order.( requireStrictTotalOrder ops order

/\ requireProgramOrder ops order

/\ requireReadValue ops order

Execution 1 Execution 2

e.g., which execution is legal under which memory model ?

Page 15: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

15

Why care about Execution Validation?

• In complex systems, FV helps eliminate (most) bugs

• Must verify final silicon also (as far as possible)

(Note: this is different from “fabrication fault” testing)

• FV can help immensely during Post-Silicon Verification !!

- This is like “runtime verification” ala. Havelund, Rosu, Lee, ...

Page 16: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

16

Post-Si verification of MP Orderings today (oversimplified)

New MP System

assemblyprogram 1

assemblyprogram n

...

...

assemblyexecution 1

assembly execution n

Run repeatedly to catch one interleavingthat might reveal bug

Check every executionagainst ordering rules forcompliance

* This is done ad-hoc* How to make this formal and efficient ?* How to capitalize on repeated re-runs ?

Page 17: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

17

Initial approach tried and abandoned...requireProgramOrder ops order = Forall i,j : ops ( orderedByAcquire i j \/ orderedByRelease i j \/ orderedByFence i j ) ==> order i j

( % Rule (ACQ): ACQ>>I .....

#\/

% Rule (REL):

Op_j #= StRel #/\(

IsWr_i #==>(WrType_i #= Local #/\ WrType_j #= Local

#\/WrType_i #= Remote #/\ WrType_j #= Remote

#/\ WrProc_i #= WrProc_j))

....#==>Oij.

IMPOSES CONSTRAINT ONMATRIX ENTRY Oij

Page 18: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

18

Our new SAT-based execution formal verification method

legalItanium(ops) =

Exists order.

( requireStrictTotalOrder ops order

/\ requireWriteOperationOrder ops order

/\ requireItProgramOrder ops order

/\ requireMemoryDataDependence ops order

/\ requireDataFlowDependence ops order

/\ requireCoherence ops order

/\ requireAtomicWBRelease ops order

/\ requireSequentialUC ops order

/\ requireNoUCBypass ops order

/\ requireReadValue ops order

st c,1 ;st d,2

ld d, 2;ld c, 1

Execution

Hand-derivation now......to be automated

Program capturingmemory ordering rules

SAT instance

Sat SolverSATUNSAT

Explanation(How thingsmay bypasseach other...)

ExtractUnsat core(currently doneusing Zcore)

Find out whichinstructionsviolated whatordering rules...

Page 19: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

19

P1: St a,1; Ld r1,a <1>; St b,r1 <1>;

P2: Ld.acq r2,b <1>; Ld r3,a <0>;

Have built tool for tuple-generation that addresses many details:(1) Expansion into tuples with variable address allocation

{id=0; proc=0; pc=0; op= St; var=0; data=1; wrID=0; wrType=Local; wrProc=0; reg=-1; useReg=false};

{id=1; proc=0; pc=0; op= St; var=0; data=1; wrID=0; wrType=Remote; wrProc=0; reg=-1; useReg=false};

{id=2; proc=0; pc=0; op= St; var=0; data=1; wrID=0; wrType=Remote; wrProc=1; reg=-1; useReg=false};

{id=3; proc=0; pc=1; op= Ld; var=0; data=1; wrID=-1; wrType=DontCare; wrProc=-1; reg=0; useReg=true};

{id=4; proc=0; pc=2; op= St; var=1; data=1; wrID=4; wrType=Local; wrProc=0; reg=0; useReg=true};

{id=5; proc=0; pc=2; op= St; var=1; data=1; wrID=4; wrType=Remote; wrProc=0; reg=0; useReg=true};

{id=6; proc=0; pc=2; op= St; var=1; data=1; wrID=4; wrType=Remote; wrProc=1; reg=0; useReg=true};

{id=7; proc=1; pc=0; op= LdAcq; var=1; data=1; wrID=-1; wrType=DontCare; wrProc=-1; reg=1; useReg=true};

{id=8; proc=1; pc=1; op= Ld; var=0; data=0; wrID=-1; wrType=DontCare; wrProc=-1; reg=2; useReg=true}

Tuple 1

Tuple 8

...

Page 20: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

20

How the SAT encoding is achieved... (actually QBF that’s unrolled...)

legalItanium(ops) =Exists order.( requireStrictTotalOrder ops order /\ requireOtherOrderItanium ops order

/\ requireReadValue ops order

st c,1 ;st d,2

ld d, 2;ld c, 0

SC(ops) =Exists order.( requireStrictTotalOrder ops order /\ requireOtherOrderSC ops order

/\ requireReadValue ops order

Example Execution

Break it down into “tuples”

• Store c viewed at P1 for modeling bypassing• Store c viewed at P1 for modeling global visibility• Store c viewed at P2 for modeling global visibility• Store d viewed at P1 for modeling bypassing• Store d viewed at P1 for modeling global visibility• Store d viewed at P2 for modeling global visibility• Ld d viewed at P2 for modeling read value• Ld c viewed at P2 for modeling read value

8 tuples obtained

Page 21: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

21

Constraint Encoding Approach #1

n logn approach (“small domain” encoding)

• Attach a word w_t of 2 bits to each tuple t• Tuple i before Tuple j --> Assert wi < wj

• StrictTotalOrder --> Assert that the wt words are distinct

• Smaller # of Boolean Vars • Much Harder SAT instances (abandoned for now)

Illustration on4 tuples

requireStrictTotalOrder ops

order requireOtherOrder ops

order requireReadValue ops order

x00 x01 x10 x11

x20 x21 x30 x31

For all i, j: xi1,xi0 != xj1, xj0

A system of constraintswith primitive constraint xi1, xi0 < xj1, xj0

Page 22: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

22

Constraint Encoding Approach #2

n n approach (“e_ij” encoding)

• Assign a matrix position mij for each pair of tuples ti and tj • Tuple i before Tuple j --> Assert mij true• StrictTotalOrder --> Assert Irreflexitivity, Transitivity, Totality

• Larger # of Boolean Vars • Easier SAT instances (being pursued now)

Illustration on4 tuples

requireStrictTotalOrder ops

order requireOtherOrder ops

order requireReadValue ops order

A system of constraintswith primitive constraint mij

Forall i : ~mii

Forall i,j : mij \/ mji

Forall i,j,k : mij /\ mjk

=> mik

i . . . .

j . mij . .

. . . . . . . .

Page 23: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

23

Transformation of HOL specs to generate constraints

atomicWBRelease(ops,order) = forall (i in ops).(j in ops).(k in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) /\ (i.wrID = k.wrID) /\ order(i,j) /\ order(j,k) ==> (j.wrID = i.wrID)

atomicWBRelease(ops,order) = forall (i in ops).(j in ops).(k in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) /\ (i.wrID = k.wrID) /\ ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))

atomicWBRelease(ops,order) = forall (i in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) ==> forall (k in ops). (i.wrID = k.wrID) ==> forall (j in ops). ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))

Initial Spec

i k

j

Applying Contrapositive

After Reducing quantifier Scopes

Page 24: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

24

Functional (Ocaml) Program Derivation from HOL Specs:

atomicWBRelease(ops,order) = forall (i in ops). (i.op = StRel) /\ (i.wrType = Remote) /\ (attr_of i.var = WB) ==> forall (k in ops). (i.wrID = k.wrID) ==> forall (j in ops). ~(j.wrID = i.wrID) ==> ~(order(i,j) /\ order(j,k))

atomicWBRelease(ops) = forall(i,ops,wb(i))

wb(i) = if ~((attr_of i.var=WB) & (i.op=StRel) & (i.wrType=Remote) then true else forall(k,ops,wb1(i,k))

wb1(i,k) = if ~(i.wrID=k.wrID) then true else forall(j,ops,wb2(i,k,j))

wb2(i,k,j) = if (j.wrID=i.wrID) then true else ~(order(i,j) & order(j,k)) forall(i,S, e(i)) = for all i in S : e(i) (* foldr( map (fn i -> e(i)) (S) (&), true) *)

Transformed Spec

Functional Program that generates the constraints (will be automated)

Page 25: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

25

Main Result:

Formally hand-derived code worked first timeon all 17 of Intel’s litmus tests!

Previous ad-hoc Prolog code had to be massively debugged

Formal derivation ensures that HOL axioms are preserved in code that generates SAT instances(very error-prone coding otherwise)

Page 26: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

26

Partial Evaluation Approach under “nn” encoding

requireStrictTotalOrder ops

order requireOtherOrder ops

order requireReadValue ops order

•These are unit-clause rich, and hence very easy to re-generate

• If *same* test re-run, variation will only be in ReadValue rule

i . . . .

j . mij . .

. . . . . . . .

• Can pre-generate these for various ‘n’ and save

• We loaded SatZoo with these constraints and checkpointed its runnable image for various ‘n’

• Incremental SAT solvers can really help!

Constraints on mij

Forall i : ~mii

Forall i,j : mij \/ mji

Forall i,j,k : mij /\ mjk

=> mik

Page 27: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

27

Recent Practical Test Run Suggests Bounded Cycle Checking:

requireStrictTotalOrder ops

order requireOtherOrder ops

order requireReadValue ops order

Forall i : ~mii

Forall i,j : mij \/ mji

Forall i,j,k IN BOUNDED

RANGES mij /\ mjk => mik

i . . . .

j . mij . .

. . . . . . . .

• Process OtherOrder First

• Incremental Constraint Gen for ReadValue

Page 28: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

28

Latest results

• Intel-provided test of 124 instructions

• Generated about 246 tuples

• Will not finish unless we rework transitivity

• Generated SAT instance with transitivity suppressed, and it found the violation (fluke)

• 115,637 variables and 164,848 clauses

• Zcore found UNSAT core of 9 clauses !!

• Better methods to handle transitivity to be implemented– Upper triangular matrix alone will do– Lazy Transitivity introduction

• Generate constraints w/o transitivity• If UNSAT, done• Else find SAT instance, force transitivity, re-check...

Page 29: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

29

Gist of results1. n n method is superior despite using more bits

2. Checkpointing method does pay-off upto 64 tuples...

3. Present approach won’t allow more than 512 tuples #clauses = 7 * n^3 + ... where n is the number of tuples #variables = 2 * n^3 + ...

4. Several solutions to be considered:• Generate upper-triangular alone• Generate w/o transitivity ; lazily introduce it• Look for natural limits due to CPU resources• Exploit barriers in code• Heuristically enumerate cycles of increasing sizes

Page 30: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

30

Table of Results (details in paper)SAT-instance generation time for n logn method

Tuples Total Order Other Order

32 0.2 1.6

64 1.2 17.1

128 5.7 179.0

SAT-instance generation time for n n method

Tuples Total Order Other Order

32 0.5 0.1

64 4.3 0.9

128 34.2 9.0

SAT-checking timesTuples n logn nn

32 9.6 0.6 4.3 0.33 0.69 0.05

64 247.17 29.53 37.6 2.73 6.17 0.5

128 abort 1341 abort 164.8 145.6 351.1

Monolith TotalOrd OtherOrd Monolith TotalOrd OtherOrd

Page 31: A Unified Framework for Constraint Based Shared Memory Consistency Analysis a presentation in CP+CV’04 Yue Yang, Ganesh Gopalakrishnan, Gary Lindstrom,

31

Concluding Remarks

• Constraint-based shared memory consistency specification is advantageous in many ways

• Preliminary work was done using Constraint-Prolog

• Recent work being done using SAT

• Need to focus on methods that can combine static constraint solving (in program code) and explicit constraint solving

• Consider Symbolic litmus-test verification as driving problem- Application: MP code optimization (synchronization removal)

• Non-standard interpretation methods might be able to combine static constraint evaluation and explicit constraint evaluation