Towards a language design for modular software verification

Towards a language design for modular software verification

Aleks NanevskiMicrosoft Research, Cambridge

Joint with Greg Morrisett (Harvard), Lars Birkedal (ITU Copenhagen), Amal Ahmed (TTI-Chicago)

Workshop on Effects and Type TheoryTallinn, December 13, 2007

How to design a programming language from scratch with verification in mind?

• Simple types have been very successful in preventing a class of programming errors.

• But many errors are outside of their reach. index-out-of-bounds division-by-zero invariants on mutable state, or almost anything involving effects

• Can a language enforce these deeper properties? While supporting usual features from programming practice. Be conservative over simply-typed languages.

Two foundational approaches to program specification and verification

• Hoare Logic starts with an existing language usually imperative, untyped, first-order recent extensions to simply-typed functional languages

[Honda’05],[Krishnaswami’06],[Birkedal’05]

• Dependent type theory targets pure higher-order lambda calculus types may capture deep semantic properties of data

• integer is even, list has 5 elements, etc.

• I want to argue that we essentially want a combination of both.

What limitations of simple types to address?

• Simple types cannot specify effects.

• These operations are naturally partial, but here they must be “completed”: perform run-time check possibly raise exception

• Simple types do not capture this partiality.

How to specify effect behavior?

• Type-and-effect systems: refine the type with the effect annotation.

Semantic disconnect in type-and-effect systems

• Following term would be labeled as throwing DivByZero, in most type-and-effect systems.

• Also, execution of div x n will repeat the check for n>0, even if it doesn’t need to.

• Also, how to specify dynamically generated exns? this immediately requires dependent types

How to reconnect type-and-effects with semantics?

• Idea: draw effect annotations from logic.

• y > 0 is a precondition that must be proved before running div x y. we will also require postconditions, like in Hoare logic and proofs

• Important: Pre/post-conditions become embedded in types.

Why embed specifications into types?

• Captures partiality e.g., no need to define div x y in case y · 0. hence, strictly more expressive than Hoare Logic

• Enables trade-offs between proving and efficiency I.e. we can immediately define:

• Uniform abstraction over terms, types, specs. essential for information hiding and scalability essential for higher-order and local state

Which logic to use for specifications?

• It should be able to support all kinds of programming features: practical data structures (e.g., hash-tables). higher-order functions, polymorphism. pointers, aliasing, state ownership recursion, callcc, IO, concurrency.

• Thus, the logic better be very expressive.

• Type theory (like Coq) seems perfect.

• But need to reconcile it with effects.

Hoare Type Theory (HTT)

• Introduce a type corresponding to specs in Hoare Logic (for partial correctness).

• Hoare type stands for stateful programs with precondition P postcondition Q result type A

• Simply-typed fragment (almost) core Haskell.

Hoare Type Theory (cont’d)

• Fruitful combination of some fundamental PL ideas: Dijkstra’s predicate transformer. Curry-Howard isomorphism. Monads (as in Haskell). Separation Logic of Reynolds, O’Hearn, et al.

• Provably compositional: components can be specified and checked in isolation.

• Prototype under construction as extension of Coq. Execution by code extraction.

Dependent types and effects

Type theories are unsound if effects are added naively

• Propositions like (10 < 0) are types.

• Effectful programs can often be given any type:

divergence via infinite recursion exceptions mutable state IO concurrency

• An effectful program can prove that (10 < 0)! Hence, the system is inconsistent

The

awkward

squad

from

Haskell

A solution: Monads

• Like in Haskell, distinguish purity with types pure fragment – the underlying type theory

• e : nat e is an integer value

• e : ST nat e is delayed effectful computation. when executed, it may change the state and diverge. but since it is delayed, it is actually considered pure. hence, can safely appear in types, predicates, proofs.

• e : ST (10 < 0) a computation which must diverge when executed.

Refine the monad with pre/post-condition to capture effectful behavior and partiality

• Hoare type is a dependent (or indexed) monad.

• Formation rule ST{P}x:A{Q} : Type if P : heap Prop A : Type x:A |- Q : heap heap Prop, where

heap = loc option(a:Type. a), and loc = nat.

• Note: postcondition is binary relation on heaps. Variant of VDM notation.

• whereis true if x points to v:A in h.

• Note: before running inc x, must prove that x stores a nat. because x may store a value of some other type. because x may be a dangling pointer.

Example: specify function that increments location contents and returns old value

Implementation of inc in Haskell-style do-notation.

• HTT implementation typechecks inc as follows: Compute P,Q=weakest pre/strongest post for the do-block Then emit obligation to prove the consequence:

Typing of primitive commands designed to compute weakest pre and strongest post

• Memory read

• (Strong) Memory update

Typing of primitive commands designed to compute weakest pre and strongest post

• Memory allocation

• Memory deallocation

Fixpoints are a little bit different…

• Pre/posts must be given explicitly (for now)

• Corresponds to giving loop invariants in Hoare Logic

• But should be possible to write a rule that infers the strongest invariant! Future work.

Monadic primitives (unit)

• Roughly, corresponds to Hoare Logic rule of variable assignment.

Monadic primitives (bind)

• Rule of sequential composition (but higher-order)

• Note: quantifications over pre/posts and heaps is essential for obtaining tightest specs.

Monadic primitives (Haskell-style do)• Rule of consequence

• Interesting fact: “do” is not ordinary coercion it is an introduction form for Hoare type bind is corresponding elimination

Example: counter

• Allocate a private location x• Export function that increments x

• Executing fcounter; x0f; x1f; x2f will bind 0,1,2 to x0,x1,x2, respectively.

• What is the spec for counter?

• Problem: x is out of scope in return type.

A specification with nested Hoare types

• Introduce invariant into code to hide how count is kept.

• Another problem: fst(f) 0 h states (x0) h, but we lost connection with i

• We will need Separation Logic to handle this.

Hide private state by existential abstraction

Proving program correctness in HTT

Weakest pre and strongest post precisely capture the semantics of a program.

• Problem: these may not be easy to read!

• Remember the example 3-line program:

Here is the computed tightest spec for inc, in Coq syntax.

inc : forall x : loc, ST (fun i : heap => (fun i0 : heap => exists v : nat, ptsto x v i0) i /\ (forall (x0 : nat) (m : heap), (fun (y : nat) (i0 m0 : heap) => m0 = i0 /\ ptsto x y i0) x0 i m -> (fun (xv : nat) (i0 : heap) => (fun i1 : heap => exists B : Type, exists w : B, ptsto x w i1) i0 /\ (forall (x1 : unit) (m0 : heap), (fun (_ : unit) (i1 m1 : heap) => m1 = update x (xv + 1) i1) x1 i0 m0 -> (fun (_ : unit) (_ : heap) => True) x1 m0)) x0 m)) (fun (y : nat) (i m : heap) => exists x0 : nat, exists h : heap, (fun (y0 : nat) (i0 m0 : heap) => m0 = i0 /\ ptsto x y0 i0) x0 i h /\ (fun (xv y0 : nat) (i0 m0 : heap) => exists x1 : unit, exists h0 : heap, (fun (_ : unit) (i1 m1 : heap) => m1 = update x (xv + 1) i1) x1 i0 h0 /\ (fun (_ : unit) (r : nat) (i1 f : heap) => r = xv /\ f = i1) x1 y0 h0 m0) x0 y h m)

Luckily, the spec has a lot of structure!

• It literally represents the program as a predicate.

• We apply the proving strategy from Hoare Logic: symbolically evaluate the program, one step at a time. at each step, discharge the verification condition that enables

the next evaluation step.

• With a twist: Evaluation/VC-generation can be implemented as a set of lemmas. proving the lemmas verifies the VC-gen implementation.

Example lemma for symbolic evaluation (in Coq syntax)

• If program starts with a read from location x: first prove that x is initialized (ptsto x v i) then proceed to prove the spec of the continuation.

• Other lemmas similar (evals_bind_write, evals_bind_new…)• Applicable lemma can be determined by a tactic.

Lemma evals_bind_read : forall (A B : Type) (x : loc) (v : A) (p2 : A -> heap -> Prop) (q2 : A -> B -> heap -> heap -> Prop)

(i : heap) (q : B -> heap -> Prop),

ptsto x v i -> (p2 v i /\ forall y m, q2 v y i m -> q y m) ->

(bind_pre (read_pre A x) (read_post A x) p2 i /\ forall y m, (bind_post (read_pre A x)

(read_post A x) p2 q2 y i m -> q y m.

Separation Logic

Large footprints in Hoare Logic

• Let inc:

• Q: What is known after inc runs in a heap with locations x and y?

• A: Only that xv+1, but all info about y is lost.

• Spec should explicitly say that y is not changed. possible to write in ST, but quite inconvenient

Small footprints and Separation Logic• Specs should only describe what the program

changes [O’Hearn,Reynolds,Pym,…]

• If e : STsep{P}x:A{Q}, then e can run in any heap containing a subheap i such that P i diverges, or returns subheap m such that Q i m part of initial heap outside i is not accessible.

• Easier to use than large footprints, but more difficult meta theory.

Separation logic adds two new things:• Separating conjunction

(easily definable in HTT):

(P * Q) holds of heap h iff P and Q hold of disjoint parts of h

• Frame rule of inference: If then

• Can we add Frame rule to HTT? How to prove that Frame is sound?

Employ a type-theoretic idea to expedite…• Impose that well-typed programs must satisfy Frame!

• Define new monad STsep, over ST:

• Then re-type the stateful commands, using rule of consequence.

Programs remain the same, but specs become much simpler

• Example: allocation

empty subheap is consumed and replaced by rv r must be fresh (as new can’t access existing state)

• Example: deallocation

subheap x- is consumed and replaced by empty.

• Analogy with linear logic.

• Now (fst f) 0 replaces empty from the precondition.• Meaning: initial heap is extended with x0

STsep monad correctly handles private state

Meta-theoretic properties:soundness, compositionality, equations

Verification in HTT reduces to typechecking

• Theorem: If e:ST{P}r:A{Q}, then E evaluates as expected.

• Proved via Preservation and Progress lemmas. but much more demanding!

• Preservation: evaluation preserves types, normal forms, and postconditions. e.g: if e:ST{T}r:int{r = 55} then e does produce 55.

• Progress demands soundness of assertion logic Requires a denotational model for HTT.

Type checking is syntax directed• Program properties independent of context.

No need for whole program reasoning. Proofs by induction on program structure.

• Program is a proof of its spec: in the pure case, by Curry-Howard. in the impure case, by weakest pre/strongest post.

• Formal statements of compositionality In the pure case, substitution principles. In the impure case, Hoare’s rule of composition.

Denotational models• Denotation for e : ST{P}x:A{Q x} is a

predicate transformer: takes p:heapProp such that 8h. p h P h returns q:AheapProp such that

8x h. q x h 9i. p i Æ Q x i h is monotone

• Model suffices for soundness, but too large e.g., does not support storing monads into heaps also, requires showing monotonicity before taking fix.

• Better, realizability model [Petersen,Birkedal’08]. But not implemented in Coq, and seems very hard to!

Implementation, related work, future work, summary

Summary• HTT reflects effect information into types via Hoare-style

pre/post conditions. Generalization of monadic type-and-effect systems, but

effect annotations are logical predicates over heaps.

• Types determine in which context a program may be used (in a context satisfying the precondition). This is a uniquely type-theoretic property, generalizing

ordinary Hoare Logics.

• Combines usefully with higher-order features of a type theory like Coq, to represent modes of use of state, like: freshnes, aliasing, ownership (via Separation Logic) higher-order and shared local state (via existential

abstraction).

Related work• Extended static checking:

ESC/Java, JML, Spec#, SPlint, Cyclone, Sage Hoare-like annotations verified during typechecking. Restrictive strategies for dealing with undecidability

• Dependent types and effects [Augustson’98],[Mandelbaum’03],[Zhu,Xi’05],[Shao’05],

[Sheard’05],[Westbrook’06],[Taha’07],[Condit’07]. Programs and specs cannot share pure code

(phase separation)

• Hoare Logics for higher-order functions: [Schoeder’02],[Honda’05],[Krishnaswami’06],[Birkedal’04] Simply-typed underlying languages (with effects) Hoare triples do not integrate into types.

HTT in comparison to related work.

Spec expressiveness

Programming features

Typed lambda calculus

Java,C#,Haskell,O’Caml

Dependent type theory (Coq,Epigram,NuPRL…)

Hoare specs (ESC,JML,Spec#,Cyclone)

Light dependent types (Cayenne,DML, ATS,Omega)

Fully verified

software

HTT

Future work: gain more experience with implementation in Coq

• A lot of scaffolding for verification is in place symbolic evaluation lemmas tactics for Separation Logic reasoning (were tricky to nail down at

first; several wrong starts)

• Getting ready to attack larger programs. Probably start with libraries for imperative data structures.

• Largest so far: Hash-table module, Stack module, Parsing combinators.

• Experience encouraging: proofs/code ratio quite large but proofs were not difficult

Future work: other effects• First attempts at formulating Haskell-style monad for

transactional concurrency. Separate state into private and shared Reasoning like O’Hearn’s concurrent separation logic Hoare type is a 4-touple STM{I}{P}x:A{Q} I – invariant of shared state

• Other notions of concurrency? Auxiliary variables, history/prophecy variables? Predicate transformers for concurrency?

• IO monad? Specifications must be limited to statements that are

invariant against outside changes to the world.

• Continuation monad? (first attempts made)

Future work: better models and axiomatizations

• Can we encode equality over effectful code as some reasonable judgment?

• Without having to implement involved categorical models.

Hopefully in future not too far, far away…

Spec expressiveness

Programming features

Typed lambda calculus

Java,C#,Haskell,O’Caml

Dependent type theory (Coq,Epigram,NuPRL…)

Hoare specs (ESC,JML,Spec#,Cyclone)

Light dependent types (Cayenne,DML, ATS,Omega)

Fully verified

software

HTT

Documents

Towards a language design for modular software verification