Type inference in type-based verification
Dimitrios Vytiniotis, Microsoft Research
May 2010
2
Software is hard to get right*
Which tools can help programmers write reliable code?
How to make these tools more practical and effective to use?
Making programming language
types more practical and effective
* Toyota recalls 2010 models due to faulty software in the brakes. Upgrade your Prius!
this
talk
3
Programming language
types
Why invest in types?
complexity
# of bugs
Model-driven
development
Development with proof assistants
Verification condition generation
andconstraint solving
Model checki
ng
Other benefits: 1. Integrated verification and
development2. Early error detection 3. Static checks means fast
runtime code4. Force to think about
documentation5. Modular development6. They scale
A demonstrably simple technology that can eliminate
lots of bugs
this talk
4
A brief (hi)story of type expressivity
Simple
Types1970
Hindley-Milner
ML, Haskell, F#
OutsideIn(X)
GADTs
First-class polymorphis
m
Dependent types
Type families
Type classes
…
2015
The context My work on expressive types
The future
ICFP 2006ICFP 2009
ICFP 2006JFP 2007
ICFP 2008ML 2009
TLDI 2010
inc::Int->Int
map::(a->b)->[a]->[b]
NEW: JFP submission
5
A brief (hi)story of type expressivity
Simple
types1970
Hindley-Milner
ML, Haskell, F#
OutsideIn(X)
GADTs
First-class polymorphis
m
Dependent types
Type families
Type classes
…
2015
My work on expressive types
The future
ICFP 2006ICFP 2009
ICFP 2006JFP 2007
ICFP 2008ML 2009
TLDI 2010
Keeping types
practical
The context
NEW: JFP submission
6
Types express properties
[1,2,3,4] :: { l :: List Int where forall i < length(l), l[i]<=4 }
[1,2,3,4] :: ListWithLength 4 Int
[1,2,3,4] :: List NONEMPTY Int
[1,2,3,4] :: List Int
[1,2,3,4] :: IntList
[1,2,3,4] :: Object
# of bugs
… but keep the complexity low
Our goal:Increase
expressivity …
Hindley-Milner [Hindley, Damas & Milner]Haskell, ML, F#, also Java, C#, …
7
Keeping type annotation cost low How to convince the type checker that
programs are well-typed?
StringBuilder sb = new StringBuilder(256);
var sb = new StringBuilder(256);
Full type inferenceNo user annotations at all
Full type checkingExplicit types everywhere
Hindley-Milner
inc x = x+1
Many traditional languages
Int inc(Int x) = x+1
Increased expressivity
requires more checking
Full type inference extremely convenient [no type-induced pain]
map f list = case list of nil -> nil h:tail -> cons (f h) (map f tail)
map<S,T> (f :: S -> T) (list :: [S]) = case list of nil -> nil<T> h:tail -> cons<T> (f h) (map<S,T> f tail)
8
Keeping types predictable With simple, robust, declarative typing rules
test1 = … p1 + p2 … -- ACCEPTEDtest2 = … p2 + p1 … -- REJECTED
And theorems that connect typing rules to low level algorithms
test1 = p -- ACCEPTED test2 = -- REJECTED let f x = x in f p
t <- infer e s <- infer u α <- freshsolve(t = s -> α)return α
e :: s -> t u :: s e u :: t
Hindley-Milner scores perfect here
9
A brief (hi)story of type expressivity
Simple
Types1970
Hindley-Milner
ML, Haskell, F#
OutsideIn(X)
GADTs
First-class polymorphis
m
Dependent types
Type families
Type classes
…
2015
The context My work on expressive types
The future
ICFP 2006ICFP 2009
ICFP 2006JFP 2007
ICFP 2008ML 2009
TLDI 2010
NEW: JFP submission
Simple, predictable No user annotations Low expressivity
1. What are GADTs2. Why they are difficult for type inference3. Inference vs checking [ICFP 2006]4. Simplifying and reducing annotations
[ICFP 2009] How to implement GADTs
10
GADTs in Glasgow Haskell Compiler (GHC)
-- An Algebraic Datatype: Integer Listsdata IList where Nil :: IList Cons :: Int -> IList -> IList
-- A Generalized Algebraic Datatype (GADT)data IList f where Nil :: IList EMPTY Cons :: Int -> IList f -> IList NONEMPTY
x = Cons 1 (Cons 2 Nil)
head :: IList NONEMPTY -> Int test0 = head x
test0 = head Nil
Type checker knowsx :: IList NONEMPTY
REJECTED!
11
Uses of GADTs Compiler enforces invariants via type checking
tail :: ListWithLength (S n) -> ListWithLength n compile :: Term SOURCE -> Maybe (Term TARGET)
Significant number of research papers [Cheney & Hinze, Xi, Pottier & Simonet, Pottier & Régis-Gianas, Sulzmann & Stuckey,…]
Verified compiler transformations, data structure implementations, reflection & generic programming, …
Such a cool feature that people are using GADT-inspired tricks in other languages! For example, C. Russo and A. Kennedy have a C# encoding
12
Example: evaluation of embedded DSL
data Term where ILit :: Int -> Term And :: Term -> Term -> Term IsZero :: Term -> Term ... eval :: Term -> Valeval (ILit i) = IVal ieval (And t1 t2) = case eval t1 of IVal _ -> error BVal b1 -> case eval t2 of IVal _ -> error BVal b2 -> BVal (b1 && b2)...
f = eval (And (ILit 3) (IsZero 0))
data Term a where ILit :: Int -> Term Int And :: Term Bool -> Term Bool -> Term Bool IsZero :: Term Int -> Term Bool ...
eval :: Term a -> a eval (ILit i) = i eval (And t1 t2) = eval t1 && eval t2...
A common example, also appearing in [Peyton Jones, Vytiniotis, Weirich, Washburn , ICFP 2006]
data Val where IVal :: Int -> Val BVal :: Bool -> Val
Represents only correct termsTagless evaluation: efficient code
A non-GADT representation
A GADT representation
13
Type checking and GADTs
Pattern matching introduces type equalities, available after the = In the first branch we learn that a ~ Int
data Term a where ILit :: Int -> Term Int
eval :: Term a -> a eval (ILit i) = i eval _ = …
i :: Int
Possible with the help of programmer annotations
Right-hand side: we must return type a
That’s fine because we know that (a~Int)from pattern matching
Determines the term we analyze
Determines the result
14
Type inference and GADTs
Here is a possible type of getILit:Term a -> [Int]
But if (a ~ Int) is used then there is also another one
Term a -> [a]
data Term a where ILit :: Int -> Term Int ...
-- Get a list of literals in this termgetILit (ILit i) = [i] getILit _ = []
Haskell programmers omit type signatures
BAD!
15
A threat for modularityTwo different “specifications” for getILitbtrm :: Term Bool
f1 = (getILit btrm) ++ [0]
f2 = (getILit btrm) ++ [True]
test = let getILit (ILit i) = [i] getILit _ = [] in ...
Works only with: Term a -> [Int]
Works only with: Term a -> [a]
And this one?
We want to have a unique principal type that we infer once and use throughout the scope of
the function
16
Separating checking and inference [ICFP 2006]
S. Peyton Jones, D. Vytiniotis, G. Washburn, S. Weirich
Not all programs have principal types, so use annotations to let programmers decide
No annotation: do not use GADT equalities
To use the other type supply an annotation:
Annotations determine two interweaved modes of operation: checking mode and inference mode
getILit (ILit i) = [i] -- inferred: (Term a -> [Int])
getILit :: Term a -> [a] getILit (ILit i) = [i]
17
Discovering a complete implementationPredictability mandates high-level declarative typing rules
That turned out to be possible because:1. Typing rules [and algorithm] can “switch” mode when they meet
annotations 2. The GADT checking problem is easy3. All non-GADT branches are typed as in Hindley-Milner
This is what GHC implements since 2006 Extremely effective and popular: http://darcs.net, commercial users,
…
The first work on type inference and GADTs to achieve this
Theorem: There exists a provably decidable, sound and
complete algorithm for the [ICFP 2006] type system
Needed to design a type system and a sound and complete algorithm
18
[ICFP 2006] was a breakthrough but …To reduce required annotations it used some ad-hoc annotation propagation
How to improve this?
opt :: Term b -> Term b
eval :: Term a -> a eval x = case opt x of ILit i -> i
eval :: Term a -> aeval x = let f x = x in case f (opt x) of ILit i -> i
fails
Because no type annotation for f
Quite remarkable BUT what about predictability?
typechecks
19
The Outside-In solutionShrijvers, Sulzmann, Peyton Jones, Vytiniotis [ICFP 2009]
perform full inference outside a GADT branch first, and then
use what you learnt to go inside the branch
Very aggressive type information discovery
+ a simpler “Outside-In” type system
eval :: Term a -> a eval x = let f x = x in case f (opt x) of ILit i -> i
Working on the outside of the branch first determines that
f (opt x) :: Term a
20
Simplifying and reducing annotations [ICFP 2009]
Fewer annotations needed Predictability
Forthcoming implementation in GHC, invited paper in special issue of JFP “the system of this paper is the simplest proposal ever made to solve
type inference for GADTs” [anonymous reviewer]
Theorem: There exists a provably decidable, sound and complete algorithm
for the “Outside-In” type system in [ICFP 2009]
All type-safe programs
All programs with principal types
Modularity
Theorem:“Outside-In” type system
21
Inferring principal types in [ICFP 2009]data Term a where ILit :: Int -> Term Int If :: Term Bool -> Term a -> Term a -> Term a
-- Get the least number in this term findLeast (ILit i) = ifindLeast (If cond t1 t2) = let x1 = findLeast t1 x2 = findLeast t2 in if (x1 < x2) then x1 else x2
Because of (x1 < x2), findLeast must return Int. There is a principal type [and ICFP 2009 finds it]:
Term a -> Int
Not due to arbitrarily choosing Term a -> Int as previously
REJECTED in [ICFP 2009] No ad-hoc assumptions about programmer intentions
22
The algorithm in [ICFP 2009]
findLeast (ILit i) = i findLeast (If cond t1 t2) = let x1 = findLeast t1 x2 = findLeast t2 in if (x1 < x2) then x1 else x2
GADT branches introduce implication constraints that we must solve(α ~ Int) => (β ~ Int)
Type checker infers partially known type:
findLeast :: Term α -> βImplication constraints may have many solutions
β := Int or β := αwhich result in different types. Constraint abduction [Maher] or (rigid) E-unification [Degtyarev & Voronkov, Veanes, Gallier & Snyder, Gurevich]
Detecting incomparable solutions only possible in special cases. Mostly negative results about complexity or even decidability of the general problem.
NOT VERY ENCOURAGING
23
Restricting implications for Outside-In Step 1: Introduce special constraints that record the
interface of the branch with the outside
Step 2: Solve non-implication constraint (B) first. Easy, no multitude of solutions to pick from:
β := Int Step 3: Substitute solution on implication constraint (A)
[a] (α ~ Int) => (Int ~ Int) Step 4: Solve remaining implications fixing interface
variables
findLeast (ILit i) = i findLeast (If cond t1 t2) = let x1 = findLeast t1 x2 = findLeast t2 in if (x1 < x2) then x1 else x2
Constraint A:[α,β] (α ~ Int) => (β ~ Int)Interface: [α,β]
Constraint B:[α,β] (β ~ Int)Interface: [α,β]
24
A brief (hi)story of type expressivity
Simple
Types1970
Hindley-Milner
ML, Haskell, F#
OutsideIn(X)
GADTs
First-class polymorphis
m
Dependent types
Type families
Type classes
…
2015
The context My work on expressive types
The future
ICFP 2006ICFP 2009
ICFP 2006JFP 2007
ICFP 2008ML 2009
TLDI 2010
NEW: JFP submission
25
The Hindley-Milner type system 25 years later
How all the above affect our “golden standard” of modern type systems?
We had to add user type annotations to HM to get GADTs Yet another reason for this is first-class polymorphism [THESIS
TOPIC] QML: Explicit first-class polymorphism for ML [Russo, Vytiniotis, ML 2009] FPH: First-class polymorphism for Haskell [Vytiniotis, Peyton Jones, Weirich, ICFP 2008] Practical type inference for higher-rank types [Peyton Jones, Vytiniotis, Weirich, Shields, JFP
2007] The canonical reference for Higher-Rank type systems
Boxy Types [Vytiniotis, Peyton Jones, Weirich, ICFP 2006]
… but are we also forced to remove anything?
Reminder: Hindley-Milner does not need any annotations, at all
26
let generalization in Hindley-Milner
For some extensions [e.g. GHCs celebrated type families] we must allow deferring because:
no-deferring hard-to-generalize*
… but is it practical to defer?
main = let group x y = [x,y] in (group 0 1, group False False) group is polymorphic. We can give it the generalized type
group :: forall a. a -> a -> [a]
or defer the check to the call sites [Pottier, Sulzmann, HM(X)]:group :: forall a b. (a ~ b) => a -> b ->
[a]
* trust me
27
No generalization for let-bound definitions
Well-typed if we defer equality to the call site of g:
g :: (a ~ Int) => b -> Int
f :: a -> Term a -> Int f x y = let g b = x + 1 in case y of ILit i -> g ()
a ~ Int ... errk???
If typing rules allow deferring
Then algorithm must not solve any equality [BAD!]
completeness
proof reveals nasty
surprise
28
The proposal [TLDI 2010] D. Vytiniotis, S. Peyton Jones, T. Schrijvers [TLDI 2010]
Abandon generalization of local definitions
The only complete algorithms are not practicalRADICAL: removing a basic ingredient of HM
But not restrictive in practice: 127 lines affected in 95Kloc of Haskell libraries
(0.13%)!
No expressivity loss: Polymorphism can be recovered with
annotations
29
OutsideIn(X) Many recent extensions exhibit those problems:
GADTs [previous slides] Type classes: sort :: forall a. Ord a => [a] -> [a] Type families:
append :: forall n m. (IList n)->(IList m)->(IList (Plus n m))
Units of measure [Kennedy 94], implicit parameters, functional dependencies, impredicative polymorphism …
OutsideIn(X) [TLDI 2010, new JFP submission]
Parameterize “Outside-In” type system and infrastructure [implication constraints] by a constraint theory X and its
solver w/o losing inference
Do th
e Har
d Wor
k
once
30
OutsideIn(X) – new JFP submission
Substantial article that brings the results of a multi-year collaborative research program together Many people involved over the years: Simon Peyton Jones,
Tom Schrijvers (KU Leuven), Martin Sulzmann (Informatik Consulting Systems AG), Manuel Chakravarty (UNSW), Stephanie Weirich (Penn), Geoff Washburn (LogicBlox) , …
Bonus: a new glorious constraint solver to instantiate X, which improves previous work, and for the first time shows how to deal with all of GHCs tricky features
31
A brief (hi)story of type expressivity
Simple types1960
Hindley-Milner
ML, Haskell, F#
OutsideIn(X)
GADTs
First-class polymorphis
m
Dependent types
Type families
Type classes
…
2015
My work on expressive types
The future
ICFP 2006ICFP 2009
ICFP 2006JFP 2007
ICFP 2008ML 2009
TLDI 2010
The context
NEW: JFP submission
32
What we did learn
We now know about:
Local assumptions [ICFP 2006, ICFP 2009, TLDI 2010] Local definitions [TLDI 2010] Generalizing Outside-In with OutsideIn(X)
[TLDI 2010]
Where to from here?
33
2015 (And ideas for collaborations!) … towards practical pluggable type systems +
inference!
import UnitTheory.thy
data Vehicle = Vehicle { weight :: Int[kg] , power :: Int[hp] , ... }
...
UnitTheory.thy A theory of units of measure: [Kennedy, ESOP94] constant kg,hp,sec,m axiom u*1 = u axiom u*v = v*uaxiom …
A solver for UnitTheory constraints
Type checker/inference
OutsideIn(UnitTheory)
DSL Designer / User
DSL User
We (the compiler)
Yes/No
Programs with principal types
Open: How to design syntactic language extensions
Open: How to trust solver [proof checking, certificates?]
Open: How to type more programs with principal types[revisiting rigid E-unification, better constraint solvers, ideas from SMT solving]
Open: How to combine multiple theories and solvers [revisiting Nelson-Oppen]
34
Understanding and writing better software Past:What do GADTs mean? How many functions have type
forall a. [a] -> a -> a forall a. Term a -> a -> a [Vytiniotis & Weirich, MFPS XXIII, Vytiniotis & Weirich, JFP 2010]
Past: PL proofs are tedious and error-prone. Mechanize them in proof assistants. The POPLMark Challenge [TPHOLS 2005] Have been using Isabelle/HOL and Coq in recent works with Claudio
Russo and Andrew Kennedy
Ongoing: Typed intermediate languages that better support type equalities and full-blown dependent types [with S. Weirich, S. Zdancewic, S. Peyton Jones]
Ongoing: Adding probabilities to contracts to combine static analysis and testing or statistical methods [with V. Koutavas, TCD]
On the wish list: Macroscopically programming groups of agents of limited computational power
35
Q-A games for encoding and decodingImagine a binary format such that
every bitstring encodes a non-empty set of type-safe CIL programs
Not easy to program from first principle!
Instead, understand and program encoders using question-answer games
Good coding scheme follows by asking good questions! Recent ICFP 2010 submission with A. Kennedy
qq1
got it!…
q2……
y
y
y
n
n
n
36
Programming language
types
Making good software easier to write
complexity
# bugs
A demonstrably simple technology that can already
eliminate lots of bugs
This talk: solving research problems to make types more
effective and practical:
Catch more bugs Require little user guidance
Remain predictable and modular