Type Inference David Walker COS 320. Criticisms of Typed Languages Types overly constrain functions & data polymorphism makes typed constructs useful

Type Inference

David Walker

COS 320

Criticisms of Typed Languages

Types overly constrain functions & datapolymorphism makes typed constructs useful in

more contextsuniversal polymorphism => code reusemodules & abstract types => code reusesubtyping => code reuse

Types clutter programs and slow down programmer productivity type inference

uninformative annotations may be omitted

Type Inference

Type Inference overview generation of type constraints from unannotated

simply-typed programs Fun has subtyping and still needed some annotations There is an algorithm for complete type inference for

MinML without subtyping (and of course for full ML, including parametric polymorphism)

solving type constraints

Type Schemes

A type scheme contains type variables (a, b, c) that may be filled in during type inferences ::= a | int | bool | s1 -> s2

A term scheme is a term that contains type schemes rather than proper typese ::= ... | fun f (x:s1) : s2 = e

Example

fun map (f, l) =

if null (l) then

nil

else

cons (f (hd l), map (f, tl l)))

Step 1: Add Type Schemes

fun map (f : a, l : b) : c =

if null (l) then

nil

else


type schemeson functions

Step 2: Generate Constraints

fun map (f : a, l : b) : c =

if null (l) then

nil

else


for eachexpression,perform type inference onsub expressions,assigning themtype schemesand generatingconstraintsthat mustbe solved


fun map (f : a, l : b) : c =

if null (l) then

nil

else


begin with theif expression &recursivelyperform type inference on itssubexpressions


fun map (f : a, l : b) : c =

if null (l) then

nil

else


b = b’ list since argument to null must be a list


fun map (f : a, l : b) : c =

if null (l) then

nil

else


: d list since nil is some kind of list

constraintsb = b’ list


fun map (f : a, l : b) : c =

if null (l) then

nil

else


: d list

constraintsb = b’ list

b = b’’ list b = b’’’ list

since hd and tl are functions that take list arguments


fun map (f : a, l : b) : c =

if null (l) then

nil

else

cons (f (hd l : b’’), map (f, tl l : b’’’ list)))

: d list

constraintsb = b’ listb = b’’ listb = b’’’ list

b = b’’’ lista = a


fun map (f : a, l : b) : c =

if null (l) then

nil

else

cons (f (hd l : b’’) : a’, map (f, tl l) : c))

: d list

constraintsb = b’ listb = b’’ listb = b’’’ lista = ab = b’’’ list

a = b’’ -> a’


fun map (f : a, l : b) : c =

if null (l) then

nil

else cons (f (hd l : b’’) : a’, map (f, tl l) : c)) : c’ list

: d list

constraintsb = b’ listb = b’’ listb = b’’’ lista = ab = b’’’ lista = b’’ -> a’

c = c’ lista’ = c’


fun map (f : a, l : b) : c =

if null (l) then

nil


: d list


c = c’ lista’ = c’d list = c’ list


fun map (f : a, l : b) : c =

if null (l) then

nil


: d list

: d list


c = c’ lista’ = c’d list = c’ listd list = c


fun map (f : a, l : b) : c =

if null (l) then

nil

else


finalconstraintsb = b’ listb = b’’ listb = b’’’ lista = ab = b’’’ lista = b’’ -> a’c = c’ lista’ = c’d list = c’ listd list = c

Step 3: Solve Constraints

finalconstraintsb = b’ listb = b’’ listb = b’’’ lista = a...

Constraint solution provides all possible solutions to type scheme annotations on terms

solutiona = b’ -> c’b = b’ listc = c’ list

map (f : a -> b x : a list): b list = ...

Step 4: Generate types

Generate types from type schemesOption 1: pick an instance of the most

general type when we have completed type inference on the entire program

map : (int -> int) -> int list -> int listOption 2: generate polymorphic types for

program parts and continue (polymorphic) type inference

map : ForAll (a,b,c) (a -> b) -> a list -> b list

Type Inference Details

Type constraints q are sets of equations between type schemesq ::= {s11 = s12, ..., sn1 = sn2}

eg: {b = b’ list, a = b -> c}

Constraint Generation

Syntax-directed constraint generationour algorithm crawls over abstract syntax of

untyped expressions and generatesa term schemea set of constraints

Algorithm defined as set of inference rules programming notation: tc(G, u) = (e, t, q)math notation: G |-- u => e : t, q

inputs outputs


G |-- x ==> x :

G |-- 3 ==> 3 :

G |-- true ==> true :

G |-- false ==> false :


G |-- x ==> x : s, { } (if G(x) = s)

G |-- 3 ==> 3 :




G |-- x ==> x : s, { } (if G(x) = s)

G |-- 3 ==> 3 : int, { }




G |-- x ==> x : s, { } (if G(x) = s)

G |-- 3 ==> 3 : int, { }

G |-- true ==> true : bool, { }

G |-- false ==> false : bool, { }

Operators

------------------------------------------------------------------------G |-- u1 + u2 ==>

Operators

G |-- u1 ==> G |-- u2 ==> ------------------------------------------------------------------------G |-- u1 + u2 ==>

Operators

G |-- u1 ==> e1 G |-- u2 ==> e2 ------------------------------------------------------------------------G |-- u1 + u2 ==>

Operators

G |-- u1 ==> e1 G |-- u2 ==> e2 ------------------------------------------------------------------------G |-- u1 + u2 ==> e1 + e2

Operators

G |-- u1 ==> e1 G |-- u2 ==> e2 ------------------------------------------------------------------------G |-- u1 + u2 ==> e1 + e2 : int

Operators

G |-- u1 ==> e1 : t1, q1 G |-- u2 ==> e2 : ------------------------------------------------------------------------G |-- u1 + u2 ==> e1 + e2 : int, q1 U {t1 = int}

Operators

G |-- u1 ==> e1 : t1, q1 G |-- u2 ==> e2 : t2, q2------------------------------------------------------------------------G |-- u1 + u2 ==> e1 + e2 : int, q1 U q2 U {t1 = int, t2 = int}

Operators

------------------------------------------------------------------------G |-- u1 < u2 ==>

Operators

G |-- u1 ==> e1 : t1, q1 G |-- u2 ==> e2 : t2, q2------------------------------------------------------------------------G |-- u1 < u2 ==> e1 < e2 : bool,

Operators

G |-- u1 ==> e1 : t1, q1 G |-- u2 ==> e2 : t2, q2------------------------------------------------------------------------G |-- u1 < u2 ==> e1 < e2 : bool,

Operators

G |-- u1 ==> e1 : t1, q1 G |-- u2 ==> e2 : t2, q2------------------------------------------------------------------------G |-- u1 < u2 ==> e1 + e2 : bool, q1 U q2 U {t1 = int, t2 = int}

If statements

----------------------------------------------------------------G |-- if u1 then u2 else u3 ==>

If statements

G |-- u1 ==> e1 : t1, q1G |-- u2 ==> e2 : t2, q2G |-- u3 ==> e3 : t3, q3----------------------------------------------------------------G |-- if u1 then u2 else u3 ==> if e1 then e2 else e3

:

If statements

G |-- u1 ==> e1 : t1, q1G |-- u2 ==> e2 : t2, q2G |-- u3 ==> e3 : t3, q3----------------------------------------------------------------G |-- if u1 then u2 else u3 ==> if e1 then e2 else e3

: a, q1 U q2 U q3 U {t1 = bool, a = t2, a = t3}

Function Application

G |-- u1 ==> e1 : t1, q1G |-- u2 ==> e2 : t2, q2----------------------------------------------------------------G |-- u1 u2==> e1 e2

: a, q1 U q2 U {t1 = t2 -> a}

Function Application

G |-- u1 ==> e1 : t1, q1G |-- u2 ==> e2 : t2, q2----------------------------------------------------------------G |-- u1 u2==> e1 e2

: a,

Function Declaration

G, f : a -> b, x : a |-- u ==> e : t, q----------------------------------------------------------------G |-- fun f(x) is u end ==> fun f (x : a) : b is e end

: a -> b, q U {t = b}

Function Declaration

G, f : a -> b, x : a |-- u ==> e : t, q----------------------------------------------------------------G |-- fun f(x) = u ==> fun f (x : a) : b = e

:

Solving Constraints

A solution to a system of type constraints is a substitution Sa function from type variables to type

schemes, eg:S(a) = aS(b) = intS(c) = a -> a

Substitutions

Applying a substitution S:substitute(S,int) = int

substitute(S,s1 -> s2) = substitute(s1) -> substitute(s2)

substitute(S,a) = s (if S(a) = s)

Due to laziness, I will write S(s) instead of

substitute(S, s)

Unification

Unification: An algorithm that provides the principal solution to a set of constraints (if one exists)Unification systematically simplifies a set of

constraints, yielding a substitutionStarting state of unification process: (Id,q)Final state of unification process: (S, { })

Unification Machine

We can specify unification as a rewriting system:Given (S, q), systematically pick a

constraint from q and simplify it, possibly adding to the substitution

Use inductive definitions to specify rewriting.

Judgment form: (S,q) -> (S’,q’)

Unification Machine

Base types & simple variables:

--------------------------------(S,{int=int} U q) -> ???

Unification Machine


--------------------------------(S,{int=int} U q) -> (S, q)

Unification Machine


--------------------------------(S,{int=int} U q) -> (S, q)

------------------------------------(S,{bool=bool} U q) -> ???

-----------------------------(S,{a=a} U q) -> ???

Unification Machine


--------------------------------(S,{int=int} U q) -> (S, q)

------------------------------------(S,{bool=bool} U q) -> (S, q)

-----------------------------(S,{a=a} U q) -> (S, q)

Unification Machine

Variable definitions

--------------------------------------------- (a not in s)(S,{a=s} U q) -> (S[a=s], [s/a]q)

-------------------------------------------- (a not in s)(S,{s=a} U q) -> (S[a=s], [s/a]q)

Unification Machine

Functions:----------------------------------------------(S,{s11 -> s12= s21 -> s22} U q) -> ???

Unification Machine

Functions:----------------------------------------------(S,{s11 -> s12= s21 -> s22} U q) -> (S, {s11 = s21, s12 = s22} U q)

Occurs Check

What is the solution to {a = a -> a}?

Occurs Check

What is the solution to {a = a -> a}?There is none! The occurs check detects this situation

-------------------------------------------- (a not in s)(S,{s=a} U q) -> (S[a=s], [s/a]q)

occurs check

Unification Machine Gets Stuck

Recall: final states have the form (S, { })

Stuck states (S,q) are such that every equation in q has the form: int = bools1 -> s2 = s (s not function type)a = s (s contains a)or is symmetric to one of the above

Stuck states arise when constraints are unsolvable

Termination

We want unification to terminate (to give us a type reconstruction algorithm)

In other words, we want to show that there is no infinite sequence of states (S1,q1) -> (S2,q2) -> ...

Termination

We associate an ordering with constraintsq < q’ if and only if

q contains fewer variables than q’q contains the same number of variables as q’ but

fewer type constructors (ie: fewer occurrences of int, bool, or “->”)

This is a lexicographic orderingwe can prove (by contradiction) that there is no

infinite decreasing sequence of constraints

TerminationLemma: Every step reduces the size of qProof: Examine each case of the definition of

the reduction relation & show that our metric goes down.

--------------------------------(S,{int=int} U q) -> (S, q)

------------------------------------(S,{bool=bool} U q) -> (S, q)

-----------------------------(S,{a=a} U q) -> (S, q)

----------------------------------------------(S,{s11 -> s12= s21 -> s22} U q) -> (S, {s11 = s21, s12 = s22} U q)

------------------------ (a not in FV(s))(S,{a=s} U q) -> (S[a=s], [s/a]q)

Properties of Solutions

given any set of constraints q, our unification machine reduces (Id, q) to (S, { }) such that S(q) is a set of true equations (eg, int = int, bool ->

a = bool -> a, etc) S is a principal solution to the constraints.

it generates more general types than any other substitution T:

ie: a -> a is more general than int -> int ie: T replaces more type variables with more specific

types

Summary: Type Inference

Type inference algorithm.Given a context G, and untyped term u:

Find e, t, q such that G |- u ==> e : t, qFind principal solution S of q via unification

if no solution exists, there is no way to type check the term

Apply S to e, ie our solution is S(e) S(e) contains schematic type variables a,b,c, etc that may

be instantiated with any type

Since S is principal, S(e) characterizes all possible typings of e

Compare with Subtyping Algorithm

Simple type inference algorithm. Given a context G (mapping variables to type schemes), and

untyped term u: Find e, t, q such that G |- u ==> e : t, q Type equations collected on the side

Subtyping algorithm given a context G (mapping variables to full types)

Find t such that G |-- e : t Subtyping equations solved when they are generated

Lots of current research on type inference that combines subtyping and polymorphic type reconstruction for advanced type systems

Documents

Type Inference David Walker COS 320. Criticisms of Typed Languages Types overly constrain functions & data polymorphism makes typed constructs useful