Type Analysis and Typed Compilation Stephanie Weirich Cornell University

Preview:

Citation preview

Type Analysis andType Analysis andTyped CompilationTyped Compilation

Stephanie Weirich

Cornell University

ICFP '98

OutlineOutline

• Typed Compilation background

• Type Analysis background

• Initial framework - Type Passing

• Problems with Type Passing

• Type Erasure framework

• Closure Conversion comparison

• Related Work

ICFP '98

Traditional CompilationTraditional Compilation

UntypedAssemblyLanguage

UntypedIntermediate

Language

Typed Abstract Syntax

ImplicitlyTyped Source

Language

Compilation is a series of translations between severallanguages

ICFP '98

Typed CompilationTyped Compilation

Assembly Language

TypedIntermediate

Language

Typed Abstract Syntax

Most of those languages are typed, and the types are translated with the terms

ImplicitlyTyped Source

Language

ICFP '98

Typed CompilationTyped Compilation

• Safety– “well-typed programs can’t go wrong”– Types describe invariants maintained by the

compiler

• Performance– Types provide information which may be used by

the compiler for optimization– Tag-free garbage collection– Data layout control

ICFP '98

Type AnalysisType Analysis

• Create functions from types to values

• Code may branch on an unknown type

fun toString x:a => (typecase a of

int => Int.toString | char => Char.toString | * => fn (fst,snd) => “(” ^ (toString fst) ^ “,” ^ (toString snd) ^ “)” ) x

ICFP '98

Data Layout OptimizationData Layout Optimization

• Parametric polymorphism requires uniformity of representation for all arguments, regardless of their types.

ICFP '98

Because any array may be passed to a polymorphic function, all arrays must look the same, no matter the type of their elements.

A:int ArrayB:bool Array

sub = fn (A:a array,i:int) => wordsub(A,i)

Polymorphic SubscriptPolymorphic Subscript

ICFP '98

In languages such as C, the type of an array is always known at its use.

A[2]

B[2]

wordsub(A,2)

wordsub(B,0)&(1<<2) <> 0

int A[4]

bool B[4]

Monomorphic subscriptMonomorphic subscript

ICFP '98ICFP '98

sub = fn (A:a array,i:int) => typecase a of int => wordsub(A,i)

| bool => (wordsub(A,i div 32) & (1<<(i mod 32))) <> 0

A:int Array

B:bool Array

Type Analysis to the RescueType Analysis to the Rescue

Type analysis allows us to determine the type at run time.

ICFP '98

Initial Framework - Initial Framework - iiML ML

• Make type abstraction/application explicit (as in System F)

( a. e) [ int ]

• Add typecase operatortypecase t of

int => ...

| b * g => ...

• Code execution (operational semantics) now relies on the type system

ICFP '98

Type Passing SemanticsType Passing Semantics

(a.(A:a array,i:int). typecase a of int => wordsub(A,i) bool => (wordsub(A,i div 32) & (1<<(i mod 32))) <> 0) [int] (A,3)

sub[int](A,3)

wordsub(A,3)

typecase int of int => wordsub(A,3) bool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0

ICFP '98

TypecheckingTypechecking

To typecheck a typecase term we annotate the term with its return type:

tostringa. x:a. (typecase [ d.d -> string ] a of

int => Int.toString | char => Char.toString | b * g => fn (fst,snd):b*g => “(” ^ (toString[b] fst) ^ “,” ^ (toString[g] snd) ^ “)” ) x

Then we check that each branch satisfies that type with the appropriate type substitution. The return type of the entire term is a substituted for d in the annotated type.

ICFP '98

ProblemsProblems

• Issues about expressiveness– Complicates low-level constructs – Can’t express some optimizations– Can’t express abstraction boundaries

ICFP '98

ComplexityComplexity

• Polymorphic closure conversion– Minamide et al. [1996]– Morrisett et al. [1998]

• Duplication of effort– Optimization of runtime behavior– Explicit modeling of low level computation

• Allocation Semantics• Typed Assembly Languages

ComplexityComplexity

ICFP '98

InefficiencyInefficiency

• Must pass all types even if some are never examined

• TIL -- eliminates unexamined run-time types in ad hoc manner in translation to untyped calculus

InefficiencyInefficiency

ICFP '98

• No way to hold types abstract if they can always be examined

• Clients allowed to break abstraction barriers and infer more information than desired

Host Client

bool ref capability

Loss of AbstractionLoss of Abstraction

ICFP '98

SolutionSolution

Pass terms that represent types

ICFP '98

Type Erasure SemanticsType Erasure Semanticssub [int] Rint (A,3)

ICFP '98

Type Erasure SemanticsType Erasure Semantics

(a.x:R(a). (A:a array,i:int). typecase x of Rint => wordsub(A,i) Rbool => (wordsub(A,i div 32) & (1<<(i mod 32))) <> 0) [int] Rint (A,3)

sub [int] Rint (A,3)

wordsub(A,3)

typecase Rint of Rint => wordsub(A,3) Rbool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0

ICFP '98

Type Erasure SemanticsType Erasure Semantics

(.x:R(). (A: array,i:int). typecase x of Rint => wordsub(A,i) Rbool => (wordsub(A,i div 32) & (1<<(i mod 32))) <> 0) [int] Rint (A,3)

sub [int] Rint (A,3)

wordsub(A,3)

typecase Rint of Rint => wordsub(A,3) Rbool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0

ICFP '98

FormalizationFormalization

• Special representation terms:

– Rint

– R (x,y)

• A term e which represents a type has the special type R().

– R(Rint,Rint): R(int int)

• Instead of a type, the argument to typecase is a term of type R()

• The type system tracks the correspondence between a type and its representation

ICFP '98

TypecheckingTypechecking

To typecheck a typecase term we still annotate the term with its return type:

tostringa. x:a. y:R(a). (typecase [ d.d -> string ] y of

Rint => Int.toString | Rchar => Char.toString | R*(r1,r2) as b*g => fn (fst,snd):b*g => “(” ^ (toString[b] fst r1) ^ “,” ^ (toString[g] snd r2) ^ “)” ) x

We require that the argument to the typecase be of type R(t) for the typecase term to typecheck. We can then substitute a for d in the annotated type as before.

ICFP '98

SolutionSolutionSolutionSolution

• Everything that happens at run-time is described by the terms

• Can go to a type erasure semantics

• Optimization• Traditional code optimizers can optimize

type representations• Sophisticated techniques still possible

• If a representation is not provided a type may not be analyzed

ICFP '98

Typed Closure ConversionTyped Closure Conversion

• At run-time, a first-class function is represented by a code pointer

• But now x is unbound -- so we change the type of l2 to take another argument

• A closure is just a code pointer paired with the values of its free variables -- its environment

f x:int. y:int. x + y

l1: x:int. l2l2: y:int. x + y

l1: x:int. (l2, x)l2: y,x:int. x + y

ICFP '98

(f 3) 4

Typed Closure ConversionTyped Closure Conversion

Application just extracts the environment and applies the function to it

let clos = (f 3) in (#1 clos) (4,#2 clos)

clos : int -> int clos : (int*int -> int) * int

f x:int. y:int. x + y

ICFP '98

Existential TypesExistential Types

• Unfortunately, the type of the closure now depends on its free variables

• Existential types hold the types of free variables abstract

l1: x:int. pack (l2,x) as env.((int*env -> int)*env) hiding int

unpack (env,clos) = (f 3)in (#1 clos)(4,#2 clos)

clos : (int*int -> int) * int

ICFP '98

Polymorphic Closure ConversionPolymorphic Closure Conversion

• In a type passing semantics a function may also have free type variables at run time

• They also must be part of the closure, via translucent types

• The type environment is then abstracted using existential kinds

a. x:a. y:int. ...

l1: a. x:a. (b=a).(l2[b], x)l2: b. y:int,x:b.

...

clos : k. b:T. a:k. g:k = a. (int * b) -> int * b

ICFP '98

Closure ConversionClosure Conversion

• In a type erasure semantics, only the type representation remains at run time.

• At the time of closure creation, the type arguments may be given to the function.

a. x:a, z:R(a). y:int. ...

l1: a. x:a,z:R(a). pack (l2[a], z) as b.(int*b -> int)*b hiding (a*R(a))

l2: b. y:int,(x:b,z:R(b)).

...

ICFP '98

Multi-stage Type AnalysisMulti-stage Type Analysis

TypedAssembly Language

TypedIntermediate

Language

Typed Abstract Syntax

Final Step of typed compilation: compile to a typed assembly language

ImplicitlyTyped Source

Language

ICFP '98

Multi-stage Type AnalysisMulti-stage Type Analysis

• How do we preserve the meaning of typecase when the types themselves change?– in TALx86 both int and float are compiled

into B4, the type of 4-byte values– int int may be compiled into a variety of

types depending on the calling convention

ICFP '98

Related WorkRelated Work

• Harper, Morrisett - POPL 95• Minamide, Morrisett, Harper - POPL 96 • Minamide - 2nd Fuji Intl Workshop on

Functional and Logic Programming 96 • Morrisett, Walker, Crary, Glew - POPL 98 • Crary, Weirich, Morrisett - ICFP 98• Crary, Weirich - ICFP 99

ICFP '98

Low-level Type AnalysisLow-level Type Analysis

• How do we analyze types with quantifiers?– In TALx86 every function (polymorphic or

not) is compiled into a polymorphic code pointer

ICFP '9834

Areas for Future WorkAreas for Future Work

ICFP '98

ICFP '98

OutlineOutline

• Introduction– Typed Compilation– Type Analysis in general

• toString example

– Type Analysis in compilation• bit array example

• Initial framework – syntax -- from examples– semantics

• type passing

– problems– complication of theory– can’t express efficient code– loss of abstraction

• Type erasure semantics• syntax• dynamic semantics• static semantics

• Closure Conversion example• LX Teaser• Related work

ICFP '98

A note about typecheckingA note about typechecking

– In this example wordsub has a strange typea array * int -> int

– It would better if it were of type

int array * int -> int – Then argument to subscript must allways be

an int array. But that forgets its actual type of bool array. So we create a special type, called packed array, with a type-level type analysis operator.

ICFP '9812 ICFP '98

Type Passing SemanticsType Passing Semanticssub[int](A,3)

ICFP '9813

Type Passing SemanticsType Passing Semantics

(.(A: array,i:int). typecase of int => wordsub(A,3) bool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0) [int] (A,3)

sub[int](A,3)

ICFP '9814

Type Passing SemanticsType Passing Semantics

(.(A: array,i:int). typecase of int => wordsub(A,3) bool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0) [int] (A,3)

sub[int](A,3)

typecase int of int => wordsub(A,3) bool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0

ICFP '9830

FormalizationFormalization

• Special representation terms:– Rint

– R (x,y)

ICFP '9831

FormalizationFormalization

• Special representation terms:– Rint

– R (x,y)

• A term e which represents a type has the special type R(). – R(Rint,Rint): R(int int)

ICFP '9824

Type Erasure SemanticsType Erasure Semantics

(.x:R(). (A: array,i:int). typecase x of Rint => wordsub(A,3) Rbool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0) [int] Rint (A,3)

sub [int] Rint (A,3)

ICFP '9825

Type Erasure SemanticsType Erasure Semantics

(.x:R(). (A: array,i:int). typecase x of Rint => wordsub(A,3) Rbool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0) [int] Rint (A,3)

sub [int] Rint (A,3)

typecase Rint of Rint => wordsub(A,3) Rbool => (wordsub(A,3 div 32) & (1<<(3 mod 32))) <> 0

ICFP '98

ICFP '985 ICFP '98

Type based compilationType based compilation

Terms TypesSource

Language

Intermediate Language

Machine Language

ICFP '98

Multi-stage Type AnalysisMulti-stage Type Analysis

ICFP '98

Type Level Type AnalysisType Level Type Analysis

ICFP '98

Type Safe LanguageType Safe Language

• Give us guarentees about the run-time behavior of programs

• Types abstractly describe the run-time flow of values

ICFP '98

Traditional CompilationTraditional Compilation( fn x => x+1 ) 3

( x . x + 1 ) 3

l1: push 3 call l2

retnl2: mov eax,[esp+4] add eax,eax mov [esp+4], eax

retn

( fn x : int => x +1 ) 3

Source File

Type InferenceChecking

Untyped IL

Machine Code

ICFP '98

Type Based CompilationType Based Compilation( fn x => x+1 ) 3

( x : int . x + 1 ) 3

l1: push 3 call l2

retnl2: mov eax,[esp+4] add eax,eax mov [esp+4], eax

retn

( fn x : int => x +1 ) 3

Source File

Type InferenceChecking

Typed IL

Machine Code

ICFP '98

Why Typed CompilationWhy Typed Compilation

• Safety -- assurances about compiler correctness

• Type based optimizations

• For example ...

ICFP '98

But we don’t always know But we don’t always know the types, what then ?the types, what then ?

• For example -- Parametric polymorphism

• Introduce an operator into the language that can distinguish types

• Typecase !

ICFP '98

Need a language with this Need a language with this operator - lmlioperator - lmli

ICFP '98

Second example Second example

• print

ICFP '9816 ICFP '98

Performance and SafetyPerformance and Safety

Terms Types

Source

IL

Machine

ICFP '9817 ICFP '98

SafetySafety

Terms Types

Source

IL

Machine TAL

ICFP '9811 ICFP '98

Type Passing SemanticsType Passing Semantics

• Used by the languageiML , the

Intermediate language of TIL/ML and FLINT compilers

• Unlike most calculi where types may be erased prior to run-time, types do have an operational significance -- they are arguments to typecase terms.

ICFP '9810 ICFP '98

Intensional Type AnalysisIntensional Type Analysis

• Valuable element of type-directed compilers

• Allows otherwise untypeable optimizations– Specialized data layout– Tag-free Garbage Collection– Polymorphic marshalling– ...

ICFP '9829

SolutionSolution

• More efficient– Only pass type representations when

necessary– Traditional code optimizers can help– Sophisticated techniques still possible

• Recovers abstraction– Can withhold representation from clients

Recommended