Scalable Certification for Scalable Certification for Typed Assembly LanguageTyped Assembly Language
Dan Grossman (with Greg Morrisett)Cornell University
2000 ACM SIGPLAN Workshop on Types in Compilation
AFTER
September 2000TIC00 Montreal
2
Types Types AfterAfter Compilation -- Why? Compilation -- Why?
Verifying object code is “well-behaved”
means we needn’t trust the code producer
• Producer-supplied types guide verification
• Encourages compiler robustness
• Promises efficient untrusted plug-ins
To maximize benefit, we want...
September 2000TIC00 Montreal
3
Certified Code Design GoalsCertified Code Design Goals
• Low-level target languageavoids performance / trusted computed base trade-off
• Source-language & compiler independentavoids hacks, promotes re-use, the object-code way
• Permit efficient object codeotherwise, just interpret or monitor at run time
• Small Certificates and Fast Verificationotherwise, only small programs are possible
Still learning how to balance these needs in practice
September 2000TIC00 Montreal
4
State of the ArtState of the Art
Low-level Compiler-independent
Efficient Code
Efficient Certification
JVML No No Yes? Yes
PCC Yes No Yes Yes
ECC Yes No No Yes
Appel/ Felty
Yes! Yes Yes? ???
TAL Yes Yes Yes (This talk)
September 2000TIC00 Montreal
5
Scalable Certification in 15 minsScalable Certification in 15 mins
• Classification of Approaches
• Why Compiler Independence Makes Scalability Harder
• Techniques that Make TAL Work
• Experimental Results
• Summary of some lessons learned
See the paper for much, much more
September 2000TIC00 Montreal
6
Approach #1 -- Bake It InApproach #1 -- Bake It In
If you allow only one way, no annotations needed and it’s trivial to check
Examples:
• Grouping code into procedures
• Function prologues
• Installing exception handlers
The type system is at a different level of abstraction
An analogy: RISC vs. CISC
September 2000TIC00 Montreal
7
Approach #2 -- Don’t OptimizeApproach #2 -- Don’t Optimize
Optimizations that are expensive to prove safe are expensive to certify
Examples:
• Dynamic type tests
• Arithmetic (division by zero, array-bounds elimination)
• Memory initialized before use
Better code can make a system look worse
A new factor for where to optimize?
September 2000TIC00 Montreal
8
Approach #3 -- ReconstructApproach #3 -- Reconstruct
Don’t write down what the verifier can
easily determineExamples:
• Don’t put types on every instruction/operand
• Omit proof steps where inversion suffices
• Re-verify target code at each “call” site (virtual inlining)
Can trade time for space or get a win/win
Analogy: source-level type inference w/o the human factor
September 2000TIC00 Montreal
9
Approach #4 -- CompressApproach #4 -- Compress
Let gzip and domain-specific tricks
solve our problems
• For annotation size, no reason not to compress
• Easy to pipeline decompression, but certification isnot I/O bound
Then again, object code compresses too
September 2000TIC00 Montreal
10
Approach #5 -- AbbreviateApproach #5 -- Abbreviate
Give the code producer type-level tools for parameterization and re-use
• Just (terminating) functions at the type level
• Usually easy for the code producer
• Improves certificate size, but may hurt certification time
Not much harder than implementing the lambda-calculus
September 2000TIC00 Montreal
11
Approaches SummaryApproaches Summary
• Bake it in
• Don’t optimize
• Reconstruct
• Compress
• Abbreviate
Now let’s get our hands dirty...
September 2000TIC00 Montreal
12
An Example – Code Pre-conditionAn Example – Code Pre-conditionint foo(int x) { return x; }
foo:MOV EAX, [ESP+0]
RETN
Pre-condition describes calling convention:
where are the arguments, results, return address,
exception handler (what’s an exception anyway), ...
September 2000TIC00 Montreal
13
Bake it in...Bake it in...int foo(int x) { return x; }
foo:intintMOV EAX, [ESP+0]
RETN
Pre-condition describes calling convention:
where are the arguments, results, return address,
exception handler (what’s an exception anyway), ...
September 2000TIC00 Montreal
14
Really bake it in...Really bake it in...int foo(int x) { return x; }
foo_Fii:
MOV EAX, [ESP+0]
RETN
Pre-condition describes calling convention:
where are the arguments, results, return address,
exception handler (what’s an exception anyway), ...
September 2000TIC00 Montreal
15
Or spell it all out...Or spell it all out...int foo(int x) { return x; }
foo:a:T,b:T,c:T,r1:S,r2:S,e1:C,e2:C.{ESP: {ESP:int::r1@{EAX:exn,ESP:r2,M:e2}::r2 EAX:int, EBX:a,ESI:b,EDI:c, M:e1+e2, EBP: {EAX:exn,ESP:r2,M:e2}::r2,
}::int::r1@{EAX:exn,ESP:r2,M:e2}::r2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, EBX:a, ESI:b, EDI:c, M:e1+e2}
MOV EAX, [ESP+0]
RETN
Pre-condition describes calling convention: arguments, results, return address pre-condition, callee-save registers, exception handler, ...
September 2000TIC00 Montreal
16
What to do?What to do?
a:T,b:T,c:T,r1:S,r2:S,e1:C,e2:C.
{ESP: {ESP:int:: r1@{EAX:exn,ESP:r2,M:e2}::r2 EAX:int, EBX:a,ESI:b,EDI:c, M:e1+e2, EBP: {EAX:exn,ESP:r2,M:e2}::r2,
}::int:: r1@{EAX:exn,ESP:r2,M:e2}::r2, EBP: {EAX:exn,ESP:r2,M:e2}::r2, EBX:a, ESI:b, EDI:c, M:e1+e2}
• Compress (compiler invariants are very repetitious)
• Don’t optimize (fewer invariants)
• Abbreviate:
foo: F [int] int
F = argsresults
args
args
result
September 2000TIC00 Montreal
17
And Reconstruction TooAnd Reconstruction Too
If we elide a pre-condition, the verifier can
re-verify the block for each predecessor
• Restrict to forward jumps to prevent loops
• Beware exponential blowup
• Bad news: Optimal type placement appears intractable
• Good news: Naive heuristics save significant space
September 2000TIC00 Montreal
18
A real applicationA real application
A bootstrapping compiler from Popcorn to TAL
• Popcorn: • “Java w/o objects, w/ polymorphism and limited pattern-
matching”• “ML w/o closures or modules, w/ C-like core syntax”• “Safe C – pointerful, garbage collection, exceptions”
• Compiler: • Conventional• Graph-coloring register allocation, null-check elimination
• Verifier: OCaml 2.04 • System: Pentium II, 266MHz, 64MB, NT4.0
September 2000TIC00 Montreal
19
Bottom line – it worksBottom line – it works
• Source code: 18KLOC, 39 files
• Target code: 816 Kb (335 Kb after strip)
• Target types: 419 Kb
• Compilation: 40 secs
• Assembly: 20 secs
• Verification: 34.5 secsAnd proportional to file size
September 2000TIC00 Montreal
20
The engineering mattersThe engineering matters
(Recall: 419Kb of types, 34.5 secs to verify)
• Without abbreviations: 2041Kb• Without pre-condition elision: 550Kb• Without either: 4500Kb
• As much elision as legal: 402Kb, 740 secs
•gzip reduces the 419Kb to 163Kb
September 2000TIC00 Montreal
21
Also studied...Also studied...
• Differences among code styles
• Techniques for speeding up the verifier
• Other forms of reconstruction
• Being “gzip-friendly”
September 2000TIC00 Montreal
22
Some engineering lessonsSome engineering lessons
• Compiler-independence produces large repetitious annotations.
• Abbreviations are easy and space-effective, but not time-effective.
• Overhead should never be proportional to the number of loop-free paths in the code.
• Certification bottlenecks often do not appear in small, simple programs.