Formal Methods for Minimizing the DHOSA Trusted Computing Base Greg Morrisett, Harvard University with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassoratti,

Formal Methods for Minimizing the DHOSA Trusted Computing Base

Greg Morrisett, Harvard University

with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassoratti, & J.B.Tristan

1

TRANSFORMATION

HARDWARE SYSTEM ARCHITECTURES

SVA

Binary translation and

emulation

Formal methods

Hardware support for isolation

Dealing with malicious hardware

Cryptographic secure

computation

Data-centric security

Secure browser appliance

Secure servers

WEB-BASED ARCHITECTURES

e.g., Enforce properties on a malicious OS

e.g., Prevent dataexfiltration

e.g., Enable complex distributed systems, with resilience to hostile OS’s

3

DHOSA TechnologiesWe are investigating a variety of techniques to

defend hosts: Binary Translation & Instrumentation LLVM & Secure Virtual Architecture New Hardware architectures

How can we minimize the need to trust these components?

4

The role of formal methods Ideally, we should have proofs that the tools

are “correct”. The consumer should be able to independently

validate the proofs against the working system.

This raises three hard problems: We need formal models of system components. We need formal statements of “correctness”. We need proofs that our

enforcement/rewriting/analysis code (or hardware) are correct.

5

Some of our activities Tools for formal modeling of machine

architectures Domain-specific languages embedded into Coq. Give us declarative specs of machine-level syntax

& semantics. Give us executable specifications for model

validation. Give us the ability to formally reason about

machine code. Tools for proving correctness of binary-

validation Specifically, that a binary will respect an isolation

policy. e.g., SFI, CFI, XFI, NaCL, TAL, etc.

Tools for proving correctness of compilers. New techniques for scalable proofs of correctness. New techniques for legacy compilers.

6

Modeling Machine Architectures Real machines (e.g., Intel’s IA64) are messy.

Even decoding instructions is hard to get right. The semantics are not explained well (and not

always understood.) There are actually many different versions.

Yet to prove that a compiler or analysis or rewriting tool is correct, we need to be able to reason about real machine architectures.

And of course, we don’t just want Intel IA64. Need IA32, AMD, ARM, … And of course the specialized hardware that DHOSA is

considering!

7

Currently Various groups are building models of

machines. ACL2 group doing FP verification Cambridge group studying relaxed memory

models NICTA group doing L4 verification Inria group doing compiler verification

However, none of them really supports everything we need:1. declarative formulation – crucial for formal

reasoning2. efficiently executable – crucial for testing and

validation3. completeness – crucial for systems-level work4. reuse in reasoning – crucial for modeling many

architectures

8

Our Approach Two domain-specific languages (DSLs)

One for binary de-coding (parsing): bits -> ASTs One for semantics: ASTs -> behavior

The DSLs are inspired by N. Ramsey’s work. Sled andλ-RTL. Ramsey’s work intended for generating compiler back-

ends. Our focus is on reasoning about compiler-like tools.

The DSLs are embedded into Coq. lets us reason formally (in Coq) about parsing, semantics.

e.g., is decoding deterministic? e.g., will this binary, when executed in this state, respect SFI?

the encoding lets us extract efficient ML code (i.e., a simulator)

9

Decoding??

10

Yacc in Coq via CombinatorsDefinition CALL_p : parser instr :=

"1110" $ "1000" $ word @ (fun w => CALL (Imm_op w) None)

||

"1111" $ "1111" $ ext_op_modrm (str ”010” || str ”011”) @

(fun op => CALL op None)

||

"1001" $ "1010" $ halfword $$ word @

(fun p =>

CALL (Imm_op (snd p)) (Some (fst p))).

11

X86 Integer Instruction DecoderDefinition instr_parser :=

AAA_p || AAD_p || AAM_p || AAS_p || ADC_p || ADD_p || AND_p || CMP_p || OR_p ||

SBB_p || SUB_p || XOR_p || ARPL_p || BOUND_p || BSF_p || BSR_p || BSWAP_p || BT_p ||

BTC_p || BTR_p || BTS_p || CALL_p || CBW_p || CDQ_p || CLC_p || CLD_p || CLI_p ||

CMC_p || CMPS_p || CMPXCHG_p || CPUID_p || CWD_p || CWDE_p || DAA_p || DAS_p ||

DEC_p || DIV_p || HLT_p || IDIV_p || IMUL_p || IN_p || INC_p || INS_p || INTn_p ||

INT_p || INTO_p || INVD_p || INVLPG_p || IRET_p || Jcc_p || JCXZ_p || JMP_p ||

LAHF_p || LAR_p || LDS_p || LEA_p || LEAVE_p || LES_p || LFS_p || LGDT_p || LGS_p ||

LIDT_p || LLDT_p || LMSW_p || LOCK_p || LODS_p || LOOP_p || LOOPZ_p || LOOPNZ_p ||

LSL_p || LSS_p || LTR_p || MOV_p || MOVCR_p || MOVDR_p || MOVSR_p || MOVBE_p ||

MOVS_p || MOVSX_p || MOVZX_p || MUL_p || NEG_p || NOP_p || NOT_p || OUT_p ||

OUTS_p || POP_p || POPSR_p || POPA_p || POPF_p || PUSH_p || PUSHSR_p || PUSHA_p ||

PUSHF_p || RCL_p || RCR_p || RDMSR_p || RDPMC_p || RDTSC_p || RDTSCP_p || REPINS_p ||

REPLODS_p || REPMOVS_p || REPOUTS_p || REPSTOS_p || REPECMPS_p || REPESCAS_p ||

REPNECMPS_p || REPNESCAS_p || RET_p || ROL_p || ROR_p || RSM_p || SAHF_p || SAR_p ||

SCAS_p || SETcc_p || SGDT_p || SHL_p || SHLD_p || SHR_p || SHRD_p || SIDT_p ||

SLDT_p || SMSW_p || STC_p || STD_p || STI_p || STOS_p || STR_p || TEST_p || UD2_p ||

VERR_p || VERW_p || WAIT_p || WBINVD_p || WRMSR_p || XADD_p || XCHG_p || XLAT_p.

12

Parsing Semantics The declarative syntax helps get things right.

we can literally scrape manuals to get decoders. though it’s far from sufficient – manuals have bugs!

It’s possible to give a simple functional interpretation of the parsing combinators (a la Haskell).

parser T := string -> FinSet(string * T) allows us to extract executable code for testing.

Makes it very easy to reason about parsers and prove things like || is associative and commutative. or e.g., that Intel’s manuals are deterministic (they are

not).

13

SemanticsThe usual style for machines is a small-step,

operational semantics.

M(R1(pc)) = a parse(M,a) = i (M,R1,i) (M’,R1’)

(M,R1 || R2 || … || Rn) (M’,R1’ || R2 || … || Rn)

This makes it easy to specify non-determinism and reason about the fine-grained behavior of the machine.

But doesn’t really give us an efficient executable.Nor reusable reasoning.

14

Our approachWrite a monadic denotational semantics for instructions:

Definition step_AND(op1 op2:operand) :=

w1 <- get_op32 op1 ;

w2 <- get_op32 op2 ;

let res := Word32.Int.and w1 w2 in

set_op32 op1 res ;;

set_flag OF false ;;

set_flag CF false ;;

set_flag ZF (is_zero32 res) ;;

set_flag SF (is_signed32 res) ;;

set_flag PF (parity res) ;;

b <- next_oracle_bit ;

set_flag AF b

15

Reasoning versus Validation The monadic operations can be interpreted as pure

functions over oracles and machine states. The monadic operations are essentially RTLs over bit-

vectors. The infrastructure can be re-used across a wide variety of

machine architectures. i.e., defining and reasoning about machine architecture

semantics becomes relatively easy. But we can extract efficient ML code for testing the

model against other simulators & real machines. e.g., in-place updates for state changes instead of

functional data structures. in particular, we can leverage the work that Stephen

talked about to do better validation.

16

Example Application: Google’s NaCl NaCl uses software-fault isolation (SFI) to

enforce an isolation policy. good baseline for us to study mask the high-bits of every store/jump to ensure a

piece of untrusted code stays in its sandbox. tricky: must consider every parse of the x86 code. by enforcing an alignment convention, ensures

there’s only one parse (McCamant). security depends on the “checker” which verifies

these properties. Our goal: build and prove correctness of the

checker.

17

Our Verified Checker We generated a checker that is:

declarative easy to update

provably correct w.r.t. our x86 model except that it contains ~80 lines of trusted C code

smaller and faster than Google’s checker Google’s checker about 600 lines of trusted C code about 3x faster on a 200Kloc C program

Basic idea: generate a DFA that accepts only correctly rewritten

programs. the DFA is encoded as a set of tables, which are proven

correct. only the DFA driver is trusted.

18

Thus far… Focus: Formal methods for modeling real

machines. DSLs for instruction decoding, instruction

semantics. Yield both formal reasoning & efficient execution. Allows us to prove correctness of binary-level tools

like the SFI checker. Another Focus: compiler correctness

Crucial for eliminating language-based techniques from TCB.

For example, the Illinois group’s secure virtual architecture depends upon the correctness of the LLVM compiler.

19

To Date Gold standard was Leroy’s Compcert Compiler

(mildly) optimizing compiler for C to x86, ARM, PPC

models of these languages & architectures proof of correctness See J.Regher’s compiler bug paper at PLDI.

However: machine models are incomplete, unvalidated optimization at O1 levels but not O3 proofs are roughly 17x the size of the code!

20

Earlier WorkPost-Doc (now MIT faculty member) Adam

Chlipala’s work on lambda-tamer: compiler from core-ML to MIPS-like machine

transformations like CPS and closure-conversion breakthrough: |proofs| ≈ |code|

clever language representations avoid tedious proofs about variables, scope, binding.

clever language semantics makes reasoning simpler, more uniform.

clever tactic-based reasoning makes proofs mostly automatic, and far more extensible.

21

Current Work: We have built a version of LLVM where the optimizer is

provably correct (see PLDI’11 paper). to be fair, only intra-procedural optimizations but includes global value numbering, sparse conditional constant

propagation, advanced dead code elimination, loop invariant code motion, loop deletion, loop unrolling, and dead-store elimination.

The “proof” is completely automated. in essence, we have a way to effectively prove that the input to

the optimizer has the same behavior as the output. or more properly, when we can’t, we don’t optimize the code.

The prover knows nothing about the internals of the LLVM optimizer. so it’s easy to change LLVM, or add new optimizations.

22

LLVM Translation Validation

LLVM front-ends

LLVM Optimizer

code generato

r

equivalence checker

23

How do we do this? Convert LLVM’s SSA-based intermediate language

into a categorical value graph representation. similar to circuit representations (think BDDs). but incorporates loops by lifting everything to the level of

streams of values. allows us to reason equationally about both data and

control. Take advantage of category theory to normalize the

input and output graphs, and check for equivalence. this gives us many equivalences for free, such as common

sub-expressions and loop-invariant computations. but still need to normalize underlying scalar computations.

The key challenge is getting this to scale to big functions.

24

% of Functions Validated on all Opts.

bzip2

gcc

h264

ref

hmm

erlb

m

libqu

a...

mcf

milc

perlb

ench

sjeng

sphinx

sqlit

e3to

tal

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

FailAlarmOKBoring

Fail: we fail to translate LLVM’s IR into our representationAlarm: we fail to validate the translationOK: we validate the translation and there are significant differencesBoring: we validate but the differences are minimal

25

Quick Recap DHOSA relies upon compilers, rewriting,

analysis, and other software tools to provide protection.

Our goal is to increase assurance in these tools. provide detailed formal models of machines prove correctness of key components find techniques for automating proofs

The hope is that these investments will pay off, not just for this project but others. e.g., IARPA Stonesoup, DARPA CRASH

Documents

Formal Methods for Minimizing the DHOSA Trusted Computing Base Greg Morrisett, Harvard University with A.Chlipala, P.Govereau, G.Malecha, G.Tan, J.Tassoratti,