Upload
tanner
View
33
Download
0
Embed Size (px)
DESCRIPTION
Overview of Back-end for CComp. Zhaopeng Li Software Security Lab. June 8, 2009. Outline. Design Points Assembly Language : “x86” Low-level Intermediate Language Future Work. Design Points. Assembly Language Target : SCAP with x86 abstract machine ; - PowerPoint PPT Presentation
Citation preview
Overview of Back-end for CComp
Zhaopeng LiSoftware Security Lab.
June 8, 2009
Outline
• Design Points• Assembly Language : “x86”• Low-level Intermediate Language• Future Work
Design Points
• Assembly Language– Target : SCAP with x86 abstract machine;– Maybe next version the program logic is changed;– Or another machine will be used.
• Low-level Intermediate Language– Hide some machine-specific things;– Note that, this level can be just a helper to
generate code and proof.
Assembly Language : “x86”
Some Topics about “x86”
• Data Representation– 32-bit vs “fake” 32-bit• Don’t care how to store the data as bits.• Integer : 4 bytes• Pointer : 4 bytes
• Data Alignment• Callee-saved Registers– EBX, ESI, EDI, EBP
Some Topics about “x86” (cont.)• Calling convention:
1. Parameters passed on the stack, pushed from right to left; Or the first three are passed through register EAX, ECX and EDX, and the other are passed on the stack;
2. Register EAX, ECX, and EDX are used in the callee; Other registers must be saved on the stack and pop before the return of the function;
3. Return value is stored in the register EAX ;
4. Caller cleans up the stack (parameter).
Some Topics about “x86” (cont.)
Prolog (typical)_function: push ebp ;store the old base pointer mov esp, ebp ;make the base
; pointer point to the current stack; location
sub x, esp ; x is the size, in bytes
Epilog(typical) mov ebp, esp ;reset the stack to
; "clean" away the local variables pop ebp ;restore the original base pointer ret ;return from the function
ebpold ebp
old eip
parameters
esp
local variables
ebp
esp
old ebp
old eip
parameters
local variables
…………
……
old eip
parameters
ebp……
esp
func. entry after Stack frame setup after the return
enter x, 0enter x, 0 leaveretleaveret
Assembly Abstract Machine “m86”
• Code Heap (C)– Code storage, – Unchanged during execution
• Machine State– Memory (M)– Register File (R)– Instruction Pointer (eip), • current instruction c = C(eip)• Or just use instruction sequence (I)
Assembly Language : “x86”• “AT&T-syntax”• Reg. r ::= eax | ebx | ecx | edx | esi | edi | esp | ebp• FReg. fr ::= sf | zf • Int. b ::= n (integer)• Instr. i ::= add r1, r2 | addi n, r | sub r1, r2 | subi n, r | mul r1, r2 | muli n, r | mov r1, r2 | movi n, r | movs r1, n(r2) | movl n(r1), r2
| push r | pop r | cmp r1, r2 | cmpi n, r | je r, b | jne r, b | jg r, b | jge r, b | jmp b | call b | ret | enter n, 0 | leave | malloc r | free r
Program Logic
• Based on SCAP
• Specification (p, g)– p : State -> Prop– g : State -> State -> Prop
• Inference Rules– Well-formed program• Well-formed basic block• Well-formed instruction
Main Objects
• Code Generation– Minimize the proof size
• Eg. the temporary result should be put in register not on the stack
• Assertion– Building (p, g) for each basic block– Generating (p, g) for each program point
• Proof– Generating proof for functions/basic blocks– (reusing the proof of VC in source level)
Assertion Relationship
Basic block1Basic block1
f : {p} //{q}
Basic block1Basic block1
Basic block2Basic block2Basic block2Basic block2
L1 : {p1}
f : {(p’, g)}
L1 : {(p’1,g1)}
Intermediate Language x86 Assembly Lanuage
p’ = trans(p) /\ paramp/\stack-regp g = trans(q) /\ callee-saved-regg /\ stackg
p’ = trans(p) /\ paramp/\stack-regp g = trans(q) /\ callee-saved-regg /\ stackg
p’ 1= trans(p1) /\ paramp 1/\ stack-regp 1 g1 = ?p’ 1= trans(p1) /\ paramp 1/\ stack-regp 1 g1 = ?
Figure Out G
push ebp
mov esp, ebp
sub $12, esp
push ebp
mov esp, ebp
sub $12, esp
Basic block2Basic block2
f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4}
L1 : {g1}
R0(ebp) = R(ebp) /\ R0(esp) = R(esp) -4R0(ebp) = R(ebp) /\ R0(esp) = R(esp) -4
R’(ebp) = R(ebp) /\ R0(ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R0(esp) = R(esp) -4R’(ebp) = R(ebp) /\ R0(ebp) = R(ebp) /\ R’(esp)=R(esp)+4 /\ R0(esp) = R(esp) -4
R’(ebp) = R0(ebp)/\ R’(esp)=R0(esp)+8
LeaveretLeaveret
R’
R
R0
g0
The method:1. Get state relation by rule of operational semantics;2. Use the g of previous program point;3. Do substitution and arithmetic.
The method:1. Get state relation by rule of operational semantics;2. Use the g of previous program point;3. Do substitution and arithmetic.
Figure Out G (cont.)
push ebp
mov esp, ebp
sub $12, esp
push ebp
mov esp, ebp
sub $12, esp
Basic block2Basic block2
f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4}
L1 : {g1}
R’(ebp) = R0(ebp)/\ R’(esp)=R0(esp)+8
R1(ebp) = R0(esp) /\ R1(esp) = R0(esp)R1(ebp) = R0(esp) /\ R1(esp) = R0(esp)
R’(ebp) = R0(ebp) /\ R1(ebp) = R0(esp) /\ R’(esp)=R0(esp)+8 /\ R1(esp) = R0(esp)R’(ebp) = R0(ebp) /\ R1(ebp) = R0(esp) /\ R’(esp)=R0(esp)+8 /\ R1(esp) = R0(esp)
R’(ebp) = M1(R1(ebp))/\ R’(esp)=R1(esp)+8
R0
R1
LeaveretLeaveret
R’
R
g0
g1
The method:1. Get state relation by rule of operational semantics;2. Use the g of previous program point;3. Do substitution and arithmetic.
The method:1. Get state relation by rule of operational semantics;2. Use the g of previous program point;3. Do substitution and arithmetic.
Figure Out G (cont.)
push ebp
mov esp, ebp
sub $12, esp
push ebp
mov esp, ebp
sub $12, esp
Basic block2Basic block2
f : {R’(ebp)=R(ebp)/\R’(esp)=R(esp)+4}
L1 : {g1}
R’(ebp) = R0(ebp)/\ R’(esp)=R0(esp)+8
R’(ebp) = M1(R1(ebp))/\ R’(esp)=R1(esp)+8
R0
R1
LeaveretLeaveret
R’
R
R2(ebp) = R1(ebp) /\ R2(esp) = R1(esp)-12R2(ebp) = R1(ebp) /\ R2(esp) = R1(esp)-12
R’(ebp) = M1(R1(ebp)) /\ R2(ebp) = R1(ebp) /\ R’(esp)=R1(esp)+8 /\ R2(esp) = R1(esp)-12
R’(ebp) = M1(R1(ebp)) /\ R2(ebp) = R1(ebp) /\ R’(esp)=R1(esp)+8 /\ R2(esp) = R1(esp)-12
R’(ebp) = M2(R2(ebp))/\ R’(esp)=R1(esp)+20
R2
g0
g1
g2
The method:1. Get state relation by rule of operational semantics;2. Use the g of previous program point;3. Do substitution and arithmetic.
The method:1. Get state relation by rule of operational semantics;2. Use the g of previous program point;3. Do substitution and arithmetic.
Low-level Intermediate Language
Potential Benefits
• Hide some machine-specific things;• Some optimizations could be done (optional);• Make the implementation simple and
reusable– (*Note that, this level is just a helper to generate
code and proof.*)– Only add codes for translating from this level
when targeting different assembly logic
The Language• Loc. l ::= r | s• Int. o,b ::= n (integer)• Slot. s ::= local(o) | incoming(o) | outgoing(o)• Reg. r ::= r1 | r2 | r3 | … //infinite pseudo-registers• Instr. i ::= bop(bop, l1,l2, l) | uop(uop, l1, l) | load(r, o, l) | store(l, r, o) | getstack(s, r) | setstack(r, s) | call(id, l) | return r | malloc(r) | free(r) | goto b | label (b) | cond(l1, cmp,l2, btrue)• BinOp. bop::= add | sub | mul | …• UnOp. Uop::= minus | …• Comp. cmp::= gt | ge | eq | ne | lt | le
Code Generation (optional)
• Do some optimizations which do no affect proof, such as:– Branch tunneling– Dead code elimination
• Future optimizations– Other low-level optimizations may be done here