Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project...

Preview:

DESCRIPTION

Why should this work? 3 Guessing the Source Compiler

Citation preview

Paradyn Project

Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2004

Paradyn Project

Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2004

Program ProvenanceGuessing the Source Compiler from Binary

Code

Nathan Rosenblum

Why compiler provenance?

2Guessing the Source Compiler

IDA Pro

Why should this work?

3Guessing the Source Compiler

4Guessing the Source Compiler

test edi,edijle 4004ae <bar+0x16>mov eax,0x0lea eax,[rdx+rax]imul edx,eaxadd eax,0x1cmp edi,eaxjg 4004a1 <bar+0x9>mov eax,edxret

xor edx,edxtest edi,edijle 400989 <bar+0x11>add edx,eaximul eax,edxinc edxcmp edx,edijl 40097e <bar+0x6>ret

int bar(int foo) { int i, j;

for(i=0;i<foo;++i) { i = j + i; j *= i; } return j;}

GCC ICC

Modeling binary code

5Guessing the Source Compiler

program binarygcc gcc gcc gcciccicc

𝑦i𝑦i₋₁ 𝑦 i ₊₁ 𝑦 i ₊₂iccicc icc none……

compiler labels

… c7 04 24 10 70 05 08 ff d0 c9 c3 90 81 ec e4 00 00 00 8b b4 24 ec 00 00 00 … underlying bytes

8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 90

80 4c 9080 4c 9480 4c 9880 4c 9b

match_initzp_init_keysseekable

padding

addrs.

data

Describing code

6Guessing the Source Compiler

⟨mov [IMM], RAX ; * ; sub [IMM], RAX⟩abstracts several IA32 opcodes

single-instruction wildcard

hide immediate values

……

instructio

n-

level

control flow-

levelbranch

011101011010101010101110101001010101110001001001011010110011010101010101010010011110

+⟨mov [IMM], RAX ; * ; sub [IMM], RAX⟩⟨add[IMM], RDX ; * ; sub RAX, RCX⟩⟨push EBP ; mov ESP, EBP⟩⟨shl[IMM], RAX ; shr[IMM], RAX⟩⟨ *; * ; sub [IMM], RAX⟩ [math elided]

Guessing the Source CompilerGuessing the Source Compiler

Results [R, Miller, Zhu PASTE ‘10]

7

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

single compiler

mixed compiler

GCC ICC MSVC

92.5%

93.7%5.3%

2.3%

or2.8%

6.4%

error types

Finer detail: compiler versions, optimization

8Guessing the Source Compiler

Major versions?

Minor versions?

Low optimization vs. high

optimization?Highly optimized

code?

GCC 3.x vs 4.x

GCC 4.2 vs 4.3

GCC -O0 vs -O3

GCC –O2 vs –O3

easy 99%

easy 85-99%

easy 99%

hard 60%

Future work

9Guessing the Source Compiler

int bar(int foo) { int i, j;

for(i=0;i<foo;++i) { i = j + i; j *= i; } return j;}

...

01110101101010101010111010100101010111000100100101101011001101010101010101001001111010101110100101101010

Recommended