9
Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Program Provenance Guessing the Source Compiler from Binary Code Nathan Rosenblum

Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Embed Size (px)

DESCRIPTION

Why should this work? 3 Guessing the Source Compiler

Citation preview

Page 1: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Paradyn Project

Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2004

Paradyn Project

Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2004

Program ProvenanceGuessing the Source Compiler from Binary

Code

Nathan Rosenblum

Page 2: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Why compiler provenance?

2Guessing the Source Compiler

IDA Pro

Page 3: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Why should this work?

3Guessing the Source Compiler

Page 4: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

4Guessing the Source Compiler

test edi,edijle 4004ae <bar+0x16>mov eax,0x0lea eax,[rdx+rax]imul edx,eaxadd eax,0x1cmp edi,eaxjg 4004a1 <bar+0x9>mov eax,edxret

xor edx,edxtest edi,edijle 400989 <bar+0x11>add edx,eaximul eax,edxinc edxcmp edx,edijl 40097e <bar+0x6>ret

int bar(int foo) { int i, j;

for(i=0;i<foo;++i) { i = j + i; j *= i; } return j;}

GCC ICC

Page 5: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Modeling binary code

5Guessing the Source Compiler

program binarygcc gcc gcc gcciccicc

𝑦i𝑦i₋₁ 𝑦 i ₊₁ 𝑦 i ₊₂iccicc icc none……

compiler labels

… c7 04 24 10 70 05 08 ff d0 c9 c3 90 81 ec e4 00 00 00 8b b4 24 ec 00 00 00 … underlying bytes

8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 90

80 4c 9080 4c 9480 4c 9880 4c 9b

match_initzp_init_keysseekable

padding

addrs.

data

Page 6: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Describing code

6Guessing the Source Compiler

⟨mov [IMM], RAX ; * ; sub [IMM], RAX⟩abstracts several IA32 opcodes

single-instruction wildcard

hide immediate values

……

instructio

n-

level

control flow-

levelbranch

011101011010101010101110101001010101110001001001011010110011010101010101010010011110

+⟨mov [IMM], RAX ; * ; sub [IMM], RAX⟩⟨add[IMM], RDX ; * ; sub RAX, RCX⟩⟨push EBP ; mov ESP, EBP⟩⟨shl[IMM], RAX ; shr[IMM], RAX⟩⟨ *; * ; sub [IMM], RAX⟩ [math elided]

Page 7: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Guessing the Source CompilerGuessing the Source Compiler

Results [R, Miller, Zhu PASTE ‘10]

7

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

01110101101010101010111010100101010111000100100101101011001101010101010101001

single compiler

mixed compiler

GCC ICC MSVC

92.5%

93.7%5.3%

2.3%

or2.8%

6.4%

error types

Page 8: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Finer detail: compiler versions, optimization

8Guessing the Source Compiler

Major versions?

Minor versions?

Low optimization vs. high

optimization?Highly optimized

code?

GCC 3.x vs 4.x

GCC 4.2 vs 4.3

GCC -O0 vs -O3

GCC –O2 vs –O3

easy 99%

easy 85-99%

easy 99%

hard 60%

Page 9: Paradyn Project Paradyn / Dyninst Week Madison, Wisconsin April 12-14, 2004 Paradyn Project Paradyn…

Future work

9Guessing the Source Compiler

int bar(int foo) { int i, j;

for(i=0;i<foo;++i) { i = j + i; j *= i; } return j;}

...

01110101101010101010111010100101010111000100100101101011001101010101010101001001111010101110100101101010