Upload
arron-butler
View
219
Download
0
Embed Size (px)
DESCRIPTION
Why should this work? 3 Guessing the Source Compiler
Citation preview
Paradyn Project
Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2004
Paradyn Project
Paradyn / Dyninst WeekMadison, WisconsinApril 12-14, 2004
Program ProvenanceGuessing the Source Compiler from Binary
Code
Nathan Rosenblum
Why compiler provenance?
2Guessing the Source Compiler
IDA Pro
Why should this work?
3Guessing the Source Compiler
4Guessing the Source Compiler
test edi,edijle 4004ae <bar+0x16>mov eax,0x0lea eax,[rdx+rax]imul edx,eaxadd eax,0x1cmp edi,eaxjg 4004a1 <bar+0x9>mov eax,edxret
xor edx,edxtest edi,edijle 400989 <bar+0x11>add edx,eaximul eax,edxinc edxcmp edx,edijl 40097e <bar+0x6>ret
int bar(int foo) { int i, j;
for(i=0;i<foo;++i) { i = j + i; j *= i; } return j;}
GCC ICC
Modeling binary code
5Guessing the Source Compiler
program binarygcc gcc gcc gcciccicc
𝑦i𝑦i₋₁ 𝑦 i ₊₁ 𝑦 i ₊₂iccicc icc none……
compiler labels
… c7 04 24 10 70 05 08 ff d0 c9 c3 90 81 ec e4 00 00 00 8b b4 24 ec 00 00 00 … underlying bytes
8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 90
80 4c 9080 4c 9480 4c 9880 4c 9b
match_initzp_init_keysseekable
padding
addrs.
data
Describing code
6Guessing the Source Compiler
⟨mov [IMM], RAX ; * ; sub [IMM], RAX⟩abstracts several IA32 opcodes
single-instruction wildcard
hide immediate values
……
instructio
n-
level
control flow-
levelbranch
011101011010101010101110101001010101110001001001011010110011010101010101010010011110
+⟨mov [IMM], RAX ; * ; sub [IMM], RAX⟩⟨add[IMM], RDX ; * ; sub RAX, RCX⟩⟨push EBP ; mov ESP, EBP⟩⟨shl[IMM], RAX ; shr[IMM], RAX⟩⟨ *; * ; sub [IMM], RAX⟩ [math elided]
Guessing the Source CompilerGuessing the Source Compiler
Results [R, Miller, Zhu PASTE ‘10]
7
01110101101010101010111010100101010111000100100101101011001101010101010101001
01110101101010101010111010100101010111000100100101101011001101010101010101001
01110101101010101010111010100101010111000100100101101011001101010101010101001
01110101101010101010111010100101010111000100100101101011001101010101010101001
01110101101010101010111010100101010111000100100101101011001101010101010101001
01110101101010101010111010100101010111000100100101101011001101010101010101001
single compiler
mixed compiler
GCC ICC MSVC
92.5%
93.7%5.3%
2.3%
or2.8%
6.4%
error types
Finer detail: compiler versions, optimization
8Guessing the Source Compiler
Major versions?
Minor versions?
Low optimization vs. high
optimization?Highly optimized
code?
GCC 3.x vs 4.x
GCC 4.2 vs 4.3
GCC -O0 vs -O3
GCC –O2 vs –O3
easy 99%
easy 85-99%
easy 99%
hard 60%
Future work
9Guessing the Source Compiler
int bar(int foo) { int i, j;
for(i=0;i<foo;++i) { i = j + i; j *= i; } return j;}
...
01110101101010101010111010100101010111000100100101101011001101010101010101001001111010101110100101101010