Upload
austin-tracy-watson
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Process Virtualizationand Symbiotic Optimization
Kim HazelwoodACACES Summer School
July 2009
ACACES 2009 – Process Virtualization
Currently – Assistant Professor at University of Virginia– Faculty Consultant at Intel
Previously– PostDoc at Intel (2004-2005)– PhD from Harvard (2004)– Four summer internships (HP & IBM)
– Worked with Dynamo, Jikes RVM, …
Other Interests– Marathons (Boston, NYC, Disney)– Reality TV Shows– Family (8 month old at home!)
2
About Your Instructor
ACACES 2009 – Process Virtualization
About the Course
• Day 1 – What is Process Virtualization?
• Day 2 – Building Process Virtualization Systems
• Day 3 – Using Process Virtualization Systems
• Day 4 – Symbiotic Optimization
• We’ll use Pin as a case studywww.pintool.org
• You’ll have homework!
3
ACACES 2009 – Process Virtualization
What is Process Virtualization?
System virtualization – allows multiple OSes to share the same hardware
Process virtualization – runs as a normal application (on top of an OS) and supports a single process
4
HW HWVMM OS
OS1 OS2
App1 App2
DBTApp1
DBIApp2
System Virtualization
Process Virtualization
ACACES 2009 – Process Virtualization5
Classifying Virtualization
Dynamic binary optimization (x86 x86--)• Complement the static compiler
– User inputs, phases, DLLs, hardware features– Examples: DynamoRIO, Mojo, Strata
Dynamic translation (x86 PPC)• Convert applications to run on a new architecture
– Examples: Rosetta, Transmeta CMS, DAISY
Dynamic instrumentation (x86 x86++)• Inspect/add features to existing applications
– Examples: Pin, Valgrind
ACACES 2009 – Process Virtualization
6
A Simple Example of Instrumentation
Inserting extra code into a program to collect runtime information
sub $0xff, %edx
cmp %esi, %edx
jle <L1>
mov $0x1, %edi
add $0x10, %eax
counter++;
counter++;
counter++;
counter++;
counter++;
ACACES 2009 – Process Virtualization7
Instruction Count Output
$ /bin/ls Makefile imageload.out itrace proccount imageload inscount atrace itrace.out
$ pin -t inscount.so -- /bin/ls
Makefile imageload.out itrace proccount imageload inscount atrace itrace.out
Count 422838
ACACES 2009 – Process Virtualization8
A Simple Example of Optimization
On Pentium 3, inc is faster than add
On Pentium 4, add is faster than inc
sub $0xff, %edxcmp %esi, %edxjle <L1>mov $0x1, %ediinc %eax
sub $0xff, %edxcmp %esi, %edxjle <L1>mov $0x1, %ediadd $0x1, %eax
ACACES 2009 – Process Virtualization
Research Applications
Computer Architecture
• Trace Generation
• Fault Tolerance Studies
• Emulating New Instructions
Program Analysis
• Code coverage
• Call-graph generation
• Memory-leak detection
• Instruction profiling
Multicore
• Thread analysis– Thread profiling– Race detection
• Cache simulations
Compilers
• Compare programs from competing compilers
Security
• Add security checks and features
9
ACACES 2009 – Process Virtualization10
Approaches
• Source modification:– Modify source programs
• Binary modification:– Modify executables directly
Advantages for binary modification Language independent Machine-level view Modify legacy/proprietary software
ACACES 2009 – Process Virtualization11
Static vs Dynamic Approaches
Dynamic approaches are more robust No need to recompile or relink Discover code at runtime Handle dynamically-generated code Attach to running processes
The Code Discovery Problem on x86Instr 1 Instr 2
Instr 3 JumpReg DATA
Instr 5 Instr 6Uncond Branch PADDING
Instr 8
Indirect jump to ??
Data interspersed with code
Pad for alignment
ACACES 2009 – Process Virtualization12
Dynamic Modification: Approaches
JIT Mode• Create a modified copy of the application on-the-fly• Original code never executes
More flexible, more common approach
Probe Mode• Modifies the original application instructions• Inserts jumps to modified code (trampolines)
Lower overhead (less flexible) approach
ACACES 2009 – Process Virtualization13
JIT-Mode Binary Modification
Generate and cache modified copies of instructions
Modified (cached) instructions are executed in lieu of original instructions
EXE
Transform
CodeCache
Execute
Profile
ACACES 2009 – Process Virtualization14
JIT-Mode Instrumentation
Original code Code cache
Fetch trace starting block 1 and start instrumentation
7’
2’
1’
Pin
2 3
1
7
45
6
Exits point back to VMM
ACACES 2009 – Process Virtualization15
JIT-Mode Instrumentation
Original code Code cache
Transfer control intocode cache (block 1)
2 3
1
7
45
67’
2’
1’
Pin
ACACES 2009 – Process Virtualization16
JIT-Mode Instrumentation
Original code Code cache
7’
2’
1’
PinFetch and instrument a new trace
6’
5’
3’trace linking
2 3
1
7
45
6
ACACES 2009 – Process Virtualization17
Instrumentation Approaches
JIT Mode• Create a modified copy of the application on-the-fly• Original code never executes
More flexible, more common approach
Probe Mode• Modify the original application instructions• Insert jumps to instrumentation code (trampolines)
Lower overhead (less flexible) approach
ACACES 2009 – Process Virtualization18
A Sample Probe
• A probe is a jump instruction that overwrites original instruction(s) in the application– Copy/translate original bytes so probed functions
can be called
Entry point overwritten with probe:0x400113d4: jmp
0x414810640x400113d9: push %ebx
Copy of entry point w/ original bytes:0x50000004: push %ebp0x50000005: mov %esp,%ebp0x50000007: push %edi0x50000008: push %esi0x50000009: jmp 0x400113d9
Original function entry point:0x400113d4: push %ebp0x400113d5: mov %esp,%ebp0x400113d7: push %edi0x400113d8: push %esi0x400113d9: push %ebx
ACACES 2009 – Process Virtualization19
Probe Instrumentation
Advantages:
• Low overhead – few percent
• Less intrusive – execute original code
Disadvantages:
• More tool writer responsibility
• Restrictions on where to modify (routine-level)
ACACES 2009 – Process Virtualization20
Probe Tool Writer Responsibilities
No control flow into the instruction space where probe is placed• 6 bytes on IA32, 7 bytes on Intel64, bundle on IA64• Branch into “replaced” instructions will fail• Probes at function entry point only
Thread safety for insertion/deletion of probes• During image load callback is safe• Only loading thread has a handle to the image
Replacement function has same behavior as original
ACACES 2009 – Process Virtualization21
Probe vs. JIT Summary
Probes JIT
Overhead Few percent 50% or higher
Intrusive Low High
Granularity Function boundary
Instruction
Safety & Isolation
More responsibility for tool writer
High
ACACES 2009 – Process Virtualization
Process Virtualization Systems
Readily Available
• DynamoRIO
• Valgrind
• Pin
Available By Request
• Strata
• Adore
Unavailable
• Transmeta CMS
• Dynamo
22
ACACES 2009 – Process Virtualization
DynamoRIO
23
ACACES 2009 – Process Virtualization
Valgrind
24
ACACES 2009 – Process Virtualization
Pin
25
ACACES 2009 – Process Virtualization26
Intel Pin
Dynamic Instrumentation:• Do not need source code, recompilation, post-linking
Programmable Instrumentation:• Provides rich APIs to write in C/C++ your own instrumentation
tools (called Pintools)
Multiplatform:• Supports x86, x86-64, Itanium, Xscale• Supports Linux, Windows, MacOS
Robust:• Instruments real-life applications: Database, web browsers, …• Instruments multithreaded applications• Supports signals
Efficient:• Applies compiler optimizations on instrumentation code
ACACES 2009 – Process Virtualization27
Using Pin
Launch and instrument an application $ pin –t pintool.so –- application
Instrumentation engine
(provided in the kit)
Instrumentation tool
(write your own, or use one provided in the kit)
Attach to and instrument an application $ pin –t pintool.so –pid 1234
ACACES 2009 – Process Virtualization28
Pin Instrumentation APIs
Basic APIs are architecture independent:• Provide common functionalities like determining:
– Control-flow changes– Memory accesses
Architecture-specific APIs• e.g., Info about opcodes and operands
Call-based APIs:• Instrumentation routines• Analysis routines
ACACES 2009 – Process Virtualization29
Instrumentation vs. Analysis
Concepts borrowed from the ATOM tool:
Instrumentation routines define where instrumentation is inserted• e.g., before instruction Occurs first time an instruction is executed
Analysis routines define what to do when instrumentation is activated• e.g., increment counter Occurs every time an instruction is executed
ACACES 2009 – Process Virtualization30
Pintool 1: Instruction Count
sub $0xff, %edx
cmp %esi, %edx
jle <L1>
mov $0x1, %edi
add $0x10, %eax
counter++;
counter++;
counter++;
counter++;
counter++;
ACACES 2009 – Process Virtualization31
Pintool 1: Instruction Count Output
$ /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out
$ pin -t inscount0.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out
Count 422838
ACACES 2009 – Process Virtualization32
ManualExamples/inscount0.cpp
instrumentation routine
analysis routine
#include <iostream>#include "pin.h"
UINT64 icount = 0;
void docount() { icount++; } void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END);}
void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; }
int main(int argc, char * argv[]){ PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}
ACACES 2009 – Process Virtualization33
Pintool 2: Instruction Trace
sub $0xff, %edx
cmp %esi, %edx
jle <L1>
mov $0x1, %edi
add $0x10, %eax
Print(ip);
Print(ip);
Print(ip);
Print(ip);
Print(ip);
Need to pass ip argument to the analysis routine (Printip())
ACACES 2009 – Process Virtualization34
Pintool 2: Instruction Trace Output
$ pin -t itrace.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out
$ head -4 itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5
ACACES 2009 – Process Virtualization35
ManualExamples/itrace.cpp
argument to analysis routine
analysis routineinstrumentation routine
#include <stdio.h>#include "pin.h"FILE * trace;void printip(void *ip) { fprintf(trace, "%p\n", ip); }
void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END);}void Fini(INT32 code, void *v) { fclose(trace); }int main(int argc, char * argv[]) { trace = fopen("itrace.out", "w"); PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0);
PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}
ACACES 2009 – Process Virtualization36
Examples of Arguments to Analysis Routine
IARG_INST_PTR– Instruction pointer (program counter) value
IARG_UINT32 <value>– An integer value
IARG_REG_VALUE <register name>– Value of the register specified
IARG_BRANCH_TARGET_ADDR– Target address of the branch instrumented
IARG_MEMORY_READ_EA– Effective address of a memory read
And many more … (refer to the manual for details)
ACACES 2009 – Process Virtualization37
Instrumentation Points
Instrument points relative to an instruction:
• Before: IPOINT_BEFORE• After:
– Fall-through edge: IPOINT_AFTER– Taken edge: IPOINT_TAKEN_BRANCH
cmp %esi, %edx
jle <L1>
mov $0x1, %edi
<L1>: mov $0x8,%edi
count()
count()
count()
ACACES 2009 – Process Virtualization38
• Instruction• Basic block
– A sequence of instructions terminated at a control-flow changing instruction
– Single entry, single exit• Trace
– A sequence of basic blocks terminated at an unconditional control-flow changing instruction
– Single entry, multiple exits
Instrumentation Granularity
sub $0xff, %edxcmp %esi, %edxjle <L1>
mov $0x1, %ediadd $0x10, %eaxjmp <L2>1 Trace, 2 BBs, 6
insts
Instrumentation can be done at three different granularities:
ACACES 2009 – Process Virtualization39
Pintool 3: Faster Instruction Count
sub $0xff, %edx
cmp %esi, %edx
jle <L1>
mov $0x1, %edi
add $0x10, %eax
counter += 3
counter += 2basic blocks (bbl)
ACACES 2009 – Process Virtualization40
ManualExamples/inscount1.cpp#include <stdio.h>#include "pin.H“UINT64 icount = 0;void docount(INT32 c) { icount += c; }void Trace(TRACE trace, void *v) { for (BBL bbl = TRACE_BblHead(trace); BBL_Valid(bbl); bbl = BBL_Next(bbl)) { BBL_InsertCall(bbl, IPOINT_BEFORE, (AFUNPTR)docount, IARG_UINT32, BBL_NumIns(bbl), IARG_END); }}void Fini(INT32 code, void *v) { fprintf(stderr, "Count %lld\n", icount);}int main(int argc, char * argv[]) { PIN_Init(argc, argv); TRACE_AddInstrumentFunction(Trace, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0;}
analysis routineinstrumentation routine
ACACES 2009 – Process Virtualization
What Did We Learn Today?
• Overview of Process Virtualization
• Approaches• Source vs. Binary• Static vs. Dynamic• JIT vs. Probes
• Three Available Systems
• Three Simple Examples
41
ACACES 2009 – Process Virtualization42
Want More Info?
• Read Jim Smith’s book: Virtual Machines
• Download one (or more) of them!
Pin www.pintool.org
DynamoRIO code.google.com/p/dynamorio
Valgrind www.valgrind.org
Day 1 – What is Process Virtualization?Day 2 – Building Process Virtualization SystemsDay 3 – Using Process Virtualization SystemsDay 4 – Symbiotic Optimization