Upload
leonard-washington
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
Persistent Code Caching
Exploiting Code ReuseAcross Executions
& Applications
†Harvard
University
‡University of
Colorado at Boulder
§Intel
Corporation
Vijay Janapa Reddi†
Dan Connors‡, Robert Cohn§, Michael D. Smith†
Runtime Compilation System
Execution environmentsthat provide an interface to the dynamic instructionstream of an application
ProgramIntrospection
ResourceManagement
ProcessManagers
RuntimeCompilation
Systems
Overheads1. Runtime compilation2. Performance of the
compiled code
RS A’ B’RS RS C’ C’ A’Runtime Sys. (RS)Code caching
A
Managing compilation overheadvia software code caching
Execution time
Reuse of cached code
B C C AOriginal dynamic
instruction stream
Basis: 90% execution time in 10% (hot) code
Problem statement
There exist execution domains wherecode caching is ineffective, which limits the deployment of runtime compilation systems
Challenges in deploying dynamic binary instrumentation into production regression testing environments
Case study of the Oracle database
Highlight of this talk:
Caching performance variesbased on program behavior
Loop intensive application
Large code footprint & infrequent code re-use
176.gcc
181.mcf
Runtime Compilation Code Cache
Caching performance variesbased on program behavior
Normalized execution time
Mcf
Eon
Vpr
Twolf
Gap
Bzip2
Gzip
Parser
Vortex
Crafty
Perl
Gcc Large footprint(infrequent reuse)
Loop intensive (frequent reuse)
Runtime Compilation
Code Cache
Gqview
Gftp
File Roller
Gvim
Gedit
Dia
Oracle
Benchmark 176.gcc is not an outlier
Oracle
Gedit
Dia
Gvim
File Roller
Gftp
Gqview
Normalized execution time
Runtime Compilation
Code Cache
GUI applications - Large startup cost - Library initialization executed < 10 times
Code caching suffers under certain execution behaviors
Less code reuseLarge code footprintShort run times
Not uncommon!
Regression testing• Oracle (100,000 tests)• Gcc (4000+ tests)
176.gcc (5 SPEC reference inputs)
Execution time
Cold code is hot code across executions!!!
Cold code is hot code across executions!!!
RS A’ B’RS RS C’ C’ A’Caching (Run 1)
A
Caching code across executions improves caching performance
B C C AOriginal dynamic
instruction stream
RS A’ B’RS RS C’ C’ A’Caching (Run 2)
Persistent caching(Run 2)
A’ B’ C’ A’Reduce overhead by
storing & reusing caches
C’
Execution time
Implementation Framework: Pin(Dynamic binary instrumentation)
Address Space
Operating SystemHardware
Applic
ati
on
Client
Runtime SystemComponents
Code C
ach
e
Interface
Appropriate system for evaluating persistence
General model Robust design Enterprise-scale usage
Persistent Pin
Persistent Cache Translated code Translation data structures Correctness metadata
Persistence Mgr.PersistentCache DB
Address Space
Operating SystemHardware
Applic
ati
on
Client
PinComponents
Cod
e C
ach
e
Interface
Experimental setup
IA32 Linux implementation
Bounded cache (320MB) Applications ran unmodified No cache flushes occurred
Input X
Empty CachePin
PersistentCache X
PersistentCache XPin
Input ?
Measure improvement
Cross-application
Cross-input
Same-input
Exploiting code reuse across executions and applications
Code coverage: Bull's eye (100% reuse)
0
20
40
60
80
100
Mcf
Pa
rse
r
Vo
rte
x
Tw
olf
Vp
r
Cra
fty
Bzi
p2
Ga
p
Gzi
p
Pe
rlb
mk
Gcc
Ora
cle
File
Ro
ller
Dia
Gvi
m
Gft
p
Pe
rfo
rma
nce
imp
rove
me
nt
(%)
Persistent caching works across program classes
SPEC 2000 INT (Reference inputs)
Benefits large code footprint
applicationsPersistent caching is complementaryto the current code caching model
0
20
40
60
80
100
Mcf
Gzi
p
Bzi
p2
Cra
fty
Pe
rlb
mk
Tw
olf
Vp
r
Pa
rse
r
Vo
rte
x
Ga
p
Gcc
Pe
rfo
rma
nce
imp
rove
me
nt (
%)
Reference Inputs
Training Inputs
Persistent caching is effectivefor short-running applications
Input data set altersprogram behavior
Small improvements gets bigger (Gap) andlarge improvements get even larger (Gcc)
Evaluating persistent caching across program inputs
50% 60% 70% 80% 90% 100%
Ora
cle
175.v
pr
253.p
erl
bm
k
176.g
cc
164.g
zip
256.b
zip
2
Code coverage between inputs
Production environments require runtime systems improvements
Case study: Regression testing of Oracle XE
Oracle: 80s Oracle + Pin (translation): 2000s
Oracle + Pin (translation) + Instrumentation (memory tracing): 3000s
One unit-test!
Oracle is a multi-process programming environment
Large number of process compilations
1
Challenges
Start Mount Open Work Close
Oracle’s execution phases
Processes exhibitcode sharing
Start Mount Open Work Close
Oracle’s execution phases
A C C B Z
A C C B Z
Large number of process compilations
1
Redundant translationsacross processes
2
Challenges
Every Oracle unit-test starts anew instance of the database
Start Mount Open Unit-test 1 Close
Oracle’s execution phases
Start Mount Open Unit-test 2 Close
Only phase changingacross all unit-tests
Large number of process compilations
1
Redundant translationsacross processes
2
Challenges
Redundant translationsacross unit-tests
3
Every unit-test executes all phases
0
100
200
300
400
500
Start Mount Open Work Close
Ela
psed
tim
e (s
)
No persistence Mean value using different persistent caches
Persistent Cache (Start)Low code coverage (15%)
Persistent Cache (Open)High code coverage (77%)
Leveraging persistence across processes
Persistent Cache Accumulation (PCA) addresses limited code coverage
Pin
Input Z
Input X
Empty CachePin
PersistentCache X
Input Y
PersistentCache XPin
Accumulate code across executions
Timed Run
PersistentCache X+Y
PersistentCache X+Y
0
1000
2000
3000
4000
With
out
Per
sist
ence
Sta
rtup
Sta
rtup
+M
ount
Sta
rtup
+M
ount
+O
pen
Sta
rtup
+M
ount
+O
pen
+W
ork
Sta
rtup
+M
ount
+O
pen
+W
ork
+C
lose
Ela
psed
tim
e (s
)
No instrumentation With instrumentation
Persistent Cache Accumulation (PCA) improves unit-test performance
Accumulated persistent caches
Performance improves with more accumulation of code
Contributions: Improved code caching
Inter-Execution
Inter-Application
Intra-Execution
Reuse
Cold code is hot code!
Persistence is effective Less code reuse Short run times Large code footprint
Robust and performanceefficient implementation
Production environment regression testing study
Backup Slides
Future Research Questions
Selective persistent caching Cache only cold/hot code
Effectiveness of optimizations across Inputs Applications
Impact of excessive cache accumulation
Persistent Cache Sizes:DS is larger than CC!
0
20
40
60
80
Gft
p
Gvi
m
Dia
File
Rol
ler
Gqv
iew
Ora
cle
(Sta
rt)
Ora
cle
(Mou
nt)
Ora
cle
(Ope
n)
Ora
cle
(Wor
k)
Ora
cle
(Clo
se)
Cac
he S
ize
(MB
)
Data Structures
Traces
Persistent Cache Sizes:DS is larger than CC!
0
10
20
30
40
Gnumeric Emacs Xpdf Gv Netscape WTS
Siz
e (M
B)
Code Cache
Data Structures
29
Cross-input Persistence reduces re-translation across inputs
Re-invocation w/ Persistence using a cache from a different input for a previously unseen input
Persistence is effective even across changing input data sets
Without Persistence Re-invocation w/ Persistence using a previously cached execution
~30% improvement via Cross-input Persistence
time
VOID Analysis(COUNTER * counter) { (*counter) ++; }
VOID Instrumentation(INS ins, VOID *v) { STATS * stats = new STATS( INS_Address(ins)); INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR (Analysis), IARG_PTR, &stats->counter, …); … }
VOID main(INT32 argc, CHAR *argv[]) { … INS_AddInstrumentFunction(Instrumentation, 0); … PIN_StartProgram(); }
Persistent instrumentation issues
Dynamically allocated memoryCalled upon every instruction execution
Called once per instruction compilation
Solution: Allocate memory usingthe Persistent Memory Allocator
Invalid pointer duringcache reuse
Memory allocation during cache generation
Inter-Application exploits redundancy of library translations
Input X
Empty CachePin
PersistentCache X
PersistentCache YPin
Input X
Input Y
Empty CachePin
PersistentCache Y
PersistentCache XPin
Input Y
Application A Application B Libraries (DSO) Initialization Toolkits/Pkgs
X11 GTK+ FLTK
Timed Run
Inter-Application Persistence
0
10
20
30
40
Gft
p
Gvim
Dia
File
Rolle
r
Gqvie
w
Ela
psed t
ime (
s)
No Persistence Same-inputPersistent Library Cache Gftp Persistent Library Cache GvimPersistent Library Cache Dia Persistent Library Cache File RollerPersistent Library Cache Gqview
Verifies that large amount of timeis spent initializing library routines
~60% improvement
Processes exhibitcode sharing
Start Mount Open Work Close
Oracle’s execution phases
Large number of process compilations
1
Redundant translationsacross processes
2
fork()
exec()
exec() loses parent cache: May re-translate parent code!
Challenges