33
Persistent Code Caching Exploiting Code Reuse Across Executions & Applications Harvard University University of Colorado at Boulder § Intel Corporation Vijay Janapa Reddi Dan Connors , Robert Cohn § , Michael D. Smith

Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Embed Size (px)

Citation preview

Page 1: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Persistent Code Caching

Exploiting Code ReuseAcross Executions

& Applications

†Harvard

University

‡University of

Colorado at Boulder

§Intel

Corporation

Vijay Janapa Reddi†

Dan Connors‡, Robert Cohn§, Michael D. Smith†

Page 2: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Runtime Compilation System

Execution environmentsthat provide an interface to the dynamic instructionstream of an application

ProgramIntrospection

ResourceManagement

ProcessManagers

RuntimeCompilation

Systems

Overheads1. Runtime compilation2. Performance of the

compiled code

Page 3: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

RS A’ B’RS RS C’ C’ A’Runtime Sys. (RS)Code caching

A

Managing compilation overheadvia software code caching

Execution time

Reuse of cached code

B C C AOriginal dynamic

instruction stream

Basis: 90% execution time in 10% (hot) code

Page 4: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Problem statement

There exist execution domains wherecode caching is ineffective, which limits the deployment of runtime compilation systems

Challenges in deploying dynamic binary instrumentation into production regression testing environments

Case study of the Oracle database

Highlight of this talk:

Page 5: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Caching performance variesbased on program behavior

Loop intensive application

Large code footprint & infrequent code re-use

176.gcc

181.mcf

Runtime Compilation Code Cache

Page 6: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Caching performance variesbased on program behavior

Normalized execution time

Mcf

Eon

Vpr

Twolf

Gap

Bzip2

Gzip

Parser

Vortex

Crafty

Perl

Gcc Large footprint(infrequent reuse)

Loop intensive (frequent reuse)

Runtime Compilation

Code Cache

Page 7: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Gqview

Gftp

File Roller

Gvim

Gedit

Dia

Oracle

Benchmark 176.gcc is not an outlier

Oracle

Gedit

Dia

Gvim

File Roller

Gftp

Gqview

Normalized execution time

Runtime Compilation

Code Cache

GUI applications - Large startup cost - Library initialization executed < 10 times

Page 8: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Code caching suffers under certain execution behaviors

Less code reuseLarge code footprintShort run times

Not uncommon!

Regression testing• Oracle (100,000 tests)• Gcc (4000+ tests)

176.gcc (5 SPEC reference inputs)

Execution time

Cold code is hot code across executions!!!

Cold code is hot code across executions!!!

Page 9: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

RS A’ B’RS RS C’ C’ A’Caching (Run 1)

A

Caching code across executions improves caching performance

B C C AOriginal dynamic

instruction stream

RS A’ B’RS RS C’ C’ A’Caching (Run 2)

Persistent caching(Run 2)

A’ B’ C’ A’Reduce overhead by

storing & reusing caches

C’

Execution time

Page 10: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Implementation Framework: Pin(Dynamic binary instrumentation)

Address Space

Operating SystemHardware

Applic

ati

on

Client

Runtime SystemComponents

Code C

ach

e

Interface

Appropriate system for evaluating persistence

General model Robust design Enterprise-scale usage

Page 11: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Persistent Pin

Persistent Cache Translated code Translation data structures Correctness metadata

Persistence Mgr.PersistentCache DB

Address Space

Operating SystemHardware

Applic

ati

on

Client

PinComponents

Cod

e C

ach

e

Interface

Page 12: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Experimental setup

IA32 Linux implementation

Bounded cache (320MB) Applications ran unmodified No cache flushes occurred

Input X

Empty CachePin

PersistentCache X

PersistentCache XPin

Input ?

Measure improvement

Page 13: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Cross-application

Cross-input

Same-input

Exploiting code reuse across executions and applications

Code coverage: Bull's eye (100% reuse)

Page 14: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

0

20

40

60

80

100

Mcf

Pa

rse

r

Vo

rte

x

Tw

olf

Vp

r

Cra

fty

Bzi

p2

Ga

p

Gzi

p

Pe

rlb

mk

Gcc

Ora

cle

File

Ro

ller

Dia

Gvi

m

Gft

p

Pe

rfo

rma

nce

imp

rove

me

nt

(%)

Persistent caching works across program classes

SPEC 2000 INT (Reference inputs)

Benefits large code footprint

applicationsPersistent caching is complementaryto the current code caching model

Page 15: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

0

20

40

60

80

100

Mcf

Gzi

p

Bzi

p2

Cra

fty

Pe

rlb

mk

Tw

olf

Vp

r

Pa

rse

r

Vo

rte

x

Ga

p

Gcc

Pe

rfo

rma

nce

imp

rove

me

nt (

%)

Reference Inputs

Training Inputs

Persistent caching is effectivefor short-running applications

Input data set altersprogram behavior

Small improvements gets bigger (Gap) andlarge improvements get even larger (Gcc)

Page 16: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Evaluating persistent caching across program inputs

50% 60% 70% 80% 90% 100%

Ora

cle

175.v

pr

253.p

erl

bm

k

176.g

cc

164.g

zip

256.b

zip

2

Code coverage between inputs

Page 17: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Production environments require runtime systems improvements

Case study: Regression testing of Oracle XE

Oracle: 80s Oracle + Pin (translation): 2000s

Oracle + Pin (translation) + Instrumentation (memory tracing): 3000s

One unit-test!

Page 18: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Oracle is a multi-process programming environment

Large number of process compilations

1

Challenges

Start Mount Open Work Close

Oracle’s execution phases

Page 19: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Processes exhibitcode sharing

Start Mount Open Work Close

Oracle’s execution phases

A C C B Z

A C C B Z

Large number of process compilations

1

Redundant translationsacross processes

2

Challenges

Page 20: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Every Oracle unit-test starts anew instance of the database

Start Mount Open Unit-test 1 Close

Oracle’s execution phases

Start Mount Open Unit-test 2 Close

Only phase changingacross all unit-tests

Large number of process compilations

1

Redundant translationsacross processes

2

Challenges

Redundant translationsacross unit-tests

3

Every unit-test executes all phases

Page 21: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

0

100

200

300

400

500

Start Mount Open Work Close

Ela

psed

tim

e (s

)

No persistence Mean value using different persistent caches

Persistent Cache (Start)Low code coverage (15%)

Persistent Cache (Open)High code coverage (77%)

Leveraging persistence across processes

Page 22: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Persistent Cache Accumulation (PCA) addresses limited code coverage

Pin

Input Z

Input X

Empty CachePin

PersistentCache X

Input Y

PersistentCache XPin

Accumulate code across executions

Timed Run

PersistentCache X+Y

PersistentCache X+Y

Page 23: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

0

1000

2000

3000

4000

With

out

Per

sist

ence

Sta

rtup

Sta

rtup

+M

ount

Sta

rtup

+M

ount

+O

pen

Sta

rtup

+M

ount

+O

pen

+W

ork

Sta

rtup

+M

ount

+O

pen

+W

ork

+C

lose

Ela

psed

tim

e (s

)

No instrumentation With instrumentation

Persistent Cache Accumulation (PCA) improves unit-test performance

Accumulated persistent caches

Performance improves with more accumulation of code

Page 24: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Contributions: Improved code caching

Inter-Execution

Inter-Application

Intra-Execution

Reuse

Cold code is hot code!

Persistence is effective Less code reuse Short run times Large code footprint

Robust and performanceefficient implementation

Production environment regression testing study

Page 25: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Backup Slides

Page 26: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Future Research Questions

Selective persistent caching Cache only cold/hot code

Effectiveness of optimizations across Inputs Applications

Impact of excessive cache accumulation

Page 27: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Persistent Cache Sizes:DS is larger than CC!

0

20

40

60

80

Gft

p

Gvi

m

Dia

File

Rol

ler

Gqv

iew

Ora

cle

(Sta

rt)

Ora

cle

(Mou

nt)

Ora

cle

(Ope

n)

Ora

cle

(Wor

k)

Ora

cle

(Clo

se)

Cac

he S

ize

(MB

)

Data Structures

Traces

Page 28: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Persistent Cache Sizes:DS is larger than CC!

0

10

20

30

40

Gnumeric Emacs Xpdf Gv Netscape WTS

Siz

e (M

B)

Code Cache

Data Structures

Page 29: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

29

Cross-input Persistence reduces re-translation across inputs

Re-invocation w/ Persistence using a cache from a different input for a previously unseen input

Persistence is effective even across changing input data sets

Without Persistence Re-invocation w/ Persistence using a previously cached execution

~30% improvement via Cross-input Persistence

time

Page 30: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

VOID Analysis(COUNTER * counter) { (*counter) ++; }

VOID Instrumentation(INS ins, VOID *v) { STATS * stats = new STATS( INS_Address(ins)); INS_InsertCall(ins, IPOINT_BEFORE, AFUNPTR (Analysis), IARG_PTR, &stats->counter, …); … }

VOID main(INT32 argc, CHAR *argv[]) { … INS_AddInstrumentFunction(Instrumentation, 0); … PIN_StartProgram(); }

Persistent instrumentation issues

Dynamically allocated memoryCalled upon every instruction execution

Called once per instruction compilation

Solution: Allocate memory usingthe Persistent Memory Allocator

Invalid pointer duringcache reuse

Memory allocation during cache generation

Page 31: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Inter-Application exploits redundancy of library translations

Input X

Empty CachePin

PersistentCache X

PersistentCache YPin

Input X

Input Y

Empty CachePin

PersistentCache Y

PersistentCache XPin

Input Y

Application A Application B Libraries (DSO) Initialization Toolkits/Pkgs

X11 GTK+ FLTK

Timed Run

Page 32: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Inter-Application Persistence

0

10

20

30

40

Gft

p

Gvim

Dia

File

Rolle

r

Gqvie

w

Ela

psed t

ime (

s)

No Persistence Same-inputPersistent Library Cache Gftp Persistent Library Cache GvimPersistent Library Cache Dia Persistent Library Cache File RollerPersistent Library Cache Gqview

Verifies that large amount of timeis spent initializing library routines

~60% improvement

Page 33: Persistent Code Caching Exploiting Code Reuse Across Executions & Applications † Harvard University ‡ University of Colorado at Boulder § Intel Corporation

Processes exhibitcode sharing

Start Mount Open Work Close

Oracle’s execution phases

Large number of process compilations

1

Redundant translationsacross processes

2

fork()

exec()

exec() loses parent cache: May re-translate parent code!

Challenges