Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Using Dynamic BinaryInstrumentation to GenerateMulti-Platform SimPoints
Vincent M. Weaver and Sally A. McKee
Cornell University
29 January 2008
Breakdown of Benchmark MethodologyUsed in Major Conferences
HPCA, ISCA, MICRO ISCA(1995-2004) (2006-2007)
Yi et al HPCA 2005
Run Complete (simulator) 18% 19%Run Multiple SimPoints 0% 4%Run SMARTS 0% 11%Run One SimPoint 0% 30%Fast Forward X, Run Z 27% 30%Run Reduced (train, MinneSPEC) 19% 22%Run Z 23% 4%
1Fusion
Phase Behavior — twolf
0 1000 2000 3000Instruction Interval (100M)
300.twolf default
0
1
2
CP
I
012345
L1 D
-Cac
heM
iss
Rat
e (%
)
2Fusion
Phase Behavior — mcf
0 200 400 600Instruction Interval (100M)
181.mcf default
02468
CP
I
0
10
20
30
L1 D
-Cac
heM
iss
Rat
e (%
)
3Fusion
Phase Behavior — gcc.200
0 200 400 600Instruction Interval (100M)
176.gcc 200
0
5
10
CP
I
0
10
20
30
L1 D
-Cac
heM
iss
Rat
e (%
)
4Fusion
SimPoint Basic Block Vectors
• Instrument every Basic Block
• Increment unique counter on entry to each BB
• Save list of BBs and frequencies periodically
These BBVs are used by SimPoint utility to determine
phase behavior
5Fusion
Dynamic Binary Instrumentation
• Instruments each BB at first execution
• Caches instrumented code
• Runs faster than simulation, slower than native
6Fusion
Previous Tools for Generating BBVs
• Atom — only Alpha with Tru64 UNIX
• Pin — only Intel architectures (IA32, Intel 64, IA64,
XScale)
• Simulator mods — only simulator specific (SLOW)
7Fusion
Goal – Expand Applicability of SimPoints
• Support more architectures
• Use existing DBI tools
• Validate generated BBVs
• Generate cross-platform BBV files
8Fusion
Solution: Valgrind and Qemu
Pin Valgrind
Qemu
itanium
arm
x86x86_64
ppc
m68kmips
sh4sparchppa
alpha
cris
9Fusion
Pin
• Intel proprietary DBI tool: IA32, IA64, Intel 64, Xscale
(Windows, Linux, OSX)
• Plugins are written in C++
• Average slowdown of 15x on SPEC 2000
10Fusion
Qemu
• Open Source
• Full-system simulation or syscall-by-proxy
• Translates from arbitrary ISAs via DBI
• Runs ARM, IA32, Intel 64, MIPS, PPC, SPARC, HPPA,
Etrax CRIS, Alpha, sh4, m68k
• Average slowdown of 28x on SPEC 2000
11Fusion
Valgrind
• Open Source, external plugin interface
• Originally designed to find memory access violations
• Translates to own IR, instruments, retranslates back to
native ISA
• Runs on IA32, Intel 64 and PPC platforms (Linux and
AIX)
• Average slowdown of 39x on SPEC 2000
12Fusion
Validation Systems
machine processor memory L1 I/D L2/L3 Cachenestle 400MHz Pentium II 256MB 16KB/16KB 512KB
spruengli 550MHz Pentium III 512MB 16KB/16KB 512KBitanium 800MHz Itanium (x86 mode) 1GB 16KB/16KB 96KB/3MBchocovic 1.66GHz Core Duo 1GB 32KB/32KB 1MB
milka 1.733MHz Athlon MP 512MB 64KB/64KB 256KBgallais 1.8GHz Pentium 4 256MB 12Ku/16KB 256KB
jennifer 2GHz Athlon64 X2 1GB 64KB/64KB 512KBsampaka12 2.8GHz Pentium 4 2GB 12Ku/16KB 512KBdomori25 3.46GHz Pentium D 4GB 12Ku/16KB 2MB
13Fusion
SPEC CPU 2000
nestle Pentium II
spruengli Pentium III
itanium Itanium
chocovic Core Duo
milka Athlon
gallais Pentium 4
jennifer Athlon 64
sampaka12 Pentium 4
domori25 Pentium D
0
10
20
30
40
CP
I Err
or (
%)
First 100MFfwd 1B, 100M
Pin, one SimPointQemu, one SimPointValgrind, one SimPoint
Pin, up to 10 SimPointsQemu, up to 10 SimPointsValgrind, up to 10 SimPoints
Pin, up to 20 SimPointsQemu, up to 20 SimPointsValgrind, up to 20 SimPoints
46.7%40.3%
50.1% 40.6% 44.9% 82.6% 50.8% 69.1% 55.3% 54.4% 54.7% 43.1% 46.2%
14Fusion
SPEC CPU 2000 Breakdown – Pentium D
bzip2.graphic
bzip2.program
bzip2.source
crafty.default
eon.cook
eon.kajiya
eon.rushmeier
gap.default
gcc.expr
gcc.integrate
gcc.scilab
gcc.166
gcc.200
gzip.graphic
gzip.log
gzip.program
gzip.random
gzip.source
mcf.default
parser.default
perlbmk.diffmail
perlbmk.makerand
perlbmk.perfect
perlbmk.535
perlbmk.704
perlbmk.957
perlbmk.850
twolf.default
vortex.1
vortex.2
vortex.3
vpr.route
vpr.place
-10
-5
0
5
10
CP
I Err
or (
%)
-10
-5
0
5
10
CP
I Err
or (
%)
Pin Qemu ValgrindInteger Results, up to 20 SimPoints, Pentium D
19.8
22.8
19.3
10.7
-109
11.8
12.9
19.9
ammp.default
applu.default
apsi.defaultart.1
10art.4
70
equake.default
facerec.default
fma3d.default
galgel.default
lucas.default
mesa.default
mgrid.default
sixtrack.default
swim.default
wupwise.default-10
-5
0
5
10
CP
I Err
or (
%)
-10
-5
0
5
10
CP
I Err
or (
%)
Pin Qemu ValgrindFP Results, up to 20 SimPoints, Pentium D
-15.5
10.817.5
15Fusion
SPEC CPU 2006
chocovic - Core Duo sampaka12 - Pentium 4 domori25 - Pentium D jennifer - Athlon 640
10
20
30
40
CP
I Err
or (
%)
First 100MFfwd 1 B, 100M
Pin, one SimPointQemu, one SimPointValgrind, one SimPoint
Pin, up to 10 SimPointsQemu, up to 10 SimPointsValgrind, up to 10 SimPoints
Pin, up to 20 SimPointsQemu, up to 20 SimPointsValgrind, up to 20 SimPoints
63.7% 110.1% 124.7% 110.0%42.6%
machine processor memory L1 I/D L2 Cache
chocovic 1.66GHz Core Duo 1GB 32KB/32KB 1MBjennifer 2GHz Athlon64 X2 1GB 64KB/64KB 512KB
sampaka12 2.8GHz Pentium 4 2GB 12Ku/16KB 512KBdomori25 3.46GHz Pentium D 4GB 12Ku/16KB 2MB
16Fusion
SPEC CPU 2006 Breakdown – Pentium D
astar.BigLakes
astar.rivers
bzip2.source
bzip2.chicken
bzip2.liberty
bzip2.program
bzip2.html
bzip2.combined
gcc.166
gcc.200
gcc.c-typeck
gcc.cp-decl
gcc.expr
gcc.expr2
gcc.g23
gcc.s04
gcc.scilab
gobmk.13x13
gobmk.nngs
gobmk.score2
gobmk.trevorc
gobmk.trevord
h264ref.fore_base
h264ref.fore_main
h264ref.sss_main
hmmer.nph3
hmmer.retro
libquantum mcf
omnetpp
perlbench.checkspam
perlbench.diffmail
perlbench.splitmailsjeng
xalancbmk-10
-5
0
5
10C
PI E
rror
(%
)
-10
-5
0
5
10C
PI E
rror
(%
)Pin Qemu ValgrindInteger Results, up to 20 SimPoints, Pentium D
12.1 10.7
-11.4
11.3
-36.3 -12.8
13.0
-12.6
11.5 13.1 15.6
bwaves
cactusADMcalculix
dealII
gamess.cytosine
gamess.h2ocu2
gamess.triazolium
GemsFDTD
gromacs lbm
leslie3dmilc
namdpovray
soplex.pds-50
soplex.ref
sphinx3tonto wrf
zeusmp-10
-5
0
5
10
CP
I Err
or (
%)
-10
-5
0
5
10
CP
I Err
or (
%)
Pin Qemu ValgrindFP Results, up to 20 SimPoints, Pentium D
-33.25-20.5
n/a
17Fusion
Results – Average CPI error
• SPEC2000: 10 SimPoints (<0.4% of reference inputs)
◦ Pin 5.32%
◦ Qemu 5.04%
◦ Valgrind 5.38%
• SPEC2006: 10 SimPoints (<0.06% of reference inputs)
◦ Pin 5.58%
◦ Qemu 5.30%
◦ Valgrind 5.28%
18Fusion
Future Work
• Generating multi-platform results – MIPS BBV file from
IA32 Qemu
• Running non-Linux binaries (Solaris, IRIX)
• Generating OS-aware SimPoints – Qemu runs full OS
19Fusion
Tools Available for Download
All code is available from:
http://fusion.csl.cornell.edu/tools/
20Fusion
Questions? Feedback?
All code is available from:
http://fusion.csl.cornell.edu/tools/
[email protected] — http://www.csl.cornell.edu/˜vince
[email protected] — http://www.csl.cornell.edu/˜sam
21Fusion