23
ROOT and x32-ABI Nathalie Rauschmayr On behalf of LHCb collaboration March 13, 2013 Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20

ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

ROOT and x32-ABI

Nathalie Rauschmayr

On behalf of LHCb collaboration

March 13, 2013

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20

Page 2: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Outline

1 History of application binary interfacesWhy is a new binary interface necessary?x32-ABI - A new 32-bit ABI for x86-64Preconditions

2 How ROOT can profitResults within other CERN applicationsResults within ROOT-BenchmarksReasons for performance differences

3 Support of x32-ABI

4 Conclusion

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20

Page 3: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

History of application binary interfaces

x86:

32-bit word size → 4GB addressable

x86 64 as an extension of x86:

runs 32-bit as well as 64-bit applications

64-bit mode supports new hardware features

Number of CPU registers

Floating-point performance

Function parameters passed via registers

Syscall instruction ...

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 2 / 20

Page 4: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Why is a new binary interface necessary?

x86-64 (64-bit mode):

new hardware features

no memory limit

slowdown due to memory issues

contention

page faults

paging ...

x86:

very low memory footprint

performance issues

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 3 / 20

Page 5: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

x32-ABI - A new 32-bit ABI for x86-64

Application Binary Interface based on 64-bit x86 architecture

Reduces size of pointers and C-datatype long to 32-bit

Takes advantage of all x64-features

Avoids memory overhead

advantages of 64-bit instruction set+

memory footprint of a 32-bit application

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 4 / 20

Page 6: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

x32-ABI - A new 32-bit ABI for x86-64

Developed by H.J. Lu (Intel)

Introduced in Linux-Kernel 3.4 (released in summer 2012)

http://www.linuxplumbersconf.org/2011/ocw/sessions/531

Opinions in Linux-community differ quite a lot

32-bit time values will overflow

adressable memory

...

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 5 / 20

Page 7: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

x32-ABI - A new 32-bit ABI for x86-64

Impressive results from x32-developers:

181.mcf from SPEC CPU 2000 (memory bound):

Intel Core i7∼ 40% faster than x86-64∼ 2% slower than ia32

Intel Atom∼ 40% faster than x86-64∼ 1% faster than ia32

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 6 / 20

Page 8: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

x32-ABI - A new 32-bit ABI for x86-64

186.crafty from SPEC CPU 2000 (64bit integer):

Intel Core i7∼ 3 % faster than x86-64∼ 40% faster than ia32

Intel Atom∼ 4 % faster than x86-64∼ 26% faster than ia32

=⇒ CERN applications will definitely reduce memory footprint and verylikely CPU-time

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 7 / 20

Page 9: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Preconditions

x32-ABI requires:

Linux Kernel 3.4 compiled with CONFIG X86 X32=y

Gcc 4.7

Binutils 2.22

Glibc 2.16

Recompiling all system libraries, required by an application, with gcc-mx32

ELF 32-bit LSB shared object, x86-64, version 1 (SYSV)

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 8 / 20

Page 10: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

How ROOT can profit

CERN applications use millions of pointers

GAUDI framework stores all events as pointers

Many virtual functions (virtual tables: function pointers and hiddenpointers)

ROOT (histogram as a tree of pointers ...)

...

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 9 / 20

Page 11: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Results within other CERN applications

namd dealII soplex povray omnetpp astar xalancbmk

−30

−20

−10

0

10

20

30

Mem

ory

red

uct

ion

in%

Memory

Resident Size (32 vs x32) Resident Size (64 vs x32)

namd dealII soplex povray omnetpp astar xalancbmk

−5

0

5

10

15

20

25

30

Red

uct

ion

ofti

me

in%

Time

CPU Time (32 vs x32) CPU Time (64 vs x32)

Figure : Comparison of memory consumption and CPU-time within theHEPSPEC-benchmarks

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 10 / 20

Page 12: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Results within other CERN applications

LHCb Reconstruction with Gaudiv23r3 and ROOT 5.34.00Reconstruction of 1000 Events:

Physical Memory: -20 %

Total elapsed: -2 % reduction

LHCb Analysis with Gaudiv23r3 and ROOT 5.34.00Analysis of 10000 Events:

Physical Memory: -21 %

Total elapsed: 1.5 % increase

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 11 / 20

Page 13: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Results within ROOT-Benchmarks

Following ROOT-benchmarks:

stressHistogram

stress 1000

stressHepix

IO

linear algebra

vector

sparse matrix

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 12 / 20

Page 14: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Results within ROOT-Benchmarks

Histogram Events Spectrum Linear Fit0

10

20

30

40

50

60

Mem

ory

red

uct

ion

in%

Memory

Resident Size (64 vs x32)

Histogram Events Spectrum Linear Fit

0

5

10

15

20

25

30

Red

uct

ion

ofti

me

in%

Time

Real Time (64 vs x32) CPU Time (64 vs x32)

Figure : Comparison of memory consumption and CPU-time within theROOT-benchmarks

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 13 / 20

Page 15: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Results within ROOT-Benchmarks

stress 1000:

0 200 400 600 800 1,000 1,2000

0.2

0.4

0.6

0.8

1

·106

Snapshots

Mem

ory

inkB

RSS m64 VSS m64 RSS x32 VSS x32

Histogram Events Spectrum Linear Fit0

20

40

60

80

100

Mem

ory

red

uct

ion

in%

Memory

heap - profiling (64 vs x32) heap - maps (64 vs x32)

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 14 / 20

Page 16: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Reasons for performance differences

Memory seen by system and by process are different

Default thresholds of malloc:

t = 4 · 1024 · 1024 · sizeof (long)

if x > t: malloc uses mmap

if x < t: malloc uses sbrk

mmap:

memory blocks can be independently given back

anonymous memory blocks as an extension of the heap

sbrk :

memory blocks cannot be returned to the operating system

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 15 / 20

Page 17: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Reasons for performance differences

Looking into dealII, omnetpp, ROOT stress:

dealII : page faults reduced by 10%

omnetpp: page faults reduced by 38%

stress: difference in CPU-time likely caused by memory issues

Further Issues:

code and data alignment

x32 loop unrolling needs some tuning

zero-extensions for pointer conversion

=⇒ x32-ABI still under development and there are possiblities foroptimization

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 16 / 20

Page 18: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Outline

1 History of application binary interfacesWhy is a new binary interface necessary?x32-ABI - A new 32-bit ABI for x86-64Preconditions

2 How ROOT can profitResults within other CERN applicationsResults within ROOT-BenchmarksReasons for performance differences

3 Support of x32-ABI

4 Conclusion

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 16 / 20

Page 19: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Support of x32-ABI

Ubuntu: Ongoing work, support planned for 13.04:https://blueprints.launchpad.net/ubuntu/+spec/

foundations-r-x32-planning

Gentoo: Release candidate available since summer 2012:http://www.gentoo.org/news/20120608-x32_abi.xml

Red Hat: No plans according to: http://comments.gmane.org/

gmane.linux.redhat.fedora.devel/164019

LLVM-compiler: supports x32-ABI since summer 2012 http:

//www.phoronix.com/scan.php?page=news_item&px=MTI3NTk

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 17 / 20

Page 20: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Outline

1 History of application binary interfacesWhy is a new binary interface necessary?x32-ABI - A new 32-bit ABI for x86-64Preconditions

2 How ROOT can profitResults within other CERN applicationsResults within ROOT-BenchmarksReasons for performance differences

3 Support of x32-ABI

4 Conclusion

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 17 / 20

Page 21: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Conclusion

New room for improvement, ( CERN ) applications can profitsubstantially

Computing Grid limits memory anyway to 4 GB per process

Gain in performance is for FREE

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 18 / 20

Page 22: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Conclusion

Recompilation:

Cast from time values to long (...) will produce wrong results (xrootd)

New pointer size required modifications in CINT (function and datapattern)

Getting a working environment:

does not work out of the box

building gcc fails due to missing glibc x32

building glibc x32 with a partly built gcc

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 19 / 20

Page 23: ROOT and x32-ABIcds.cern.ch/record/1528222/files/LHCb-TALK-2013-060.pdf · 4 Conclusion Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 1 / 20. History of application binary interfaces

Any questions?

Rauschmayr (CERN) ROOT and x32-ABI March 13, 2013 20 / 20