Automated Floating-Point Precision Analysis Michael O. Lam Ph.D. Defense 6 Jan 2014 Jeff Hollingsworth, Advisor

Context 2 Floating-point arithmetic is ubiquitous

Context 3 Floating-point arithmetic represents real numbers as ( 1.frac 2 exp ) Sign bit Exponent Significand (mantissa or fraction)

Context 4 032 16 84 Significand (23 bits)Exponent (8 bits) 0x40000000 03264 16 84 Significand (52 bits)Exponent (11 bits) 0x4000000000000000 Floating-point arithmetic represents real numbers as ( 1.frac 2 exp ) Sign bit Exponent Significand (mantissa or fraction) Representing 2.0:

Context 5 032 16 84 Significand (23 bits)Exponent (8 bits) 0x40200000 03264 16 84 Significand (52 bits)Exponent (11 bits) 0x4005000000000000 Floating-point arithmetic represents real numbers as ( 1.frac 2 exp ) Sign bit Exponent Significand (mantissa or fraction) Representing 2.625:

Context 6 032 16 84 Significand (23 bits)Exponent (8 bits) 0x3DCCCCCD 03264 16 84 Significand (52 bits)Exponent (11 bits) 0x3FB999999999999A Floating-point arithmetic represents real numbers as ( 1.frac 2 exp ) Sign bit Exponent Significand (mantissa or fraction) Representing 0.1:

Context 7 032 16 84 Significand (23 bits)Exponent (8 bits) 0x3F9DF3B6 03264 16 84 Significand (52 bits)Exponent (11 bits) 0x3FF3BE76C8B43958 Floating-point arithmetic represents real numbers as ( 1.frac 2 exp ) Sign bit Exponent Significand (mantissa or fraction) Representing 1.234:

Context 8 Floating-point is ubiquitous but problematic Rounding error Accumulates after many operations Not always intuitive (e.g., non-associative) Nave approach: higher precision Lower precision is preferable Tesla K20X is 2.3X faster in single precision Xeon Phi is 2.0X faster in single precision Single precision uses 50% of the memory bandwidth

Problem 9 Current analysis solutions are lacking Numerical analysis methods are difficult Static analysis is too conservative Trial-and-error is time-consuming We need better analysis solutions Produce easy-to-understand results Incorporate runtime effects Automated or semi-automated

Thesis 10 Automated runtime analysis techniques can inform application developers regarding floating-point behavior, and can provide insights to guide developers towards reducing precision with minimal impact on accuracy.

Contributions 11 1.Floating-point software analysis framework 2.Cancellation detection 3.Mixed-precision configuration 4.Reduced-precision analysis Initial emphasis on capability over performance

Example: Sum2PI_X 12 int sum2pi_x() { int i, j, k; real x, y, acc, sum; real final = PI * OUTER; /* correct answer */ sum = 0.0; for (i=0; i

Reduced Precision: Results NAS mg.W (incremental) 57 >5.0% - 4:66 >0.1% - 15:45 >1.0% - 5:93 >0.5% - 9:45 >0.05% - 23:60Full 28:71

Reduced Precision: Conclusions 58 Automated analysis can identify general precision level requirements Reduced-precision analysis provides results more quickly than mixed-precision analysis Incremental searches reduce the time to solution without sacrificing fidelity

Contributions 59 General floating-point analysis framework 32.3K LOC total in ~200 files LGPL on Sourceforge: sf.net/p/crafthpc Cancellation detection WHIST11 paper, PARCO 39/3 article Mixed-precision configuration SC12 poster, ICS13 paper Reduced-precision analysis ICS14 submission in preparation

Future Work 60 Short term Optimization and platform ports Analysis extension and composition Further case studies Long term Compiler-based implementation IDE and development cycle integration Program modeling and verification

Conclusion 61 Automated runtime analysis techniques can inform application developers regarding floating-point behavior, and can provide insights to guide developers towards reducing precision with minimal impact on accuracy.

Acknowledgements 62 Collaborators Jeff Hollingsworth (advisor) and Pete Stewart (UMD) Bronis de Supinski, Matt Legendre, et al. (LLNL) Colleagues Ananta Tiwari, Tugrul Ince, Geoff Stoker, Nick Rutar, Ray Chen, et al. CS Department @ UMD Intel XED2 Family & Friends Lindsay Lam (spouse) Neil & Alice Lam, Barry & Susan Walters Wallace PCA and Elkton EPC cartoon by Nick Rutar

Documents

Automated Floating-Point Precision Analysis Michael O. Lam Ph.D. Defense 6 Jan 2014 Jeff Hollingsworth, Advisor