Dynamic Floating-Point Cancellation Detection

Dynamic Floating-Point Cancellation

DetectionMichael O. Lam (Presenter)

Jeffrey K. HollingsworthG. W. Stewart

University of Maryland, College Park

2

Background(Floating-Point Representation 101)

Floating-point represents real numbers as (± sig × 2exp) Sign bit Significand (“mantissa” or “fraction”) Exponent

Floating-point numbers have finite binary precision Single-precision: 24 binary digits (~7 decimal digits) Double-precision: 53 binary digits (~16 decimal digits)

Examples: π 3.141592… 11.0010010… 1/10 0.1 0.0001100110…

Image from Wikipedia (“Single precision”)

3

Motivation Finite precision causes round-off error

Compromises ill-conditioned calculations Hard to detect and diagnose

Increasingly important as HPC scales Need to balance speed and accuracy

Lower precision is faster Higher precision is more accurate

Industry-standard double precision may still fail on long-running computations

4

Previous Solutions Analytical

Requires numerical analysis expertise Conservative static error bounds are largely

unhelpful Ad-hoc

Run experiments at different precisions Increase precision where necessary Tedious and time-consuming

5

Instrumentation Solution

Automated (vs. manual) Minimize developer effort Ensure consistency and correctness

Binary-level (vs. source-level) Include shared libraries without source code Include compiler optimizations

Runtime (vs. compile time) Dataset and communication sensitivity

6

Solution Components Dyninst-based instrumentation utility

(“mutator”) Cross-platform No special hardware required Stack walking and binary rewriting

Shared library with runtime analysis routines Flexibility and ease of development

Java-based log viewer GUI Cross-platform Minimal development effort

7

Analysis Process Run mutator

Find floating-point instructions Insert calls to shared library

Run instrumented program Executes analysis alongside original program Stores results in a log file

View output with GUI

8

Analysis Types Cancellation detection Shadow-value analysis

9

Cancellation Loss of significant digits during subtraction

operations

Cancellation is a symptom, not the root problem Indicates that a loss of information has occurred

that may cause problems later

1.613647 (7) 1.613647 (7) - 1.613635 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0)

(5 digits cancelled) (all digits cancelled)

1.6136473- 1.6136467 0.0000006

10

Detecting Cancellation For each addition/subtraction:

Extract value of each operand Calculate result and compare magnitudes

(binary exponents) If eans < max(ex,ey) there is a cancellation

For each cancellation event: Calculate “priority:” max(ex,ey) - eans If above threshold, save event information to

log For some events, record operand values

11

12

13

Experiments Gaussian elimination

Benefits of partial pivoting Differing runtime behavior of popular

algorithms

14

Gaussian EliminationA [L,U]

Partial pivoting Nominally to avoid division by zero Also avoids inaccurate results from small pivots This can be detected using cancellation

swap

15

cancellation

loss of data

pivot

16

Gaussian Cancellation

log(diag. element size)

Threshold

Matrix Size

-2 -4 -6 -8 Estimate1 7 13 1710 x 10 66 37 37 34 2515 x 15 225 123 122 122 10020 x 20 663 247 252 257 22525 x 25 1227 394 423 441 400

Cancellation Counts

17

Gaussian Elimination This suggests that cancellation can be used to

detect the effects of a small pivot Useful in sparse elimination with limited

ability to pivot Threshold must be kept high enough

18

Gaussian EliminationA [L,U]

Classical Bordered

19

Size of diagonal elements

Iterations of algorithm

Classical Bordered

Classical Borderedthreshold 1 2 3 4 5 1 2 3 4 5smallest

diag. value

10-5 14 8 1 0 0 8 7 6 5 410-10 29 23 16 11 3 8 8 7 7 610-15 39 33 27 21 17 9 9 9 8 8

20

Gaussian Elimination Classical method: many small cancellations Bordered method: fewer but larger

cancellations Our tool can detect these differences and

inform the developer, who can then make decisions regarding which algorithm to use

21

Other Results Approximate nearest neighbor

More cancellations in denser point sets SPEC benchmarks milc and lbm

Cancellations in error calculations indicate good results

SPEC benchmark povray Cancellations indicate color black

22

Conclusions It is important to vary the threshold

Most calculations have background cancellations

Small cancellations can hide large ones Cancellation results require interpretation by

someone who is familiar with the algorithm Properly employed, cancellation detection can

help find “trouble spots” in numerical codes

23

Ongoing Research Shadow value analysis

Replace floating-point numbers with pointers to auxiliary information (higher precision, etc.)

double x = 1.0;

void func() { double y = 4.0; x = x + y;}

printf(“%f”, x);

1.0004.0005.000

“shadow value”

24

Shadow Value Analysis Current status: allows programmers to

automatically test their entire program in different precisions

Next step: selectively instrument particular code blocks or data structures

Goal: automated floating-point analysis and recommendation framework

25

Thank you! Code available upon request Questions?

26

Size of diagonal elements

Iterations of algorithm

Classical Bordered

threshold

1 2 3 4 5

smallest diag. value

C B C B C B C B C B

10-5 14 8 8 7 1 6 0 5 0 410-10 29 8 23 8 16 7 11 7 3 610-15 39 9 33 9 27 9 21 8 17 8

27

Gaussian Cancellation

log(pivot) -2 -4 -6 -8 log(pivot) -2 -4 -6 -8 Threshold 1 7 13 17 Threshold 1 7 13 17 n = 10 n = 20 Count 66 37 37 34 Count 663 247 252 257 Trunc 55 37 37 34 Trunc 298 245 252 257 Est 25 25 25 25 Est 225 225 225 225 n = 15 n = 25 Count 225 123 122 122 Count 1227 394 423 441 Trunc 154 122 122 122 Trunc 447 381 423 441 Est 100 100 100 100 Est 400 400 400 400

Documents

Dynamic Floating-Point Cancellation Detection