45
Multiprecision Algorithms Nick Higham School of Mathematics The University of Manchester http://www.ma.man.ac.uk/~higham @nhigham, nickhigham.wordpress.com Waseda Research Institute for Science and Engineering Institute for Mathematical Science Kickoff Meeting, Waseda University, Tokyo, March 6, 2018

Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Research Matters

February 25, 2009

Nick HighamDirector of Research

School of Mathematics

1 / 6

Multiprecision Algorithms

Nick HighamSchool of Mathematics

The University of Manchester

http://www.ma.man.ac.uk/~higham@nhigham, nickhigham.wordpress.com

Waseda Research Institute for Science and EngineeringInstitute for Mathematical Science Kickoff Meeting,

Waseda University, Tokyo, March 6, 2018

Page 2: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Outline

Multiprecision arithmetic: floating point arithmeticsupporting multiple, possibly arbitrary, precisions.

Applications of & support for low precision.

Applications of & support for high precision.

How to exploit different precisions to achieve faster algswith higher accuracy.

Download this talk from http://bit.ly/waseda18

Nick Higham Multiprecision Algorithms 2 / 27

Page 3: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Floating Point Number System

Floating point number system F ⊂ R:

y = ±m × βe−t , 0 ≤ m ≤ βt − 1.

Base β (β = 2 in practice),precision t ,exponent range emin ≤ e ≤ emax.

Assume normalized: m ≥ βt−1.

Floating point numbers are not equally spaced.

If β = 2, t = 3, emin = −1, and emax = 3:

0 0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Nick Higham Multiprecision Algorithms 3 / 27

Page 4: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Floating Point Number System

Floating point number system F ⊂ R:

y = ±m × βe−t , 0 ≤ m ≤ βt − 1.

Base β (β = 2 in practice),precision t ,exponent range emin ≤ e ≤ emax.

Assume normalized: m ≥ βt−1.

Floating point numbers are not equally spaced.

If β = 2, t = 3, emin = −1, and emax = 3:

0 0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Nick Higham Multiprecision Algorithms 3 / 27

Page 5: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

IEEE Standard 754-1985 and 2008 Revision

Type Size Range u = 2−t

half 16 bits 10±5 2−11 ≈ 4.9× 10−4

single 32 bits 10±38 2−24 ≈ 6.0× 10−8

double 64 bits 10±308 2−53 ≈ 1.1× 10−16

quadruple 128 bits 10±4932 2−113 ≈ 9.6× 10−35

Arithmetic ops (+,−, ∗, /,√) performed as if firstcalculated to infinite precision, then rounded.Default: round to nearest, round to even in case of tie.Half precision is a storage format only.

Nick Higham Multiprecision Algorithms 4 / 27

Page 6: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Rounding

For x ∈ R, fl(x) is an element of F nearest to x , and thetransformation x → fl(x) is called rounding (to nearest).

TheoremIf x ∈ R lies in the range of F then

fl(x) = x(1 + δ), |δ| ≤ u.

u := 12β

1−t is the unit roundoff, or machine precision.

The machine epsilon, εM = β1−t is the spacing between 1and the next larger floating point number (eps in MATLAB).

0 0.5 1.0 2.0 3.0 4.0 5.0 6.0 7.0

Nick Higham Multiprecision Algorithms 5 / 27

Page 7: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Precision versus Accuracy

fl(abc) = ab(1 + δ1) · c(1 + δ2) |δi | ≤ u,= abc(1 + δ1)(1 + δ2)

≈ abc(1 + δ1 + δ2).

Precision = u.Accuracy ≈ 2u.

Accuracy is not limited by precision

Nick Higham Multiprecision Algorithms 6 / 27

Page 8: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Precision versus Accuracy

fl(abc) = ab(1 + δ1) · c(1 + δ2) |δi | ≤ u,= abc(1 + δ1)(1 + δ2)

≈ abc(1 + δ1 + δ2).

Precision = u.Accuracy ≈ 2u.

Accuracy is not limited by precision

Nick Higham Multiprecision Algorithms 6 / 27

Page 9: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM
Page 10: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

NVIDIA Tesla P100 (2016)

“The Tesla P100 is the world’s first accelerator built for deeplearning, and has native hardware ISA support for FP16arithmetic, delivering over 21 TeraFLOPS of FP16processing power.”

Nick Higham Multiprecision Algorithms 8 / 27

Page 11: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

AMD Radeon Instinct MI25 GPU (2017)

“24.6 TFLOPS FP16 or 12.3 TFLOPS FP32 peak GPUcompute performance on a single board . . . Up to 82GFLOPS/watt FP16 or 41 GFLOPS/watt FP32 peak GPUcompute performance”

Nick Higham Multiprecision Algorithms 9 / 27

Page 12: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Machine Learning

“For machine learning as well as for certain imageprocessing and signal processing applications, moredata at lower precision actually yields better resultswith certain algorithms than a smaller amount of moreprecise data.”

“We find that very low precision is sufficient not just forrunning trained networks but also for training them.”Courbariaux, Benji & David (2015)

We’re solving the wrong problem (Scheinberg, 2016),so don’t need an accurate solution.

Low precision provides regularization.

See Jorge Nocedal’s plenary talk Stochastic GradientMethods for Machine Learning at SIAM CSE 2017.

Nick Higham Multiprecision Algorithms 10 / 27

Page 13: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Climate Modelling

T. Palmer, More reliable forecasts with less precisecomputations: a fast-track route to cloud-resolvedweather and climate simulators?, Phil. Trans. R.Soc. A, 2014:

Is there merit in representing variables atsufficiently high wavenumbers using halfor even quarter precision floating-pointnumbers?

T. Palmer, Build imprecise supercomputers, Nature,2015.

Nick Higham Multiprecision Algorithms 11 / 27

Page 14: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Error Analysis in Low Precision

For an inner product xT y of n-vectors the standard errorbound is

| fl(xT y)− xT y | ≤ nu|x |T |y |+ O(u2).

In half precision, u ≈ 4.9× 10−4, so nu = 1 for n = 2048 .

Most existing rounding error analysis guar-antees no accuracy, and maybe not even acorrect exponent, for half precision!

Is standard error analysis especially pessimistic in theseapplications? Try a statistical approach?

Nick Higham Multiprecision Algorithms 12 / 27

Page 15: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Error Analysis in Low Precision

For an inner product xT y of n-vectors the standard errorbound is

| fl(xT y)− xT y | ≤ nu|x |T |y |+ O(u2).

In half precision, u ≈ 4.9× 10−4, so nu = 1 for n = 2048 .

Most existing rounding error analysis guar-antees no accuracy, and maybe not even acorrect exponent, for half precision!

Is standard error analysis especially pessimistic in theseapplications? Try a statistical approach?

Nick Higham Multiprecision Algorithms 12 / 27

Page 16: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Error Analysis in Low Precision

For an inner product xT y of n-vectors the standard errorbound is

| fl(xT y)− xT y | ≤ nu|x |T |y |+ O(u2).

In half precision, u ≈ 4.9× 10−4, so nu = 1 for n = 2048 .

Most existing rounding error analysis guar-antees no accuracy, and maybe not even acorrect exponent, for half precision!

Is standard error analysis especially pessimistic in theseapplications? Try a statistical approach?

Nick Higham Multiprecision Algorithms 12 / 27

Page 17: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Need for Higher Precision

He and Ding, Using Accurate Arithmetics toImprove Numerical Reproducibility and Stability inParallel Applications, 2001.Bailey, Barrio & Borwein, High-PrecisionComputation: Mathematical Physics & Dynamics,2012.Khanna, High-Precision Numerical Simulations on aCUDA GPU: Kerr Black Hole Tails, 2013.Beliakov and Matiyasevich, A Parallel Algorithm forCalculation of Determinants and Minors UsingArbitrary Precision Arithmetic, 2016.Ma and Saunders, Solving Multiscale LinearPrograms Using the Simplex Method in QuadruplePrecision, 2015.

Nick Higham Multiprecision Algorithms 13 / 27

Page 18: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Going to Higher Precision

If we have quadruple or higher precision, how can wemodify existing algorithms to exploit it?

To what extent are existing algs precision-independent?

Newton-type algs: just decrease tol?

How little higher precision can we get away with?

Gradually increase precision through the iterations?

For Krylov methods # iterations can depend onprecision, so lower precision might not give fastestcomputation!

Nick Higham Multiprecision Algorithms 14 / 27

Page 19: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Going to Higher Precision

If we have quadruple or higher precision, how can wemodify existing algorithms to exploit it?

To what extent are existing algs precision-independent?

Newton-type algs: just decrease tol?

How little higher precision can we get away with?

Gradually increase precision through the iterations?

For Krylov methods # iterations can depend onprecision, so lower precision might not give fastestcomputation!

Nick Higham Multiprecision Algorithms 14 / 27

Page 20: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Availability of Multiprecision in Software

Maple, Mathematica, PARI/GP, Sage.

MATLAB: Symbolic Math Toolbox, MultiprecisionComputing Toolbox (Advanpix).

Julia: BigFloat.

Mpmath and SymPy for Python.

GNU MP Library.

GNU MPFR Library.

(Quad only): some C, Fortran compilers.

Gone, but not forgotten:

Numerical Turing: Hull et al., 1985.

Nick Higham Multiprecision Algorithms 15 / 27

Page 21: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Cost of Quadruple Precision

How Fast is Quadruple Precision Arithmetic?Compare

MATLAB double precision,Symbolic Math Toolbox, VPA arithmetic, digits(34),Multiprecision Computing Toolbox (Advanpix),mp.Digits(34). Optimized for quad.

Ratios of timesIntel Broadwell-E Core i7-6800K @3.40GHz, 6 cores

mp/double vpa/double vpa/mpLU, n = 250 98 25,000 255eig, nonsymm, n = 125 75 6,020 81eig, symm, n = 200 32 11,100 342

Nick Higham Multiprecision Algorithms 16 / 27

Page 22: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Matrix Functions

(Inverse) scaling and squaring-type algorithms for eA,log A, cos A, At use Padé approximants.

Padé degree and algorithm parameters chosen toachieve double precision accuracy, u = 2−53.

Change u and the algorithm logic needs changing!

Open questions, even for scalar elementary functions?

Nick Higham Multiprecision Algorithms 17 / 27

Page 23: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Log Tables

Henry Briggs (1561–1630):Arithmetica Logarithmica (1624)Logarithms to base 10 of 1–20,000 and90,000–100,000 to 14 decimal places.

Briggs must be viewed as one of thegreat figures in numerical analysis.

—Herman H. Goldstine,A History of Numerical Analysis (1977)

Name Year Range Decimal placesR. de Prony 1801 1− 10,000 19

Edward Sang 1875 1− 20,000 28

Nick Higham Multiprecision Algorithms 18 / 27

Page 24: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

The Scalar Logarithm: Comm. ACM

J. R. Herndon (1961). Algorithm 48: Logarithm of acomplex number.

A. P. Relph (1962). Certification of Algorithm 48:Logarithm of a complex number.

M. L. Johnson and W. Sangren, W. (1962). Remark onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Remark on remarks onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Algorithm 243: Logarithm of acomplex number: Rewrite of Algorithm 48.

Nick Higham Multiprecision Algorithms 19 / 27

Page 25: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

The Scalar Logarithm: Comm. ACM

J. R. Herndon (1961). Algorithm 48: Logarithm of acomplex number.

A. P. Relph (1962). Certification of Algorithm 48:Logarithm of a complex number.

M. L. Johnson and W. Sangren, W. (1962). Remark onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Remark on remarks onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Algorithm 243: Logarithm of acomplex number: Rewrite of Algorithm 48.

Nick Higham Multiprecision Algorithms 19 / 27

Page 26: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

The Scalar Logarithm: Comm. ACM

J. R. Herndon (1961). Algorithm 48: Logarithm of acomplex number.

A. P. Relph (1962). Certification of Algorithm 48:Logarithm of a complex number.

M. L. Johnson and W. Sangren, W. (1962). Remark onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Remark on remarks onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Algorithm 243: Logarithm of acomplex number: Rewrite of Algorithm 48.

Nick Higham Multiprecision Algorithms 19 / 27

Page 27: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

The Scalar Logarithm: Comm. ACM

J. R. Herndon (1961). Algorithm 48: Logarithm of acomplex number.

A. P. Relph (1962). Certification of Algorithm 48:Logarithm of a complex number.

M. L. Johnson and W. Sangren, W. (1962). Remark onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Remark on remarks onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Algorithm 243: Logarithm of acomplex number: Rewrite of Algorithm 48.

Nick Higham Multiprecision Algorithms 19 / 27

Page 28: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

The Scalar Logarithm: Comm. ACM

J. R. Herndon (1961). Algorithm 48: Logarithm of acomplex number.

A. P. Relph (1962). Certification of Algorithm 48:Logarithm of a complex number.

M. L. Johnson and W. Sangren, W. (1962). Remark onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Remark on remarks onAlgorithm 48: Logarithm of a complex number.

D. S. Collens (1964). Algorithm 243: Logarithm of acomplex number: Rewrite of Algorithm 48.

Nick Higham Multiprecision Algorithms 19 / 27

Page 29: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Principal Logarithm and pth Root

Let A ∈ Cn×n have no eigenvalues on R− .

Principal logX = log A denotes unique X such that

eX = A.−π < Im

(λ(X )

)< π.

Nick Higham Multiprecision Algorithms 20 / 27

Page 30: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

The Average Eye

First order character of optical system characterized bytransference matrix

T =

[S δ0 1

]∈ R5×5,

where S ∈ R4×4 is symplectic:

ST JS = J =

[0 I2−I2 0

].

Average m−1∑mi=1 Ti is not a transference matrix.

Harris (2005) proposes the average exp(m−1∑mi=1 log Ti).

Nick Higham Multiprecision Algorithms 21 / 27

Page 31: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Inverse Scaling and Squaring

log A = log(A1/2s)2s

= 2s log(A1/2s)

1: s ← 0

2: Compute the Schur decomposition A := QTQ∗

3: while ‖ − I‖2 ≥ 1 do4: ← 1/2

5: s ← s + 16: end while7: Find Padé approximant rkm to log(1 + x)8: return

What degree for the Padé approximants?Should we take more square roots once ‖T − I‖2 < 1?

Nick Higham Multiprecision Algorithms 22 / 27

Page 32: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Inverse Scaling and Squaring

log A = log(A1/2s)2s

= 2s log(A1/2s)

Algorithm Basic inverse scaling and squaring1: s ← 0

2: Compute the Schur decomposition A := QTQ∗

3: while ‖A− I‖2 ≥ 1 do4: A← A1/2

5: s ← s + 16: end while7: Find Padé approximant rkm to log(1 + x)8: return 2srkm(A− I)

What degree for the Padé approximants?Should we take more square roots once ‖T − I‖2 < 1?

Nick Higham Multiprecision Algorithms 22 / 27

Page 33: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Inverse Scaling and Squaring

log A = log(A1/2s)2s

= 2s log(A1/2s)

Algorithm Schur–Padé inverse scaling and squaring1: s ← 02: Compute the Schur decomposition A := QTQ∗

3: while ‖T − I‖2 ≥ 1 do4: T ← T 1/2

5: s ← s + 16: end while7: Find Padé approximant rkm to log(1 + x)8: return 2sQ∗rkm(T − I)Q

What degree for the Padé approximants?Should we take more square roots once ‖T − I‖2 < 1?

Nick Higham Multiprecision Algorithms 22 / 27

Page 34: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Bounding the Error

Kenney & Laub (1989) derived absolute f’err bound

‖ log(I + X )− rkm(X )‖ ≤ | log(1− ‖X‖)− rkm(−‖X‖)|.

Al-Mohy, H & Relton (2012, 2013) derive b’err bound.Strategy requires symbolic and high prec. comp. (doneoff-line) and obtains parameters dependent on u.

Fasi & H (2018) use a relative f’err bound based on‖ log(I + X )‖ ≈ ‖X‖, since ‖X‖ � 1 is possible.

Strategy: Use f’err bound to minimize s subject to f’errbounded by u (arbitrary) for suitable m.

Nick Higham Multiprecision Algorithms 23 / 27

Page 35: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Sharper Error Bound for Nonnormal Matrices

Al-Mohy & H (2009) note that∥∥∥∥ ∞∑i=`

ciAi

∥∥∥∥∞≤

∞∑i=`

|ci |‖Ai‖ =∞∑i=`

|ci |(‖Ai‖1/i)i ≤

∞∑i=`

|ci |β i ,

where β = maxi≥` ‖Ai‖1/i .

Fasi & H (2018) refine Kenney & Laub’s bound using

αp(X ) = max(‖X p‖1/p, ‖X p+1‖1/(p+1)),

in place of ‖A‖. Note that ρ(A) ≤ αp(A) ≤ ‖A‖ .

Even sharper bounds derived by Nadukandi & H (2018).

Nick Higham Multiprecision Algorithms 24 / 27

Page 36: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Relative Errors: 64 Digits

0 20 40 6010-66

10-60

10-54

10-48

κlog(A)u

logm mct

logm agm

logm tayl

logm tfree tayl

logm pade

logm tfree pade

Nick Higham Multiprecision Algorithms 25 / 27

Page 37: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Relative Errors: 256 Digits

0 20 40 6010-258

10-253

10-248

10-243

Nick Higham Multiprecision Algorithms 26 / 27

Page 38: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

Conclusions

Both low and high precision floating-point arithmeticbecoming more prevalent, in hardware and software.

Need better understanding of behaviour of algs in lowprecision arithmetic.

Matrix functions at high precision need algs with(at least) different logic.

In SIAM PP18, MS54 (Thurs, 4:50) I’ll show that usingthree precisions in iter ref brings major benefits andpermits faster and more accurate solution of Ax = b.

Nick Higham Multiprecision Algorithms 27 / 27

Page 39: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References I

D. H. Bailey, R. Barrio, and J. M. Borwein.High-precision computation: Mathematical physics anddynamics.Appl. Math. Comput., 218(20):10106–10121, 2012.

G. Beliakov and Y. Matiyasevich.A parallel algorithm for calculation of determinants andminors using arbitrary precision arithmetic.BIT, 56(1):33–50, 2016.

Nick Higham Multiprecision Algorithms 1 / 7

Page 40: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References II

E. Carson and N. J. Higham.Accelerating the solution of linear systems by iterativerefinement in three precisions.MIMS EPrint 2017.24, Manchester Institute forMathematical Sciences, The University of Manchester,UK, July 2017.30 pp.Revised January 2018. To appear in SIAM J. Sci.Comput.

E. Carson and N. J. Higham.A new analysis of iterative refinement and its applicationto accurate solution of ill-conditioned sparse linearsystems.SIAM J. Sci. Comput., 39(6):A2834–A2856, 2017.

Nick Higham Multiprecision Algorithms 2 / 7

Page 41: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References III

M. Courbariaux, Y. Bengio, and J.-P. David.Training deep neural networks with low precisionmultiplications, 2015.ArXiv preprint 1412.7024v5.

M. Fasi and N. J. Higham.Multiprecision algorithms for computing the matrixlogarithm.MIMS EPrint 2017.16, Manchester Institute forMathematical Sciences, The University of Manchester,UK, May 2017.19 pp.Revised December 2017. To appear in SIAM J. MatrixAnal. Appl.

Nick Higham Multiprecision Algorithms 3 / 7

Page 42: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References IV

W. F. Harris.The average eye.Opthal. Physiol. Opt., 24:580–585, 2005.

Y. He and C. H. Q. Ding.Using accurate arithmetics to improve numericalreproducibility and stability in parallel applications.J. Supercomputing, 18(3):259–277, 2001.

N. J. Higham.Accuracy and Stability of Numerical Algorithms.Society for Industrial and Applied Mathematics,Philadelphia, PA, USA, second edition, 2002.ISBN 0-89871-521-0.xxx+680 pp.

Nick Higham Multiprecision Algorithms 4 / 7

Page 43: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References V

G. Khanna.High-precision numerical simulations on a CUDA GPU:Kerr black hole tails.J. Sci. Comput., 56(2):366–380, 2013.

D. Ma and M. Saunders.Solving multiscale linear programs using the simplexmethod in quadruple precision.In M. Al-Baali, L. Grandinetti, and A. Purnama, editors,Numerical Analysis and Optimization, number 134 inSpringer Proceedings in Mathematics and Statistics,pages 223–235. Springer-Verlag, Berlin, 2015.

Nick Higham Multiprecision Algorithms 5 / 7

Page 44: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References VI

P. Nadukandi and N. J. Higham.Computing the wave-kernel matrix functions.MIMS EPrint 2018.4, Manchester Institute forMathematical Sciences, The University of Manchester,UK, Feb. 2018.23 pp.

A. Roldao-Lopes, A. Shahzad, G. A. Constantinides,and E. C. Kerrigan.More flops or more precision? Accuracyparameterizable linear equation solvers for modelpredictive control.In 17th IEEE Symposium on Field ProgrammableCustom Computing Machines, pages 209–216, Apr.2009.

Nick Higham Multiprecision Algorithms 6 / 7

Page 45: Nick Higham Research Matters School of Mathematics The ...higham/talks/waseda18.pdf · See Jorge Nocedal’s plenary talk Stochastic Gradient Methods for Machine Learning at SIAM

References VII

K. Scheinberg.Evolution of randomness in optimization methods forsupervised machine learning.SIAG/OPT Views and News, 24(1):1–8, 2016.

Nick Higham Multiprecision Algorithms 7 / 7