24
NAG and the BLAS Mick Pont Development Division NAG Ltd, Oxford May 19th 2016

NAG and the BLAS

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

NAGandtheBLAS

MickPontDevelopmentDivision

NAGLtd,OxfordMay19th2016

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 2

NAG§ NumericalAlgorithmsGroup- Founded1970

¨ Co-operative softwareproject:Birmingham,Leeds,Manchester,Nottingham,Oxford,andAtlasLaboratory

¨ InspiredbyJimWilkinson§ IncorporatedasNAGLtd.in1976

¨ Not-for-profit¨ Based inOxford,withoffices inManchester, Chicago,Tokyo

§ Mathematicalalgorithmdevelopment¨ Oftencollaborative¨ Wealsofundresearch students

§ Softwareengineering– productionandportingofsoftwarelibraries

§ HPCservicesandconsultancy

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 3

NAGLibraryContentsOverview

§ Root Finding§ Summation of Series § Quadrature§ Ordinary Differential Equations§ Partial Differential Equations § Numerical Differentiation § Integral Equations § Mesh Generation § Interpolation § Curve and Surface Fitting § Optimization§ Dense Linear Algebra

¨ BLAS,LAPACK

§ Sparse Linear Algebra§ Correlation and Regression

Analysis§ Analysis of Variance§ Random Number Generators§ Univariate Estimation § Nonparametric Statistics § Smoothing in Statistics § Contingency Table Analysis § Survival Analysis§ Time Series Analysis § Operations Research § Special Functions

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 4

NAGandtheBLAS

§ NAGLibraryisrelativelyhighlevel,e.g.¨ mathematicaloptimizationsolvers

¨ datafitting

§ Butreliesheavilyonlowerlevelslikelinearalgebra

Hardware

Compilers etc.Core Math Lib

(BLAS and LAPACK)

NAG Library

UserApplication

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 5

NAGLibraryWrapsNAGEngine

NAG Engine(95% Fortran)

FortranWrappers

C / C++

MATLAB mex

wrappers Python

Wrappers auto-generated from XML doc

etc.

Java

BLAS /LAPACK

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 6

Testprograms

§ NAGlibrarycurrentlyhasabout1600routines§ Everyroutinehasassociateddetaileddocumentationandexampleprogram

§ Alsoa“stringenttestprogram”¨ ForLAPACKroutinestheseareindependentofthenetlibtestprograms

¨ Oursaresimplerand(hopefully)easier toextend

¨ Butwedoalsousethenetlib ones

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 7

NAGandtheBLAS

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 8

Reproducibility

§ SIMDinstructionsoperateonseveralnumbersatonce

§ Buttousetheseinstructionsmemoryalignmentmaybecrucial…

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 9

Example- dotproduct

Mathematically equivalent – but the two results are not necessarily identical. Does it matter? Sometimes …

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 10

Local(mathematical)optimization

Nonlinear programming using a sequential quadratic programming (SQP) method

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 11

Localoptimization

Without BWR – could go to one or other of the local minima on different runs of the same program on the same machine

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 12

ImportanceofBWR

§ Bit-WiseReproducibility(BWR)¨ Essentialforsomeusers(e.g.infinance)¨ Theymayneedtoexplainevenminordifferences¨ Sometimesnoamountofdiscussioncanchangetheirminds!

§ NAGusersspotproblems¨ e.g.withlocaloptimization¨ e.g.withstatisticalanalyseswronglyperceivedbythemasusingstochasticalgorithms

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 13

SomethingsaddedinrecentNAGversions

Usuallyaddedinresponsetocustomerdemand§ Matrixfunctions§ Mixedintegernonlinearprogramming§ Changepointanalysis§ Travellingsalesmanproblem

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 14

MatrixFunctions(workbyHighametal)

GivensquarematrixA,matrixfunctionf(A) isageneralizationofthescalarfunctionf.

IfAhasafullsetofeigenvectorsVthenitcanbefactorizedas𝐴 = 𝑉𝐷𝑉%&

Then𝑓 𝐴 = 𝑉𝑓(𝐷)𝑉%&

(It’smorecomplicatedifnotafullsetofeigenvectors)

RecentNAGlibrarieshaveexp(A),log(A),sin(A),cos(A),generalf(A)

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 15

BLAS-likeroutinesthatwouldbeusefultous

§ Routinestomultiplytriangularmatrices:

x=

x=

(xTRMM routinesallowonlyonematrixtobetriangular)§ Usedagreatdealincomputingmatrixfunctions§ Alsousefulinstatisticalapplications

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 16

BLAS-likeroutinesthatwouldbeusefultous

§ Perhapssingle-callpre- andpost-multiplication:

Suchoperationsalsousedalotinstatisticalcalculations¨ e.g.timeseriesanalysis

𝑋+𝑆𝑋where𝑋 isgeneraland𝑆 issymmetric

§ Multiplicationoftwosymmetricmatrices:

𝑆&𝑆-

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 17

H02DA– MixedIntegerNonlinearProgramming

𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑓 𝑥, 𝑦

𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑡𝑜𝑐> 𝑥, 𝑦 = 0, 𝑗 = 1,2, …𝑚C

𝑐> 𝑥, 𝑦 ≥ 0, 𝑗 = 𝑚C + 1, … ,𝑚𝑥 −continuousvariables(i.e.anyrealnumber)𝑦 −binary(0/1)orintegervariables(i.e.wholenumbers)

(generalnonlinearfunction)

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 18

H02DA– MixedIntegerNonlinearProgramming

§ H02DAisbasedonthealgorithmMISQP¨ Exler,Lehmann,Schittkowski

§ UsesamodifiedSQPalgorithm¨ SolvesasequenceofQPproblems

§ Someadvantages:¨ Tendstousefewfunctionevaluations

¨ Unlikegeneticalgorithms

¨ Integervariablesarenotassumedrelaxable¨ i.e.theobjective functionisonlyevaluatedatintegerpointsy

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 19

H02DA– MixedIntegerNonlinearProgramming

Examplefromportfoliooptimization:

Choose𝑝 assetsfromasetof𝑛 assetssuchthattheriskofholdingtheassetsisminimizedforagivenexpectedreturn

Otherexamples:§ Networkdesignforwaterorgasdistribution§ Operationalreloadingofnuclearreactors

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 20

§ G13N– ChangePointAnalysis¨ Detectpointsinaunivariate timeserieswheresomefeatureofthedata(e.g.themeanorstandarddeviation)changes

¨ G13NA- PELTalgorithmbyRKillick,PFearnhead andIAEckely¨ PELT- PrunedExactLinearTime

¨ G13ND- Binarysegmentation

G13N– ChangePointAnalysis

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 21

H03BB– TravellingSalesmanProblem

Typical NAG salesman's path:Birmingham --> London --> Birmingham --> Oxford--> Birmingham --> London --> Birmingham --> New York--> Birmingham --> London --> Birmingham --> Oxford--> Birmingham --> Chicago --> Birmingham --> London--> Birmingham --> Oxford --> Birmingham --> Mumbai -->...

Total distance travelled : endless

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 22

Givenasetofplacesandthedistancesbetweenthem,e.g.

H03BB– TravellingSalesmanProblem

Oxford Birmingham Warwick Singapore New York Chicago Mumbai London78 45 6785 3397 3892 4523 60 : Oxford

26 6778 3380 3849 4545 101 : Birmingham6783 3383 3867 4531 82 : Warwick

9519 9359 2422 6733 : Singapore711 7789 3459 : New York

8047 3947 : Chicago4468 : Mumbai

: London

calculatesomethingneartheshortestclosedpaththatconnectseachplace

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 23

§ Greedyalgorithm– alwaysgotonearestunvisitedplace– usuallygetswithin25%ofbestsolution(butcouldbetheworstsolution)

§ H02BBinsteadusesasimulatedannealingapproach¨ newstatestestedbyswitchingpairsofplacesinapath

§ OptimalpathforpreviousslidecomputedbyH03BB:Oxford-->London-->Mumbai-->Singapore-->Chicago-->NewYork-->Birmingham-->Warwick-->Oxford

Totaldistancetravelled:20471milesSavingontypicalNAGsalesmanexpenses:large

H03BB– TravellingSalesmanProblem

WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 24

POP- PerformanceOptimisationandProductivity

• NAGispartnerintheEU-fundedPOPCentreofExcellence• Performanceoptimisationadviceforparallelcodes:

• Suggestions/support onhowtorefactorcodeinthemostproductiveway

• Enable improvedefficiency, betterutilisationofHPCresources• Supportingwide rangeoflanguages (C,C++,Fortran,OpenMP,

MPI,Cuda,…)

• FreeofchargetoorganisationsintheEU• WehopethatbatchedBLAScanplayapart

• Educationiskey

• Website:https://pop-coe.eu