Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 2
NAG§ NumericalAlgorithmsGroup- Founded1970
¨ Co-operative softwareproject:Birmingham,Leeds,Manchester,Nottingham,Oxford,andAtlasLaboratory
¨ InspiredbyJimWilkinson§ IncorporatedasNAGLtd.in1976
¨ Not-for-profit¨ Based inOxford,withoffices inManchester, Chicago,Tokyo
§ Mathematicalalgorithmdevelopment¨ Oftencollaborative¨ Wealsofundresearch students
§ Softwareengineering– productionandportingofsoftwarelibraries
§ HPCservicesandconsultancy
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 3
NAGLibraryContentsOverview
§ Root Finding§ Summation of Series § Quadrature§ Ordinary Differential Equations§ Partial Differential Equations § Numerical Differentiation § Integral Equations § Mesh Generation § Interpolation § Curve and Surface Fitting § Optimization§ Dense Linear Algebra
¨ BLAS,LAPACK
§ Sparse Linear Algebra§ Correlation and Regression
Analysis§ Analysis of Variance§ Random Number Generators§ Univariate Estimation § Nonparametric Statistics § Smoothing in Statistics § Contingency Table Analysis § Survival Analysis§ Time Series Analysis § Operations Research § Special Functions
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 4
NAGandtheBLAS
§ NAGLibraryisrelativelyhighlevel,e.g.¨ mathematicaloptimizationsolvers
¨ datafitting
§ Butreliesheavilyonlowerlevelslikelinearalgebra
Hardware
Compilers etc.Core Math Lib
(BLAS and LAPACK)
NAG Library
UserApplication
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 5
NAGLibraryWrapsNAGEngine
NAG Engine(95% Fortran)
FortranWrappers
C / C++
MATLAB mex
wrappers Python
Wrappers auto-generated from XML doc
etc.
Java
BLAS /LAPACK
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 6
Testprograms
§ NAGlibrarycurrentlyhasabout1600routines§ Everyroutinehasassociateddetaileddocumentationandexampleprogram
§ Alsoa“stringenttestprogram”¨ ForLAPACKroutinestheseareindependentofthenetlibtestprograms
¨ Oursaresimplerand(hopefully)easier toextend
¨ Butwedoalsousethenetlib ones
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 8
Reproducibility
§ SIMDinstructionsoperateonseveralnumbersatonce
§ Buttousetheseinstructionsmemoryalignmentmaybecrucial…
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 9
Example- dotproduct
Mathematically equivalent – but the two results are not necessarily identical. Does it matter? Sometimes …
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 10
Local(mathematical)optimization
Nonlinear programming using a sequential quadratic programming (SQP) method
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 11
Localoptimization
Without BWR – could go to one or other of the local minima on different runs of the same program on the same machine
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 12
ImportanceofBWR
§ Bit-WiseReproducibility(BWR)¨ Essentialforsomeusers(e.g.infinance)¨ Theymayneedtoexplainevenminordifferences¨ Sometimesnoamountofdiscussioncanchangetheirminds!
§ NAGusersspotproblems¨ e.g.withlocaloptimization¨ e.g.withstatisticalanalyseswronglyperceivedbythemasusingstochasticalgorithms
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 13
SomethingsaddedinrecentNAGversions
Usuallyaddedinresponsetocustomerdemand§ Matrixfunctions§ Mixedintegernonlinearprogramming§ Changepointanalysis§ Travellingsalesmanproblem
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 14
MatrixFunctions(workbyHighametal)
GivensquarematrixA,matrixfunctionf(A) isageneralizationofthescalarfunctionf.
IfAhasafullsetofeigenvectorsVthenitcanbefactorizedas𝐴 = 𝑉𝐷𝑉%&
Then𝑓 𝐴 = 𝑉𝑓(𝐷)𝑉%&
(It’smorecomplicatedifnotafullsetofeigenvectors)
RecentNAGlibrarieshaveexp(A),log(A),sin(A),cos(A),generalf(A)
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 15
BLAS-likeroutinesthatwouldbeusefultous
§ Routinestomultiplytriangularmatrices:
x=
x=
(xTRMM routinesallowonlyonematrixtobetriangular)§ Usedagreatdealincomputingmatrixfunctions§ Alsousefulinstatisticalapplications
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 16
BLAS-likeroutinesthatwouldbeusefultous
§ Perhapssingle-callpre- andpost-multiplication:
Suchoperationsalsousedalotinstatisticalcalculations¨ e.g.timeseriesanalysis
𝑋+𝑆𝑋where𝑋 isgeneraland𝑆 issymmetric
§ Multiplicationoftwosymmetricmatrices:
𝑆&𝑆-
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 17
H02DA– MixedIntegerNonlinearProgramming
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑓 𝑥, 𝑦
𝑠𝑢𝑏𝑗𝑒𝑐𝑡𝑡𝑜𝑐> 𝑥, 𝑦 = 0, 𝑗 = 1,2, …𝑚C
𝑐> 𝑥, 𝑦 ≥ 0, 𝑗 = 𝑚C + 1, … ,𝑚𝑥 −continuousvariables(i.e.anyrealnumber)𝑦 −binary(0/1)orintegervariables(i.e.wholenumbers)
(generalnonlinearfunction)
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 18
H02DA– MixedIntegerNonlinearProgramming
§ H02DAisbasedonthealgorithmMISQP¨ Exler,Lehmann,Schittkowski
§ UsesamodifiedSQPalgorithm¨ SolvesasequenceofQPproblems
§ Someadvantages:¨ Tendstousefewfunctionevaluations
¨ Unlikegeneticalgorithms
¨ Integervariablesarenotassumedrelaxable¨ i.e.theobjective functionisonlyevaluatedatintegerpointsy
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 19
H02DA– MixedIntegerNonlinearProgramming
Examplefromportfoliooptimization:
Choose𝑝 assetsfromasetof𝑛 assetssuchthattheriskofholdingtheassetsisminimizedforagivenexpectedreturn
Otherexamples:§ Networkdesignforwaterorgasdistribution§ Operationalreloadingofnuclearreactors
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 20
§ G13N– ChangePointAnalysis¨ Detectpointsinaunivariate timeserieswheresomefeatureofthedata(e.g.themeanorstandarddeviation)changes
¨ G13NA- PELTalgorithmbyRKillick,PFearnhead andIAEckely¨ PELT- PrunedExactLinearTime
¨ G13ND- Binarysegmentation
G13N– ChangePointAnalysis
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 21
H03BB– TravellingSalesmanProblem
Typical NAG salesman's path:Birmingham --> London --> Birmingham --> Oxford--> Birmingham --> London --> Birmingham --> New York--> Birmingham --> London --> Birmingham --> Oxford--> Birmingham --> Chicago --> Birmingham --> London--> Birmingham --> Oxford --> Birmingham --> Mumbai -->...
Total distance travelled : endless
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 22
Givenasetofplacesandthedistancesbetweenthem,e.g.
H03BB– TravellingSalesmanProblem
Oxford Birmingham Warwick Singapore New York Chicago Mumbai London78 45 6785 3397 3892 4523 60 : Oxford
26 6778 3380 3849 4545 101 : Birmingham6783 3383 3867 4531 82 : Warwick
9519 9359 2422 6733 : Singapore711 7789 3459 : New York
8047 3947 : Chicago4468 : Mumbai
: London
calculatesomethingneartheshortestclosedpaththatconnectseachplace
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 23
§ Greedyalgorithm– alwaysgotonearestunvisitedplace– usuallygetswithin25%ofbestsolution(butcouldbetheworstsolution)
§ H02BBinsteadusesasimulatedannealingapproach¨ newstatestestedbyswitchingpairsofplacesinapath
§ OptimalpathforpreviousslidecomputedbyH03BB:Oxford-->London-->Mumbai-->Singapore-->Chicago-->NewYork-->Birmingham-->Warwick-->Oxford
Totaldistancetravelled:20471milesSavingontypicalNAGsalesmanexpenses:large
H03BB– TravellingSalesmanProblem
WorkshoponBatched,Reproducible,andReducedPrecisionBLAS– May18th-19th 2016 24
POP- PerformanceOptimisationandProductivity
• NAGispartnerintheEU-fundedPOPCentreofExcellence• Performanceoptimisationadviceforparallelcodes:
• Suggestions/support onhowtorefactorcodeinthemostproductiveway
• Enable improvedefficiency, betterutilisationofHPCresources• Supportingwide rangeoflanguages (C,C++,Fortran,OpenMP,
MPI,Cuda,…)
• FreeofchargetoorganisationsintheEU• WehopethatbatchedBLAScanplayapart
• Educationiskey
• Website:https://pop-coe.eu