Upload
inside-bigdatacom
View
512
Download
0
Embed Size (px)
Citation preview
(2012) This is all great, but…
• IsMachineLearningrelevanttoscience?• WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?
-4-
(2012) This is all great, but…
• IsMachineLearningrelevanttoscience?– Successstoriesareforimagesandaudio,buthowaboutscien=ficdata?
• WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?– Ourappliedmathema=ciansarecontentwithformula=ngandsolvingPDEs
– TheNNSAfolkscareaboutUncertaintyQuan=fica=on– Ourdata‘analy=cs’folksarehappydealingwithmeshes,computa=onalgeometry,topology
-5-
(2016) The writing is on the wall
• O(B)$worthofinvestmentbyindustry• MachineLearningandSta;s;csareestablishedaskeydisciplinesforthisdecade– DeepLearninghastakenoffasthemostpromisingMLtechnique
-12-
(2012) Revisited..
• IsMachineLearningrelevanttoscience?• WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?
-13-
4 V’s of Scientific Big Data
-24-
ScienceDomain
Variety Volume Velocity Veracity
Astronomy Mul=pleTelescopes,mul=-band/spectra
O(100)TB 100GB/night–10TB/night
Noisy,acquisi=onartefacts
LightSources Mul=pleimagingmodali=es
O(100)GB 1Gb/s-1Tb/s Noisy,sampleprepara=on/acquisi=onartefacts
Genomics Sequencers,Mass-spec,proteomics
O(1-10)TB TB/week Missingdata,errors
HighEnergyPhysics
Mul=pledetectors O(100)TB–O(10)PB
1-10PB/sreducedtoGB/s
Noisy,artefacts,spa=o-temporal
Climate Simula=onsMul=-variate,spa=o-temporal
O(10)TB 100GB/s ‘Clean’,needtoaccountformul=plesourcesofuncertainty
Does Machine Learning matter?
• IsMachineLearningrelevanttoscience?– Yes!
• WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?– Analy=csisthekeystepforgainingscien=ficinsights– Thenatureofques=onsindata-intensivescienceareinferen=al
– Sta=s=csandMachineLearningdealwithinferenceinpresenceofnoiseanderrors
-25-
Towards Synthesis (and maybe Convergence)
• WhatisthelandscapeofMachineLearningproblemsinscience?– Bewilderingarrayoftaxonomyanddomain-specificterminology
• Whatarethekeycomputa;onalmo;fs?
– Needtohaveaproduc=veconversa=onwithHPCsoaware,hardwarevendors
-36-
✗ ✗ ✗ ✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗ ✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗ ✗ ✗ ✗ ✗
✗ ✗ ✗ ✗
Classifica;on
Regression
Clustering
DimensionalityReduc;on
Inference
ModelEs;ma;on
DesignofExperiments
Seman;cAnalysis
FeatureLearning
AnomalyDetec;on
Astron
omy
Cosm
ology
Clim
ate
System
sBiology
Neu
roscience
EM/X-Ray
Imaging
Mass-spec
Imaging
Person
alized
To
xicology
Materials
Par;cle
Physics
✗ ✗ ✗ ✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗ ✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗
✗ ✗ ✗ ✗ ✗ ✗ ✗
✗ ✗ ✗ ✗
Classifica;on
Regression
Clustering
DimensionalityReduc;on
Inference
ModelEs;ma;on
DesignofExperiments
Seman;cAnalysis
FeatureLearning
AnomalyDetec;on
Astron
omy
Cosm
ology
Clim
ate
System
sBiology
Neu
roscience
EM/X-Ray
Imaging
Mass-spec
Imaging
Person
alized
To
xicology
Materials
Par;cle
Physics
Hardware
Op=mizedLibraries
BigDataMo=f
ScalableAlgorithms
Scien=ficAnalysis
ScienceApplica=ons Astronomy,Cosmology,Climate,BRAIN,BioImaging,HEP
Pafern/AnomalyDiscovery
LargeScaleInference
Clustering,DimensionalityReduc=on
DataFusion GenomeAssembly
Dense/SparseLinearAlgebra
Op=miza=on(Stochas=c)
RandomizedLinearAlgebra
Many-CoreChipset,DeepMemoryHierarchy,ReducingDataMovement,PowerEfficiency
GraphMethods(BFS,DFS,…)
MapReduce
ScaLAPACK,BLAS
PCL-DNN
TensorFlowSpearMint RandLATECA GraphLab
DeepLearning
Stochas=cVaria=onalInference
DistributedMCMC
DirectGraphKernel
Computa=on
SparseCoding
DBSCANCUR/CX
Machine Learning Research Strategy
Hardware
Op=mizedLibraries
BigDataMo=f
ScalableAlgorithms
Scien=ficAnalysis
ScienceApplica=ons Astronomy,Cosmology,Climate,BRAIN,BioImaging,HEP
Pafern/AnomalyDiscovery
LargeScaleInference
Clustering,DimensionalityReduc=on
DataFusion GenomeAssembly
Dense/SparseLinearAlgebra
Op=miza=on(Stochas=c)
RandomizedLinearAlgebra
Many-CoreChipset,DeepMemoryHierarchy,ReducingDataMovement,PowerEfficiency
GraphMethods(BFS,DFS,…)
MapReduce
ScaLAPACK,BLAS
PCL-DNN
TensorFlowSpearMint RandLATECA GraphLab
DeepLearning
Stochas=cVaria=onalInference
DistributedMCMC
DirectGraphKernel
Computa=on
SparseCoding
DBSCANCUR/CX
Machine Learning Research Strategy
Machine Learning: Challenges
• Cultural– MLdoesn’tcleanly‘fit’withinComputerScienceorAppliedMath
– Sta=s=cs,CS(MachineLearning,HPC)taxonomy– Mindshare
• Afrac=ngthebestacademicandindustrytalentishard
• Technical– BigDataecosystemhasevolvedindependentlyofHPC– Aspira=onsofConvergence(Soaware,Hardware)
• HPCins=tu=onsneedtodoabeferjobofcharacterizingtheirDataAnaly=csrequirements
-41-
Machine Learning: Opportunities
• HPCcommunityisuniquelyposi;oned– StorageandComputeHardware– Meaningfulscien=ficproblems
• SoWware(ResearchandProduc;on)iswideopen• Mostexci;ngdiscoverieshappenattheintersec;onofdomainsciencesandmethods– Wedon’tknowthelimitsofDeepLearningmethods
-42-
Thanks!
-44-
Wearehiring:BigDataArchitectsBigDataEngineersDataScien;stsPost-docs,interns
Contact:[email protected]