44
Machine Learning -1- Prabhat HPC User Forum April 12, 2016

Machine learning

Embed Size (px)

Citation preview

Machine Learning

-1-

Prabhat HPC User Forum April 12, 2016

SlideCourtesyofNervanaSystems

(2012) This is all great, but…

•  IsMachineLearningrelevanttoscience?•  WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?

-4-

(2012) This is all great, but…

•  IsMachineLearningrelevanttoscience?–  Successstoriesareforimagesandaudio,buthowaboutscien=ficdata?

•  WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?–  Ourappliedmathema=ciansarecontentwithformula=ngandsolvingPDEs

–  TheNNSAfolkscareaboutUncertaintyQuan=fica=on–  Ourdata‘analy=cs’folksarehappydealingwithmeshes,computa=onalgeometry,topology

-5-

(2016) The writing is on the wall

•  O(B)$worthofinvestmentbyindustry•  MachineLearningandSta;s;csareestablishedaskeydisciplinesforthisdecade–  DeepLearninghastakenoffasthemostpromisingMLtechnique

-12-

(2012) Revisited..

•  IsMachineLearningrelevanttoscience?•  WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?

-13-

Astronomy

Physics Light Sources

Genomics Climate

(2010-2016): The Rise of Data-Intensive Science

-15-

-17-

-18-

-19-

-20-

-21-

-22-

-23-

4 V’s of Scientific Big Data

-24-

ScienceDomain

Variety Volume Velocity Veracity

Astronomy Mul=pleTelescopes,mul=-band/spectra

O(100)TB 100GB/night–10TB/night

Noisy,acquisi=onartefacts

LightSources Mul=pleimagingmodali=es

O(100)GB 1Gb/s-1Tb/s Noisy,sampleprepara=on/acquisi=onartefacts

Genomics Sequencers,Mass-spec,proteomics

O(1-10)TB TB/week Missingdata,errors

HighEnergyPhysics

Mul=pledetectors O(100)TB–O(10)PB

1-10PB/sreducedtoGB/s

Noisy,artefacts,spa=o-temporal

Climate Simula=onsMul=-variate,spa=o-temporal

O(10)TB 100GB/s ‘Clean’,needtoaccountformul=plesourcesofuncertainty

Does Machine Learning matter?

•  IsMachineLearningrelevanttoscience?–  Yes!

•  WhyshouldHPCfacili;escareaboutMachineLearning,DeepLearning,Sta;s;cs?–  Analy=csisthekeystepforgainingscien=ficinsights–  Thenatureofques=onsindata-intensivescienceareinferen=al

–  Sta=s=csandMachineLearningdealwithinferenceinpresenceofnoiseanderrors

-25-

Creating a catalog of all objects "in the Universe

1 Top 10 Data Analytics Problems

Fundamental Constants of Cosmology 2 Top 10 Data Analytics Problems

Characterizing Extreme Weather in a Changing Climate

3 Top 10 Data Analytics Problems

Knowledge Extraction from "Scientific Literature

4 Top 10 Data Analytics Problems

Understanding Speech Production 5 Top 10 Data Analytics Problems

Quantitative and Predictive Biology 6 Top 10 Data Analytics Problems

Understanding the Genetic Code 7 Top 10 Data Analytics Problems

8 Top 10 Data Analytics Problems

Personalized Toxicology

Designer Materials 9 Top 10 Data Analytics Problems

Fundamental Constituents of Matter 10 Top 10 Data Analytics Problems

Towards Synthesis (and maybe Convergence)

•  WhatisthelandscapeofMachineLearningproblemsinscience?–  Bewilderingarrayoftaxonomyanddomain-specificterminology

•  Whatarethekeycomputa;onalmo;fs?

–  Needtohaveaproduc=veconversa=onwithHPCsoaware,hardwarevendors

-36-

✗ ✗ ✗ ✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗ ✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗ ✗ ✗ ✗ ✗

✗ ✗ ✗ ✗

Classifica;on

Regression

Clustering

DimensionalityReduc;on

Inference

ModelEs;ma;on

DesignofExperiments

Seman;cAnalysis

FeatureLearning

AnomalyDetec;on

Astron

omy

Cosm

ology

Clim

ate

System

sBiology

Neu

roscience

EM/X-Ray

Imaging

Mass-spec

Imaging

Person

alized

To

xicology

Materials

Par;cle

Physics

✗ ✗ ✗ ✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗ ✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗

✗ ✗ ✗ ✗ ✗ ✗ ✗

✗ ✗ ✗ ✗

Classifica;on

Regression

Clustering

DimensionalityReduc;on

Inference

ModelEs;ma;on

DesignofExperiments

Seman;cAnalysis

FeatureLearning

AnomalyDetec;on

Astron

omy

Cosm

ology

Clim

ate

System

sBiology

Neu

roscience

EM/X-Ray

Imaging

Mass-spec

Imaging

Person

alized

To

xicology

Materials

Par;cle

Physics

Hardware

Op=mizedLibraries

BigDataMo=f

ScalableAlgorithms

Scien=ficAnalysis

ScienceApplica=ons Astronomy,Cosmology,Climate,BRAIN,BioImaging,HEP

Pafern/AnomalyDiscovery

LargeScaleInference

Clustering,DimensionalityReduc=on

DataFusion GenomeAssembly

Dense/SparseLinearAlgebra

Op=miza=on(Stochas=c)

RandomizedLinearAlgebra

Many-CoreChipset,DeepMemoryHierarchy,ReducingDataMovement,PowerEfficiency

GraphMethods(BFS,DFS,…)

MapReduce

ScaLAPACK,BLAS

PCL-DNN

TensorFlowSpearMint RandLATECA GraphLab

DeepLearning

Stochas=cVaria=onalInference

DistributedMCMC

DirectGraphKernel

Computa=on

SparseCoding

DBSCANCUR/CX

Machine Learning Research Strategy

Hardware

Op=mizedLibraries

BigDataMo=f

ScalableAlgorithms

Scien=ficAnalysis

ScienceApplica=ons Astronomy,Cosmology,Climate,BRAIN,BioImaging,HEP

Pafern/AnomalyDiscovery

LargeScaleInference

Clustering,DimensionalityReduc=on

DataFusion GenomeAssembly

Dense/SparseLinearAlgebra

Op=miza=on(Stochas=c)

RandomizedLinearAlgebra

Many-CoreChipset,DeepMemoryHierarchy,ReducingDataMovement,PowerEfficiency

GraphMethods(BFS,DFS,…)

MapReduce

ScaLAPACK,BLAS

PCL-DNN

TensorFlowSpearMint RandLATECA GraphLab

DeepLearning

Stochas=cVaria=onalInference

DistributedMCMC

DirectGraphKernel

Computa=on

SparseCoding

DBSCANCUR/CX

Machine Learning Research Strategy

Machine Learning: Challenges

•  Cultural– MLdoesn’tcleanly‘fit’withinComputerScienceorAppliedMath

–  Sta=s=cs,CS(MachineLearning,HPC)taxonomy– Mindshare

•  Afrac=ngthebestacademicandindustrytalentishard

•  Technical–  BigDataecosystemhasevolvedindependentlyofHPC–  Aspira=onsofConvergence(Soaware,Hardware)

•  HPCins=tu=onsneedtodoabeferjobofcharacterizingtheirDataAnaly=csrequirements

-41-

Machine Learning: Opportunities

•  HPCcommunityisuniquelyposi;oned–  StorageandComputeHardware– Meaningfulscien=ficproblems

•  SoWware(ResearchandProduc;on)iswideopen•  Mostexci;ngdiscoverieshappenattheintersec;onofdomainsciencesandmethods– Wedon’tknowthelimitsofDeepLearningmethods

-42-

-43-

Thanks!

-44-

Wearehiring:BigDataArchitectsBigDataEngineersDataScien;stsPost-docs,interns

Contact:[email protected]