Upload
peter-wittek
View
107
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Learn Faster: High-Performance Machine Learning on GPU Clusters
Learn Faster: High-Performance MachineLearning on GPU Clusters
Peter Wittek
September 26, 2012
Learn Faster: High-Performance Machine Learning on GPU Clusters
Machine Learning
What Machine Learning Is Not
It is not statisticsData-drivenStrict assumptions on underlying distributions
It is not AIModel-drivenUncertainty is addressed
It is not data miningAlthough there is a considerable overlap
Learn Faster: High-Performance Machine Learning on GPU Clusters
Machine Learning
What Machine Learning Should Be About
Data-drivenLooking for patternsClasses, groups of similar objectsMainly quantitative, but can also be qualitative
Robust, tolerates noiseGeneralize well beyond training data
Learn Faster: High-Performance Machine Learning on GPU Clusters
Machine Learning
Characteristics
Loose collection of algorithmsNo common ground
Few assumptionsParameters can be a major obstacleComputationally intensive
Not easy to parallelizeN:N access patterns are commonOr N:K through a proxy
Learn Faster: High-Performance Machine Learning on GPU Clusters
Machine Learning
Nature-Inspired Methods
Many nature-inspired methodsComputational IntelligenceNeural networks, flocking algorithms, genetic algorithms,chemical reactions, etc.Also methods inspired by quantum mechanicsOthers: manifold learning, density-based clustering,support vector machines, etc.
Learn Faster: High-Performance Machine Learning on GPU Clusters
Machine Learning
Learning Approach
SupervisedBiomedical: recognizing cancer cellsRecognizing handwritingSpam detection
UnsupervisedRecommendation enginesFinding groups of similar patentsIdentifying trends in a dynamic environment
Learn Faster: High-Performance Machine Learning on GPU Clusters
Machine Learning
Ensembles
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Why Do We Need It?
Petabytes of dataSparse, noisy, might be missing elements
There should be as few assumptions as possible
Large scale may not entail a need for quick learningmethods
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
A Case Study: Digital Preservation
Adding advanced services to digital librariesCloud paradigm is importantOverview of the SHAMAN core infrastructure:
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Examples
An ensemble of unsupervised methods:Distributed indexing (not on GPUs)Dimensionality reductionVisualization of clusters
A supervised classifier
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Dimensionality Reduction: Random Projection
Johnson-Lindenstrauss lemma (1984)Latent Semantic AnalysisCPU: Incremental approachGPU: 14.5x slow-downAw×dRd×k = A′w×k
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Dimensionality Reduction: Random Projection
Dead end: MapReduceMPI and CuSparseVery irregular, sparse matrix ( 1 % nonzero)GPUs 2 4 8 16All With I/O 5.10terms Projection only 19.37Subset With I/O 2.34 3.46 4.53 5.38
Projection only 2.02 4.05 8.10 16.45
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Visualization: Self-Organizing Maps
wj(t + 1) = wj(t) + αhbj(t)[x(t)− wj(t)]
hbj = exp(−||rb−rj ||δ(t) )
Batch formulation
wj(tf ) =∑tf
t′=t0hbj (t′)x(t′)∑tf
t′=t0hbj (t′)
Video
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Visualization: Self-Organizing Maps
Critical operation: finding best matching unit
d(wj(t0), x(t)) =√∑N
i=1(xi(t)− wji(t0))2
Multi-step reduction to find the minimum
1: v1 = (X ◦ X )[1,1 . . . 1]′
2: v2 = (W ◦W )[1,1 . . . 1]′
3: P1 = [v1v1 . . . v1]4: P2 = [v2v2 . . . v2]
′
5: P3 = XW ′
6: D = (P1 + P2 − 2P3)
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Visualization: Self-Organizing Maps
GPUs 2 4 8 16All With I/O 8.69terms One epoch 9.68Subset With I/O 8.57 7.49 6.48 4.85
One epoch 9.68 9.42 9.75 9.56
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Classification: Support Vector Machines
w′φ(xi) + b ≥ 1− ξi if yi = +1,
w′φ(xi) + b ≤ −1 + ξi if yi = −1,
Making a problem linearly separable after embedding intoa feature space by a nonlinear map φ.Minimize min 1
2‖w‖2 + C
∑i ξi
Solve the dual with the Gram matrix K (xi ,xj) = φ(xi)φ(xj)′.
a) b)
Learn Faster: High-Performance Machine Learning on GPU Clusters
High-Performance Machine Learning
Classification: Support Vector Machines
SVM ModelCreation
Cross ValidationKernel Matrix
Calculation (GPU)
SVM ParameterSelection
N-fold Validation
Different Sets of Parameters
TrainingData
SVMModel
Around 10x speedup
Learn Faster: High-Performance Machine Learning on GPU Clusters
Quantum-Inspired Methods
Why Is Quantum Mechanics Relevant?
Contextual probabilityp(A ∩ B) 6= p(B ∩ A)If an event A happens, it implies a context
Robust and naturally fuzzyQuantum probability and quantum logic: same linalgframeworkBonus: HPC acceleration for free
Learn Faster: High-Performance Machine Learning on GPU Clusters
Quantum-Inspired Methods
Dynamic Quantum Clustering
Semi-classical methodEhrenfest’s theoremEvolves the Hamiltonian of a quantum system:
Hψ(x , t) = (T + V (x))ψ(x , t)
Learn Faster: High-Performance Machine Learning on GPU Clusters
Quantum-Inspired Methods
Dynamic Quantum Clustering
Speedups are impressive; square root of matrix below
0
10
20
30
40
50
60
70
80
90
64 128 256 512 1024 2048 4096 8192
Speedup
Matrix size
Without Memory TransferWith Memory Transfer
Learn Faster: High-Performance Machine Learning on GPU Clusters
Quantum-Inspired Methods
There Is More to It
Trotter-Suzuki AlgorithmAvoids eigendecompositionLinear scaling tested up to 64 GPUsSpeedup over SSE and cache optimized CPU variant: 4-8x
0
5
10
15
20
25
30
1 2 4 8 16 32
Tim
e (
s)
Nodes
cpusse
cudahybrid
Going beyond current HPC: Machine learning based onactual quantum computers
Learn Faster: High-Performance Machine Learning on GPU Clusters
Quantum-Inspired Methods
Summary
ML is about data and patternsBlend of algorithmsEnsembles
AI?Parallel and distributed computing with challenges
Large-scale versus HPCTowards a common ground: quantum-inspired methods
Bonus: HPC with little effort