View
58
Download
0
Category
Tags:
Preview:
DESCRIPTION
Scalable Training of Mixture Models via Coresets. Daniel Feldman. Matthew Faulkner. Andreas Krause. MIT. Fitting Mixtures to Massive Data. EM, generally expensive. Weighted EM, fast!. Importance Sample. Coresets for Mixture Models. *. Naïve Uniform Sampling. - PowerPoint PPT Presentation
Citation preview
Scalable Training of Mixture Models via Coresets
Daniel Feldman
MatthewFaulkner
Andreas Krause
MIT
Fitting Mixtures to Massive Data
ImportanceSample
EM, generally expensive Weighted EM, fast!
Coresets for Mixture Models
*
Naïve Uniform Sampling
4
5
Naïve Uniform Sampling
Small cluster is missed
Sample a set U of m points uniformly
High variance
Sampling Distribution
Sampling distribution
Bias sampling towards small clusters
Importance Weights
WeightsSampling distribution
Creating a Sampling Distribution
Iteratively find representative points
8
Creating a Sampling Distribution
• Sample a small set uniformly at random
9
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
10
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
11
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
12
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
13
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
14
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
15
Iteratively find representative points
Creating a Sampling Distribution
• Remove half the blue points nearest the samples• Sample a small set uniformly at random
16
Small clusters are represented
Iteratively find representative points
Creating a Sampling Distribution
Partition data via a Voronoi diagram centered at points17
Creating a Sampling Distribution
Sampling distribution 18
Points in sparse cells get more massand points far from centers
Importance Weights
Sampling distribution 19
Points in sparse cells get more massand points far from centers
Weights
20
Importance Sample
21
Coresets via Adaptive Sampling
A General Coreset Framework
•
•
•
•
Contributions for Mixture Models:
A Geometric PerspectiveGaussian level sets can be expressed purely geometrically:
23
affine subspace
Geometric Reduction
Lifts geometric coreset tools to mixture models
Soft-min
Semi-Spherical Gaussian Mixtures
25
Extensions and Generalizations
26
Level Sets
Composition of Coresets
Merge[c.f. Har-Peled, Mazumdar 04]
27
Composition of Coresets
Compress
Merge[Har-Peled, Mazumdar 04]
28
Coresets on Streams
Compress
Merge[Har-Peled, Mazumdar 04]
29
Coresets on Streams
Compress
Merge[Har-Peled, Mazumdar 04]
30
Coresets on Streams
Compress
Merge[Har-Peled, Mazumdar 04]
31Error grows linearly with number of compressions
Coresets on Streams
Error grows with height of tree
33
Coresets in Parallel
Handwritten DigitsObtain 100-dimensional features from 28x28 pixel images via PCA. Fit GMM with k=10 components.
34
MNIST data:60,000 training,10,000 testing
35
Neural Tetrode RecordingsWaveforms of neural activity at four co-located electrodes in a live rat hippocampus. 4 x 38 samples = 152 dimensions.
T. Siapas et al, Caltech
36
Community Seismic NetworkDetect and monitor earthquakes using smart phones, USB sensors, and cloud computing.
CSN Sensors Worldwide
Learning User Acceleration
37
17-dimensional acceleration feature vectors
Bad
Good
38
Seismic Anomaly Detection
Bad
Good
GMM used for anomaly detection
Conclusions
• Lift geometric coreset tools to the statistical realm - New complexity result for GMM level sets
• Parallel (MapReduce) and Streaming implementations
• Strong empirical performance, enables learning on mobile devices
• GMMs admit coresets of size independent of n - Extensions for other mixture models
39
Recommended