View
25
Download
0
Category
Preview:
DESCRIPTION
Slides are available at http://grigory.us/big-data.html. Sublinear Algorihms for Big Data. Lecture 4. Grigory Yaroslavtsev http://grigory.us. Today. Dimensionality reduction AMS as dimensionality reduction Johnson- Lindenstrauss transform Sublinear time algorithms - PowerPoint PPT Presentation
Citation preview
Sublinear Algorihms for Big Data
Grigory Yaroslavtsevhttp://grigory.us
Lecture 4
Slides are available at http://grigory.us/big-data.html
Today
• Dimensionality reduction– AMS as dimensionality reduction– Johnson-Lindenstrauss transform
• Sublinear time algorithms– Definitions: approximation, property testing– Basic examples: approximating diameter, testing
properties of images– Testing sortedness– Testing connectedness
-norm Estimation
• Stream: updates that define vector where . • Example: For
• -norm:
-norm Estimation
• -norm: • Two lectures ago:– -moment – -moment (via AMS sketching)
• Space: • Technique: linear sketches– for random set s – for random signs
AMS as dimensionality reduction
• Maintain a “linear sketch” vector
, where • Estimator for :
• “Dimensionality reduction”: “heavy” tail
Normal Distribution
• Normal distribution – Range: – Density: – Mean = 0, Variance = 1
• Basic facts:– If and are independent r.v. with normal
distribution then has normal distribution
– If are independent, then
Johnson-Lindenstrauss Transform
• Instead of let be i.i.d. random variables from normal distribution
• We still have because:– “variance of “
• Define and define:
• JL Lemma: There exists s.t. for small enough
Proof of JL Lemma
• JL Lemma: s.t. for small enough
• Assume . • We have and
• Alternative form of JL Lemma:
Proof of JL Lemma
• Alternative form of JL Lemma:
• Let and • For every we have:
• By Markov and independence of :
• We have , hence:
Proof of JL Lemma
• Alternative form of JL Lemma:
• For every we have:
• Let and recall that • A calculation finishes the proof:
Johnson-Lindenstrauss Transform
• Single vector: – Tight: [Woodruff’10]
• vectors simultaneously: – Tight: [Molinaro, Woodruff, Y. ’13]
• Distances between vectors = vectors:
Recommended