Nonparametric Divergence Estimators for Independent Subspace Analysis
Barnabás Póczos (Carnegie Mellon University, USA)
Zoltán Szabó (Eötvös Loránd University, Hungary)
Jeff Schneider (Carnegie Mellon University, USA) EUSIPCO‐2011
Barcelona, SpainSept 2, 2011
2
Outline
•Goal: divergence estimation
•Definitions, basic properties, motivation
•The estimator
•Theoretical results•Consistency
•Experimental results•Mutual information estimation•Independent subspace analysis•Low-dimensional embedding of distributions
Measuring divergences
www.juhokim.com/projects.php
Cristiano RonaldoRio FerdinandOwen Hargreaves
KL
Rényi
Tsallis
Manchester United 07/08
4
How should we estimate them?
• Naïve plug-in approach using density estimation– density estimators
• histogram• kernel density estimation• k-nearest neighbors [D. Loftsgaarden & C. Quesenberry. 1965.]
• How can we estimate them directly?
Density: nuisance parameterDensity estimation: difficult
5
kNN density estimation
How good is this estimation?
[D. Loftsgaarden and C. Quesenberry. 1965.]
[N. Leonenko et. al. 2008]
6
Divergence Estimation
6
7
Asymptotically unbiased
We need to prove:
The estimator
1-, and -1 moments of the “normalized k-NN distances”
Normalized k-NN distances converge to the Erlang distribution
Agner Krarup Erlang
7
8
Asymptotically unbiased
If we could move the limit inside the expectation…
All we need is
9
A little problem…
Asymptotically uniformly integrability…
Solutions:
Increases the paper length by another 20 pages…
10
Results for divergence estimation
2D Normal
10
11
Results for MI estimation
rotated uniform distribution
1212
Independent Subspace Analysis
Observation X=AS
Independent subspaces
Estimate A and S observing samples from X onlyGoal:
6 by 6 mixing matrix
1313
Independent Subspace Analysis
Objective:
14
Low dimensional embeddig of digits
Noisy USPS datasets
15
Embedding using raw image data
16
Embedding using Rényi divergences
17
Be careful, some mistakes are easy to make…
We want:
Helly–Bray theorem
[Annals of Statistics]
18
Some mistakes …
We want:
Enough:
Erlang
Fatou lemma:
[Journal of Nonparametric Statistics, Problems Information Transmission, IEEE Trans. on Information Theory]
Fatou lemma:
19
Takeaways
If you need to estimate divergences, then use me!
Consistent divergence estimator Direct: no need to estimate densities Simple: it needs only kNN based statistics Can be used for mutual information estimation,
independent subspace analysis, low-dimensional embedding
Thanks for your attention!
20
Attic