Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
DPPs in stats and MLwith real bits of joint work w/ Adrien Hardy, Michalis
Titsias, Guillaume Gautier, and Michal Valko.
Remi Bardenet
1CNRS & CRIStAL, Univ. Lille, France
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 1
Summary
Determinantal point processes
A zoo of DPPs
DPPs in stats and ML
Advances on inference and sampling
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 2
Summary
Determinantal point processes
A zoo of DPPs
DPPs in stats and ML
Advances on inference and sampling
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 3
Point processes
I A point process X on S is a random countable set of points inS .
I In most cases, it is defined by its joint intensities ρk
E
[k∏
i=1
X (Di )
]=
∫∏
Di
ρk(x1, . . . , xk)dµ(x1) . . . dµ(xk)
for disjoint Di s, see [6].
I A point process is determinantal with kernel K if:
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 4
Determinantal point processes
I Existence is tricky, see e.g. [11]
I A DPP is repulsive.
I Repulsiveness is geometric.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 5
Summary
Determinantal point processes
A zoo of DPPs
DPPs in stats and ML
Advances on inference and sampling
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 6
Uniform spanning trees
I Let A be the vertex-edge incidence matrix of a connectedgraph G , and drop the last row.
I Sample a uniform spanning tree of G , then
edges in T ∼ DPP(K ),
with K = AT (AAT )−1A, see [5].
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 7
Random matrices
Eigenvalues of some random matrices are DPPs:
I when G is filled in with iid complex Gaussians,
40 30 20 10 0 10 20 30 4040
30
20
10
0
10
20
30
40
Figure: The Ginibre ensemble with N = 1000.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 8
Random matrices
Eigenvalues of some random matrices are DPPs:
I when H = 12 (G + G ∗),
10 5 0 5 100.06
0.04
0.02
0.00
0.02
0.04
0.06
Figure: The GUE with N = 50.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 8
A zoo of DPPs: N free fermions at equilibrium
I In statistical quantum physics, a system of one particle isdescribed at equilibrium by
HψE (q) = EψE (q).
I We want a Ψ : SN → C such that
|Ψ(qσ(1), . . . , qσ(N))|2 = |Ψ(q1, . . . , qN)|2,∀σ ∈ SN .
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 9
Summary
Determinantal point processes
A zoo of DPPs
DPPs in stats and ML
Advances on inference and sampling
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 10
Orthogonal polynomial ensembles
I Let µ be a positive Borel measure, [2] build a DPP(µ,KN) onRd such that
√N1+1/d
(N∑i=1
f (xi )
KN(xi , xi )−∫
f (x)µ(dx)
)law−−−−→
N→∞N(0,Ω2
f ,ω
).
for f essentially C 1, where Ωf ,ω measures the decay of theFourier coefficients of f .
I This is useful for Monte Carlo, provided we know how tosample from that DPP!
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 11
Computational issue #1: Sampling from a DPP
I The vanilla algorithm starts from a diagonalized
K (x , y) =∞∑i=1
λiϕi (x)ϕi (y).
I This is O(N3), knowing the diagonalization and neglectingrejection sampling!
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 12
An example from spatial statistics [10, Section 5.4]
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 13
An example from spatial statistics [10, Section 5.4]
I Compare a hardcore-Strauss model
p(x1:n|β, γ, r1, r2) ∝ βn∏i
1xi∈W∏i<j
1‖xi−xj‖>r1γ
1‖xi−xj‖<r2 .
(1)
I fitted with adhoc pseudolikelihood methods.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 14
An example from spatial statistics [10, Section 5.4]
I to a Matern DPP
ρk(x|ρ, ν, α) = det((K (xi , xj)
))∏i
1xi∈W (1)
with K (x , y) = τKν,α(‖x − y‖), Kν,α(0) = 1.
I Since ∫K (x , x)1W (x)dx = τ |W |,
we have an unbiased estimator τ = n|W | .
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 14
Computational issue #2: Fitting a DPP
The density of a DPP
Let µ be supported on S compact,
K (x , y) =∑k>0
λkΦk(x)Φk(y),
and assume λk ⊂ [0, 1) for all k. Then DPP(K , µ) has a density fw.r.t. the unit rate Poisson process on S , and
f (x1, . . . , xn) ∝det((L(xi , xj)
))det(I + L)
where L = (I − K)−1K has kernel
L(x , y) =∑k>0
λi1− λi
Φk(x)Φk(y).
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 15
Analysis of [10]
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 16
Text summarization [9]
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 17
Text summarization [9]
I Build a kernel between sentences
Lij =√qiSij√qj
where Sij ∝∑
w tfi (w)tfj(w)idf(w)2, and qi = exp(θTui ).
I and sample from det(LI )1|I |=k .
I Fitting θ is relatively easy.
I Note this is not a DPP if L is not a projection.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 17
Text summarization [9]
I Again, this requires a lot of sampling, but we have time.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 17
Summary
Determinantal point processes
A zoo of DPPs
DPPs in stats and ML
Advances on inference and sampling
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 18
Inference
I Lots of activity in ML and stats, see [4, 13] and refs within,but no clear winning strategy.
I If you forget about K but parametrize L = Lθ instead, weshow [3] how to bypass the spectral decomposition, see also[1].
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 19
Remember computational issue #2: Fitting a DPP
The density of a DPP
Let µ be supported on S compact,
K (x , y) =∑k>0
λkΦk(x)Φk(y),
and assume λk ⊂ [0, 1) for all k. Then DPP(K , µ) has a density fw.r.t. the unit rate Poisson process on S , and
f (x1, . . . , xn) ∝det((L(xi , xj)
))det(I + L)
where L = (I − K)−1K has kernel
L(x , y) =∑k>0
λi1− λi
Φk(x)Φk(y).
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 20
Bounding Fredholm determinants
Proposition
Let Z = z1, . . . , zm ⊂ Rd , then
det LZdet(LZ + Ψ)
e−∫L(x,x)dµ(x)+tr(L−1
Z Ψ) 61
det(I + L)6
det LZdet(LZ + Ψ)
,
where LZ = ((L(zi , zj)) and Ψij =∫L(zi , x)L(x, zj)dµ(x).
I Now we can optimize over Z and plug this into MCMCroutines!
I Empirically, we only suffer from the dimension d .
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 21
What do the optimized inducing inputs look like?
Figure: The panels in the top row show the initial inducing inputlocations for various values of m, while the corresponding panels in thebottom row show the optimized locations.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 22
Sampling finite projection DPPs
I Random projections can help [9].
I Some DPPs are just easy to sample: e.g. USTs of graphswith no bottleneck.
I Assume we know A such that K = AT (AAT )−1A.
I Key idea we use from [7] in [8]:
Vol(Zon(A)) , A[0, 1]n =∑B∈B
Vol(Zon(B)) =∑B∈B
det B
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 23
Sampling the zonotope
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24
Sampling the zonotope
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24
Sampling the zonotope
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24
Sampling the zonotope
0 5 10 15 20 25 30
#iterations (x103)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1.0 1.0
Basis Exchange
Zonotope
Figure: Relative error of the mass of a triplet for a BA graph withrandom uniform weights.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24
Sampling the zonotope
0 5 10 15 20 25 30CPU time (s)
1.0 1.0
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
0.2 0.2
0.4 0.4
0.6 0.6
0.8 0.8
1.0 1.0Basis Exchange
Zonotope
Figure: Relative error of the mass of a triplet for a BA graph withrandom uniform weights.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 24
Conclusion
I DPPs are the kernel machine of PPs,
I Applications in stats [10] and ML [9],
I Applications in signal processing [12] and Bayesiannonparametrics are coming!
I Fast inference and sampling are available.
I powerful statistical models and algorithms combine ideas fromalgebra, combinatorial geometry, functional analysis.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 25
References I
[1] R. H. Affandi, E. B. Fox, R. P. Adams, and B. Taskar.
Learning the parameters of determinantal point processes.
In Proceedings of the International Conference on Machine Learning(ICML), 2014.
[2] R. Bardenet and A. Hardy.
Monte Carlo with determinantal point processes.
arXiv preprint arXiv:1605.00361, 2016.
[3] R. Bardenet and M. K. Titsias.
Inference for determinantal point processes without spectralknowledge.
In Advances in Neural Information Processing Systems (NIPS),pages 3375–3383, 2015.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 26
References II
[4] V.-E. Brunel, A. Moitra, P. Rigollet, and J. Urschel.
Maximum likelihood estimation of determinantal point processes.
arXiv preprint arXiv:1701.06501, 2017.
[5] R. Burton and R. Pemantle.
Local characteristics, entropy and limit theorems for spanning treesand domino tilings via transfer-impedances.
Annals of Probability, 21(3):1329–1371, 07 1993.
[6] D. J. Daley and D. Vere-Jones.
An introduction to the theory of point processes.
Springer, 2nd edition, 2003.
[7] Martin Dyer and Alan Frieze.
Random walks, totally unimodular matrices, and a randomised dualsimplex algorithm.
Mathematical Programming, 64(1-3):1–16, 1994.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 27
References III
[8] G. Gautier, R. Bardenet, and M. Valko.
Zonotope hit-and-run for efficient sampling of projection dpps.
In International Conference on Machine Learning (ICML), 2017.
[9] A. Kulesza and B. Taskar.
Determinantal point processes for machine learning.
Foundations and Trends in Machine Learning, 2012.
[10] F. Lavancier, J. Møller, and E. Rubak.
Determinantal point process models and statistical inference:Extended version.
Preprint arXiv: 1205.4818, 2014.
[11] O. Macchi.
The coincidence approach to stochastic point processes.
Advances in Applied Probability, 7:83–122, 1975.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 28
References IV
[12] N. Tremblay, P.-O. Amblard, and S. Barthelme.
Graph sampling with determinantal processes.
arXiv preprint arXiv:1703.01594, 2017.
[13] J. Urschel, V.-E. Brunel, A. Moitra, and P. Rigollet.
Learning determinantal point processes with moments and cycles.
arXiv preprint arXiv:1703.00539, 2017.
Remi Bardenet (CNRS & Univ. Lille) Determinantal point processes in stats and ML: computational issues 29