Gibbs Sampling from π‘˜-Determinantal Point Processes13-09-00)-13...Β Β· 2019-06-13Β Β· Point...

Preview:

Citation preview

Gibbs Sampling from π‘˜-Determinantal Point Processes

Based on joint work with

Shayan Oveis Gharan

Alireza RezaeiUniversity of Washington

Point Process: A distribution on subsets of 𝑁 = {1,2,… ,𝑁}.

Determinantal Point Process: There is a PSD kernel 𝐿 ∈ ℝ𝑁×𝑁 such that

βˆ€π‘† βŠ† 𝑁 : β„™ 𝑆 ∝ det 𝐿𝑆

𝑳𝑺

Point Process: A distribution on subsets of 𝑁 = {1,2,… ,𝑁}.

Determinantal Point Process: There is a PSD kernel 𝐿 ∈ ℝ𝑁×𝑁 such that

βˆ€π‘† βŠ† 𝑁 : β„™ 𝑆 ∝ det 𝐿𝑆

π’Œ-DPP: Conditioning of a DPP on picking subsets of size π‘˜

if 𝑆 = π‘˜: β„™ 𝑆 ∝ det 𝐿𝑆

otherwise : β„™ 𝑆 = 0

Focus of the talk:Sampling from π‘˜-

DPPs

𝑳𝑺

Point Process: A distribution on subsets of 𝑁 = {1,2,… ,𝑁}.

Determinantal Point Process: There is a PSD kernel 𝐿 ∈ ℝ𝑁×𝑁 such that

DPPs are Very popular probabilistic models in machine learning to capture diversity.

βˆ€π‘† βŠ† 𝑁 : β„™ 𝑆 ∝ det 𝐿𝑆

π’Œ-DPP: Conditioning of a DPP on picking subsets of size π‘˜

if 𝑆 = π‘˜: β„™ 𝑆 ∝ det 𝐿𝑆

otherwise : β„™ 𝑆 = 0

Focus of the talk:Sampling from π‘˜-

DPPs

Applications [Kulesza-Taskar’11, Dang’05, Nenkova-Vanderwende-McKeown’06, Mirzasoleiman-Jegelka-Krause’17]

β€” Image search, document and video summarization, tweet timeline generation, pose

estimation, feature selection

𝑳𝑺

Continuous Domain

Input: PSD operator 𝐿: π’ž Γ— π’ž β†’ ℝand π‘˜

select a subset 𝑆 βŠ‚ π’ž with π‘˜ points from a distribution with PDF function

𝑝(𝑆) ∝ det 𝐿(π‘₯, 𝑦) π‘₯,π‘¦βˆˆπ‘†

Continuous Domain

Input: PSD operator 𝐿: π’ž Γ— π’ž β†’ ℝand π‘˜

select a subset 𝑆 βŠ‚ π’ž with π‘˜ points from a distribution with PDF function

𝑝(𝑆) ∝ det 𝐿(π‘₯, 𝑦) π‘₯,π‘¦βˆˆπ‘†

Applications.

β€” Hyper-parameter tuning [Dodge-Jamieson-Smith’17]

β€” Learning mixture of Gaussians[Affandi-Fox-Taskar’13]

Ex. Gaussian : 𝐿 π‘₯, 𝑦 = exp βˆ’π‘₯βˆ’π‘¦ Ξ£βˆ’1 π‘₯βˆ’π‘¦

2

Random Scan Gibbs Sampler for 𝐾-DPP

1. Stay at the current state 𝑆 = {π‘₯1, … π‘₯π‘˜} with prob1

2.

2. Choose π‘₯𝑖 ∈ 𝑆 u.a.r

3. Choose 𝑦 βˆ‰ 𝑆 from the conditional dist πœ‹ . 𝑆 βˆ’ π‘₯𝑖 is chosen)

Continuous: PDF 𝑦 ∝ πœ‹ π‘₯1, … π‘₯π‘–βˆ’1, 𝑦, π‘₯𝑖+1, … , π‘₯π‘˜ )S ∈

[𝑁]

π‘˜

y

π‘₯𝑖

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Discrete: A simple greedy initialization gives 𝜏 = 𝑂 π‘˜5log π‘˜ . Total running time is 𝑂 𝑁 . poly π‘˜ .

Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]

Mixing time is independent of 𝑁, so the running time in distributed settings is sublinear.

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Discrete: A simple greedy initialization gives 𝜏 = 𝑂 π‘˜5log π‘˜ . Total running time is 𝑂 𝑁 . poly π‘˜ .

Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]

Mixing time is independent of 𝑁, so the running time in distributed settings is sublinear.

Continuous: Given access to conditional oracles, πœ‡ can be found so 𝜏 = 𝑂(π‘˜5log π‘˜).

First algorithm with a theoretical guarantee for sampling from continuous π‘˜-DPP.

Being able to run the chain.

Main Result

Given a π‘˜-DPP πœ‹, an β€œapproximate” sample from πœ‹ can be generated by running the

Gibbs sampler for 𝝉 = ෩𝑢 π’ŒπŸ’ β‹… π₯𝐨𝐠 (π―πšπ«π…π’‘π

𝒑𝝅) steps where πœ‡ is the starting dist.

Discrete: A simple greedy initialization gives 𝜏 = 𝑂 π‘˜5log π‘˜ . Total running time is 𝑂 𝑁 . poly π‘˜ .

Does not improve upon the previous MCMC methods. [Anari-Oveis Gharan-R’16]

Mixing time is independent of 𝑁, so the running time in distributed settings is sublinear.

Continuous: Given access to conditional oracles, πœ‡ can be found so 𝜏 = 𝑂(π‘˜5log π‘˜).

First algorithm with a theoretical guarantee for sampling from continuous π‘˜-DPP.

Using a rejection sampler as the conditional oracles for Gaussian kernels 𝐿 π‘₯, 𝑦 = exp(βˆ’π‘₯βˆ’π‘¦ 2

𝜎2)

defined a unit sphere in ℝ𝑑, the total running time is

β€’ If π‘˜ =poly(d): poly(𝑑, 𝜎)

β€’ If π‘˜ ≀ 𝑒𝑑1βˆ’π›Ώ

and 𝜎 = 𝑂 1 : poly 𝑑 β‹… π‘˜π‘‚(1

𝛿)

Being able to run the chain.

Recommended