Upload
others
View
11
Download
1
Embed Size (px)
Citation preview
Bayesian Quadrature for Multiple Related Integrals
Francois-Xavier Briol
University of Warwick (Department of Statistics)Imperial College London (Department of Mathematics)
University of SheffieldMachine Learning Seminar
arXiv:1801.04153
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 1 / 32
Collaborators
Xiaoyue Xi Mark Girolami Chris Oates(Warwick) (Imperial & ATI) (Newcastle & ATI)
Michael Osborne Dino Sejdinovic
(Oxford) (Oxford & ATI)F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 2 / 32
Bayesian Numerical Methods
Bayesian Numerical Methods
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 3 / 32
Bayesian Numerical Methods
Bayesian Numerical Methods
Standard Numerical Analysis: Study of how to best projectcontinuous mathematical problems into discrete scales. (also thoughtof as the study of numerical errors).
Statistics: Infer some quantity of interest (usually the parameter of amodel) from data samples. Sounds familiar?
Bayesian Numerical Methods: Perform Bayesian statisticalinference on the solution of numerical problems.
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379-422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory andRelated Topics IV, 163-175.[3] O’Hagan, A. (1992). Some Bayesian numerical analysis. Bayesian Statistics, 4,345-363.[4] Hennig, P., Osborne, M. A., & Girolami, M. (2015). Probabilistic Numerics andUncertainty in Computations. J. Roy. Soc. A, 471(2179).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 4 / 32
Bayesian Numerical Methods
Bayesian Numerical Methods
Standard Numerical Analysis: Study of how to best projectcontinuous mathematical problems into discrete scales. (also thoughtof as the study of numerical errors).
Statistics: Infer some quantity of interest (usually the parameter of amodel) from data samples. Sounds familiar?
Bayesian Numerical Methods: Perform Bayesian statisticalinference on the solution of numerical problems.
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379-422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory andRelated Topics IV, 163-175.[3] O’Hagan, A. (1992). Some Bayesian numerical analysis. Bayesian Statistics, 4,345-363.[4] Hennig, P., Osborne, M. A., & Girolami, M. (2015). Probabilistic Numerics andUncertainty in Computations. J. Roy. Soc. A, 471(2179).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 4 / 32
Bayesian Numerical Methods
Bayesian Numerical Methods
Standard Numerical Analysis: Study of how to best projectcontinuous mathematical problems into discrete scales. (also thoughtof as the study of numerical errors).
Statistics: Infer some quantity of interest (usually the parameter of amodel) from data samples. Sounds familiar?
Bayesian Numerical Methods: Perform Bayesian statisticalinference on the solution of numerical problems.
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379-422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory andRelated Topics IV, 163-175.[3] O’Hagan, A. (1992). Some Bayesian numerical analysis. Bayesian Statistics, 4,345-363.[4] Hennig, P., Osborne, M. A., & Girolami, M. (2015). Probabilistic Numerics andUncertainty in Computations. J. Roy. Soc. A, 471(2179).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 4 / 32
Bayesian Numerical Methods
Bayesian Numerical Methods
Standard Numerical Analysis: Study of how to best projectcontinuous mathematical problems into discrete scales. (also thoughtof as the study of numerical errors).
Statistics: Infer some quantity of interest (usually the parameter of amodel) from data samples. Sounds familiar?
Bayesian Numerical Methods: Perform Bayesian statisticalinference on the solution of numerical problems.
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379-422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory andRelated Topics IV, 163-175.[3] O’Hagan, A. (1992). Some Bayesian numerical analysis. Bayesian Statistics, 4,345-363.[4] Hennig, P., Osborne, M. A., & Girolami, M. (2015). Probabilistic Numerics andUncertainty in Computations. J. Roy. Soc. A, 471(2179).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 4 / 32
Bayesian Numerical Methods
What is the Point?
Quantification of the epistemic uncertainty associated with thenumerical problem using probability measures, rather thanworst-case bounds (not always representative of the actual error).
Propagation of uncertainty through pipelines.
Bayesian Numerical Methods can be framed as Bayesian InverseProblems for the solution of the numerical problem.
[1] Cockayne, J., Oates, C., Sullivan, T., & Girolami, M. (2017). Bayesian ProbabilisticNumerical Methods. arXiv:1701.04006.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 5 / 32
Bayesian Numerical Methods
What is the Point?
Quantification of the epistemic uncertainty associated with thenumerical problem using probability measures, rather thanworst-case bounds (not always representative of the actual error).
Propagation of uncertainty through pipelines.
Bayesian Numerical Methods can be framed as Bayesian InverseProblems for the solution of the numerical problem.
[1] Cockayne, J., Oates, C., Sullivan, T., & Girolami, M. (2017). Bayesian ProbabilisticNumerical Methods. arXiv:1701.04006.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 5 / 32
Bayesian Numerical Methods
What is the Point?
Quantification of the epistemic uncertainty associated with thenumerical problem using probability measures, rather thanworst-case bounds (not always representative of the actual error).
Propagation of uncertainty through pipelines.
Bayesian Numerical Methods can be framed as Bayesian InverseProblems for the solution of the numerical problem.
[1] Cockayne, J., Oates, C., Sullivan, T., & Girolami, M. (2017). Bayesian ProbabilisticNumerical Methods. arXiv:1701.04006.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 5 / 32
Bayesian Numerical Integration
Bayesian Numerical Integration
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 6 / 32
Bayesian Numerical Integration
The Problem
Let’s come back to our problem of computing integrals! Consider afunction f : X → R (X ⊆ Rp) assumed to be square-integrable and aprobability measure Π.
Π[f ] =
∫Xf (x)dΠ(x) ≈
n∑i=1
wi f (xi ) = Π[f ]
where {xi}ni=1 ∈ X & {wi}ni=1 ∈ R.
Examples include:
1 Monte Carlo (MC): Sample {xi}ni=1 ∼ Π and let wi = 1/n ∀i .2 Markov Chain Monte Carlo (MCMC): Sample states {xi}ni=1 from a
Markov Chain with invariant distribution Π and let wi = 1/n ∀i .3 Gaussian quadrature, importance sampling, QMC, SMC, etc...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 7 / 32
Bayesian Numerical Integration
The Problem
Let’s come back to our problem of computing integrals! Consider afunction f : X → R (X ⊆ Rp) assumed to be square-integrable and aprobability measure Π.
Π[f ] =
∫Xf (x)dΠ(x) ≈
n∑i=1
wi f (xi ) = Π[f ]
where {xi}ni=1 ∈ X & {wi}ni=1 ∈ R.
Examples include:
1 Monte Carlo (MC): Sample {xi}ni=1 ∼ Π and let wi = 1/n ∀i .2 Markov Chain Monte Carlo (MCMC): Sample states {xi}ni=1 from a
Markov Chain with invariant distribution Π and let wi = 1/n ∀i .3 Gaussian quadrature, importance sampling, QMC, SMC, etc...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 7 / 32
Bayesian Numerical Integration
Sketch of Bayesian Quadrature
n=0 n=3 n=8
x
Inte
gran
d
Solution of the integral
Pos
terio
r di
strib
utio
n
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 8 / 32
Bayesian Numerical Integration
Bayesian Quadrature
1 Place a Gaussian Process prior (assumed w.l.o.g. to have zero mean).2 Evaluate the integrand f at several locations {xi}ni=1 on X . We get a
Gaussian Process with mean and covariance function:
mn(x) = k(x ,X )k(X ,X )−1f (X )
kn(x , x ′) = k(x , x ′)− k(x ,X )k(X ,X )−1k(X , x ′)
3 Taking the pushforward through the integral operator, we get:
En[Π[f ]] = ΠBQ[f ] := Π[k(·,X )]k(X ,X )−1f (X )
Vn[Π[f ]] = ΠΠ[k]− Π[k(·,X )]k(X ,X )−1Π[k(X , ·)].
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory and RelatedTopics IV, 163175.[3] OHagan, A. (1991). Bayes-Hermite quadrature. Journal of Statistical Planning andInference, 29, 245-260.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 9 / 32
Bayesian Numerical Integration
Bayesian Quadrature
1 Place a Gaussian Process prior (assumed w.l.o.g. to have zero mean).2 Evaluate the integrand f at several locations {xi}ni=1 on X . We get a
Gaussian Process with mean and covariance function:
mn(x) = k(x ,X )k(X ,X )−1f (X )
kn(x , x ′) = k(x , x ′)− k(x ,X )k(X ,X )−1k(X , x ′)
3 Taking the pushforward through the integral operator, we get:
En[Π[f ]] = ΠBQ[f ] := Π[k(·,X )]k(X ,X )−1f (X )
Vn[Π[f ]] = ΠΠ[k]− Π[k(·,X )]k(X ,X )−1Π[k(X , ·)].
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory and RelatedTopics IV, 163175.[3] OHagan, A. (1991). Bayes-Hermite quadrature. Journal of Statistical Planning andInference, 29, 245-260.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 9 / 32
Bayesian Numerical Integration
Bayesian Quadrature
1 Place a Gaussian Process prior (assumed w.l.o.g. to have zero mean).2 Evaluate the integrand f at several locations {xi}ni=1 on X . We get a
Gaussian Process with mean and covariance function:
mn(x) = k(x ,X )k(X ,X )−1f (X )
kn(x , x ′) = k(x , x ′)− k(x ,X )k(X ,X )−1k(X , x ′)
3 Taking the pushforward through the integral operator, we get:
En[Π[f ]] = ΠBQ[f ] := Π[k(·,X )]k(X ,X )−1f (X )
Vn[Π[f ]] = ΠΠ[k]− Π[k(·,X )]k(X ,X )−1Π[k(X , ·)].
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379422.[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory and RelatedTopics IV, 163175.[3] OHagan, A. (1991). Bayes-Hermite quadrature. Journal of Statistical Planning andInference, 29, 245-260.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 9 / 32
Bayesian Numerical Integration
Important Points
The probability measure represents our uncertainty about the value ofΠ[f ] due to the fact that we cant evaluate the integrand f everywhere.
We have chosen to model f (and hence Π[f ]) using a Gaussianmeasure. This is mostly for computational tractability, but isn’tnecessarily the right thing to do!
Notice that the formulae below do not specify where to evaluate f ,but provide a posterior given the points at which we have evaluated.
En[Π[f ]] = ΠBQ[f ] := Π[k(·,X )]k(X ,X )−1f (X )
Vn[Π[f ]] = ΠΠ[k]− Π[k(·,X )]k(X ,X )−1Π[k(X , ·)].
We have the freedom to combine these with MC, MCMC, QMC, etc...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 10 / 32
Bayesian Numerical Integration
Important Points
The probability measure represents our uncertainty about the value ofΠ[f ] due to the fact that we cant evaluate the integrand f everywhere.
We have chosen to model f (and hence Π[f ]) using a Gaussianmeasure. This is mostly for computational tractability, but isn’tnecessarily the right thing to do!
Notice that the formulae below do not specify where to evaluate f ,but provide a posterior given the points at which we have evaluated.
En[Π[f ]] = ΠBQ[f ] := Π[k(·,X )]k(X ,X )−1f (X )
Vn[Π[f ]] = ΠΠ[k]− Π[k(·,X )]k(X ,X )−1Π[k(X , ·)].
We have the freedom to combine these with MC, MCMC, QMC, etc...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 10 / 32
Bayesian Numerical Integration
Important Points
The probability measure represents our uncertainty about the value ofΠ[f ] due to the fact that we cant evaluate the integrand f everywhere.
We have chosen to model f (and hence Π[f ]) using a Gaussianmeasure. This is mostly for computational tractability, but isn’tnecessarily the right thing to do!
Notice that the formulae below do not specify where to evaluate f ,but provide a posterior given the points at which we have evaluated.
En[Π[f ]] = ΠBQ[f ] := Π[k(·,X )]k(X ,X )−1f (X )
Vn[Π[f ]] = ΠΠ[k]− Π[k(·,X )]k(X ,X )−1Π[k(X , ·)].
We have the freedom to combine these with MC, MCMC, QMC, etc...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 10 / 32
Bayesian Numerical Integration
Toy example
Toy problem: f (x) = sin(x) + 1 & Π is N (0, 1). We use the kernelk(x , y) = exp(−(x − y)2/l2) (with l = 1) and sample {xi}ni=1 ∼ Π.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 11 / 32
Bayesian Numerical Integration
Toy example: The effect of sampling
Toy problem: f (x) = sin(x) + 1 & Π is N (0, 1). We use the kernelk(x , y) = exp(−(x − y)2/l2) (with l = 1) and sample {xi}ni=1 ∼ N (0, σ2).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 12 / 32
Bayesian Numerical Integration
Toy example: The effect of sampling and kernel choice
Toy problem: f (x) = sin(x) + 1 & Π is N (0, 12). We use the kernelk(x , y) = exp(−(x − y)2/l2) and sample from {xi}ni=1 ∼ N (0, σ2).
Number of samples: 25
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 13 / 32
Bayesian Numerical Integration
Toy example: The effect of sampling and kernel choice
Toy problem: f (x) = sin(x) + 1 & Π is N (0, 12). We use the kernelk(x , y) = exp(−(x − y)2/l2) and sample from {xi}ni=1 ∼ N (0, σ2).
Number of samples: 50
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 14 / 32
Bayesian Numerical Integration
Toy example: The effect of sampling and kernel choice
Toy problem: f (x) = sin(x) + 1 & Π is N (0, 12). We use the kernelk(x , y) = exp(−(x − y)2/l2) and sample from {xi}ni=1 ∼ N (0, σ2).
Number of samples: 75
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 15 / 32
Bayesian Numerical Integration
Calibration
Clearly, from the previous slide we can tell that hyperparameters will be ofgreat importance. To do this we have several alternatives:
Go full Bayesian and put a prior on the parameter, then look at theposterior predictive. Problem: We lose the conjugacy property andthis becomes very expensive to do!
Do some sort of cross validation. Problem: This is still expensive asrequires solving many linear systems.
Do some empirical Bayes, i.e. maximise the marginal likelihood of thedata. Problem: Not very Bayesian.
In practice, we tend to use empirical Bayes for speed reason...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 16 / 32
Bayesian Numerical Integration
Calibration
Clearly, from the previous slide we can tell that hyperparameters will be ofgreat importance. To do this we have several alternatives:
Go full Bayesian and put a prior on the parameter, then look at theposterior predictive. Problem: We lose the conjugacy property andthis becomes very expensive to do!
Do some sort of cross validation. Problem: This is still expensive asrequires solving many linear systems.
Do some empirical Bayes, i.e. maximise the marginal likelihood of thedata. Problem: Not very Bayesian.
In practice, we tend to use empirical Bayes for speed reason...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 16 / 32
Bayesian Numerical Integration
Calibration
Clearly, from the previous slide we can tell that hyperparameters will be ofgreat importance. To do this we have several alternatives:
Go full Bayesian and put a prior on the parameter, then look at theposterior predictive. Problem: We lose the conjugacy property andthis becomes very expensive to do!
Do some sort of cross validation. Problem: This is still expensive asrequires solving many linear systems.
Do some empirical Bayes, i.e. maximise the marginal likelihood of thedata. Problem: Not very Bayesian.
In practice, we tend to use empirical Bayes for speed reason...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 16 / 32
Bayesian Numerical Integration
Calibration
Clearly, from the previous slide we can tell that hyperparameters will be ofgreat importance. To do this we have several alternatives:
Go full Bayesian and put a prior on the parameter, then look at theposterior predictive. Problem: We lose the conjugacy property andthis becomes very expensive to do!
Do some sort of cross validation. Problem: This is still expensive asrequires solving many linear systems.
Do some empirical Bayes, i.e. maximise the marginal likelihood of thedata. Problem: Not very Bayesian.
In practice, we tend to use empirical Bayes for speed reason...
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 16 / 32
Bayesian Numerical Integration
Convergence Results
So why does it converge fast?
For integration in an RKHS H (associated to the GP kernel k), thestandard quantity to consider is the worst-case error:
e(Π; Π,H
):= sup
‖f ‖H≤1
∣∣∣Π[f ]− Π[f ]∣∣∣
One can show that BQ attains optimal rates of convergence forcertain classes of smooth functions. In particular, with i.i.d points,one can get O(n−
αd
+ε) for spaces of smoothness α and
O(exp(−Cn1d−ε)) for infinitely smooth functions.
Briol, F.-X., Oates, C. J., Girolami, M., Osborne, M. A., & Sejdinovic, D. (2015). ProbabilisticIntegration: A Role for Statisticians in Numerical Analysis?, arXiv:1512.00933.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 17 / 32
Bayesian Numerical Integration
Application: Global Illumination
object
𝝎𝒊
𝝎𝒐
𝑆2
environment mapcamera
𝒏
𝐿𝑒(𝝎𝑜)
𝐿𝑖(𝝎𝑜)
We need to compute three integrals at each pixel:
Lo(ωo) = Le(wo) +
∫S2
Li (ωi )ρ(ωi , ωo)[ωi · n]+dσ(ωi )
[1] Marques, R., Bouville, C., Ribardiere, M., Santos, P., & Bouatouch, K. (2013). A sphericalGaussian framework for Bayesian Monte Carlo rendering of glossy surfaces. IEEE Transactionson Visualization and Computer Graphics, 19(10), 1619-1632.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 18 / 32
Bayesian Numerical Integration
Application: Global Illumination
Number of States (n)
102 103
0.00001
0.001
1MMD
MC BMC QMC BQMC
Inte
gra
l E
stim
ate
0.014
0.016
0.018
0.02
0.022Red Channel
Number of States (n)
102 103
Inte
gra
l E
stim
ate
0.016
0.017
0.018
0.019
0.02
×10-3
-4
-2
0
2
4Green Channel
Number of States (n)
102 103
×10-3
-2
-1
0
1
2
0.012
0.014
0.016
0.018
0.02Blue Channel
Number of States (n)
102 1030.014
0.015
0.016
0.017
0.018
The kernel used gives an RKHS norm-equivalent to a Sobolev spaceof smoothness 3
2 :
k(x , x ′) =8
3− ‖x − x ′‖2 for all x , x ′ ∈ S2.
We can show a convergence rate of e(ΠBMC; Π,H) = OP(n−34 )
which is optimal for this space!
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 19 / 32
Bayesian Numerical Integration
Application: Global Illumination
Number of States (n)
102 10310-6
10-4
10-2
100MMD
MC
BMC
QMC
BQMC
Spreading the points and re-weighting can help significantly!
[1] Briol, F.-X., Oates, C. J., Girolami, M., & Osborne, M. A. (2015). Frank-Wolfe BayesianQuadrature: Probabilistic Integration with Theoretical Guarantees. In Advances In NeuralInformation Processing Systems 28 (pp. 1162–1170).[2] Briol, F.-X., Oates, C. J., Cockayne, J., Chen, W. Y., & Girolami, M. (2017). On theSampling Problem for Kernel Quadrature. In Proceedings of the 34th International Conferenceon Machine Learning (pp. 586–595).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 20 / 32
Bayesian Quadrature for Multiple Integrals
Bayesian Quadrature for Multiple Integrals
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 21 / 32
Bayesian Quadrature for Multiple Integrals
What About Multiple Integrals?
In the example, we actually need to approximate thousands ofintegrals for each frame of a virtual environment... This is slow andexpensive!
We can formalise the process above as that of finding the integral ofa set of functions f1, . . . , fD against some measure Π. But what if weknow something about how f1 relates to f2, etc...?
It might make more sense to approximate the integral with aquadrature rule of the form:
Π[fd ] =D∑
d ′=1
n∑i=1
(Wi )dd ′fd ′(xd ′i )
where the weights would encode the correlation across functions.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 22 / 32
Bayesian Quadrature for Multiple Integrals
What About Multiple Integrals?
In the example, we actually need to approximate thousands ofintegrals for each frame of a virtual environment... This is slow andexpensive!
We can formalise the process above as that of finding the integral ofa set of functions f1, . . . , fD against some measure Π. But what if weknow something about how f1 relates to f2, etc...?
It might make more sense to approximate the integral with aquadrature rule of the form:
Π[fd ] =D∑
d ′=1
n∑i=1
(Wi )dd ′fd ′(xd ′i )
where the weights would encode the correlation across functions.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 22 / 32
Bayesian Quadrature for Multiple Integrals
What About Multiple Integrals?
In the example, we actually need to approximate thousands ofintegrals for each frame of a virtual environment... This is slow andexpensive!
We can formalise the process above as that of finding the integral ofa set of functions f1, . . . , fD against some measure Π. But what if weknow something about how f1 relates to f2, etc...?
It might make more sense to approximate the integral with aquadrature rule of the form:
Π[fd ] =D∑
d ′=1
n∑i=1
(Wi )dd ′fd ′(xd ′i )
where the weights would encode the correlation across functions.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 22 / 32
Bayesian Quadrature for Multiple Integrals
Bayesian Quadrature for Multiple Related Functions
We can use the same type of results for Gaussian Processes on theextended space of vector-valued functions f : X → RD (rather thanf : X → R) where f (x) = (f1(x), . . . , fD(x)).
This approach allow us to directly encode the relation between eachfunction fi by specifying the kernel K .
In this case the posterior distribution is a GP(mn,Kn) withvector-valued mean mn : X → RD and matrix-valued covarianceKn : X × X → RD×D :
mn(x) = K (x ,X )K (X ,X )−1f (X )
Kn(x , x ′) = K (x , x ′)−K (x ,X )K (X ,X )−1K (X , x ′).
The overall cost for computing this is O(n3D3).
Alvarez, M. A., Rosasco, L., & Lawrence, N. D. (2012). Kernels for vector-valuedfunctions: A review. Foundations and Trends in Machine Learning, 4(3), 195-266.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 23 / 32
Bayesian Quadrature for Multiple Integrals
Bayesian Quadrature for Multiple Related Functions
We can use the same type of results for Gaussian Processes on theextended space of vector-valued functions f : X → RD (rather thanf : X → R) where f (x) = (f1(x), . . . , fD(x)).
This approach allow us to directly encode the relation between eachfunction fi by specifying the kernel K .
In this case the posterior distribution is a GP(mn,Kn) withvector-valued mean mn : X → RD and matrix-valued covarianceKn : X × X → RD×D :
mn(x) = K (x ,X )K (X ,X )−1f (X )
Kn(x , x ′) = K (x , x ′)−K (x ,X )K (X ,X )−1K (X , x ′).
The overall cost for computing this is O(n3D3).
Alvarez, M. A., Rosasco, L., & Lawrence, N. D. (2012). Kernels for vector-valuedfunctions: A review. Foundations and Trends in Machine Learning, 4(3), 195-266.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 23 / 32
Bayesian Quadrature for Multiple Integrals
Consider multi-output Bayesian Quadrature with a GP(0,K ) prior onf = (f1, . . . , fD)>. The posterior distribution on Π[f ] is aD-dimensional Gaussian with mean and covariance matrix:
EN [Π[f ]] = Π[K (·,X )]K (X ,X )−1f (X )
VN [Π[f ]] = ΠΠ [K ]− Π[K (·,X )]K (X ,X )−1Π[K (X , ·)]
Kernel evaluations are now matrix-valued (i.e. in RD×D) as opposedto scalar-valued. A simple example is the following separable kernel:
K (x , x ′) = Bk(x , x ′)
B encodes the covariance between function, and k the type offunction in each of the components.
In this case, we can reduce the cost to O(n3 + D3).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 24 / 32
Bayesian Quadrature for Multiple Integrals
Consider multi-output Bayesian Quadrature with a GP(0,K ) prior onf = (f1, . . . , fD)>. The posterior distribution on Π[f ] is aD-dimensional Gaussian with mean and covariance matrix:
EN [Π[f ]] = Π[K (·,X )]K (X ,X )−1f (X )
VN [Π[f ]] = ΠΠ [K ]− Π[K (·,X )]K (X ,X )−1Π[K (X , ·)]
Kernel evaluations are now matrix-valued (i.e. in RD×D) as opposedto scalar-valued. A simple example is the following separable kernel:
K (x , x ′) = Bk(x , x ′)
B encodes the covariance between function, and k the type offunction in each of the components.
In this case, we can reduce the cost to O(n3 + D3).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 24 / 32
Bayesian Quadrature for Multiple Integrals
Some Theory for Multi-output BQ
Define the individual worst-case errors:
e(Π; Π,HK , d) = sup‖f ‖K≤1
∣∣∣Π[fd ]− Π[fd ]∣∣∣
Theorem (Convergence rate for BQ with separable kernel)
Suppose we want to approximate Π[f ] for some f : X → RD and ΠBQ[f ]is the multi-output BQ rule with the kernel K (x , x ′) = Bk(x , x ′) for somepositive definite B ∈ RD×D and scalar-valued kernel k : X ×X → R whereall functions are evaluated on a point set {xi}ni=1.Then, ∀d = 1, . . . ,D, we have:
e(ΠBQ; Π,HK , d) = O(e(ΠBQ; Π,Hk)
)F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 25 / 32
Bayesian Quadrature for Multiple Integrals
Some Theory for Multi-output BQ
Assume that X ⊂ Rp is a domain with Lipschitz boundary and satisfyingan interior cone condition. Furthermore, assume the {xi}ni=1 are either:(A1) IID samples from some distribution Π′ with density π′ > 0 on X , or(A2) obtained from a quasi-uniform grid on X .
If Hk is norm-equivalent to an RKHS with Matern kernel ofsmoothness α > p
2 , we have ∀d = 1, . . . ,D:
e(HK , ΠBQ,X , d) = O(n−
αp
+ε).
for ε > 0 arbitrarily small.
If Hk is norm-equivalent to the RKHS with squared-exponential,multiquadric or inverse multiquadric kernel, we have ∀d = 1, . . . ,D:
e(HK , ΠBQ,X , d) = O(
exp(−C1n
1p−ε))
.
for some C1 > 0 and ε > 0 arbitrarily small.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 26 / 32
Bayesian Quadrature for Multiple Integrals
Some Theory for Multi-output BQ
Assume that X ⊂ Rp is a domain with Lipschitz boundary and satisfyingan interior cone condition. Furthermore, assume the {xi}ni=1 are either:(A1) IID samples from some distribution Π′ with density π′ > 0 on X , or(A2) obtained from a quasi-uniform grid on X .
If Hk is norm-equivalent to an RKHS with Matern kernel ofsmoothness α > p
2 , we have ∀d = 1, . . . ,D:
e(HK , ΠBQ,X , d) = O(n−
αp
+ε).
for ε > 0 arbitrarily small.
If Hk is norm-equivalent to the RKHS with squared-exponential,multiquadric or inverse multiquadric kernel, we have ∀d = 1, . . . ,D:
e(HK , ΠBQ,X , d) = O(
exp(−C1n
1p−ε))
.
for some C1 > 0 and ε > 0 arbitrarily small.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 26 / 32
Bayesian Quadrature for Multiple Integrals
Some Theory for Multi-output BQ
Assume that X ⊂ Rp is a domain with Lipschitz boundary and satisfyingan interior cone condition. Furthermore, assume the {xi}ni=1 are either:(A1) IID samples from some distribution Π′ with density π′ > 0 on X , or(A2) obtained from a quasi-uniform grid on X .
If Hk is norm-equivalent to an RKHS with Matern kernel ofsmoothness α > p
2 , we have ∀d = 1, . . . ,D:
e(HK , ΠBQ,X , d) = O(n−
αp
+ε).
for ε > 0 arbitrarily small.
If Hk is norm-equivalent to the RKHS with squared-exponential,multiquadric or inverse multiquadric kernel, we have ∀d = 1, . . . ,D:
e(HK , ΠBQ,X , d) = O(
exp(−C1n
1p−ε))
.
for some C1 > 0 and ε > 0 arbitrarily small.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 26 / 32
Bayesian Quadrature for Multiple Integrals
Theory in the Misspecified Case
Theorem (Misspecified Convergence Result for Separable Kernel)
Let kα be a kernel norm-equivalent to a Matern kernel on some domain Xwith Lipschitz boundary and interior cone condition and consider the BQrule ΠBQ[f ] corresponding to a separable kernel Kα(x , x ′) = Bkα(x , x ′).
Suppose {xi}ni=1 satisfies (A2), and f ∈ HCβwhere p
2 ≤ β ≤ α.Then, ∀d = 1, . . . ,D:∣∣∣Π[fd ]− ΠBQ[fd ]
∣∣∣ = O(n−
βp
+ε)
for some ε > 0.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 27 / 32
Bayesian Quadrature for Multiple Integrals
Multi-output BQ for the Computer Graphics Example
object
𝝎𝒊
𝝎𝒐
𝑆2
environment mapcamera
𝒏
𝐿𝑒(𝝎𝑜)
𝐿𝑖(𝝎𝑜)
We compute integrals for different integrands based on various anglesω0 (akin to a camera moving).
We pick a separable kernel K (x , x ′) = Bk(x , x ′) where B is chosen torepresent the angle between integrands and k(x , x ′) = 8
3 −‖x −x ′‖2.
We can prove that the worst-case integration error converges at a
rate O(n−34 ) for each integrand. This is the same rate as uni-output
BQ (but we usually improve on constants).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 28 / 32
Bayesian Quadrature for Multiple Integrals
Multi-output BQ for the Computer Graphics Example
object
𝝎𝒊
𝝎𝒐
𝑆2
environment mapcamera
𝒏
𝐿𝑒(𝝎𝑜)
𝐿𝑖(𝝎𝑜)
We compute integrals for different integrands based on various anglesω0 (akin to a camera moving).
We pick a separable kernel K (x , x ′) = Bk(x , x ′) where B is chosen torepresent the angle between integrands and k(x , x ′) = 8
3 −‖x −x ′‖2.
We can prove that the worst-case integration error converges at a
rate O(n−34 ) for each integrand. This is the same rate as uni-output
BQ (but we usually improve on constants).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 28 / 32
Bayesian Quadrature for Multiple Integrals
Multi-output BQ for the Computer Graphics Example
object
𝝎𝒊
𝝎𝒐
𝑆2
environment mapcamera
𝒏
𝐿𝑒(𝝎𝑜)
𝐿𝑖(𝝎𝑜)
We compute integrals for different integrands based on various anglesω0 (akin to a camera moving).
We pick a separable kernel K (x , x ′) = Bk(x , x ′) where B is chosen torepresent the angle between integrands and k(x , x ′) = 8
3 −‖x −x ′‖2.
We can prove that the worst-case integration error converges at a
rate O(n−34 ) for each integrand. This is the same rate as uni-output
BQ (but we usually improve on constants).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 28 / 32
Bayesian Quadrature for Multiple Integrals
Multi-output BQ for the Computer Graphics Example
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 29 / 32
Bayesian Quadrature for Multiple Integrals
Application: Multifidelity Modelling
In each case, we have access to a cheap simulator/function (left) and anexpensive simulator/function (right).
blue = truth, red = uni-output, yellow & purple = two outputs[1] Perdikaris, P., Raissi, M., Damianou, A., Lawrence, N. D., & Karniadakis, G. E. (2016).Nonlinear information fusion algorithms for robust multi-fidelity modeling. Proceedings of theRoyal Society A: Mathematical, Physical, and Engineering Sciences, 473(2198).
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 30 / 32
Bayesian Quadrature for Multiple Integrals
Application: Multifidelity Modelling
In multifidelity modelling, multi-output Gaussian processes are alreadybeing used as efficient surrogate models.
Furthermore, we are in general interested in some statistic of theexpensive surrogate model. We therefore might as well re-use thismulti-output GP approximate the integral.
Results:
Model 1-output BQ 2-output BQ
Step function (l) 0.024 (0.223) 0.021 (0.213)Step function (h) 0.405 (0.03) 0.09 (0.091)
Forrester function (l) 0.076 (4.913) 0.076 (4.951)Forrester function (h) 3.962 (3.984) 2.856 (27.01)
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 31 / 32
Bayesian Quadrature for Multiple Integrals
References
[1] Larkin, F. M. (1972). Gaussian measure in Hilbert space and applications in numericalanalysis. Rocky Mountain Journal of Mathematics, 2(3), 379422.
[2] Diaconis, P. (1988). Bayesian Numerical Analysis. Statistical Decision Theory and RelatedTopics IV, 163175.
[3] O’Hagan, A. (1991). Bayes-Hermite quadrature. Journal of Statistical Planning andInference, 29, 245260.
[4] Rasmussen, C., & Ghahramani, Z. (2002). Bayesian Monte Carlo. In Advances in NeuralInformation Processing Systems (pp. 489496).
[5] Briol, F.-X., Oates, C. J., Girolami, M., Osborne, M. A., & Sejdinovic, D. (2015).Probabilistic Integration: A Role for Statisticians in Numerical Analysis?, arXiv:1512.00933.
[6] Xi, X., Briol, F.-X., & Girolami, M. (2018). Bayesian Quadrature for Multiple RelatedIntegrals. arXiv:1801.04153.
F-X Briol (Warwick & Imperial) BQ for Multiple Related Integrals May 2018 32 / 32