18
Control Functionals for Quasi-Monte Carlo Integration Chris. J. Oates Mark Girolami University of Technology Sydney and University of Warwick and The Alan Turing Institute for Data Science Abstract Quasi-Monte Carlo (QMC) methods are be- ing adopted in statistical applications due to the increasingly challenging nature of numer- ical integrals that are now routinely encoun- tered. For integrands with d-dimensions and derivatives of order α, an optimal QMC rule converges at a best-possible rate O(N -α/d ). However, in applications the value of α can be unknown and/or a rate-optimal QMC rule can be unavailable. Standard practice is to employ α L -optimal QMC where the lower bound α L α is known, but in general this does not exploit the full power of QMC. One solution is to trade-off numerical integration with functional approximation. This strat- egy is explored herein and shown to be well- suited to modern statistical computation. A challenging application to robotic arm data demonstrates a substantial variance reduc- tion in predictions for mechanical torques. 1 Introduction Consider a Lebesgue-integrable test function f : X→ R defined on a bounded measurable subspace X⊆ R d (d N) with square integrable derivatives of order α> 0 in each variable. Our focus is numerical computation of the integral I [f ] := R X f (x)dx. The Quasi-Monte Carlo (QMC) approach is based on an approximation Q[f ; x 1:N ] := 1 N N X n=1 f (x n ) where the (possibly random) design points x 1:N = {x 1 ,..., x N }⊂X have low discrepancy; that is, the Appearing in Proceedings of the 19 th International Con- ference on Artificial Intelligence and Statistics (AISTATS) 2016, Cadiz, Spain. JMLR: W&CP volume 41. Copyright 2016 by the authors. points are ‘well-spaced’ in a precise sense defined be- low. This contrasts with the Monte Carlo (MC) ap- proach whereby the design points are sampled inde- pendently from a uniform distribution over X . MC integration achieves a root mean square error (RMSE) convergence rate of O(N -1/2 ) whereas QMC integra- tion can in principle achieve a rate O(N -α/d ) on spe- cific geometric sequences {N n } n=1 [24]. It is known that this rate is best-possible [19] and explicit algo- rithms to generate design points that attain this rate are now available for many (but not all) values of α [6]. Challenging integration problems are common in con- temporary statistics, for example when computing ex- pectations, marginal probability densities or normalis- ing constants, and QMC methods are therefore gaining importance in statistical applications [12, 17, 35]. Contrary to the above theoretical considerations, rate- optimal QMC is often not employed in practice. This is mainly due to three reasons; either (R1) the smooth- ness parameter α is unknown, (R2) there does not cur- rently exist an explicit QMC rule that is rate-optimal for functions of smoothness α, or (R3) it is simply more convenient to employ a basic QMC rule based on a weaker smoothness assumption α L , as imple- mented in standard software. In each situation there is a gap between theory and practice that, as we show in this paper, can be bridged using functional approx- imation. Previous work on variance reduction techniques for QMC includes [1], who considered modified impor- tance sampling strategies, and [14], who considered constructing control variates for QMC. Neither ap- proach improved the asymptotic error rate, though in some cases the QMC error was reduced by a constant factor. Interestingly, [14] reports some quite negative results for control variate strategies in this setting, be- cause the objective being minimised by QMC is not equivalent to the MC variance that is minimised by control variates. [33] demonstrates variance reductions in QMC are possible using additive approximations, though again the asymptotics were unchanged. This paper studies a general approach to variance re- arXiv:1501.03379v7 [stat.CO] 1 Apr 2016

Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

Chris. J. Oates Mark GirolamiUniversity of Technology Sydney and University of Warwick and

The Alan Turing Institute for Data Science

Abstract

Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due tothe increasingly challenging nature of numer-ical integrals that are now routinely encoun-tered. For integrands with d-dimensions andderivatives of order α, an optimal QMC ruleconverges at a best-possible rate O(N−α/d).However, in applications the value of α canbe unknown and/or a rate-optimal QMC rulecan be unavailable. Standard practice is toemploy αL-optimal QMC where the lowerbound αL ≤ α is known, but in general thisdoes not exploit the full power of QMC. Onesolution is to trade-off numerical integrationwith functional approximation. This strat-egy is explored herein and shown to be well-suited to modern statistical computation. Achallenging application to robotic arm datademonstrates a substantial variance reduc-tion in predictions for mechanical torques.

1 Introduction

Consider a Lebesgue-integrable test function f : X →R defined on a bounded measurable subspace X ⊆ Rd(d ∈ N) with square integrable derivatives of order α >0 in each variable. Our focus is numerical computationof the integral I[f ] :=

∫X f(x)dx. The Quasi-Monte

Carlo (QMC) approach is based on an approximation

Q[f ;x1:N ] :=1

N

N∑n=1

f(xn)

where the (possibly random) design points x1:N ={x1, . . . ,xN} ⊂ X have low discrepancy; that is, the

Appearing in Proceedings of the 19th International Con-ference on Artificial Intelligence and Statistics (AISTATS)2016, Cadiz, Spain. JMLR: W&CP volume 41. Copyright2016 by the authors.

points are ‘well-spaced’ in a precise sense defined be-low. This contrasts with the Monte Carlo (MC) ap-proach whereby the design points are sampled inde-pendently from a uniform distribution over X . MCintegration achieves a root mean square error (RMSE)convergence rate of O(N−1/2) whereas QMC integra-tion can in principle achieve a rate O(N−α/d) on spe-cific geometric sequences {Nn}∞n=1 [24]. It is knownthat this rate is best-possible [19] and explicit algo-rithms to generate design points that attain this rateare now available for many (but not all) values of α [6].Challenging integration problems are common in con-temporary statistics, for example when computing ex-pectations, marginal probability densities or normalis-ing constants, and QMC methods are therefore gainingimportance in statistical applications [12, 17, 35].

Contrary to the above theoretical considerations, rate-optimal QMC is often not employed in practice. Thisis mainly due to three reasons; either (R1) the smooth-ness parameter α is unknown, (R2) there does not cur-rently exist an explicit QMC rule that is rate-optimalfor functions of smoothness α, or (R3) it is simplymore convenient to employ a basic QMC rule basedon a weaker smoothness assumption αL < α, as imple-mented in standard software. In each situation thereis a gap between theory and practice that, as we showin this paper, can be bridged using functional approx-imation.

Previous work on variance reduction techniques forQMC includes [1], who considered modified impor-tance sampling strategies, and [14], who consideredconstructing control variates for QMC. Neither ap-proach improved the asymptotic error rate, though insome cases the QMC error was reduced by a constantfactor. Interestingly, [14] reports some quite negativeresults for control variate strategies in this setting, be-cause the objective being minimised by QMC is notequivalent to the MC variance that is minimised bycontrol variates. [33] demonstrates variance reductionsin QMC are possible using additive approximations,though again the asymptotics were unchanged.

This paper studies a general approach to variance re-

arX

iv:1

501.

0337

9v7

[st

at.C

O]

1 A

pr 2

016

Page 2: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

duction for QMC rules, building on kernel methodsand recent work in the Monte Carlo setting due to[22, 31]. The mathematics that underpins our workcomes from the functional approximation literature.This takes the form of a ‘control functional’ ψ : X → Rthat satisfies (i) ψ integrates to zero, (ii) f −ψ is moreamenable to QMC methods than f , in a precise sense.The general approach that we explore is to replace theintegrand f by f−ψ and target the QMC objective di-rectly. This can lead to accelerated asymptotics. Themain contribution of this paper is to explore this strat-egy in the settings (R1-3) above. Theoretical analysisof convergence rates is provided, along with empiricalresults and a challenging application to robotics. Webegin by presenting some background on QMC the-ory below, before describing the methodology in moredetail.

2 Background

QMC is naturally studied in reproducing kernelHilbert spaces (RKHS; [8]). Below we draw connec-tions with kernel methods, that are themselves natu-rally studied in RKHS.

Notation. We work in a Hilbert space H, consist-ing of measurable functions f : X → R. For simplic-ity of presentation we assume H includes the constantfunctions. We follow the mainstream QMC literatureby taking X = [0, 1]d, equipped with the Euclidean

norm ‖x‖ := (∑di=1 x

2i )

1/2. Denote the scalar prod-uct and norm on H by 〈·, ·〉H and ‖ · ‖H respectively.Suppose further that H is a RKHS with kernel K :[0, 1]d× [0, 1]d → R; that is, K satisfies (i) K(·,x) ∈ Hfor all x ∈ [0, 1]d and (ii) f(x) = 〈f,K(·,x)〉H for allf ∈ H and all x ∈ [0, 1]d. K is assumed to be non-trivial, i.e. K 6= 0.

Quadrature Error Analysis. The quadraturemethods that we focus on aim to minimise the ‘worstcase’ integration error which, for design points x1:N

and Hilbert space H, is defined to be

eH(x1:N ) := sup‖f‖H≤1

∣∣Q[f ;x1:N ]− I[f ]∣∣ (1)

where the supremum is taken over all test functionsf belonging to the unit ball in H. It follows fromlinearity that, for any function f ∈ H, the integrationerror obeys∣∣Q[f ;x1:N ]− I[f ]

∣∣ ≤ eH(x1:N )‖f‖H . (2)

The worst case error eH(x1:N ) is the usual targetof QMC innovation, with x1:N chosen to (approxi-mately, asymptotically) minimise eH(x1:N ) [8]. Notethat Eqn. 1 is also the ‘maximum mean discrepancy’

(MMD), as studied extensively in the kernel methodsliterature [4, 30].

Quadrature is naturally studied in RKHS becausethere exists a closed-form expression for the worst caseerror in terms of the kernel K, which facilitates theprincipled selection of design points [8]:

eH(x1:N )2 =

∫ ∫[0,1]d

K(x,y)dxdy

− 2

N

N∑n=1

∫[0,1]d

K(xn,y)dy

+1

N2

N∑m,n=1

K(xn,xm) (3)

The mainstream QMC literature supposes H is aSobolev space of known order α (defined below). Inthis setting, O(N−α/d) is the best-possible rate for theworst case error when x1:N are chosen deterministi-cally and O(N−α/d−1/2) is the best-possible RMSEwhen x1:N are allowed to be random [19]. We will re-fer to QMC rules that achieve these optimal rates as‘α-QMC rules’.

This paper focuses on improving performance in thesituation where a (sub-optimal) αL-QMC rule is usedto integrate a test function of smoothness α > αL. Forreasons (R1-3), this scenario is commonly encounteredin statistical applications. In contrast to QMC [8] (andkernel methods that aim to minimise the MMD [2]),the rate constant ‖f‖H is the primary target of ourmethodology below.

3 Methodology

Control Functionals for QMC. The approach thatwe pursue in this paper aims to construct a Lebesgue-integrable functional ψ : [0, 1]d → R that satisfies

I[ψ] = 0. (4)

When x has the interpretation of a random variable,ψ(x) is classically known as a ‘control variate’ [14].When ψ itself is estimated, we follow [22] and refer tothe entire mapping ψ as a ‘control functional’ (CF). Inthe CF approach to estimation, the test function f isreplaced by f − ψ; it is hoped that the latter is moreamenable to numerical integration. Clearly I[f −ψ] =I[f ]. In this paper we construct a CF ψN based ona tractable approximation fN to f . (The dependenceon N will be explained below.) It is required that theintegral I[fN ] is available in closed-form. We then set

ψN (x) = fN (x)− I[fN ] (5)

so that ψN satisfies Eqn. 4. For this to make sensemathematically, it must be the case that fN ∈ H and

Page 3: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

this informs our method of approximation (the con-stant function with value I[fN ] belongs to H by as-sumption). Intuitively, a good CF ψN will provide aclose approximation to fluctuations of the test functionf , so that the functional difference f − ψN becomeincreasingly ‘flat’ and thus more amenable to QMCmethods. More precisely, motivated by Eqn. 2 we aimto construct a CF such that ‖f −ψN‖H < ‖f‖H . Thisconnection with functional approximation offers thepossibility to leverage kernel methods for these prob-lems, see e.g. [27, 31].

Control Functional Error Analysis. Consider par-titioning x1:N into two sets u1:M and vM+1:N where1 < M < N and M/N → c ∈ (0, 1) as N → ∞.The first set u1:M , possibly non-random, will be usedin a preliminary step to construct an approximationfM (·;u1:M ) to f . Then the second set vM+1:N , possi-bly random, is used to evaluate the ‘CF estimator’

E[f ;u1:M ,vM+1:N ] := Q[f − ψN (·;u1:M );vM+1:N ]

= Q[f − fM (·;u1:M );vM+1:N ]

+I[fM (·;u1:M )]. (6)

We remark that if the points vn are ran-dom and marginally distributed as U([0, 1]d) thenE[f ;u1:M ,vM+1:N ] will be an unbiased estimator forI[f ]. Error analysis for the CF estimator is based onthe following:

Theorem 1. Given f, fM ∈ H, we have

|E[f ;u1:M ,vM+1:N ]− I[f ]|≤ eH(vM+1:N )‖f − fM (·;u1:M )‖H . (7)

Proof. Since f, fM ∈ H we have that f − fM ∈ H.The result then follows by applying the fundamentalinequality from Eqn. 2 to the function f − fM andusing linearity of the integral operator I.

Thus the CF methodology produces an estimatorE[f ;u1:M ,vM+1:N ] that has asymptotically zero errorrelative to standard QMC estimators, providing thatit is possible to construct an approximation fM to f insuch a way that ‖f − fM (·;u1:M )‖H → 0 as M →∞.The next sections establish convergence rates for func-tional approximation using kernel methods.

Sobolev Spaces. To achieve consistent approxima-tion ‖f − fM‖H → 0 it is necessary to impose regu-larity conditions on H. Sobolev spaces are a generalsetting in which to formulate such regularity assump-tions; our main reference here is [27]. Firstly supposethat k ∈ N0, k > d/2 and 1 ≤ p < ∞. For a multi-index a ∈ Nd0 we write |a| = a1 + · · ·+ ad. Define the

‘p-Sobolev space of order k’ to be

W k,p := {f : [0, 1]d → R | Daf exists and

Daf ∈ Lp([0, 1]d),∀a ∈ Nd0 with |a| ≤ k}.

Here Daf denotes the weak (or ‘distributional’)derivative of f ; the reader is referred to the abovereference for details. Clearly W k,p is a vector spaceover R when addition and (scalar) multiplication aredefined point-wise. For the special case p = 2 we equipW k,2 with the inner product

〈f, g〉k :=∑

a∈Nd0 ,|a|≤k

I[DafDag]

and denote this inner-product space Hk :=(W k,2, 〈·, ·〉k). Defined in this way, Hk is a Hilbertspace of functions whose (weak) derivatives exist upto order k. Moreover Hk can be made into a RKHSwith an appropriate choice of kernel (see below). Ourresults below apply also to Sobolev spaces with non-integer k, but this construction is more technical andwe refer the reader to [27] for details.

Approximation in Sobolev Spaces. Our assump-tions are naturally stated using Sobolev spaces: Giventwo Hilbert spaces H, H ′, defined on the same ele-ment set, with norms ‖ · ‖H , ‖ · ‖H′ , we say that Hand H ′ are ‘norm-equivalent’, written H ≡ H ′, when-ever there exist positive constants c1, c2 such thatc1‖f‖H ≤ ‖f‖H′ ≤ c2‖f‖H for all f ∈ H.

Assumption 1: H ≡ HαL where αL > d/2.

Assumption 2: f ∈ Hα where α ≥ αL.

Assumption 1 is a technical requirement to ensure thespace H (where QMC is performed) admits consistentfunctional approximation. Assumption 2 ensures thatthe test function f is ‘smooth enough’ for αL-QMCmethods to converge at the αL-rate. This follows fromthe fact that Sobolev spaces are nested, so that f ∈Hα =⇒ f ∈ HαL .

For consistent approximation of f it is necessary tobase our approximation fM in a space H∗ of functionsthat are ‘at least as smooth’ as f :

Assumption 3: H∗ ≡ HαU where αU ≥ α.

It follows again from the nested property that fM ∈HαL and thus the functional difference f − fM existsin HαL . The Sobolev spaces H∗ can be characterisedas RKHS via an appropriate reproducing kernel K∗,such as the well-known Matern kernel.

Finally an approximation fM to f is constructed basedon the points u1:M as follows:

fM (x;u1:M ) :=

M∑n=1

βnK∗(x,un) (8)

Page 4: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

where the weights βn ∈ R are defined as the solutionto the linear system of interpolation equations

fM (un;u1:M ) = f(un), n = 1, . . . ,M. (9)

It is well-known that Eqn. 8 is the unique minimiserof the H∗-norm under all functions in H∗ that satisfythe linear system in Eqn. 9 [27]. In practice it maybe necessary to regularise the linear system in order tofacilitate inversion, but we do not go into details here,see e.g. [27].

We note that I[fM ] will not have a closed-form ex-pression when the Matern kernel is employed and forthis technical reason we instead employ tensor prod-ucts of polynomial kernels (these give rise to Sobolevspaces of mixed dominating smoothness - full detailsare provided at the end of this section).

Theory: Deterministic Case. We begin by con-sidering the case where the design points vM+1:N arechosen deterministically. Define the ‘fill distance’

h(u1:M ) := supx∈[0,1]d

minn‖x− un‖,

the ‘separation radius’

q(u1:M ) :=1

2minj 6=k‖uj − uk‖

and the ‘mesh ratio’ ρ(u1:M ) := h(u1:M )/q(u1:M ).The set u1:M is called ‘quasi-uniform’ if ρ(u1:M ) → 1as M →∞.

Theorem 2. Under Assumptions 1-3 the CF estima-tor has error bounded by

|E[f ;u1:M ,vM+1:N ]− I[f ]|≤ CeHαL (vM+1:N )h(u1:M )α−αLρ(u1:M )αU−αL‖f‖Hα

where C > 0 is a constant that depends on α, αL andαU but not on f , vM+1:N and u1:M .

Proof. From [27] (Theorem 7.8) we have that the ker-nel estimator in Eqn. 8 is consistent for the non-parametric regression problem at a rate

‖f − fM (·;u1:M )‖HαL≤ Ch(u1:M )α−αLρ(u1:M )αU−αL‖f‖Hα

where C depends only on α, αL, αU . Combining thiswith Eqn. 7 completes the proof.

For quasi-uniform u1:M , there is no asymptoticpenalty from employing a kernel K∗ that imposes ‘toomuch smoothness’ on the approximation fM , withρ→ 1. In this case h(u1:M ) = O(M−1/d) and, sinceMand N are proportional, h(u1:M ) = O(N−1/d). How-ever the rate constant C will increase when too much

smoothness is assumed so that, as a rule of thumb, weshould try to select αU as close as possible to α. Ourmain result is stated below:

Corollary 1. When u1:M is quasi-uniform, CFs ac-celerate αL-QMC by a factor O(N−(α−αL)/d).

Remark: The improvement due to CFs appears to bemainly limited to low-dimensional integrals (d small),but in fact CFs can in principle be extended to high-dimensional integrals under additional tractability as-sumptions, as discussed in Sec. 5.

Remark: Optimising the bound in Theorem 2 enablesus to obtain the optimal scaling

M

N→ c∗ =

α− αLα

,

see the Supplement for full details.

The overall convergence rate of the CF estimator de-pends on how the design points vM+1:N are gener-ated. For this there are many QMC methodologiesavailable, each leading to different convergence ratesfor the worst case error eHαL (vM+1:N ); see [7] for arecent survey of some of these approaches. Of partic-ular interest in statistical applications is the case ofrandom design points which we discuss below.

Theory: Randomised Case. Modern QMC meth-ods begin with a deterministic set/sequence of designpoints (e.g. a Halton sequence or a Sobol sequence),then apply a random transformation leading to a lowdiscrepancy set with high probability. Below we con-sider three types of randomisation; shifting, foldingand scrambling.

Shifting: In ‘random shift’ QMC the design pointsvM+1:N are translated by a common uniform randomvector ∆ ∈ [0, 1]d, so that vn 7→ vn + ∆ for eachn = M + 1, . . . , N . For convenience we write this‘shifted’ set as vM+1:N + ∆. Applying Theorem 2 tovM+1:N + ∆ and then marginalising over ∆ ∈ [0, 1]d

produces a RMSE bound for the CF estimator:

Corollary 2. Under Assumptions 1-3 the randomshift CF estimator has error bounded by√E|E[f ;u1:M ,vM+1:N + ∆]− I[f ]|2

≤ CeshHαL (vM+1:N )h(u1:M )α−αLρ(u1:M )αU−αL‖f‖Hα

where

(eshHαL (vM+1:N ))2 :=

∫[0,1]d

eHαL (vM+1:N + ∆)2d∆

and C > 0 is a constant that does not depend on f ,vM+1:N or u1:M .

Page 5: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

For quasi-uniform u1:M , CFs accelerate random shiftαL-QMC by a factor O(N−(α−αL)/d) (compare againstSec. 5.2 of [7]).

Folding: A shifted and ‘folded’ QMC rule takes theform

Qb(f ; z1:N + ∆) :=1

N

N∑n=1

f(b(zn + ∆))

where b is the ‘baker’s transformation’, given bybi(t) = 1−|2ti−1|. This transformation reduces errorrates; for example, for f ∈ SH2([0, 1]d) (defined be-low), folding and shifting a uniform lattice z1:N leadsto a RMSE O(N−2+ε) that is smaller than the RMSEO(N−1+ε) for a shifted lattice (p. 59 of [7]). The CFestimator here is

Eb[f ;u1:M ,vM+1:N + ∆]

:= I[fM (·;u1:M )] +Qb[f − fM (·;u1:M );vM+1:N + ∆].

For convenience we denote the shifted and folded de-sign points by b(vM+1:N+∆). Applying Theorem 2 tob(vM+1:N+∆) and then marginalising over ∆ ∈ [0, 1]d

produces:

Corollary 3. Under Assumptions 1-3 the shifted andfolded CF estimator has error bounded by√

E|Eb[f ;u1:M ,vM+1:N + ∆]− I[f ]|2

≤ Cesh,bHαL (vM+1:N )h(u1:M )α−αLρ(u1:M )αU−αL‖f‖Hα

where

(esh,bHαL (vM+1:N ))2 :=

∫[0,1]d

eHαL (b(vM+1:N + ∆))2d∆

and C > 0 is a constant independent of f , vM+1:N

and u1:M .

Again, for quasi-uniform u1:M , CFs accelerate shiftedand folded αL-QMC by a factor O(N−(α−αL)/d) (com-pare against Sec. 5.9 of [7]).

Scrambling: An explicit α-QMC rule that applies forall integer values of α was recently discovered by [6].For simplicity focussing on d = 1, these random designpoints achieve α-rates and, moreover, the RMSE iscontrolled by a norm of the form ‖f‖Hα . When αis known and is an integer, one may achieve optimalrates and CFs provide no rate improvement. However,when α /∈ N, CFs can be used to transform these sub-optimal integrators into optimal integrators.

Choice of Kernel: The QMC+CF methodology hassome flexibility in terms of the choice of kernel K∗that is used to construct the approximation fM . Ourmain requirements here are: (i) K∗ imposes ‘enough

smoothness’ on fM in order to be able to faithfullyapproximate f (Assumption 3). Moreover, K∗ shouldbe tunable to achieve a pre-specified minimum levelof smoothness. Below we make an explicit connectionbetween K∗ and the order of the associated ‘native’Sobolev space that will allow us to satisfy this require-ment. (ii) The functions K∗(·,y) can be integratedanalytically, so that I[fM ] is available in closed form.This second requirement leads us to consider tensorproducts of Sobolev spaces, as described below.

To construct analytically integrable functional approx-imations we consider kernels that are given by poly-nomials. Wendland’s compactly supported functions[34] are defined via the recursion

ϕd,k = Ik[ϕbd/2c+k+1],

the base function ϕ`(r) = (1 − r)`+ with x+ :=max{0, x}, and the integral operator

I[ϕ](r) =

∫ ∞r

tϕ(t)dt

(r ≥ 0), so that

ϕd,k(r) =

{(1− r)`+kpd,k(r), r ∈ [0, 1]0, r > 1

where ` = bd/2c + k + 1 and pd,k is a polynomial ofdegree k (see e.g. p.87 of [9] for explicit formulae).Then the kernel K∗(x,y) = ϕd,k(‖x − y‖) has nativespace Hd/2+k+1/2 (where the restriction d > 3 is inprinciple required for the special case k = 0) (see e.g.p.109 of [9]). With this kernel we can therefore guar-antee a minimum level of smoothness. By rescaling,the kernel’s support can be changed from the unit ball(as above) to balls of smaller radius. This in turn en-forces sparsity on the system of interpolation equationsthat are the basis of the CF estimator and reduces thecomputational cost of inverting this linear system.

Wendland’s kernel cannot be integrated analytically ind ≥ 2 dimensions, violating requirement (ii). Howeverwe can exploit recent work by [29] that shows the d-dimensional tensor product space Hk([0, 1]) ⊗ · · · ⊗Hk([0, 1]) is norm-equivalent to SHk = SHk([0, 1]d),the Sobolev space with dominating mixed smoothness:

SHk := {f : [0, 1]d → R | Daf exists and

Daf ∈ Lp([0, 1]d),∀a ∈ Nd0 with ai ≤ k}.

(The distinction with Hk([0, 1]d) is that the multi-index a is now constrained component-wise, ai ≤ k,rather than |a| ≤ k.) In particular SHk([0, 1]d) ⊆Hk([0, 1]d) so that functions in SHk are at least assmooth as functions in Hk. We therefore propose toemploy the product kernel

K(k)∗ (x,y) =

d∏i=1

ϕ1,k(|xi − yi|) (10)

Page 6: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

whose native space is SHk+1. The integral∫[0,1]d

K(k)∗ (x,y)dx

of tensor products of Wendland functions in Eqn. 10can now be integrated analytically. This approach pro-vides a convenient mechanism to control the degree ofsmoothness that we impose on the approximation fM .

4 Experimental Results

Our methodology provides a variance reduction tech-nique for QMC that is able to accelerate convergencerates, yet is also practical. The first numerical studybelow is a ‘proof-of-principle’ designed to validate thisspecific claim in the empirical setting.

Simulation Study: For objective assessment we ex-ploited the test package proposed by [11]. This pack-age defines 6 function families, each of them charac-terized by some peculiarity, such as oscillation, dis-continuity or corner peaks, with the property thattheir exact integrals are available. The ‘discontinu-ous’ Genz function provides an example where smooth-ness assumptions on the test function are violated.We used the MATLAB implementation of [11] thatis freely available at http://people.sc.fsu.edu/

~jburkardt/m_src/testpack/testpack.html.

In the experiments below, we focus on the two QMCrules that are most widely used in practice. In thefirst experiment, the random QMC point set vM+1:N

was generated by truncating the Halton sequence,scrambling the digits of the resulting points using thereverse-radix algorithm [16] and applying a uniformrandom shift. This QMC rule achieves the αL = 1 rateon the subsequence Nn = 2n when the test functionhas mixed partial derivatives of first order. To ensurethat these QMC rules were implemented faithfully, werestricted attention to the case where M = N/2 sothat N −M was always a power of two. The train-ing points u1:M were taken to be d-dimensional squarelattices in all experiments.

We considered the 6 Genz functions in d = 1, 2, 3, 4dimensions. The performance of QMC with and with-out CFs was compared, in each case ensuring that thetotal number of evaluations of the integrand f wasequal for all methods. For CFs, the tensor-productWendland kernel with k = 1 was employed (i.e. ap-proximation with functions fM ∈ H2, so αU = 2).Results are presented in Fig. 1. (For clarity we chosenot present results for MC, since these were inferior toQMC methods in all cases considered.) For the first 5Genz functions it holds that f ∈ Hα with α = 2 andtheory (for the random case) guarantees an accelera-tion of O(N−1/d); this is borne out in experimental

results. In the 6th, discontinuous case the QMC+CFmethod does not out-perform QMC (at least in di-mension d > 1), as the functional approximation fMis poor due to violation of our continuity assump-tion. In all cases the performance of QMC+CF ap-proaches that of QMC as the dimension d increased. Inhigher dimensions (d ≥ 5, not shown) the QMC+CFand QMC estimators demonstrated effectively identi-cal performance, in line with theory.

The experiments were then repeated with rougher(k = 0) and smoother (k = 2) regression kernels. Re-sults in the Supplement (Figs. S3-8) demonstrated aslight improvement in the performance of QMC+CFwhen k = 2, in line with theory, though generally es-timates were robust to the choice of regression kernel.To further assess the generality of these conclusions,further experiments were performed using a differentQMC rule (truncated Sobol sequence with scramblingdue to [18]). Results in the Supplement showed thatthe same conclusions can be drawn in each case. Takentogether, these results demonstrate that CFs can ac-celerate QMC, at least in low-dimensional settings,and thus complete our ‘proof-of-principle’. MATLABcode to reproduce these results is provided.

Application to Robot Arm Data: To demonstratethe benefits of our methodology we consider the prob-lem of estimating the inverse dynamics of a sevendegrees-of-freedom robot arm. The task, as describedin [25], is to map from a 21-dimensional input space(7 positions, 7 velocities, 7 accelerations) to the cor-responding 7 joint torques. Following [25] we presentresults below on just one of the mappings, from the 21input variables to the first of the seven torques. Thedataset consists of 48,933 input-output pairs, of which44, 484 were used as a training set and the remaining4,449 were used as a test set. The inputs were linearlyrescaled to have mean zero and unit variance on thetraining set. The outputs were centred to have meanzero on the training set.

We consider a hierarchical model based on 21-dimensional Gaussian process (GP) regression. De-note by Yi ∈ R a measured response variable atstate zi ∈ R21, assumed to satisfy Yi = g(zi) +εi where εi ∼ N(0, σ2) are independent for i =1, . . . , n and σ > 0 will be assumed known. In or-der to use training data (yi, zi)ni=1 to make predic-tions regarding an unseen test point z∗, we place aGP prior g ∼ GP(0, c(z, z′;θ)) where c(z, z′;θ) =θ1 exp(− 1

2θ−22 ‖z − z′‖22). Here θ = (θ1, θ2) are hyper-

parameters that control how training samples areused to predict the response at a new test point.A fully-Bayesian treatment aims to marginalise overthese hyper-parameters and we assign independent pri-ors θ1 ∼ Γ(α, β), θ2 ∼ Γ(γ, δ) in the shape/scale

Page 7: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

Num. Function Evaluations (N)

Mean S

quare

Err

or

Oscillatory Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Mean S

quare

Err

or

Product Peak Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Mean S

quare

Err

or

Corner Peak Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Mean S

quare

Err

or

Gaussian Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Mean S

quare

Err

or

Continuous Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Mean S

quare

Err

or

Discontinuous Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Figure 1: Simulation study (Genz functions): Each panel represents one test function. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2,� represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error barsdenote standard error of the replicate mean. QMC points were generated from a shifted and scrambled Haltonsequence. A Wendland regression kernel was used with k = 1.

parametrisation, which we write jointly as π(θ). Hereσ = 0.1, α = β = γ = δ = 2.

To predict the value of the response Y∗ correspondingto an unseen state vector z∗, our estimator will be theBayesian posterior mean

Y∗ := E[Y∗|y] =

∫E[Y∗|y,θ]π(θ)dθ, (11)

where we implicitly condition on the covariatesz1, . . . ,zn, z∗. Phrasing in terms of our earlier nota-tion, the test function is

f(x) = E[Y∗|y,Π−1(x)] = C∗,n(Cn + σ2In×n)−1y

where Π is the c.d.f for π, (Cn)i,j = c(zi, zj ;θ) and(C∗,n)1,j = c(z∗, zj ;θ). Each evaluation of the in-tegrand f(x) requires O(n3) operations due to thematrix inversion and this entails a prohibitive levelof computation. A partial solution is provided by a‘subset of regressors’ approximation

f(x) ≈ C∗,n′(Cn′,nCn,n′ + σ2Cn′)−1Cn′,ny (12)

where n′ < n denotes a subset of the full data; seeSec. 8.3.1 of [25] for full details. However even Eqn.12 still represents a substantial computational burdenin general. To facilitate the illustration below, whichinvestigates the sampling distribution of estimators,

we took a random subset of n = 1, 000 training pointsand a subset of regressors approximation with n′ =100. The total computational time needed to obtainthese results was 268 core-hours.

For each test point z∗ the sampling standard deviationof Y∗ was estimated from 10 independent realisationsof the QMC procedures. For CF we used a randomly-shifted, scrambled Halton sequence (αL = 1) andWendland kernels with k = 1 (αU = 2), so that the-ory predicts an acceleration factor of O(N−1/2). Theestimator standard deviations were estimated for all4,449 test points (with N = 28) and the full resultsare shown in Fig. 2. Note that each test point z∗ cor-responds to a different test function f and thus theseresults are quite objective, encompassing thousands ofdifferent integration problems. For the vast major-ity of integration problems, CF accelerated the stan-dard QMC estimator. Here the computational timeto construct a functional approximation (inverting a16 × 16 matrix) was negligible (3%) in comparisonto the cost of evaluating the function f once. Thetotal additional computational time associated withthe QMC+CF methodology was 2% greater than forQMC, which is easily justified by the substantial vari-ance reductions (∼ 103%) that are realised in this ap-plication. Supplementary results (Fig. S9) compareQMC+CF to MC+CF (standard MC sampling).

Page 8: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

0 0.5 1−1

0

1

x1

f(x)

Test Function (1D Projection)

0 0.5 1−1

0

1

x1

f N(x

)

Tractable Approximation

0 0.5 1−1

0

1

x1

f(x)

− f

N(x

) Difference, for QMC

0 0.01 0.02 0.03 0.04 0.050

0.01

0.02

0.03

0.04

0.05

Std[QMC + CF]

Std

[QM

C]

Figure 2: Application to robot arm data. Left: Posterior predictive means were computed for the mechanicaltorque experienced by one of the seven joints of the arm, for each of 4,449 joint configurations. Schematicreproduced from [32]. Centre: Model hyper-parameters were integrated out; for this task we compared standardQMC with the proposed QMC+CF approach (both implementations provided unbiased estimators). Right:Examining the estimator sampling standard deviations, we see that, for all but a handful of the configurations,QMC+CF was more accurate than QMC.

5 Discussion

QMC methods are becoming increasingly relevant inmodern statistics applications [12, 35] and it is surelya priority to target the rate constants governing thepractical performance of these algorithms. CFs pro-vide one route to achieve this goal, providing substan-tial variance reductions in many of the examples weconsidered. Indeed, CFs allow us to use a sub-optimalQMC rule (e.g. as built into existing software pack-ages) and yet, with minimal additional coding, obtaina QMC+CF algorithm that attains optimal conver-gence rates. The focus on unknown smoothness α dis-tinguishes our work from previous literature on theconnection between integration and functional approx-imation, e.g. [3, 13].

Functional approximation, and hence our QMC+CFmethodology, has a computational cost associatedwith solution of a linear system. Whilst negligible inour experiments, this cost could be reduced if neces-sary using standard approximations and/or compactlysupported kernels. On the other hand, we note thatQMC is often used when f is expensive to evaluateand in such situations it is likely that evaluation ofthe integrand, rather than solution of a linear system,will be the main computational bottleneck.

Our focus was on Sobolev spaces, but it is known thata faster rate O(N−α+ε) is possible in the subspaceSHα([0, 1]d), for any ε > 0, and explicit point setsare available (for integer α) [6]. An immediate exten-sion is to establish optimal rates for CFs in this classof functions. In a related direction, one can in princi-

ple obtain dimension-independent rates by imposing a(strong) assumption of polynomial tractability on theRKHS. This is achieved by generalising to weightedSobolev spaces, such that the integrand f ‘dependsonly weakly on most of the components of x’. Furtherdetails are provided in [7, 20] and form part of ourongoing research.

The methods that we describe are immediately appli-cable in a range of applications including marginalisa-tion of hyper-parameters in classification [10], proba-bilistic inference for differential equations [28, 5], com-putation of model evidence [21] and approximation ofthe partition function in social network models [26].Finally we note that CFs generalise to other integra-tion methods including Bayesian Quadrature [23, 4]and related kernel-based quadrature rules [2], in whichthe worst case error is also controlled by an RKHSnorm ‖f‖H ; this will be the focus of our ongoing re-search.

Acknowledgments

The authors are grateful to Dan Simpson, MathieuGerber and Ben Collyer for helpful discussions. CJOwas supported by EPSRC [EP/D002060/1] and theARC Centre of Excellence for Mathematics and Sta-tistical Frontiers. MG was supported by EPSRC[EP/J016934/1, EP/K034154/1], an EPSRC Estab-lished Career Fellowship, the EU grant [EU/259348]and a Royal Society Wolfson Research Merit Award.

Page 9: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

References

[1] C. Aistleitner and J. Dick. Functions ofbounded variation, signed measures, and a gen-eral Koksma-Hlawka inequality. Acta Arith-metica, 167(2):143–171, 2015.

[2] F. Bach. On the equivalence between quadraturerules and random features. arXiv:1502.06800,2015.

[3] N.S. Bakhvalov. On the approximate calcula-tion of multiple integrals. Journal of Complexity,31(4):502-516, 2015.

[4] F.X. Briol, C.J. Oates, M. Girolami, M. Os-borne and D. Sejdinovic. Probabilistic Integra-tion: A Role for Statisticians in Numerical Anal-ysis? arXiv:1512.00933, 2016.

[5] J. Cockayne, C.J. Oates, T. Sullivan andM. Girolami. Probabilistic Meshless Methods forBayesian Inverse Problems. In preparation.

[6] J. Dick. Higher order scrambled digital netsachieve the optimal rate of the root mean squareerror for smooth integrands. The Annals of Statis-tics, 39(3):1372–1398, 2011.

[7] J. Dick, F.Y. Kuo, and I.H. Sloan. High-dimensional integration: The quasi-Monte Carloway. Acta Numerica, 22:133–288, 2013.

[8] J. Dick and F. Pillichshammer. Discrepancy the-ory and quasi-Monte Carlo integration. Springer,Berlin, 2010.

[9] G.F. Fasshauer. Meshfree approximation methodswith MATLAB. World Scientific Publishing Co.,Inc., 2007.

[10] M. Filippone and M. Girolami. Pseudo-MarginalBayesian Inference for Gaussian Processes. Pat-tern Analysis and Machine Intelligence, IEEETransactions on, 36(11):2214–2226, 2014.

[11] A. Genz. Testing multidimensional integrationroutines. In International Conference on Tools,Methods and Languages for Scientific and En-gineering Computation, pages 81–94. ElsevierNorth-Holland, Inc., 1984.

[12] M. Gerber and N. Chopin. Sequential Quasi-Monte Carlo. Journal of the Royal Statisti-cal Society: Series B (Statistical Methodology),77(3):509–579, 2015.

[13] S. Heinrich. Random approximation in numeri-cal analysis. Lecture Notes in Pure and AppliedMathematics, 150:123-171, 1994

[14] F.J. Hickernell, C. Lemieux, and A.B. Owen.Control variates for quasi-Monte Carlo. Statis-tical Science, 20(1):1–31, 2005.

[15] F. Huszar and D. Duvenaud. Optimally-WeightedHerding is Bayesian Quadrature. In Uncertaintyin Artificial Intelligence, pages 377–385, 2012.

[16] L. Kocis and W.J. Whiten. Computational in-vestigations of low-discrepancy sequences. ACMTransactions on Mathematical Software (TOMS),23(2):266–294, 1997.

[17] S. Lacoste-Julien, F. Lindsten, and F. Bach. Se-quential Kernel Herding: Frank-Wolfe Optimiza-tion for Particle Filtering. In International Con-ference on Artificial Intelligence and Statistics,volume 18, 2015.

[18] J. Matousek. On the L2-discrepancy for AnchoredBoxes. J. Complex., 14(4):527–556, 1998.

[19] E. Novak. Deterministic and stochastic errorbounds in numerical analysis. Springer-VerlagBerlin, 1988.

[20] E. Novak and H. Wozniakowski. Tractability ofMultivariate Problems: Standard information forfunctionals, volume 2. European MathematicalSociety, 2010.

[21] C.J. Oates, T. Papamarkou and M. Girolami. TheControlled Thermodynamic Integral for BayesianModel Evidence Evaluation. J. Am. Stat. Assoc.,2016. To appear.

[22] C.J. Oates, M. Girolami, and N. Chopin. ControlFunctionals for Monte Carlo Integration. J. R.Statist. Soc. B, 2016. To appear.

[23] A. O’Hagan. Bayes-Hermite quadrature. Journalof Statistical Planning and Inference, 29(3):245–260, 1991.

[24] A.B. Owen. A constraint on extensible quadra-ture rules. Numerische Mathematik, 2015. Toappear.

[25] C.E. Rasmussen and C.K.I. Williams. Gaussianprocesses for machine learning. MIT Press, 2(3):4,2006.

[26] G. Robins, P. Pattison, Y. Kalish, and D. Lusher.An introduction to exponential random graph(p*) models for social networks. Social Networks,29(2):173–191, 2007.

[27] R. Schaback and H. Wendland. Kernel tech-niques: from machine learning to meshless meth-ods. Acta Numerica, 15:543–639, 2006.

Page 10: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

[28] M. Schober, D.K. Duvenaud, and P. Hennig.Probabilistic ODE solvers with Runge-Kuttameans. In Advances in Neural Information Pro-cessing Systems, volume 27, pages 739–747. 2014.

[29] W. Sickel and T. Ullrich. Tensor products ofSobolev-Besov spaces and applications to approx-imation from the hyperbolic cross. Journal of Ap-proximation Theory, 161(2):748–786, 2009.

[30] A. Smola, A. Gretton, L. Song, and B. Scholkopf.A Hilbert space embedding for distributions.In Algorithmic Learning Theory, pages 13–31.Springer, 2007.

[31] B. Tracey, D. Wolpert, and J.J. Alonso. Using Su-pervised Learning to Improve Monte Carlo Inte-gral Estimation. AIAA Journal, 51(8):2015–2023,2013.

[32] S. Vijayakumar and S. Schaal. Locally weightedprojection regression: An O(n) algorithm for in-cremental real time learning in high dimensionalspace. In International Conference on MachineLearning, volume 16, pages 1079–1086, 2000.

[33] X. Wang. Enhancing Quasi-Monte Carlo Methodsby Exploiting Additive Approximation for Prob-lems in Finance. SIAM Journal of Scientific Com-puting, 34(1):A283–A308, 2012.

[34] H. Wendland. Piecewise polynomial, positive def-inite and compactly supported radial functionsof minimal degree. Advances in ComputationalMathematics, 4(1):389–396, 1995.

[35] J. Yang, V. Sindhwani, H. Avron, and M. Ma-honey. Quasi-Monte Carlo feature maps for shift-invariant kernels. In International Conferenceon Machine Learning, volume 31, pages 485–493,2014.

Page 11: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

Supplementary Text

In this section we provide details for how to allocate computational resources between the sets u1:M and vM+1:N ,trading off integration error with functional approximation error.

Proposition. The optimal scaling of M/N ∈ (0, 1), in the sense of asymptotically minimising the QMC+CFabsolute error, is given by

M

N→ c∗ =

α− αLα

.

Proof. From Theorem 2, with u1:M quasi-uniform, the QMC+CF error is bounded above by

|E[f ;u1:M ,vM+1:N ]− I[f ]| ≤ CfeHαL (vM+1:N )h(u1:M )α−αL (13)

for some constant Cf ∈ (0,∞).

For αL-optimal QMC we have that eHαL (vM+1:N ) = O((N −M)−αL/d).

Suppose that M = md for some m ∈ N. Then since u1:M are quasi-uniform it follows (from considering a regularsquare lattice) that h(u1:M ) = O(d1/2m−1). This gives the general scaling h(u1:M ) = O(d−1/2M−1/d).

Writing M = cN for some c we obtain from Eqn. 13 the objective function

J(c) = (N − (cN))−αL/d × (d−1/2(cN)−1/d)α−αL

that we wish to minimise over c ∈ [0, 1). Solving for J ′(c) = 0 completes the argument.

Supplementary Experimental Results

This section contains all simulated data results discussed in the paper. Specifically, for each of the 6 testfunctions described by [11], we display results based on Wendland’s compactly supported regression kernel [34]with smoothness parameter k (described above) set equal to either

• k = 0, or

• k = 1, or

• k = 2

in combination with QMC design points vM+1:N generated from either

• a Halton sequence, deterministically scrambled using the reverse radix algorithm [16], and then applying arandom shift or

• a Sobol sequence, randomly scrambled using the algorithm of [18].

We used the MATLAB implementation of [11] that is freely available (web address given in the Main Text).The QMC design points can be generated using the in-build MATLAB functions haltonset, sobolset andscramble. Full MATLAB code used to generate these results is provided in the Electronic Supplement.

Page 12: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

Genz Function #1: Oscillatory Test Function

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Oscillatory Test Function (Halton Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Oscillatory Test Function (Sobol Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Oscillatory Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Oscillatory Test Function (Sobol Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Oscillatory Test Function (Halton Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Oscillatory Test Function (Sobol Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Figure S3: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2, �represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error bars denotestandard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence ora scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k.

Page 13: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

Genz Function #2: Product Peak Test Function

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Product Peak Test Function (Halton Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Product Peak Test Function (Sobol Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Product Peak Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Product Peak Test Function (Sobol Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Product Peak Test Function (Halton Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Product Peak Test Function (Sobol Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Figure S4: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2, �represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error bars denotestandard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence ora scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k.

Page 14: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

Genz Function #3: Corner Peak Test Function

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Corner Peak Test Function (Halton Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Corner Peak Test Function (Sobol Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Corner Peak Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Corner Peak Test Function (Sobol Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Corner Peak Test Function (Halton Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Corner Peak Test Function (Sobol Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Figure S5: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2, �represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error bars denotestandard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence ora scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k.

Page 15: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

Genz Function #4: Gaussian Test Function

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Gaussian Test Function (Halton Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−12

10−10

10−8

10−6

10−4

10−2

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Gaussian Test Function (Sobol Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Gaussian Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Gaussian Test Function (Sobol Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Gaussian Test Function (Halton Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Gaussian Test Function (Sobol Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Figure S6: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2, �represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error bars denotestandard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence ora scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k.

Page 16: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

Genz Function #5: Continuous Test Function

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Continuous Test Function (Halton Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Continuous Test Function (Sobol Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Continuous Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Continuous Test Function (Sobol Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Continuous Test Function (Halton Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Continuous Test Function (Sobol Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

Figure S7: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2, �represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error bars denotestandard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence ora scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k.

Page 17: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Chris. J. Oates, Mark Girolami

Genz Function #6: Discontinuous Test Function

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Discontinuous Test Function (Halton Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Discontinuous Test Function (Sobol Set, k = 0)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Discontinuous Test Function (Halton Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Discontinuous Test Function (Sobol Set, k = 1)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Discontinuous Test Function (Halton Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Num. Function Evaluations (N)

Me

an

Sq

ua

re E

rro

r

Discontinuous Test Function (Sobol Set, k = 2)

2 3 4 ... 10 20 30 40 ... 10010

−15

10−10

10−5

100

105

Figure S8: Numerical results: Each panel represents one of the 6 QMC+CF formulations. Solid lines correspondto standard QMC, dashed lines correspond to QMC+CF. # represents dimension d = 1, 4 represents d = 2, �represents d = 3 and ∗ represents d = 4. Experiments were replicated with 10 random seeds and error bars denotestandard error of the replicate mean. QMC points were generated either from a scrambled Halton sequence ora scrambled Sobol sequence (see the Main Text). The Wendland regression kernel took parameter k.

Page 18: Control Functionals for Quasi-Monte Carlo Integration · Quasi-Monte Carlo (QMC) methods are be-ing adopted in statistical applications due to the increasingly challenging nature

Control Functionals for Quasi-Monte Carlo Integration

Robot Arm Example: Additional Results

We re-ran the robot arm simulation in order to compare the QMC+CF estimator with the MC+CF estimator;that is, a quasi-uniform set u1:M were used to construct a control functional fM , whilst a Monte Carlo samplevM+1:N were used to integrate the difference f − fM .

Std[MC + CF]0 0.05 0.1 0.15 0.2

Std

[MC

]

0

0.05

0.1

0.15

0.2

Figure S9: Application to robot arm data: Examining the estimator sampling standard deviations, we see that,for all but a handful of the configurations, QMC+CF was more accurate than MC+CF.