21
Aust. N. Z. J. Stat. 2012 doi: 10.1111/j.1467-842X.2011.00639.x UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOS ´ E E. CHAC ´ ON 1, AND T ARN DUONG 2 Universidad de Extremadura and Institut Curie Summary Two of the most useful multivariate bandwidth selection techniques are the plug-in and cross- validation methods. The smoothed version of the cross-validation method is known to reduce the variability of its non-smoothed counterpart; however, it shares with the plug-in choice the need for a pilot bandwidth matrix. Owing to the mathematical difficulties encountered in the optimal pilot choice, it is common to restrict this pilot matrix to be a scalar multiple of the identity matrix, at the expense of losing the flexibility afforded by the unconstrained approach. Here we show how to overcome these difficulties and propose a smoothed cross- validation selector using an unconstrained pilot matrix. Our numerical results indicate that the unconstrained selector outperforms the constrained one in practice, and is a viable competitor to unconstrained plug-in selectors. Key words: cross-validation; kernel density estimation; mean integrated squared error; uncon- strained bandwidth matrices. 1. Introduction The ability of kernel density estimators to disclose the structure of multivariate data clouds depends crucially on the selection of the smoothing parameter, commonly known as the bandwidth matrix (see Simonoff 1996 for a review). The parametrization of this band- width is an important factor in the performance of kernel density estimators. Constrained parametrizations restrict the kernels to be aligned to the coordinate axes. In contrast, uncon- strained bandwidth matrices allow for the most appropriately oriented kernels, so are able to more clearly reveal structures that are oriented away from the coordinate axes. More- over, Wand & Jones (1993) and Chac´ on (2009) show that kernel density estimators with unconstrained bandwidths may substantially outperform those using constrained ones. Optimal unconstrained bandwidth matrices with optimal scalar pilot bandwidths have been developed for plug-in selectors (Wand & Jones 1994; Duong & Hazelton 2003) and for smoothed cross-validation (Duong & Hazelton 2005b). The asymptotic and finite-sample per- formance of these two classes of selectors has as its core the ability to tune the pilot estimators that contribute to the intermediate stages of estimating the optimality criteria. Focusing on Author to whom correspondence should be addressed. 1 Departamento de Matem´ aticas, Universidad de Extremadura, E-06006, Badajoz, Spain. e-mail: [email protected] 2 Institut Curie, Molecular Mechanisms of Intracellular Transport Laboratory, CNRS, UMR 144, F-75248 Paris, France. e-mail: [email protected] Acknowledgements. We thank an anonymous associate editor and two anonymous referees for their comments, which led to a substantial improvement in the quality of the paper. Grants MTM2009-0730 (J.E.C.) and MTM2010-16660 (both authors) from the Spanish Ministerio de Ciencia e Innovacion, and Mayent-Rothschild and Agence Nationale de Recherche fellowships (T.D.) from the Institut Curie, France have supported this work. C 2012 Australian Statistical Publishing Association Inc. Published by Blackwell Publishing Asia Pty Ltd. Australian & New Zealand Journal of Statistics

UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

Aust. N. Z. J. Stat. 2012 doi: 10.1111/j.1467-842X.2011.00639.x

UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION

JOSE E. CHACON1,∗ AND TARN DUONG2

Universidad de Extremadura and Institut Curie

Summary

Two of the most useful multivariate bandwidth selection techniques are the plug-in and cross-validation methods. The smoothed version of the cross-validation method is known to reducethe variability of its non-smoothed counterpart; however, it shares with the plug-in choicethe need for a pilot bandwidth matrix. Owing to the mathematical difficulties encounteredin the optimal pilot choice, it is common to restrict this pilot matrix to be a scalar multipleof the identity matrix, at the expense of losing the flexibility afforded by the unconstrainedapproach. Here we show how to overcome these difficulties and propose a smoothed cross-validation selector using an unconstrained pilot matrix. Our numerical results indicate that theunconstrained selector outperforms the constrained one in practice, and is a viable competitorto unconstrained plug-in selectors.

Key words: cross-validation; kernel density estimation; mean integrated squared error; uncon-strained bandwidth matrices.

1. Introduction

The ability of kernel density estimators to disclose the structure of multivariate dataclouds depends crucially on the selection of the smoothing parameter, commonly known asthe bandwidth matrix (see Simonoff 1996 for a review). The parametrization of this band-width is an important factor in the performance of kernel density estimators. Constrainedparametrizations restrict the kernels to be aligned to the coordinate axes. In contrast, uncon-strained bandwidth matrices allow for the most appropriately oriented kernels, so are ableto more clearly reveal structures that are oriented away from the coordinate axes. More-over, Wand & Jones (1993) and Chacon (2009) show that kernel density estimators withunconstrained bandwidths may substantially outperform those using constrained ones.

Optimal unconstrained bandwidth matrices with optimal scalar pilot bandwidths havebeen developed for plug-in selectors (Wand & Jones 1994; Duong & Hazelton 2003) and forsmoothed cross-validation (Duong & Hazelton 2005b). The asymptotic and finite-sample per-formance of these two classes of selectors has as its core the ability to tune the pilot estimatorsthat contribute to the intermediate stages of estimating the optimality criteria. Focusing on

∗Author to whom correspondence should be addressed.1Departamento de Matematicas, Universidad de Extremadura, E-06006, Badajoz, Spain.e-mail: [email protected] Curie, Molecular Mechanisms of Intracellular Transport Laboratory, CNRS, UMR 144, F-75248Paris, France.e-mail: [email protected]. We thank an anonymous associate editor and two anonymous referees for their comments,which led to a substantial improvement in the quality of the paper. Grants MTM2009-0730 (J.E.C.) andMTM2010-16660 (both authors) from the Spanish Ministerio de Ciencia e Innovacion, and Mayent-Rothschildand Agence Nationale de Recherche fellowships (T.D.) from the Institut Curie, France have supported thiswork.

C© 2012 Australian Statistical Publishing Association Inc. Published by Blackwell Publishing Asia Pty Ltd.

Australian & New Zealand Journal of Statistics

Page 2: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

2 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

the scalar parametrization of the pilot selectors reduces the complexity of the mathematicalanalysis, at the price of losing some flexibility. Nevertheless, with some novel matrix analysisresults for higher-order derivatives of multivariate functions, these mathematical complexi-ties were surmounted by Chacon & Duong (2010), who presented a plug-in selector usingunconstrained matrices for all pilot stages for the first time in the literature, and demonstratedthe gains in performance over their constrained counterparts in practice. We present hereanalogous unconstrained pilot selectors for smoothed cross-validation selectors.

In Section 2, we outline the optimal bandwidth selection problem for kernel density esti-mators. For smoothed cross-validation, this problem relies in turn on optimal pilot bandwidthselection, as elaborated in Section 3. This section contains our main asymptotic results forunconstrained pilot selectors. This is followed by a numerical study in Section 4 to investigatethe performance for finite samples. In a simulation study we show that the theoretical effortmade to derive unconstrained pilot selectors is worthwhile, as this new approach consistentlyoutperforms the constrained one in all the considered examples. Finally, we illustrate thedifferences between various bandwidth selection methods in practice through the analysis ofglobal cellular organization from microscopy image data.

2. Optimal bandwidth selection for kernel density estimation

For a d-variate random sample X1, X2, . . . , Xn drawn from a common density f , thekernel density estimator is defined as

fH(x) = n−1n∑

i=1

KH(x − Xi ),

where x = (x1, x2, . . . , xd )� and Xi = (Xi1, Xi2, . . . , Xid )�, i = 1, 2, . . . , n. Here K (x) isthe multivariate kernel, which is a spherically symmetric probability density function. Theparameter H is the bandwidth matrix, which is symmetric and positive-definite, and KH(x) =|H|−1/2 K (H−1/2x) is the scaled kernel.

We measure the performance of a kernel density estimate using the mean integratedsquared error (MISE), defined as

MISE(H) = MISE( fH) = E

∫Rd

{ fH(x) − f (x)}2 dx,

assuming that both K and f are square-integrable. By expanding the integral in the previousequation, the MISE can be rewritten as

MISE(H) = {n−1|H|−1/2 R(K ) − n−1 R∗(KH ∗ KH, f )

}+{R∗(KH ∗ KH, f ) − 2R∗(KH, f ) + R( f )},

(1)

where R(α) = ∫Rd α(x)2 dx for any square-integrable function α, and R∗(LH, f ) =∫

Rd (LH ∗ f )(x) f (x) dx for any square-integrable kernel L, with ∗ denoting the convolutionoperator; see, for instance, Chacon, Duong & Wand (2011). The first set of braces containsthe contribution from the integrated variance, and the second set that from the integratedsquared bias.

This is not the usual expression of the MISE: we use this alternative form because itleads more naturally into our treatment of smoothed cross-validation. The usual asymptotic

C© 2012 Australian Statistical Publishing Association Inc.

Page 3: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 3

approximation to the MISE, for example as found in Wand (1992), is calculated by ignoringthe contribution of the second term in the integrated variance, and by keeping only the leadingterm in the integrated squared bias, resulting in

AMISE(H) = n−1|H|−1/2 R(K ) + 1

4m2(K )2

∫Rd

tr2{HD2 f (x)} dx,

where D2 f is the Hessian matrix of second-order partial derivatives of f , tr denotes the traceoperator, and m2(K ) ∈ R is such that

∫Rd xx�K (x)dx = m2(K )Id , with Id denoting the

identity matrix of order d. This AMISE approximation forms the basis of plug-in selectors;see Wand & Jones (1994).

The crucial consideration in kernel density estimation is to select an optimal value forthe bandwidth matrix. The MISE-optimal selector is defined to be the minimizer of the MISE:

HMISE = argminH∈F

MISE(H),

where F is the set of all symmetric and positive-definite d × d matrices. This ideal selectoris mathematically intractable for general densities, so instead it is common to deal withHAMISE, the minimizer of the AMISE over F , which is asymptotically equivalent to HMISE.In fact, when f has sufficient smoothness, HAMISE can be written as Cn−2/(d+4) for a certainpositive-definite symmetric matrix C (not having an explicit form), so that

HMISE = Cn−2/(d+4) + o(n−2/(d+4)Jd ),

where Jd is the d × d matrix having each entry equal to one. Here and hereafter, the asymptoticnotation for matrices is to be understood element-wise, as introduced in Duong & Hazelton(2005a).

The smoothed cross-validation (SCV) method of Hall, Marron & Park (1992) attemptsto improve the bias estimation in the AMISE expansion. The target for SCV can thus beconsidered to be

MISE2(H) = n−1|H|−1/2 R(K ) + R∗(KH ∗ KH, f ) − 2R∗(KH, f ) + R( f ) , (2)

that is, it keeps the exact integrated squared bias as in the MISE, and only the dominant termin the integrated variance. It is clear that MISE(H) − MISE2(H) = O(n−1), so MISE andMISE2 are asymptotically equivalent. The following theorem establishes that HMISE2, theminimizer of MISE2, is also asymptotically equivalent to HMISE, and provides the relativerate of convergence for this equivalence. Here the relative rate of convergence is definedto be n−α if vec (HMISE2 − HMISE) = O(n−αJd2 ) vec HMISE, where vec is the operator thatconcatenates the columns of a matrix into a single vector (see Duong & Hazelton 2005a).

Theorem 1. Suppose that the following conditions hold:

(H) For the sequence of bandwidth matrices H = Hn, every element of H → 0 andn−1|H|−1/2 → 0 as n → ∞.

(D) All partial derivatives up to order 6 inclusive of the density function f are bounded,continuous and square-integrable.

(K) The kernel K is a symmetric, square-integrable density function such that∫Rd xx�K (x)dx = m2(K )Id and all its moments of order 4 are finite.

Then vec (HMISE2 − HMISE) = O(n−(d+2)/(d+4)Jd2 ) vec HMISE.

C© 2012 Australian Statistical Publishing Association Inc.

Page 4: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

4 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

The proof of this result, along with all other proofs and supporting lemmas, is presentedin the Appendix. Conditions (H), (D) and (K) are not a minimal set of hypotheses; rather, theyserve as a useful starting point. An example of a kernel fulfilling condition (K) is the normalkernel, given by φ(x) = (2π )−d/2 exp(− 1

2 x�x) for x ∈ Rd .

Note that the relative rate in the previous result is faster than n−1/2 for all d. Because n−1/2

is the fastest relative rate than can be achieved in this bandwidth selection problem (Hall &Marron 1991), it follows that we can replace HMISE with HMISE2 everywhere in our asymptoticanalysis. In contrast, it can be shown that vec (HAMISE − HMISE) is O(n−2/(d+4)Jd2 ) vec HMISE,

so that HAMISE has a slower relative rate of convergence to HMISE than HMISE2. In this sense,HMISE2 is a more efficient target than HAMISE for all d. This can be regarded as a consequenceof the improvement in the estimation of the bias. Furthermore, as the dimension d increases,the convergence rate of HAMISE becomes slower, whereas for HMISE2 it becomes faster.

3. Pilot bandwidth selection for smoothed cross-validation

The result of Theorem 1 does not provide a way to choose the bandwidth matrix, asHMISE2, like HMISE and HAMISE, still depends on the unknown density f . To determinea data-based selector, the SCV method replaces f in equation (2) with a pilot estimatorfG(x) = n−1 ∑n

i=1 LG(x − Xi ) with kernel L and bandwidth G, possibly different from Kand H. To be precise, following Jones, Marron & Park (1991) let �H = KH − K0, whereK0 is our notation for the Dirac delta function, and let � = � ∗ � be the self-convolution of afunction �. Then �H = KH − 2KH + K0, so that MISE2 admits the expression,

MISE2(H) = n−1|H|−1/2 R(K ) + R∗(�H, f )

as R∗(�H, f ) = ∫Rd (�H ∗ f )(x) f (x) dx is the integrated squared bias term from (1). The

SCV criterion is a plug-in estimator of this form of MISE2

SCV(H) = n−1|H|−1/2 R(K ) + R∗(�H, fG)

= n−1|H|−1/2 R(K ) + n−2n∑

i, j=1

(�H ∗ LG)(Xi − X j )

and the SCV selector is the minimizer of the SCV function, HSCV = argminH∈F SCV(H).As stated in the Introduction, the main goal of this paper is to investigate the choice of

the pilot bandwidth G to be used in the SCV selector. As HSCV can be considered an estimateof HMISE, proceeding as in Duong & Hazelton (2005b) we set the optimality criterion for theSCV pilot selector to be

MSE(G) = MSE(HSCV; G) = E{vec� (HSCV − HMISE) vec (HSCV − HMISE)

}.

Because this MSE expression is difficult to manage, we derive below its asymptotic approxi-mation, which is easier to interpret for our purposes.

We need the following notation: starting with the d-dimensional vector of first-order differentials D = (∂/∂x1, . . . , ∂/∂xd )�, and adopting the convention (∂/∂xi )(∂/∂x j ) =∂2/(∂xi∂x j ), following Holmquist (1996) our definition of the r th-order derivative of f as a

C© 2012 Australian Statistical Publishing Association Inc.

Page 5: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 5

single vector of length dr is

D⊗r f (x) = (D f )⊗r (x) = ∂r f (x)

∂x⊗r,

where A⊗r denotes the r -fold Kronecker product of a matrix A; see Chacon & Duong (2010)or Chacon et al. (2011), where this notation is also used. The arrangement of the r th partialderivatives into a vector allows us to extend the usual scalar integrated density derivativefunctional (Hall & Marron 1987) to its vector-valued counterpart

ψr =∫

Rd

D⊗r f (x) f (x) dx ∈ Rdr.

Finally, let us write R(�) = ∫Rd �(x)�(x)� dx for a vector-valued function �.

The asymptotic approximation of MSE(G) is given in the next result.

Theorem 2. Suppose that (H),(D),(K) from Theorem 1 and the following conditions hold:

(G) For the bandwidth sequence G = Gn , every element of G and G−1H → 0 as n → ∞.(L) The kernel L is a symmetric, square-integrable density function such that∫

Rd xx�L(x)dx = m2(L)Id . All partial derivatives of L up to order 4 inclusive arecontinuous, bounded and square-integrable.

The MSE has the asymptotic representation MSE(G) = AMSE(G)[1 + o(1)], where

AMSE(G) = 1

4m2(K )4tr

{(4 + ω4ω

�4

)(vec HAMISEvec� HAMISE ⊗ Id2

)}with

�r = 4n−1var{D⊗r f (X)} + 2n−2 R( f )|G|−1/2(G−1/2)⊗r R(D⊗r L)(G−1/2)⊗r ,

ωr = n−1|G|−1/2(G−1/2)⊗rD⊗r L(0) + 1

2m2(L)

(vec� G ⊗ Idr

)ψr+2

for every even number r , where X is a random variable having density f .The plug-in (PI) method proposed in Chacon & Duong (2010) uses the AMISE instead

of the MISE2 function as a target. They show that the AMISE can be rewritten in the form

AMISE(H) = n−1|H|−1/2 R(K ) + 1

4m2(K )2ψ�

4 (vec H)⊗2.

The only unknown in this AMISE expression is ψ4, which is estimated by ψ4(G) =n−2 ∑n

i, j=1 D⊗4LG(Xi − X j ), so that the optimal pilot bandwidth for the PI selector is given

by the matrix G minimizing the mean squared error E[‖ψ4(G) − ψ4‖2], where ‖ · ‖ denotesthe Euclidean norm. Theorem 1 in Chacon & Duong (2010) shows that the dominant partof that error is precisely given by tr(4 + ω4ω

�4 ), but with the unconvolved kernel L in the

place of L .Therefore, despite the PI and SCV methods having very different target functions, it

turns out that there is a deep synchronicity in the two approaches with respect to the problemof pilot bandwidth selection, which was not apparent at all in previous studies with restrictedpilot bandwidth parametrizations, but had been noted in the univariate case (see Cao 1993).

C© 2012 Australian Statistical Publishing Association Inc.

Page 6: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

6 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

This allows us to take advantage of the results already obtained by Chacon & Duong (2010),for example that for the choice of G the squared bias term dominates the variance term, so

AMSE(G) = 1

4m2(K )4 tr

{ω4ω

�4

(vec HAMISE vec� HAMISE ⊗ Id2

)}{1 + o(1)}.

Moreover, it is clear that the choice of the bandwidth GAMSE minimizing the dominant partof the AMSE is unaffected by the terms involving HAMISE, so from theorem 2 in Chacon &Duong (2010) it follows that GAMSE is of order n−2/(d+6) and the minimal MSE is of ordern−4/(d+6).

The previous observation is useful for determining the relative convergence rate of HSCV

in a relatively straightforward way.

Theorem 3. Suppose that the conditions of Theorems 1 and 2 hold. The relative rate ofconvergence of HSCV to HMISE is n−2/(d+6).

In the univariate case, Jones & Sheather (1991) noticed that the pilot bandwidth canbe chosen to annihilate the two dominant bias terms in the estimation of ψ4, leading to abetter convergence rate, namely from n−2/7 to n−5/14. In the multivariate case, this wouldbe equivalent to solving the system of equations ω4(G) = 0 for G, but, as mentioned inChacon & Duong (2010), this bias annihilation is not possible for general densities if d ≥ 2.As an example, if we consider the bivariate density f (x) = {φ(x − μ) + φ(x + μ)}/2 withμ = (1, 1), then the exact normal calculations in section 3.3 of Chacon & Duong (2010)can be used to obtain an explicit formula for ω4(G), and numerical minimization leads tominG∈F ‖ω4(G)‖2 = 4.35 × 10−5 > 0 for such a density when n = 100.

Furthermore, in Jones et al. (1991) this dominant bias annihilation is the key to obtaininga SCV bandwidth selector with relative convergence rate n−1/2 in the univariate case, bycarefully choosing the pilot bandwidth as a function of h. Because bias annihilation is notpossible in the multivariate case in general, it is unlikely that such a fast rate could be achievedby using the same approach for higher dimensions.

On the other hand, the SCV selector of Duong & Hazelton (2005b) used a scalarparametrization G = g2Id for the pilot bandwidth. To use a scalar bandwidth with multi-variate data, the usual procedure is to pre-transform the data so that all dimensions have (atleast approximately) the same marginal dispersion; this is called pre-sphering. For unimodaldistributions, pre-sphering does lead to an approximately spherically symmetric transformeddistribution. In other cases, however, such as the separated bimodal example from Chacon &Duong (2010), pre-sphering does not achieve such a goal, even if the marginal variances areequal. In these latter cases, a scalar pilot with the pre-transformed data will be markedly lessefficient than an unconstrained selector.

We can also use Theorem 2 to derive known results about scalar pilot selectors (such asthose shown in Duong & Hazelton 2005b) in a very simple way. Substituting G = g2Id intothe AMSE from Theorem 2 we obtain the following expression for the optimal scalar pilotbandwidth.

Corollary 1. Suppose the conditions for Theorem 2 hold. If the pilot matrix is parametrizedas G = g2Id , then the AMSE simplifies to

AMSE(g) = 1

4m2(K )4

(n−2g−2d−8 A1 + 2n−1g−d−2 A2 + g4 A3

){1 + o(1)},

C© 2012 Australian Statistical Publishing Association Inc.

Page 7: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 7

where

A1 = D⊗4 L(0)�(vec HAMISE vec� HAMISE ⊗ Id2

)D⊗4 L(0),

A2 = D⊗4 L(0)�(vec HAMISE vec� HAMISE ⊗ Id2

)(vec� Id ⊗ Id4

)ψ6,

A3 = ψ�6 (vec Id ⊗ Id4 )

(vec HAMISE vec� HAMISE ⊗ Id2

)(vec� Id ⊗ Id4

)ψ6,

whose minimizer is given by

gAMSE ={

2(d + 4)A1[ − (d + 2)A2 + {(d + 2)2 A2

2 + 8(d + 4)A1 A3}1/2]

n

}1/(d+6)

.

Next we show that this optimal pilot has the same form as the optimal scalar pilot from Duong& Hazelton (2005b), except for some constant matrix factors, as they arrange the sixth-orderintegrated derivatives into in a d × d matrix, whereas we characterize these derivatives as thevector ψ6.

Corollary 2. Suppose that the conditions for Theorem 2 hold and that the pilot matrix isparametrized as G = g2Id . Further suppose that K = L = φ and HAMISE = Cn−2/(d+4). Inthis case the coefficients from Corollary 1 admit the alternative forms

A1 = 9

16(4π )−dn−4/(d+4)(tr C)vec� (C ⊗ Id )Sd,4 vec Id2 ,

A2 = 3

4(4π )−d/2n−4/(d+4)(tr C)

(vec� C ⊗ vec� Id

)Sd,4(vec� Id ⊗ Id4

)ψ6,

A3 = n−4/(d+4)ψ�6 (vec Id ⊗ Id4 )

(vec C vec� C ⊗ Id2

)(vec� Id ⊗ Id4

)ψ6.

The equivalent coefficients for the optimal pilot selector from Duong & Hazelton (2005b)are

A′1 = 1

16(4π )−dn−4/(d+4)(tr C){4 + (d + 4) tr C},

A′2 = 1

16(4π )−d/2n−4/(d+4)vec� [{2C2 + (tr C)C} ⊗ Id2

]ψ6,

A′3 = 1

4n−4/(d+4)ψ�

6

(Id ⊗ C2 ⊗ vec Id2 vec� Id2

)ψ6.

Remark 1. The previous methodology for selecting the pilot bandwidth G follows theguidelines of the univariate case as described in the paper by Hall et al. (1992). A differentapproach, suggested by an anonymous referee, could be based on the minimization of theSCV criterion over G and H. This is an interesting idea, and is closely related to anotherunivariate procedure, the double kernel-double h method, introduced in the L1 context byBerlinet & Devroye (1994) and studied in the L2 context by Jones (1998) and Abdous (1999).Minimizing SCV over G and H is equivalent to finding the closest pair of estimators fH andfG, so a different kernel L (usually a higher-order kernel) is needed for the estimator fG inorder to avoid degeneracy of the problem. Therefore, this would be a completely differentprocedure, and its study would be worth a separate paper.

C© 2012 Australian Statistical Publishing Association Inc.

Page 8: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

8 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

4. Numerical results

4.1. Practical implementation of the new method

We took K = L = φ for all of our numerical work, owing to the simplification inducedin the SCV criterion because of its good convolution properties. It is easy to show (Duong &Hazelton 2005b) that for this choice of K and L we have

SCV(H) = n−1|H|−1/2(4π )−d/2 + n−2n∑

i, j=1

(φ2H+2G − 2φH+2G + φ2G)(Xi − X j ).

For the practical implementation of the SCV bandwidth we propose a two-stage approachas in Chacon & Duong (2010), which can be described as follows.

(i) Compute

ψNR8 = 8!

4!2d+8πd/2|S|−1/2Sd,8(vec S−1)⊗4,

which is the value of ψ8 in the case where f is the N (0,�) density, but with � replacedby S, the sample covariance matrix. Plug this estimate in the formula of ‖ω6‖2 andnumerically minimize in G ∈ F to obtain G6.

(ii) Use G = G6 to compute

ψ6(G) = n−2n∑

i, j=1

D⊗6φG(Xi − X j ),

plug ψ6(G6) in the formula of ‖ω4‖2 and numerically minimize in G ∈ F to obtain G4.(iii) Finally, employ G = G4 in the SCV criterion and numerically minimize in H ∈ F to

obtain HSCV.

4.2. Simulation study

In this section we explore the finite-sample performance of the unconstrained SCVselector in comparison to that of other selectors. The selectors that we compare are

(i) the plug-in with a scalar pilot from Duong & Hazelton (2003), labelled PIS.(ii) the plug-in with an unconstrained pilot from Chacon & Duong (2010), labelled PIU.

(iii) the smoothed cross-validation with a scalar pilot from Duong & Hazelton (2005b),labelled SCVS.

(iv) our proposed smoothed cross-validation with an unconstrained pilot, labelled SCVU.(v) the unconstrained cross-validation (UCV) of Sain, Baggerly & Scott (1994), labelled

UCV; this selector does not rely on asymptotic arguments, and does not require a pilotbandwidth.

All these selectors are implemented in the ks library (Duong 2007) in R.For the bivariate study, we examine six normal mixture target densities from Chacon

(2009). Their contour plots are depicted in Figure 1. Target density ‘1’ is a single normaldensity and so it can be considered a base case. Densities ‘2’, ‘6’, ‘7’, ‘8’ and ‘11’ have

C© 2012 Australian Statistical Publishing Association Inc.

Page 9: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 9

Figure 1. Contour plots for the bivariate target densities.

varying degrees of intricate structure. In addition to the normal mixture densities, we includetwo bivariate densities having bounded support, namely the symmetric beta density (seeDevroye 1996, example 2 with c = 2, for random variate generation from this distribution)and the Dirichlet(2, 2, 2) distribution.

For each target density, we take 100 replicates for two representative sample sizes,n = 100 and n = 1000. To measure the accuracy of a bandwidth selection method, sayH, we compute the integrated squared error of the kernel density estimate, ISE(H) =∫

Rd { fnH(x) − f (x)}2dx. The logarithms of the ISEs are given in Figures 2 and 3 for n = 100and n = 1000, respectively.

The relative performances of the five selectors considered are similar for the two samplesizes, so the following comments apply to both situations, with n = 100 and n = 1000.Comparing the two versions of the SCV approach, we see that for densities ‘1’, ‘2’, ‘6’, beta

C© 2012 Australian Statistical Publishing Association Inc.

Page 10: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

10 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

Figure 2. Boxplots for the logarithm of the integrated squared error of the kernel estimator usingthe plug-in bandwidth with a scalar pilot (PIS) and with an unconstrained pilot (PIU), the smoothedcross-validation bandwidth with a scalar pilot (SCVS) and with an unconstrained pilot (SCVU), andthe unconstrained cross-validation bandwidth (UCV) for the bivariate target densities for sample size

n = 100.

and Dirichlet, where we know that the constrained SCV pilot selectors are optimal, there is noloss in ISE performance when using unconstrained pilots. This indicates that our more general,unconstrained approach remains useful even in situations where simpler methods succeed. Forthe other three multimodal densities, however, the SCVU selector consistently outperformsSCVS, thus justifying the additional mathematical calculations involved in the new method.On the other hand, SCVU achieves the goal of having much less variability than UCV, andalso outperforms UCV in terms of median ISE for all densities. These observations mirrorthe comparison between the scalar and unconstrained pilot selectors for plug-in selectors(Chacon & Duong 2010). When comparing SCV and plug-in, it is found that they both havequite similar performances. SCV is known to produce smoother density estimates (see Cao,

C© 2012 Australian Statistical Publishing Association Inc.

Page 11: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 11

Figure 3. Boxplots for the logarithm of the integrated squared error of the kernel estimator usingthe plug-in bandwidth with a scalar pilot (PIS) and with an unconstrained pilot (PIU), the smoothedcross-validation bandwidth with a scalar pilot (SCVS) and with an unconstrained pilot (SCVU), andthe unconstrained cross-validation bandwidth (UCV) for the bivariate target densities for sample size

n = 1000.

Cuevas & Gonzalez-Manteiga 1994), so it performs better for density functions with smootherfeatures, for example densities ‘1’, ‘2’, beta and Dirichlet, and vice versa for sharper features,for example densities ‘6’, ‘7’, ‘8’, ‘11’, but with only a slight advantage of one method overthe other.

For the multivariate study, we focus on the multi-dimensional generalization of targetdensity ‘7’ introduced by Chacon & Duong (2010). This density is an equal two-componentnormal mixture 1

2 N ((d, 0, . . . , 0)�,��) + 12 N ((−d, 0, . . . , 0)�,��), where

= dd−1 · · ·2, with i the 45◦ rotation matrix in the plane of Rd defined by the

coordinates x1 and xi , and � = diag(4−(d−1), 4−(d−2), . . . , 4−1, 1). Each bivariate projectionof this density thus consists of a separated bimodal density. The ISEs for d = 2, 3, 4 are

C© 2012 Australian Statistical Publishing Association Inc.

Page 12: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

12 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

Figure 4. Boxplots for the logarithm of the integrated squared error of the kernel estimator usingthe plug-in bandwidth with a scalar pilot (PIS) and with an unconstrained pilot (PIU), the smoothedcross-validation bandwidth with a scalar pilot (SCVS) and with an unconstrained pilot (SCVU), and theunconstrained cross-validation bandwidth (UCV) for the multivariate target densities for sample size

n = 1000 (upper row) and n = 10 000 (lower row), for d = 2, 3, 4.

shown in Figure 4 for n = 1000 and n = 10 000. As in the bivariate case, SCVU is uniformlybetter than SCVS. Moreover, the marginal improvement of the unconstrained pilots overscalar pilots seems to increase with d, providing additional justification for the study of un-constrained pilot selectors. On comparing SCVU with its unconstrained plug-in counterpartPIU, the former outperforms the latter in all cases except d = 3 and n = 10 000, although wenote for this case that, even though the median ISEs are similar, the SCVU is considerablyless variable. Comparing SCVU with UCV, the situation is more blurred, with UCV generallyexhibiting lower median ISEs coupled with more dispersed ISEs, although for d = 3 andn = 10 000 SCVU outperforms UCV. This variable ISE performance of UCV has been notedpreviously, for example by Scott (1992, pp. 166–170).

4.3. Real data analysis

A key feature of eukaryotic cells is the compartmentalization of cellular functions intocomplex, membrane-surrounded organelles. The advent of modern fluorescent microscopytechnologies has allowed for the visualization of a variety of these sub-cellular organelles,using fluorescent markers attached to proteins of interest, which serve as a useful proxyfor studying organelles and their behaviour. The 3-dimensional spatial organization of theseorganelles poses difficult challenges for their analysis, thus requiring statistical approacheshistorically not used in cell biology.

C© 2012 Australian Statistical Publishing Association Inc.

Page 13: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 13

Figure 5. Density estimates of Rab6-positive organelles from microscopy image data. The scatter plot isthe upper left plot, with the approximate cell outline as the grey shell; the constrained and unconstrainedplug-in pilots (PIS, PIU) are the other two plots in the upper row; the unbiased cross-validation andthe constrained and unconstrained SCV pilots are in the lower row (UCV, SCVS, SCVU). The plottedcontours are the 10%, 30%, 50% 70% and 90% levels. The axes indicate pixels from the microscopy

images.

In a recent paper, Schauer et al. (2010) showed that density estimation is a powerfultechnique for studying global cellular organization and for the quantification of cell biology.The data set that we consider here is taken from these authors, and consists of the trivariatelocations of 181 organelles with a Rab6-positive fluorescence signal detected from a singlecell. We compute each of the five bandwidths considered in the previous simulation study:

HPIS =⎡⎣ 2455.07 −85.3 0.1

−85.3 777.3 −12.40.1 −12.4 3.0

⎤⎦, HPIU =

⎡⎣ 6072.53 82.9 149.0

82.9 1465.5 −31.9149.0 −31.9 5.4

⎤⎦,

HSCVS =⎡⎣ 4484.8 −254.5 0.6

−254.5 1163.9 −16.30.6 −16.3 3.9

⎤⎦, HSCVU =

⎡⎣ 6089.4 80.0 −6.3

80.0 1470.5 −33.3−6.3 −33.3 7.5

⎤⎦,

HUCV =⎡⎣ 885.8 293.4 13.5

293.4 892.8 −1.813.5 −1.8 0.8

⎤⎦

C© 2012 Australian Statistical Publishing Association Inc.

Page 14: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

14 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

and the resulting density estimates are displayed in Figure 5. The constrained pilot estimates(PIS and SCVS) are similar to each other. The PIU estimate emphasizes the oblique orientationof the data, whereas the SCVU estimate exhibits less marked trimodality in the central region.The UCV selector yields the noisiest density estimate, indicating that undersmoothing is quitelikely here. Schauer et al. (2010) note that Rab6-positive endosomes move along from theGolgi to the cell periphery. The Golgi is a larger endosome that is located at approximately(0, 100) in the (x, y)-plane, and the cell periphery is roughly demarcated by the grey contourshell in the figure. As expected, these density estimates exhibit local peaks in density nearthe Golgi and the cell periphery. The UCV estimates were considered to be too variablewhen applied to other similar experimental data (not shown). The SCVU estimate highlightsa smoother density of Rab6-positive endosomes from the Golgi to the periphery, comparedwith the more clustered densities for the SCVS, PIU and PIS estimates. The results are leadingto further studies to quantify this movement, using density estimation at discrete snapshots intime-lapse experiments.

Appendix: Proofs

In the following proofs, integrals without any integration limits are integrated over theappropriate Euclidean space.

Proof of Theorem 1. Using previously established results, for example Duong & Hazelton(2005b), we can show that vec (HMISE2 − HMISE) is of the same order as

DH(MISE2 − MISE)(HMISE) = n−1DH{R∗(KHMISE , f )},

where DH = ∂/(∂vec H) denotes the gradient operator with respect to vec H. Notice that achange of variables allows us to write R∗(KH, f ) = ∫

K (x)p(H1/2x)dx, where the func-tion p = f ∗ f−, with f−(x) = f (−x), is such that D⊗r p(0) = ψr . Moreover, under thesmoothness conditions on f and K , we can commute DH and the integral appearingin the R∗ functional, so that DH{R∗(KH, f )} = ∫

K (x)DH{p(H1/2x)}dx. Next, if we de-note ρ(H) = H1/2 ⊗ Id + Id ⊗ H1/2 then straightforward matrix differential calculus (seeMagnus & Neudecker 1999) gives the differential

d{p(H1/2x)} = Dp(H1/2x)�(x� ⊗ Id

)ρ(H)−1d vec H,

which yields DH{R∗(KH, f )} = ρ(H)−1∫

K (x)(x ⊗ Id )Dp(H1/2x)dx. Now using theTaylor expansion of the vector-valued function x → Dp(H1/2x) around x = 0, in the formu-lation of Chacon et al. (2010), it follows that

DH{R∗(KH, f )}

= ρ(H)−1

⎡⎣ 1∑

j=0

∫K (x)(x ⊗ Id )

{Id ⊗ (x�H1/2)⊗ j

}dx

⎤⎦D⊗( j+1) p(0){1 + o(1)}.

Because the first of the two summands in the previous expression vanishes, owing to thesymmetry of K, after some more matrix manipulations we finally obtain that DH{R∗(KH, f )}is asymptotically equivalent to m2(K )ρ(H)−1(Id ⊗ H1/2)ψ2, which is a bounded sequence.

C© 2012 Australian Statistical Publishing Association Inc.

Page 15: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 15

Thus, the components of the vector vec (HMISE2 − HMISE) are of order n−1, so recalling thatHMISE = O(n−2/(d+4)Jd ) the proof is complete. �

In the following, we define the vector raw moment of f as μr ( f ) = ∫x⊗r f (x) dx (see

Holmquist 1988, or Jammalamadaka, Rao & Terdik 2006). This vector moment is just thevectorization of the usual matrix moment for r = 2; that is, μ2( f ) = vec

∫xx� f (x) dx.

We also denote by Sd,r the symmetrizer matrix of order r (see Holmquist 1985). Thismatrix is characterized by the fact that pre-multiplying a Kronecker product of any r vectorsby Sd,r results in the normalized sum of all possible permutations of the r -fold product; forexample, for three d-vectors x1, x2 and x3, we have Sd,3(x1 ⊗ x2 ⊗ x3) = 1

3! (x1 ⊗ x2 ⊗x3 + x1 ⊗ x3 ⊗ x2 + x2 ⊗ x1 ⊗ x3 + x2 ⊗ x3 ⊗ x1 + x3 ⊗ x1 ⊗ x2 + x3 ⊗ x2 ⊗ x1).

Using formula (7.4) in Holmquist (1996), which gives the vector binomial expansion(x + y)⊗r = Sd,r

∑rj=0(r

j )x⊗ j ⊗ y⊗(r− j) for any x, y ∈ Rd , it is straightforward to prove

that the 2r th moment of the self-convolution � = � ∗ � of a symmetric function � : Rd → R

can be calculated as

μ2r (�) = Sd,2r

r∑j=0

(2r

2 j

)μ2 j (�) ⊗ μ2r−2 j (�).

Thus, for the particular case of � = � we obtain μ j (�) = 0 for j = 0, 1, 2, 3 and μ4(�) =μ4(K ) − 2μ4(K ) = 6 Sd,4μ2(K )⊗2 = 6m2(K )2Sd,4(vec Id )⊗2.

The proof of Theorem 2 needs two separate lemmas, which we state and prove next.

Lemma 1. Under the conditions of Theorem 2,

E SCV(H) − MISE2(H) = 1

4m2(K )2ω�

4 (vec H)⊗2{1 + o(1)}.

Proof. From the definitions,

E SCV(H) − MISE2(H)

= n−1�H ∗ LG(0) + (1 − n−1)E{�H ∗ LG(X1 − X2)} −∫

(�H ∗ f )(x) f (x) dx.

For the difference between the second and third terms, denoting by L0 the Dirac delta functionhere, we have

E{�H ∗ LG(X1 − X2)} −∫

(�H ∗ f )(x) f (x) dx

=∫

{�H ∗ (LG − L0) ∗ f }(x) f (x) dx

=∫∫∫

�( y)(L − L0)(z) f (x − H1/2 y − G1/2 z) f (x) dxd yd z.

To simplify this, we note that μ0(L − L0) = 0, μ1(L − L0) = 0 and μ2(L − L0) = μ2(L) =2μ2(L) = 2m2(L) vec Id , so taking into account the Taylor expansion

f (x − H1/2 y − G1/2 z) =2∑

j=0

(−1) j

j!(z�G1/2)⊗ jD⊗ j f (x − H1/2 y){1 + o(1)}

C© 2012 Australian Statistical Publishing Association Inc.

Page 16: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

16 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

and integrating with respect to z we obtain

E{�H ∗ LG(X1 − X2)} −∫

(�H ∗ f ) f (x) f (x) dx

=∫∫

�( y){μ2(L)�(G1/2)⊗2D⊗2 f (x − H1/2 y)

}f (x){1 + o(1)} dxd y

= m2(L)vec� G∫∫

�( y)D⊗2 f (x − H1/2 y) f (x){1 + o(1)} dxd y,

where we have used (vec� Id )(G1/2)⊗2 = vec� G. Next we apply the Taylor expansion of thevector-valued function D⊗2 f (x − H1/2 y), for example from Chacon et al. (2010),

D⊗2 f (x − H1/2 y) =4∑

j=0

(−1) j

j!

{Id2 ⊗ ( y�H1/2)⊗ j

}D⊗( j+2) f (x){1 + o(vec Jd )}.

Because the first three moments of � are zero and μ4(�) = 6m2(K )2Sd,4(vec Id )⊗2, wearrive at

E{�H ∗ LG(X1 − X2)} −∫

(�H ∗ f )(x) f (x) dx

= 1

4!m2(L)vec� G

[Id2 ⊗ {

μ4(�)�(H1/2)⊗4}] ∫

D⊗6 f (x) f (x) dx{1 + o(1)}

= 1

4m2(K )2m2(L)(vec� G)

[Id2 ⊗ {(

vec� Id)⊗2Sd,4(H1/2)⊗4

}]ψ6{1 + o(1)}

= 1

4m2(K )2m2(L)

[(vec� G) ⊗ {(vec� H)⊗2}]ψ6{1 + o(1)}

= 1

8m2(K )2m2(L)ψ�

6 [(vec G) ⊗ Id4 ](vec H)⊗2{1 + o(1)},

where we have used Sd,4(H1/2)⊗4 = (H1/2)⊗4Sd,4 and (Id2 ⊗ Sd,4)D⊗6 = D⊗6 (see Schott2003).

For the other summand, �H ∗ LG(0) = |G|−1/2∫

�(z)L(G−1/2H1/2 z)d z, so that a Tay-lor expansion of order 4 of L around z = 0 gives

�H ∗ LG(0) = 1

4!|G|−1/2μ4(�)�(H1/2)⊗4(G−1/2)⊗4D⊗4 L(0){1 + o(1)}

= 1

4m2(K )2|G|−1/2(vec� H)⊗2(G−1/2)⊗4D⊗4 L(0){1 + o(1)}

and the proof is complete. �

Lemma 2. Under the conditions of Theorem 2,

var DHSCV(H) = 1

4m2(K )4(vec� H ⊗ Id2

)4(vec H ⊗ Id2 ){1 + o(1)}.

C© 2012 Australian Statistical Publishing Association Inc.

Page 17: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 17

Proof. By standard U -statistics theory, we are interested in the dominant terms of

var DHSCV(H) = var

{n−2

n∑i �= j

(DH�H) ∗ LG(Xi − X j )

}∼ 4n−1(�1 − �0) + 2n−2�2,

where

�1 = E{(DH�H) ∗ LG(X1 − X2)(DH�H) ∗ LG(X1 − X3)�

}�2 = E

{(DH�H) ∗ LG(X1 − X2)(DH�H) ∗ LG(X1 − X2)�

}�0 = E{(DH�H) ∗ LG(X1 − X2)}E{(DH�H) ∗ LG(X1 − X2)}�.

Regarding �0, taking into account the moment properties of � it follows that

E{�H ∗ LG(X1 − X2)} =∫∫

�( y) f (x − H1/2 y) f (x){1 + o(1)} dxd y

= 1

4m2(K )2(vec� H)⊗2ψ4{1 + o(1)}.

It is straightforward to obtain the differential d(vec H)⊗2 = 2(vec H ⊗ Id2 )d vec H, where 2 = Id2 + Kd2,d2 , with Km,n denoting the commutation matrix of order mn × mn (Magnus& Neudecker 1979). So taking into account that �

2 D⊗4 = 2D⊗4 (hence �2 ψ4 = 2ψ4) and

swapping the order of expectation and differentiation,

E{(DH�H) ∗ LG(X1 − X2)} = 1

2m2(K )2(vec� H ⊗ Id2

)ψ4{1 + o(1)}.

Overall,

�0 = 1

4m2(K )4

(vec� H ⊗ Id2

)ψ4ψ

�4 (vec H ⊗ Id2 ){1 + o(1)}.

For the other two matrices �1 and �2, using the Taylor expansion

L(G−1/2x − G−1/2H1/2 z) =4∑

j=0

(−1) j

j!(z�H1/2G−1/2)⊗ jD⊗ j L(G−1/2x){1 + o(1)}

gives, for every fixed x,

�H ∗ LG(x) = |G|−1/2

∫�(z)L(G−1/2x − G−1/2H1/2 z)d z

= 1

4m2(K )2|G|−1/2(vec� H)⊗2(G−1/2)⊗4D⊗4 L(G−1/2x){1 + o(1)}

= 1

4m2(K )2(vec� H)⊗2D⊗4 LG(x){1 + o(1)}.

Differentiating the previous expression, we obtain

(DH�H) ∗ LG(x) = 1

2m2(K )2

(vec� H ⊗ Id2

)D⊗4 LG(x){1 + o(1)}.

C© 2012 Australian Statistical Publishing Association Inc.

Page 18: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

18 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

Therefore, integrating out, using the change of variables z = G−1/2(x − y) and a Taylorexpansion of f ,

�2 =∫∫

(DH�H) ∗ LG(x − y)(DH�H) ∗ LG(x − y)� f (x) f ( y)dxd y

= 1

4m2(K )4

(vec� H ⊗ Id2

)∫∫D⊗4 LG(x − y)D⊗4 LG(x − y)� f (x) f ( y)dxd y

×(vec H ⊗ Id2 ){1 + o(1)}

= 1

4m2(K )4 R( f )

(vec� H ⊗ Id2

)|G|−1/2(G−1/2)⊗4R(D⊗4 L)(G−1/2)⊗4(vec H ⊗ Id2 )

×{1 + o(1)}

and similarly

�1 =∫∫∫

(DH�H) ∗ LG(x − y)(DH�H) ∗ LG(x − z)� f (x) f ( y) f (z) dxd yd z

= 1

4m2(K )4

(vec� H ⊗ Id2

) ∫∫∫D⊗4 LG(x − y)D⊗4 LG(x − z)� f (x) f ( y) f (z) dxd yd z

×(vec H ⊗ Id2 ){1 + o(1)}

= 1

4m2(K )4

(vec� H ⊗ Id2

) ∫∫∫LG(x − y)LG(x − z) f (x)D⊗4 f ( y)D⊗4 f (z)� dxd yd z

×(vec H ⊗ Id2 ){1 + o(1)}

= 1

4m2(K )4

(vec� H ⊗ Id2

) ∫D⊗4 f (x)D⊗4 f (x)� f (x) dx (vec H ⊗ Id2 ){1 + o(1)}.

�Duong & Hazelton (2005b) show that we can express MSE(G) in terms of the derivatives

of SCV − MISE. We decompose this difference into (SCV − MISE2) + (MISE2 − MISE)and focus on the former, as it dominates the latter.

Proof of Theorem 2. From Duong & Hazelton (2005b), the leading term of the squared biasin MSE(G) is E{DH(SCV − MISE2)(H)}�E{DH(SCV − MISE2)(H)}. From Lemma 4.3, thedifference (ESCV − MISE2)(H) = 1

4 m2(K )2ω�4 (vec H)⊗2{1 + o(1)} so

E{DH(SCV − MISE2)(H)}�E{DH(SCV − MISE2)(H)}

= 1

16m2(K )4ω�

4 2(vec Hvec� H ⊗ Id2

) �

2 ω4{1 + o(1)}

= 1

4m2(K )4ω�

4

(vec Hvec� H ⊗ Id2

)ω4{1 + o(1)},

as it is not hard to check that �2 ω4 = 2ω4. The leading term of the variance in MSE(G) is

varDHSCV(H), which was already computed in Lemma 4.3. �

C© 2012 Australian Statistical Publishing Association Inc.

Page 19: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 19

Proof of Theorem 3. Combining theorem 2 from Chacon & Duong (2010) with Theorem 2in this paper, we find that MSE(HSCV; G) = O(n−4/(d+6)Jd2 )(vec HMISE)(vec� HMISE). Theresult follows immediately from the approach in Duong & Hazelton (2005a). �

Proof of Corollary 1. Substituting G = g2Id into the dominant squared bias term fromTheorem 2, we obtain AMSE(g) = 1

4 m2(K )4(n−2g−2d−8 A1 + 2n−1g−d−2 A2 + g4 A3){1 +o(1)}. Differentiating with respect to g and setting to zero leads to

(d + 4)A1n−1g−2d−12 + A2n−1g−d−6 − 2A3 = 0,

which is a quadratic in n−1g−d−6, whose solution is readily found. The discriminant of thisquadratic is (d + 2)2 A2

2 + 8(d + 4)A1 A3 > 0, as A1, A3 > 0, so the solutions are real-valued.�

To show that the expression from Corollary 1 is essentially the same as the optimal pilotselector from Duong & Hazelton (2005b), we rely on the next lemma.

Lemma 3. Let A ∈ Md×d and �6 = ∫(D2)3 f (x) f (x) dx. Then

tr(A�6) = vec� (A ⊗ Id2 )ψ6

tr(A�26) = ψ�

6

(Id ⊗ A ⊗ vec Id2 vec� Id2

)ψ6.

Proof. Let us denote by Km,n the commutation matrix of order mn × mn, whose propertieswe use next (see Magnus & Neudecker 1979). For a vector a ∈ R

d2, we have that

a�vec (D2)3 = a�(D2 ⊗ D2)D⊗2 = vec� (D2 ⊗ D2)vec (D⊗2a�)

= (D⊗4)�(Id ⊗ Kdd ⊗ Id )(a ⊗ Id2 )D⊗2 = (D⊗4)�(a ⊗ Id2 )D⊗2

= vec� (a� ⊗ Id2

)D⊗6 = (

a� ⊗ vec� Id2

)D⊗6.

(3)

We apply (3) with a = vec A to the trace of the product of a matrix A with the cube of theHessian operator to obtain the first stated equality

tr{A(D2)3} = (vec� A)vec (D2)3 = (vec� A ⊗ vec� Id2

)D⊗6 = vec� (A ⊗ Id2 )D⊗6.

To prove the second equality we use the first one, some more properties of the Kroneckerproduct (Magnus & Neudecker 1999) and (3) with a = (Id ⊗ A� ⊗ vec� Id2 )ψ6, to obtain

tr(A�26) = ψ�

6 vec {(A�6) ⊗ Id2}= ψ�

6 (Id ⊗ A ⊗ vec Id2 )vec �6

= [{ψ�

6 (Id ⊗ A ⊗ vec Id2 )} ⊗ vec� Id2

]ψ6

= ψ�6 (Id ⊗ A ⊗ vec Id2 ⊗ vec� Id2 )ψ6,

as desired. �

C© 2012 Australian Statistical Publishing Association Inc.

Page 20: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

20 UNCONSTRAINED SMOOTHED CROSS-VALIDATION

Proof of Corollary 2. If L is a normal kernel then, using properties from section 3.3 inChacon & Duong (2010),

D⊗4φ(0) = D⊗4φ2Id (0) = 2−(d+4)/2D⊗4φ(0) = 3 × 2−d−2π−d/2Sd,4(vec Id )⊗2.

Substituting this into Corollary 1, the coefficients A1 and A2 can be rewritten. For A1,

16

9(4π )dn4/(d+4) A1 = tr

{(vec C vec� C ⊗ Id2

)Sd,4(vec Id vec� Id

)⊗2}= tr

{(vec C vec� C vec Id vec� Id ⊗ vec Id vec� Id

)Sd,4}

= (tr C) tr{(

vec C vec� Id ⊗ vec Id vec� Id)Sd,4

}= (tr C) tr

{(vec C ⊗ vec Id )

(vec� Id ⊗ vec� Id

)Sd,4}

= (tr C) tr{vec (C ⊗ Id )(Id ⊗ Kdd ⊗ Id )(Id ⊗ Kdd ⊗ Id )

(vec� Id2

)Sd,4}

= (tr C) tr{vec (C ⊗ Id )

(vec� Id2

)Sd,4},

and for A2,

4

3(4π )d/2n4/(d+4) A2 = tr

{(vec C vec� C ⊗ Id2

)Sd,4(vec Id )⊗2ψ�6 (vec Id ⊗ Id4 )

}= tr

{Sd,4(vec C vec� C vec Id ⊗ vec Id

)ψ�

6 (vec Id ⊗ Id4 )}

= (tr C)tr{Sd,4(vec C ⊗ vec Id )ψ�

6 (vec Id ⊗ Id4 )}.

Rewriting these expressions for A1 and A2, and that for A3, as scalar products of vectorizedmatrices gives the result.

The equivalent coefficients from Duong & Hazelton (2005b) are

A′1 = 1

16(4π )−dn−4/(d+4)(tr C){4 + (d + 4)tr C},

A′2 = 1

16(4π )−d/2n−4/(d+4)tr

[{2C2 + (tr C)C}�6],

A′3 = 1

4n−4/(d+4)tr

(C2�2

6

).

Applying the identities from the previous lemma to convert tr{[2C2 + (tr C)C]�6

}and

tr(C2�26) to expressions involving ψ6 establishes the result. �

References

ABDOUS, B. (1999). L2 version of the double kernel method. Statistics 32, 249–266.BERLINET, A. & DEVROYE, L. (1994). A comparison of kernel density estimates. Publ. l’Inst. Statist. l’Univ.

Paris 38, 3–59.CAO, R. (1993). Bootstrapping the mean integrated squared error. J. Multivariate Anal. 45, 137–160.CAO, R., CUEVAS, A. & GONZALEZ-MANTEIGA, W. (1994). A comparative study of several smoothing methods

in density estimation. Comput. Statist. Data Anal. 17, 153–176.CHACON, J.E. (2009). Data-driven choice of the smoothing parametrization for kernel density estimators.

Canad. J. Statist. 37, 249–265.CHACON, J.E. & DUONG, T. (2010). Multivariate plug-in bandwidth selection with unconstrained pilot band-

width matrices. Test 19, 375–398.

C© 2012 Australian Statistical Publishing Association Inc.

Page 21: UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED …UNCONSTRAINED PILOT SELECTORS FOR SMOOTHED CROSS-VALIDATION JOSE ´E. CHACON1,∗ AND TARN DUONG2 Universidad de Extremadura and Institut

JOSE E. CHACON AND TARN DUONG 21

CHACON, J.E., DUONG, T. & WAND, M.P. (2011). Asymptotics for general multivariate kernel density deriva-tives. Statistica Sinica. 21, 807–840..

DEVROYE, L. (1996). Random variate generation in one line of code. In 1996 Winter Simulation ConferenceProceedings, eds. J.M. Charnes, D.J. Morrice, D.T. Brunner and J.J. Swain, pp. 265–272. San Diego,CA: ACM.

DUONG, T. (2007). ks: Kernel density estimation and kernel discriminant analysis for multivariate data in R.J. Statist. Softw. 21(7), 1–16.

DUONG, T. & HAZELTON, M.L. (2003). Plug-in bandwidth matrices for bivariate kernel density estimation. J.Nonparametr. Stat. 15, 17–30.

DUONG, T. & HAZELTON, M.L. (2005a). Convergence rates for unconstrained bandwidth matrix selectors inmultivariate kernel density estimation. J. Multivariate Anal. 93, 417–433.

DUONG, T. & HAZELTON, M.L. (2005b). Cross-validation bandwidth matrices for multivariate kernel densityestimation. Scand. J. Statist. 32, 485–506.

HALL, P. & MARRON, J.S. (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett.6, 109–115.

HALL, P. & MARRON, J.S. (1991). Lower bounds for bandwidth selection in density estimation. Probab. TheoryRelated Fields 90, 149–163.

HALL, P., MARRON, J.S. & PARK, B.U. (1992). Smoothed cross validation. Probab. Theory Rel. Fields 92,1–20.

HOLMQUIST, B. (1985). The direct product permuting matrices. Linear Multilinear Algebra 17, 117–141.HOLMQUIST, B. (1988). Moments and cumulants of the multivariate normal distribution. Stochastic Anal. 6,

273–278.HOLMQUIST, B. (1996). The d-variate vector Hermite polynomial of order k. Linear Algebra Appl. 237/238,

155–190.JAMMALAMADAKA, S.R., RAO, T.S. & TERDIK, G. (2006). Higher order cumulants of random vectors and

applications to statistical inference and time series. Sankhya 68, 326–356.JONES, M.C. (1998). On some kernel density estimation bandwidth selectors related to the double kernel

method. Sankhya 60, 249–264.JONES, M.C. & SHEATHER, S.J. (1991). Using non-stochastic terms to advantage in kernel-based estimation

of integrated squared density derivatives. Statist. Probab. Lett. 11, 511–514.JONES, M.C., MARRON, J.S. & PARK, B.U. (1991). A simple root n bandwidth selector. Annals Statist. 19,

1919–1932.MAGNUS, J.R. & NEUDECKER, H. (1979). The commutation matrix: some properties and applications. Annals

Statist. 7, 381–394.MAGNUS, J.R. & NEUDECKER, H. (1999). Matrix Differential Calculus with Applications in Statistics and

Econometrics, rev. edn. Chichester: John Wiley & Sons.SAIN, S.R., BAGGERLY, K.A. & SCOTT, D.W. (1994). Cross-validation of multivariate densities. J. Amer.

Statist. Assoc. 89, 807–817.SCHAUER, K., DUONG, T., BLEAKLEY, K., BARDIN, S., BORNENS, M. & GOUD, B. (2010). Probabilistic density

maps to study global endomembrane organization. Nat. Methods 7, 560–568.SCHOTT, J.R. (2003). Kronecker product permutation matrices and their application to moment matrices of

the normal distribution. J. Multivariate Anal. 87, 177–190.SCOTT, D.W. (1992). Multivariate Density Estimation: Theory, Practice, and Visualization. New York: John

Wiley & Sons.SIMONOFF, J.S. (1996). Smoothing Methods in Statistics. Berlin: Springer-Verlag.WAND, M.P. (1992). Error analysis for general multivariate kernel estimators. J. Nonparametr. Stat. 2, 2–15.WAND, M.P. & JONES, M.C. (1993). Comparison of smoothing parametrizations in bivariate kernel density

estimation. J. Amer. Statist. Assoc. 88, 520–528.WAND, M.P. & JONES, M.C. (1994). Multivariate plug-in bandwidth selection. Comput. Statist. 9, 97–117.

C© 2012 Australian Statistical Publishing Association Inc.