Inconsistency of Bootstrap: the Grenander Estimatordept.stat.lsa.umich.edu/~moulib/Grenboots.pdf · estimate, smoothed bootstrap. 1 Introduction Suppose that we observe i.i.d. random

Inconsistency of Bootstrap: the Grenander

Estimator

Bodhisattva Sen, Moulinath Banerjee and Michael Woodroofe

University of Michigan

November 5, 2007

Abstract

In this paper we investigate the (in)-consistency of different boot-

strap methods for constructing confidence bands in the class of esti-

mators that converge at rate cube-root n. The Grenander estimator

(see Grenander (1956)), the nonparametric maximum likelihood esti-

mator of an unknown non-increasing density function f on [0,∞), is

a prototypical example. We focus on this example and illustrate dif-

ferent approaches of constructing confidence intervals for f(t0), where

t0 is an interior point, i.e., 0 < t0 < ∞. It is claimed that the boot-

strap statistic, when generating bootstrap samples from the empirical

distribution function Fn, does not have any weak limit, conditional on

the data, in probability. A similar phenomenon is shown to hold when

bootstrapping from Fn, the least concave majorant of Fn. We provide

a set of sufficient conditions for the consistency of bootstrap methods

in this example. A suitable version of smoothed bootstrap is pro-

posed and shown to be strongly consistent. The m out of n bootstrap

method is also proved to be consistent while generating samples from

1

Fn and Fn. Although we work out the main results for the Grenander

estimator, very similar techniques can be employed to draw analogous

conclusions for other estimators with cube-root convergence.

Keywords: decreasing density, empirical distribution function, least con-

cave majorant, m out of n bootstrap, nonparametric maximum likelihood

estimate, smoothed bootstrap.

1 Introduction

Suppose that we observe i.i.d. random variables X1, X2, . . . , Xn from a con-

tinuous distribution function F with non-increasing density f on [0,∞). Let

Fn denote the empirical distribution function (e.d.f.) of the data. Grenan-

der (1956) showed that the non-parametric maximum likelihood estimator

(NPMLE) fn of f exists (under the monotonicity constraint) and is given

by the left-derivative of Fn, the least concave majorant (LCM) of Fn (see

Robertson, Wright and Dykstra (1988) for a derivation of this result). The

main result on the distributional convergence of fn(t0), for t0 ∈ (0,∞), was

given by Prakasa Rao (1969): If f ′(t0) 6= 0, then

n1/3

fn(t0)− f(t0)⇒ κZ (1)

where κ = 2∣∣12f(t0)f

′(t0)∣∣1/3

, Z = arg maxs∈RW(s) − s2, and W is a two-

sided standard Brownian motion on R with W(0) = 0. There are other

estimators that exhibit similar asymptotic properties; for example, Chernoff’s

(1964) estimator of the mode, the monotone regression estimator (Brunk

(1970)), Rousseeuw’s (1984) least median of squares, and the estimator of

the shorth (Andrews et al. (1972) and Shorack and Wellner (1986)). The

seminal paper by Kim and Pollard (1990) unifies the n1/3-rate of convergence

problems in a more general M-estimation framework and provides limiting

distributions of the estimators.

2

The presence of nuisance parameters in the limit distribution of the esti-

mators complicates the construction of confidence intervals. Bootstrap inter-

vals avoid this problem and are generally reliable and accurate in problems

with√

n convergence rate (see Bickel and Freedman (1981), Singh (1981),

Shao and Tu (1995) and its references). Our aim in this paper is to study

the consistency of bootstrap methods for the Grenander estimator with the

goal of constructing point-wise confidence bands around fn. The monotone

density estimation problem sheds light on the behavior of bootstrap methods

in other similar cube-root convergence problems discussed above.

Recently there has been considerable interest in using resampling based

methods in similar n1/3-rate convergence problems. Subsampling based con-

fidence intervals (see Romano, Politis and Wolf (1999)) are consistent in this

scenario. But subsampling requires a choice of block-size, which is quite

tricky and computationally intensive. The resulting confidence intervals are

also not always very accurate and can vary substantially with changing block-

size. Abrevaya and Huang (2005) obtained the unconditional limit distribu-

tion for the bootstrap version of the normalized estimator in the setup of

Kim and Pollard (1990) and proposed a method for constructing confidence

intervals in such non-standard problems by correcting the usual bootstrap

method. But as we will show in this paper, such methods of correcting the

usual bootstrap method are unlikely to work since there is extremely strong

evidence to suggest that the bootstrap statistic does not have any weak

limit in probability, conditional on the data. Kosorok (2007) also shows that

bootstrapping from the e.d.f. is not consistent in the monotone density es-

timation problem. Lee and Pun (2006) explore m out of n bootstrapping

from the empirical distribution function in similar non-standard problems

and prove the consistency of the method. Leger and MacGibbon (2006)

describe conditions for a resampling procedure to be consistent under cube

root asymptotics and assert that these conditions are generally not met while

3

bootstrapping from the e.d.f. They propose a smoothed version of bootstrap

and show its consistency for Chernoff’s estimator of the mode. The authors

carry out an extensive simulation study which reveals a disparity in the cov-

erage probability of the percentile and basic bootstrap confidence intervals,

also shedding doubt on the existence of a fixed conditional limit distribution

for the bootstrap statistic.

In Section 2 we introduce notation, describe the stochastic processes of

interest, and prove a uniform version of Equation (1) that is used later on to

study the consistency of different bootstrap methods. Section 3 starts with

a brief introduction to bootstrap procedures and formalizes the notion of

consistency. We show that if the bootstrap methods (while generating boot-

strap samples from either the e.d.f. Fn or its LCM Fn) were consistent, then

two random variables would be independent, and then show by simulation

that these two random variables are not independent. In fact, we show that

in these two situations the bootstrap distribution of the statistic of interest

does not even have any conditional weak limit, in probability. We state suf-

ficient conditions for the consistency of any bootstrap method and propose

a version of smoothed bootstrap in Section 4 that can be used to construct

asymptotically correct confidence intervals for f(t0). Section 5 investigates

the m out of n bootstrapping procedure, when generating bootstrap sam-

ples from Fn and Fn, and shows that both the methods are consistent. In

Section 6 we discuss our findings, especially the failure of the conditional

convergence of the bootstrap established in Section 3, which we view as one

of the key contributions of our current research as it has strong implications

for the behavior of the bootstrap in the broader class of cube–root estimation

problems. Section A, the appendix, provides the details of some arguments

used in proving the main results.

4

2 Preliminaries

We begin with a uniform version of the Prakasa Rao (1969) result which

will be useful later on. For the rest of the paper we will assume that F is

a distribution function with continuous non-increasing density f on [0,∞)

which is continuously differentiable near t0 ∈ (0,∞) with nonzero derivative.

Suppose that Xn,1, Xn,2, . . . , Xn,mn are i.i.d. random variables having distri-

bution function Fn, where mn ≤ n (of special interest is the case mn = n).

The quantity of interest to us is

∆n := m1/3n fn,mn(t0)− fn(t0) (2)

where fn,mn(t0) is the Grenander estimator based on the data Xn,1, Xn,2, . . . ,

Xn,mn and fn(t0) can be taken as the density of Fn at t0 (later on we allow

fn to be more flexible, and Fn need not have a density). Let Fn,mn be the

e.d.f. of the data. We study the limiting distribution of the process

Zn(h) := m2/3n

Fn,mn(t0 + hm−1/3

n )− Fn,mn(t0)− fn(t0)hm−1/3n

(3)

for h ∈ Imn := [−t0m1/3n ,∞) and use continuous mapping arguments to

deduce the limiting distribution of ∆n, which can be expressed as the left-

hand slope at 0 of the LCM of Zn, i.e., ∆n = CMImn(Zn)′(0), where CMI

is the operator that maps a function g : R → R into the LCM of g on

the interval I ⊂ R and ′ corresponds to the left derivative. We consider

all stochastic processes as random elements in D(R), the space of cadlag

function (right continuous having left limits) on R, and equip it with the

projection σ-field and the metric of uniform convergence on compacta, i.e.,

ρ(x, y) =∞∑

k=1

2−kmin[1, ρk(x, y)]

where ρk(x, y) = sup|t|≤k |x(t)− y(t)| and x and y are elements in D(R). We

say that a sequence ξn of random elements in D(R) converges in distribu-

tion to a random element ξ, written ξn ⇒ ξ, if Eg(ξn) → Eg(ξ) for every

5

bounded, continuous, measurable real-valued function g. With this notion

of weak convergence, the continuous mapping theorem holds (see Pollard

(1984), Chapters IV and V for more details).

We decompose Zn into Zn,1 and Zn,2 where

Zn,1(h) := m2/3n

(Fn,mn − Fn)(t0 + hm−1/3

n )− (Fn,mn − Fn)(t0)

Zn,2(h) := m2/3n

Fn(t0 + hm−1/3

n )− Fn(t0)− fn(t0)hm−1/3n

(4)

Now we state some conditions on the behavior of Fn and fn (which need

not be the density of Fn) to be utilized in proving the uniform version of

Equation (1).

(a) Fn(x) → F (x) uniformly for all x in a neighborhood of t0.

(b) m1/3n

Fn(t0 + hm

−1/3n )− Fn(t0)

→ hf(t0) as n → ∞ uniformly on

compacta.

(c) Zn,2(h) → 12h2f ′(t0) as n →∞ uniformly on compacta.

(d) For each ε > 0,

∣∣∣∣Fn(t0 + β)− Fn(t0)− βfn(t0)− 1

2β2f ′(t0)

∣∣∣∣ ≤ εβ2 + o(β2) + O(m−2/3n )

for large n, uniformly in β varying over a neighborhood of zero (both

n and the neighborhood can depend on ε).

(e) There exist a neighborhood of 0 and a constant C > 0 such that for all

n sufficiently large,

|Fn(t0 + β)− Fn(t0)| ≤ |β|C + O(m−1/3n )

uniformly for β in the neighborhood of 0.

6

Letting W1 be a standard two-sided Brownian motion on R with W1(0) = 0,

we define the following stochastic processes

Z1(h) = W1(f(t0)h) and Z(h) = Z1(h) +1

2h2f ′(t0), for h ∈ R.

Proposition 2.1 If (b) holds then Zn,1 ⇒ Z1. Further, if (c) holds then

Zn ⇒ Z.

Proof. To find the limit distribution of the process Zn, we make crucial

use of the Hungarian embedding of Komlos, Major and Tusnady (1975). We

may suppose that Xn,i = F#n (Un,i), where F#

n (u) = infx : Fn(x) ≥ u and

Un,1, . . . , Un,mn are i.i.d. Uniform(0, 1) random variables. Let Un denote the

empirical distribution function of Un,1 , . . . , Un,mn , En(t) =√

mn(Un(t)− t),

and Vn =√

mn(Fn,mn −Fn). Then Vn = En Fn. We may also suppose that

the probability space supports a sequence of independent Brownian Bridges

B0nn≥1 for which

sup0≤t≤1

|En(t)− B0n(t)| = O(m−1/2

n log mn) a.s.

Let ηnn≥1 be a sequence of N(0, 1) random variables independent of B0nn≥1.

Define a version Bn of Brownian motion by Bn(t) = B0n(t)+ηnt, for t ∈ [0, 1].

Using the Hungarian construction we express Zn,1 as

Zn,1(h) = m1/6n Vn(t0 + hm−1/3

n )− Vn(t0)= m1/6

n

En(Fn(t0 + hm−1/3

n ))− En(Fn(t0))

= m1/6n

B0

n(Fn(t0 + hm−1/3n ))− B0

n(Fn(t0))

+ Rn,1(h)

= m1/6n

Bn(Fn(t0 + hm−1/3

n )− Bn(Fn(t0)

+ Rn(h) (5)

where Rn = Rn,1+Rn,2, |Rn,1(h)| ≤ 2m1/6n sup0≤t≤1 |En(t)−B0

n(t)| = O(m−1/3n log mn)

a.s., and |Rn,2(h)| ≤ m1/6n |ηn||Fn(t0 + hm

−1/3n ) − Fn(t0)| → 0, w.p.1 by con-

dition (b). Therefore, Rn(h) → 0 w.p.1 as n → ∞ uniformly on compacta.

Letting Xn(h) := m1/6n Bn(Fn(t0 + hm

−1/3n ))− Bn(Fn(t0)), we observe that

7

Xn is a mean zero Gaussian process defined on Imn with independent incre-

ments and covariance kernel

Kn(h1, h2) = m1/3n Fn(t0 + (h1 ∧ h2)m

−1/3n )− Fn(t0)1sign(h1h2) > 0.

Theorem V.19 in Pollard (1984) gives sufficient conditions for convergence

of the process Xn(h) to W1(f(t0)h) in D([−c, c]) for any c that are read-

ily verified using condition (b) in the proposition. The second part follows

immediately. ¤We may obtain the asymptotic distribution of ∆n from the following

corollary, which is stated in a more general setup.

Corollary 2.2 Suppose that conditions (a), (d) and (e) hold. Let Z be a

stochastic process on R such that,

(1) lim|h|→∞Z(h)|h| = −∞ a.e.,

(2) Z is a.s. bounded above, and

(3) CM[−k,k](Z), for k = 1, 2, . . ., and CMR(Z) are differentiable at 0 a.s.

If Zn ⇒ Z then ∆n ⇒ CMR(Z)′(0).

We use the continuous mapping principle and a localization argument similar

to that in Kim and Pollard (1990). The details are provided in the Appendix.

3 Inconsistency of the bootstrap

In this section, we show that the usual bootstrap method, generating boot-

strap samples from the e.d.f. Fn, leads to an inconsistent procedure. Not

only does the bootstrap estimate fail to converge weakly to the right distribu-

tion, but there is strong evidence that it does not have any conditional limit

distribution, in probability. We also consider bootstrapping from Fn, the

8

least concave majorant of Fn, and this procedure shows similar asymptotic

behavior. We begin with a brief discussion on bootstrap.

Suppose we have i.i.d. random variables X1, X2, . . . , Xn having an un-

known distribution function F defined on a probability space (Ω,A,P ) and

we seek to estimate the sampling distribution of the random variable Rn(Xn,

F ), based on the observed data Xn = (X1, X2, . . . , Xn). Let Hn be the dis-

tribution function of Rn(Xn, F ). The bootstrap methodology can be broken

into three simple steps:

Step 1: Construct an estimate Fn of F based on the data (for example, the

e.d.f. Fn).

Step 2: With Fn fixed, we draw a random sample of size mn from Fn, say

X∗n = (X∗

1 , X∗2 , . . . , X

∗mn

) (identically distributed and conditionally in-

dependent given Xn). This is called the bootstrap sample.

Step 3: We approximate the sampling distribution of Rn(Xn, F ) by the sam-

pling distribution of R∗n = Rn(X∗

n, Fn). The sampling distribution of

R∗n can be simulated on the computer by drawing a large number of

bootstrap samples and computing R∗n for each sample.

Thus the bootstrap estimator of the sampling distribution function of

Rn(Xn, F ) is given by

H∗n(x) = P ∗R∗

n ≤ x,where P ∗· is the conditional probability given the data Xn. Let L denote

the Levy metric or any other metric metrizing weak convergence of distri-

bution functions. We say that H∗n is (weakly) consistent if L(Hn, H∗

n)P→ 0.

Similarly, H∗n is strongly consistent if L(Hn, H

∗n) → 0 a.s. If Hn has a weak

limit H, for the bootstrap procedure to be consistent, H∗n must converge

weakly to H, in probability. In addition, if H is continuous, we must have

supx∈R

|H∗n(x)−H(x)| P→ 0 as n →∞.

9

By saying that H∗n converges in distribution to a possibly random G, in

probability, we shall mean

(i) that there exists a stochastic transition function G : R × Ω → [0, 1]

such that G(·, ω) is a distribution function for all ω ∈ Ω, and G(x; ·) is

a measurable function for every x ∈ R, and

(ii) L(H∗n, G)

P→ 0.

In fact, if Fn depends only on the order statistics of X1, X2, . . . , Xn, the

limiting G cannot depend on ω, if it exists. For if h is a bounded measur-

able function on R, then any limit in probability of∫R h(x)H∗

n(dx; ω) must

be invariant under permutations of X1, X2, . . . , Xn up to equivalence, and

thus, must be almost surely constant by the Hewitt-Savage zero-one law(see

Breiman (1968)). Let

G(x) =

∫

Ω

G(x; ω)P (dω),

then G is a distribution function and∫R h(x)G(dx; ω) =

∫R h(x)G(dx) a.s.

for each bounded continuous h. It follows that G(x; ω) = G(x) a.e. ω for

each x by letting h approach an indicator.

We are interested in exploring the (in)-consistency of different bootstrap

procedures for the Grenander estimator. Specifically, we are interested in

studying the limit behavior of

∆∗n = m1/3

n

f ∗n,mn

(t0)− fn(t0)

(6)

where fn(t0) is an estimate of f(t0) (fn(t0) can be fn(t0)); f ∗n,mn(t0) is the

corresponding bootstrap estimate based on a bootstrap sample of size mn.

Remark: For the rest of the paper we make crucial use of Proposition 2.1

and Corollary 2.2. In situations where the bootstrap works, the results will be

10

applied conditionally on the sequence X1, X2, . . . with Fn = Fn and Fn,mn =

F∗n (the e.d.f. of the bootstrap sample generated from Fn). For scenarios

where the bootstrap is inconsistent, techniques similar to that of the proof

of Corollary 2.2 are used unconditionally to derive the unconditional limit

distribution of ∆∗n.

3.1 Bootstrapping from the e.d.f. Fn

Consider now the case in which mn = n and Fn = Fn. The quantity of interest

is ∆∗n := n1/3f ∗n(t0)− fn(t0), the bootstrap analogue of ∆n := n1/3fn(t0)−

f(t0). Letting X = (X1, X2, . . .), we define Gn(x; ω) = P∆∗n ≤ x|X(ω) =

P ∗∆∗n ≤ x(ω) as the conditional distribution function of ∆∗

n given X. We

claim that Gn does not converge in P -probability.

Let us define the process

Zn(h) := n2/3F∗n(t0 + hn−1/3)− F∗n(t0)− fn(t0)hn−1/3

for h ∈ In = [−t0n1/3,∞). Then Zn = Zn,1 + Zn,2, where

Zn,1(h) = n2/3F∗n(t0 + hn−1/3)− F∗n(t0)− Fn(t0 + hn−1/3) + Fn(t0)

and

Zn,2(h) = n2/3Fn(t0 + hn−1/3)− Fn(t0)− fn(t0)hn−1/3.LetW1 andW2 be two independent two-sided standard Brownian motions

on R with W1(0) = W2(0) = 0 and let

Z1(h) := W1(f(t0)h),

Z02(h) := W2(f(t0)h) +

1

2f ′(t0)h2,

Z2 := CMR[Z02]′(0),

Z2(h) := Z02(h)− hZ2,

Z := Z1 + Z2 and

Z1 := CMR[Z1 + Z02]′(0). (7)

11

Note that ∆∗n equals the left derivative at h = 0 of the LCM of Zn. We study

the behavior of the process Zn and then use a continuous mapping type

argument to derive the behavior of ∆∗n. It will be shown that Zn does not

have any weak limit conditional on X in P -probability. But unconditionally,

Zn has a limit distribution, which gives us the unconditional limit distribution

of ∆∗n that is different from the limit distribution of ∆n.

We first state two lemmas without proof, applicable in more general sce-

narios, that will be used later in the paper.

Lemma 3.1 Let X∗n be a bootstrap sample generated from the data Xn. Let

Wn := ψn(Xn) ∈ Rl and W ∗n := ψ∗n(Xn,X∗

n) ∈ Rk where ψn and ψ∗n are

measurable functions; and let Q and Q∗ be distributions on the Borel sets

of Rl and Rk. If the distribution of Wn converges to Q and the conditional

distribution of W ∗n given Xn converges in probability to Q∗, then the joint

distribution of (Wn,W∗n) converges to the product measure Q×Q∗.

Remark: The above lemma can be proved easily using characteristic func-

tions.

Lemma 3.2 Let X∗n be a bootstrap sample generated from the data Xn. Let

Yn := ψn(Xn) and Zn := φn(Xn,X∗n) where ψn and φn are measurable func-

tions; and let Gn and Hn be the conditional distribution functions of Yn +Zn

and Zn respectively. If there are distribution functions G and H for which H

is non-degenerate, L(Gn, G)P→ 0 and L(Hn, H)

P→ 0 then there is a random

variable Y for which YnP→ Y .

Remark: One proof of this lemma rests on the following idea. If nkis any subsequence for which L(Gnk

, G)→0 and L(Hnk, H)→0 w.p.1, then

Y := limn→∞ Ynkexists by the Convergence of Types Theorems (see Loeve

(1962), page 203) and Y does not depend on nk since two subsequences can

be joined. The lemma follows easily.

12

Proposition 3.3 The conditional distribution of Zn,1 given X = (X1, X2

, . . .) converges a.s. to the distribution of Z1. The unconditional distribu-

tion of Zn,2 converges to that of Z2 and the unconditional distribution of Zn

converges to that of Z.

Proof. The conditional convergence of Zn,1 follows by applying Proposi-

tion 2.1 with mn = n, Fn = Fn, Fn,mn = F∗n. Note that as we are conditioning

on X, Fn, and fn are fixed and we can apply the proposition. Condition (b)

in the Proposition is satisfied as n1/3Fn(t0+hn−1/3)−Fn(t0) can be written

as

n1/3F (t0 + hn−1/3)− F (t0) + n1/3(Fn − F )(t0 + hn−1/3)− (Fn − F )(t0)= hf(αn(h)) + rn(h), (8)

where |rn(h)| ≤ 2n1/3 sups∈R |Fn(s) − F (s)| → 0 w.p.1 (P ) by the law of

iterated logarithm (see Theorem 5.1.1 of Csorgo, M., and Revesz, P. (1981)),

and αn(h) is between t0 +hn−1/3 and t0. Thus the conditional distribution of

Zn,1 given X converges to that of Z1 a.s. As a consequence, the unconditional

limit distribution of Zn,1 is the same as that of Z1.

To find the unconditional limit distribution of the process Zn,2 notice that

Zn,2 is a function of the process

Z0n,2(h) = n2/3Fn(t0 + hn−1/3)− Fn(t0)− f(t0)hn−1/3,

which is quite well studied in the literature (see Kim and Pollard (1990)

for more details). For I ⊂ R, define the operator GI : f(h) 7→ f(h) − h ·(CMIf)′(0) for h ∈ I, f : R → R. Observe that Zn,2 is the image of Z0

n,2

under the mapping GIn .

We apply Lemma A.2 with Xn,c = G[−c,c][Z0n,2], Yn = GIn [Z0

n,2], Wc =

G[−c,c][Z02] and Y = GR[Z0

2]. For I compact, it is easy to see that GI : D(I) →D(I) is a continuous map at all points f for which (CMIf) is differentiable

13

at 0, i.e., both left and right derivatives exist and are equal. This shows that

condition (iii) of the lemma is satisfied. Condition (ii) follows from known

facts about the process Z02. Note that for any δ > 0, there exists K > 0 such

that for c > K,

Pρ(Xn,c,Wc) > δ ≤ P

∣∣CM[−c,c][Z0n,2]

′(0)− CMIn [Z0n,2]

′(0)∣∣ >

δ

2K

.

The Assertion in page 217 of Kim and Pollard (1990) can now be used directly

to verify condition (i) of Lemma A.2. Thus we conclude that Zn,2 = Yn =

GIn [Z0n,2] ⇒ GR[Z0

2] = Y = Z2.

Next we show that Zn,1 and Z0n,2 are asymptotically independent, i.e.,

the joint limit distribution of Zn,1 and Z0n,2 is the product of their marginal

limit distributions. For this it suffices to show that (Zn,1(t1), . . . ,Zn,1(tk))

and (Z0n,2(s1), . . . ,Z0

n,2(sl)) are asymptotically independent, for all choices

−∞ < t1 < . . . < tk < ∞ and −∞ < s1 < . . . < sl < ∞. This is an easy

consequence of the Lemma 3.1.

The joint unconditional distribution of (Zn,1,Zn,2) can be expressed as(Zn,1(h)

Zn,2(h)

)=

(Zn,1(h)

GIn [Z0n,2](h)

)⇒

(W1(f(t0)h)

Z02(h)− hZ2

). (9)

As Zn,1 and Z0n,2 are asymptotically independent, the process Zn converges

weakly to Z. ¤

Corollary 3.4 The unconditional distribution of ∆∗n converges to that of

CMR[Z]′(0).

As in the proof of Corollary 2.2, we use the continuous mapping principle

with a localization argument. The details are provided in the Appendix.

The argument can be easily extended to find the joint limiting distribution

of (∆∗n, ∆n) as

(∆∗

n

∆n

)=

(CMIn [Zn,1 + Zn,2]

′(0)

CMIn [Z0n,2]

′(0)

)⇒

(Z1 − Z2

Z2

). (10)

14

Proposition 3.5 Conditional on X, the distribution of Zn does not have a

weak limit in P -probability.

Proof. We use the method of contradiction. Let Zn := Zn,1(h0) and Yn :=

Zn,2(h0) for some fixed h0 > 0 (say h0 = 1) and suppose that the conditional

distribution of Zn + Yn = Zn(h0) converges in probability to the distribution

function G. Observe that the distribution of Zn converges in P -probability to

a normal distribution by Proposition 2.1 which is obviously nondegenerate.

Thus the assumptions of Lemma 3.2 are satisfied and we conclude that YnP→

Y , for some random variable Y . It then follows from the Hewitt-Savage zero-

one law that Y is a constant, say Y = c0 w.p.1. The contradiction arises

since Yn converges in distribution to Z02(h0) − h0Z2 which is not a constant

a.s. ¤

Proposition 3.6 If the conditional distribution function of ∆∗n converges in

P -probability, then CMR[Z]′(0) = Z1 − Z2 must be independent of Z2.

Proof. Let (U, V ) be independent random variables with U having the

distribution of Z1 − Z2 and V distributed like Z2. Since the conditional

limit distribution of ∆∗n, in probability, must be its unconditional limit – the

distribution of U , and the unconditional limit of ∆n is the distribution of

V , an application of Lemma 3.1 with Wn = ∆n, and W ∗n = ∆∗

n, shows that

(∆∗n, ∆n) must converge jointly (unconditionally) to (U, V ). It follows from

Equation (10) that (U, V ) must have the same joint distribution as that of

(Z1 − Z2, Z2), whence Z1 − Z2 and Z2 must be independent. ¤

When combined with simulations and Equation (10), Proposition 3.6

strongly suggests that the conditional distribution of ∆∗n does not converge

in probability. The simulations clearly indicate that Z1 − Z2 and Z2 are not

independent. We have not been able to find a mathematical proof of this.

15

−5 −4 −3 −2 −1 0 1 2 3 4−4

−3

−2

−1

0

1

2

3

4

5

Z2

Z 1 − Z

2

Figure 1: Scatter plot of 10000 random draws of (Z2, Z1−Z2) when f(t0) = 1

and f ′(t0) = −2.

Figure 1 shows the scatter plot of Z1 − Z2 versus Z2 obtained from a

simulation study with 10000 samples. We took f(t0) = 1 and f ′(t0) = −2.

The correlation coefficient obtained is −0.2114 and is highly significant. This

indicates that Z2 and Z1 − Z2 are not independent.

3.2 Bootstrapping from Fn

One obvious problem with drawing the bootstrap samples from the e.d.f.

Fn is that Fn does not have a density. In this subsection we consider boot-

strapping from Fn, the LCM of Fn, which does have a non-increasing density

fn.

Let X∗n,1, X

∗n,2, . . . , X

∗n,n be a bootstrap sample generated from Fn. As be-

fore, we study the process Zn(h) = n2/3F∗n(t0+hn−1/3)−F∗n(t0)−fn(t0)hn−1/3.We claim that ∆∗

n = n1/3f ∗n(t0)− fn(t0), the left derivative at h = 0 of the

LCM of Zn, does not have any weak limit, conditional on X. We show that

Zn does not have any limit distribution conditional on the data. But un-

16

conditionally, Zn has a limit distribution which gives the unconditional limit

distribution of ∆∗n that is different from the weak limit of ∆n, thereby il-

lustrating that the bootstrap procedure is not consistent. We borrow the

notation introduced in Equation (7) except that now

Z2(h) := CMR[Z02](h)− CMR[Z0

2](0)− h · CMR[Z02]′(0).

Theorem 3.7 The following hold.

(i) The conditional distribution of Zn,1, given X, converges almost surely to

the distribution of Z1; the unconditional distribution of Zn,2 converges

to that of Z2; and the unconditional distribution of Zn converges to that

of Z.

(ii) The unconditional distribution of ∆∗n converges to that of CMR[Z]′(0).

(iii) Conditional on X, Zn does not have a weak limit in P -probability.

(iv) If ∆∗n has a weak limit, conditional on X, in P -probability, then Z1−Z2

must be independent of the process Z2 and the random variable Z2.

Proof. The proof of the result runs along similar lines as that of the propo-

sitions and corollaries in the last subsection. Using ideas similar to that in

Equation (5) and the following discussion, the process

Zn,1(h) := n2/3F∗n(t0 + hn−1/3)− F∗n(t0)− Fn(t0 + hn−1/3) + Fn(t0)

converges in distribution to Z1(h) = W1(f(t0)h) conditional on X, a.s. We

express Zn,2 as a function of the process Z0n,2 and apply a continuous mapping

type argument to find its limiting distribution. Note that Zn,2(h) can be

expressed as

n2/3

Fn(t0 + hn−1/3)− Fn(t0)− fn(t0)hn1/3

17

= n2/3

Fn(t0 + hn−1/3)− Fn(t0)− f(t0)hn−1/3

− n2/3

Fn(t0)− Fn(t0)− n1/3h

fn(t0)− f(t0)

= CMIn [Z0n,2](h)− CMIn [Z0

n,2](0)− h · CMIn [Z0n,2]

′(0)

An application of the continuous mapping principle (with a localization argu-

ment) yields the unconditional convergence of Zn,2 ⇒ Z2. The proof of part

(ii) uses similar techniques as that in the proof of Corollary 3.4 and is given

in the appendix. Using Proposition 3.5 we can argue that Zn does not con-

verge to any weak limit, conditional on X, in P -probability. Proposition 3.6

can be employed to complete the proof of (iv) of the theorem. ¤As before, extensive simulations show that Z1 − Z2 and Z2 are not inde-

pendent, which suggests that ∆∗n does not have a conditional weak limit in

probability.

4 Bootstrapping from a smoothed version of

Fn

One of the major reasons for the inconsistency of bootstrap methods dis-

cussed in the previous section is the lack of smoothness of the distribution

from which the bootstrap samples are generated. The e.d.f. Fn does not

have a density, and Fn does not have a differentiable density, whereas F is

assumed to have a nonzero differentiable density at t0. The results from Sec-

tion 2 are directly applied to derive sufficient conditions on the smoothness

of the distribution from which the bootstrap samples are generated.

Theorem 4.1 Suppose that we generate a bootstrap sample X∗n,1, X

∗n,2, . . . ,

X∗n,mn

from a distribution function Fn constructed from the data X1, X2, . . . ,

Xn. Let fn be an estimate of the density of Fn. Let f ∗n be the NPMLE

based on the bootstrap sample. Also suppose that conditions (a)-(e) used in

18

Proposition 2.1 hold a.s. with Fn = Fn and fn = fn. Then the bootstrap

distribution is strongly consistent, i.e., for almost all X, the conditional limit

distribution of ∆∗n = m

1/3n

f ∗n(t0)− fn(t0)

is the same the unconditional

limit distribution of ∆n = n1/3

fn(t0)− f(t0). Equivalently,

supx∈R

|P ∗ ∆∗n ≤ x − P ∆n ≤ x| a.s.−→ 0 (11)

Proof. Conditional on X, Fn and fn are fixed, and we can apply Propo-

sition 2.1 with Fn = Fn and fn = fn to obtain the limit distribution of the

process Zn (defined in Equation (3)). Equation (11) follows directly from

an application of Corollary 2.2 (as the conditions (1)-(3) on the limit pro-

cess Z are satisfied) and Polya’s theorem, noticing that the conditional limit

distribution of ∆∗n is continuous. ¤

As an example, we construct a kernel smoothed version of Fn and show

that it leads to a consistent bootstrap procedure. The usual kernel smoothing

of the Grenander estimator would give rise to a boundary effect at 0, as f is

supported on [0,∞), and might violate the assumption of monotonicity. To

avoid these difficulties, we transform the observations by taking logarithms,

kernel smooth the transformed data points, which are now supported on R,

and back transform the smoothed density to obtain an estimate of f . The

result is

fn(x) :=1

xhn

∫ ∞

0

K

(log u− log x

hn

)fn(u)du

=1

hn

∫ ∞

0

K

(log v

hn

)fn(vx)dv

for x ∈ [0,∞), where hn is the smoothing bandwidth, and K(·) is a symmetric

(around 0) density function on R satisfying the following conditions:

(i) K ′ exists and is bounded on R.

(ii) K ′′ exists and is continuous on R.

19

(iii)∫∞−∞ |K(i)(u)|max1, euεdu < ∞ for some ε > 0, and i = 0, 1, 2.

It is easy to see that fn is a non-increasing density function supported on

[0,∞). We generate bootstrap samples from Fn, the distribution function

having density fn. To simplify notation, let Khn(u, x) := 1xhn

K(

log u−log xhn

).

The following display gives an alternative expression for fn which directly

follows from integration by parts and noticing that limu→∞ Kh(u, x) = 0 for

every x ∈ (0,∞), hn > 0,

fn(x) =

∫ ∞

0

Khn(u, x)fn(u)du = −∫ ∞

0

∂

∂u[Khn(u, x)] Fn(u)du.

The next theorem shows the consistency of the bootstrap procedure when

generating n data points X∗n,1, X

∗n,2, . . . , X

∗n,n from Fn.

Theorem 4.2 Assume that hn → 0 and h2n(n/ log log n)1/2 →∞ as n →∞.

Then the bootstrap method is strongly consistent, i.e., Equation (11) holds

with ∆∗n = n1/3

f ∗n(t0)− fn(t0)

.

Proof. Let F∗n be the e.d.f. of X∗n,1, X

∗n,2, . . . , X

∗n,n. We define Zn(z) :=

n2/3F∗n(t0 + zn−1/3)− F∗n(t0)− fn(t0)zn−1/3 for z ∈ [−t0n

1/3,∞]. We show

that the conditions (a)-(e) hold a.s. and use Theorem 4.1 to get the desired

result.

As before, let Zn(z) = Zn,1(z) + Zn,2(z), where Zn,1(z) = n2/3[F∗n(t0 +

zn−1/3)− F∗n(t0) − Fn(t0 + zn−1/3)− Fn(t0)], and Zn,2(z) = n2/3[Fn(t0 +

zn−1/3)− Fn(t0) − fn(t0)zn−1/3].

As a first step, we establish (c), i.e., Zn,2(z)a.s.→ z2

2f ′(t0) uniformly on

compacta. Fix a compact set [−M, M ] ⊂ R. As Fn is twice continuously

differentiable, we can use Taylor expansion to simplify Zn,2(z) to z2

2f ′n(tn(z))

where tn(z) is an intermediate point between t0 and t0 + zn−1/3. We now

show that f ′n(tn(z))a.s.→ f ′(t0) uniformly for z ∈ [−M, M ]. Towards this end,

let us define

fn(x) =

∫ ∞

0

Khn(u, x)f(u)du = −∫ ∞

0

∂

∂u[Khn(u, x)] F (u)du; (12)

20

fn is just a smoothed version of the original density function f . We first

show that f ′n(t) − f ′n(t)a.s.→ 0 uniformly on [t0 − δ, t0 + δ] where δ > 0 is

such that t0 − δ > 0 and f is continuously differentiable in the interval. For

t ∈ [t0 − δ, t0 + δ],

|f ′n(t)− f ′n(t)| =

∣∣∣∣∫ ∞

0

∂2

∂t∂u[Khn(u, t)]

Fn(u)− F (u)

∣∣∣∣ du

≤∫ ∞

0

∣∣∣∣∂2

∂t∂u[Khn(u, t)]

∣∣∣∣∣∣∣Fn(u)− F (u)

∣∣∣ du

≤ Dn

∫ ∞

0

∣∣∣∣∂2

∂t∂u[Khn(u, t)]

∣∣∣∣ du where Dn = ‖Fn − F‖∞

=Dn

h2n

h2

n

∫ ∞

0

∣∣∣∣∂2

∂t∂u[Khn(u, t)]

∣∣∣∣ du

a.s.→ 0 (13)

uniformly as h2n(n/ log log n)1/2 →∞ ((n/ log log n)1/2Dn = O(1) w.p.1 from

Theorem 7.2.1 in Robertson, Wright and Dykstra (1988)) and the fact that

h2n

∫∞0| ∂2

∂t∂u[Khn(u, t)]|du is uniformly bounded (as a consequence of assump-

tion (iii) about the kernel K). To show that f ′n(t) → f(t) uniformly on

I := [t0 − δ/2, t0 + δ/2], we express

fn(t) =

∫

Cn

K(v)f(tevhn)evhndv +

∫

Ic

Khn(u, t)f(u)du

where Cn :=[

log (t0−δ/2)−log thn

, log (t0+δ/2)−log thn

]. On differentiating and some

simplification we have

|f ′n(t)− f ′(t)| ≤∫

Cn

K(v)e2vhn∣∣f ′(tevhn)− f ′(t)

∣∣ dv +

∫

Ic

∣∣∣∣∂

∂tKhn(u, t)

∣∣∣∣ f(u)du + |f ′(t)|∣∣∣∣∫

Cn

K(v)e2vhndv − 1

∣∣∣∣ . (14)

By uniform continuity of f ′ on [t0 − δ, t0 + δ], the first term can be made

uniformly small. It is easy to see that the third term goes to zero. The

second term can be shown to vanish by using properties (i) and (iii) about the

21

kernel and an application of Cauchy-Schwarz inequality. From Equations (13)

and (14) we see that Z2,n(z)a.s.→ z2

2f ′(t0) uniformly on [−M, M ]. Notice that,

∫ ∞

0

|fn(t)− fn(t)|dt ≤∫ ∞

0

∫ ∞

0

Khn(u, t)|fn(u)− f(u)|du dt

=

∫ ∞

0

|fn(u)− f(u)|∫ ∞

0

Khn(u, t)dt du =

∫ ∞

0

|fn(u)− f(u)|du → 0 a.s.

by interchanging the order of integration (and noticing that the inner integral

evaluates to 1) and using Theorem 8.3 of Devroye (1987). Also note that

fn(t) → f(t) for all t > 0, by an application of the dominated convergence

theorem. By Scheffes theorem,∫∞

0|fn(t) − f(t)|dt → 0. Thus, we conclude

that ∫ ∞

0

|fn(t)− f(t)|dt → 0 a.s.

Therefore, Fn converges uniformly on (0,∞) to F a.s., which shows that (a)

holds. Also as F has a continuous density f , fn(t) → f(t) a.s. for every t > 0

by the lemma in page 330 of Robertson, Wright and Dykstra (1988). As fn’s

are monotonically decreasing functions converging pointwise to a continuous

f , the convergence is uniform on the compact neighborhood [t0 − δ, t0 + δ].

Now, to show that condition (b) holds, for z ∈ [−M, M ], we use a one term

Taylor series expansion to bound

|n1/3Fn(t0 + zn−1/3)− Fn(t0) − zf(t0)|≤ M

max

|s|≤Mn−1/3|fn(t0 + s)− f(t0 + s)|+ max

|s|≤Mn−1/3|f(t0 + s)− f(t0)|

which converges to 0 a.s. by the above discussion and the continuity of

f . A similar argument also shows that (e) holds, with the O(m−1/3n ) term

identically 0.

To prove condition (d), let ε > 0 be given. We use a two term Taylor

expansion to bound the right-hand side of (d) as

|Fn(t0 + β)− Fn(t0)− fn(t0)β − 1

2β2f ′(t0)|

22

≤ 1

2β2 max

|s|≤|β||f ′n(t0 + s)− f ′(t0)|

≤ 1

2β2

max|s|≤|β|

|f ′n(t0 + s)− f ′(t0 + s)|+ max|s|≤|β|

|f ′(t0 + s)− f ′(t0)|

≤ εβ2 + o(β2).

The last inequality follows from the uniform convergence of f ′n(s) to f ′(s)

in a neighborhood of t0 (which is proved in Equations (13) and (14)) and

the continuity of f ′ at t0, by choosing a sufficiently large n and a sufficiently

small neighborhood for β around 0. ¤

5 m out of n Bootstrap

In Section 3 we showed that the two most intuitive methods of bootstrapping

are inconsistent. In this section we show that the corresponding m out of n

bootstrap procedures are weakly consistent. The following theorem considers

generating bootstrap samples X∗n,1, X

∗n,2, . . . , X

∗n,mn

from Fn, where mn is

strictly less than n. The quantity of interest is ∆∗n = m

1/3n

f ∗mn

(t0)− fn(t0)

.

Theorem 5.1 If mn = o(n) then the bootstrap procedure is weakly consis-

tent, i.e.,

supx∈R

|P ∗ ∆∗n ≤ x − P ∆n ≤ x| P−→ 0. (15)

Proof. We verify conditions (a)-(e) (with some modification) as in Theo-

rem 4.1 with Fn = Fn and fn = fn to establish the desired result. Conditions

(a), (b) and (e) hold a.s. and are easy to establish.

Fix a compact set [−M, M ] ⊂ R. We show that (c) holds in probability,

i.e., Zn,2(z)P→ z2

2f ′(t0) uniformly on [−M, M ]. Towards this end, we simplify

Zn,2(z), for z ∈ [−M, M ], in the following way

m2/3n

Fn(t0 + zm−1/3

n )− Fn(t0)−m1/3

n zfn(t0)

23

= m2/3n

(Fn − F )(t0 + zm−1/3

n )− (Fn − F )(t0)

+

m1/3

n zf(t0) +z2

2f ′(t0 + αn(z))

−m1/3

n zfn(t0)

where αn(z) is between t0 and t0 + zm−1/3n

= oP (1)−m1/3n z

fn(t0)− f(t0)

+

z2

2f ′(t0 + αn(z))

P→ z2

2f ′(t0) as n →∞ (16)

as supz∈[−M,M ]

∣∣∣(Fn − F )(t0 + zm−1/3n )− (Fn − F )(t0)

∣∣∣ = OP (n−1/2m−1/6n ) =

oP (m−2/3n ) .

To verify condition (d), let ε > 0 be given. By Equation (23) we can

choose a small enough neighborhood of 0 for β and n large so that the

righthand-side of (d) can be bounded by oP (m−2/3n ) + εβ2 + o(β2).

Given any subsequence nk ⊂ N, there exists a further subsequence

nkl such that conditions (c) and (d) hold a.s. and Theorem 4.1 is ap-

plicable. Thus Equation (11) holds for the subsequence nkl which proves

Equation (15). ¤The next theorem shows that the m out of n bootstrap method is also

weakly consistent when we generate bootstrap samples from Fn. We will as-

sume slightly stronger conditions on F , namely, conditions (a)-(d) mentioned

in Theorem 7.2.3 of Robertson, Wright and Dykstra (1988).

Theorem 5.2 If mn = O(n(log n)−3/2) then Equation (15) holds.

Proof. The proof is similar to that of Theorem 5.1. We only show that

condition (c) holds. Letting z ∈ [−M, M ] ⊂ R, we add and subtract the

term m2/3n

Fn(t0 + zm

−1/3n )− Fn(t0)

from

Zn,2(z) = m2/3n

Fn(t0 + zm−1/3

n )− Fn(t0)−m1/3

n zfn(t0)

and then use the following result due to Kiefer and Wolfowitz (1976)∣∣∣

Fn(t0 + zm−1/3n )− Fn(t0)

−

Fn(t0 + zm−1/3n )− Fn(t0)

∣∣∣

24

≤ 2‖Fn − Fn‖ = oP (n−2/3 log n) = oP (m2/3n ).

This, coupled with the convergence of

m2/3n

Fn(t0 + zm−1/3

n )− Fn(t0)− zm1/3

n fn(t0)P→ z2

2f ′(t0)

uniformly on [−M, M ] (see Equation (16)) establishes (c). ¤

6 Discussion

We worked with the Grenander estimator as a prototypical example of cube-

root asymptotics, but believe that our results have broader implications for

the (in)-consistency of the bootstrap methods in problems with an n1/3 con-

vergence rate. We consider in this connection the work of Abrevaya and

Huang (2005).

The setup is similar to that of Kim and Pollard (1990), where a general

M-estimation framework is considered. For mathematical simplicity, we use

the same notation as in Abrevaya and Huang (2005). Let Wn := rn(θn − θ0)

and Wn := rn(θn − θn) be the sample and bootstrap statistic of interest. In

our case rn = n1/3, θ0 = f(t0), θn = fn(t0) and θn = f ∗n(t0). By Theorem 2

of Abrevaya and Huang (2005),

Wn ⇒ arg max Z(t)− arg max Z(t)

conditional on the original sample, in P∞-probability, where Z(t) = −12t′V t+

W (t) and Z(t) = −12t′V t + W (t) + W (t), W and W are two independent

Gaussian processes, both with continuous sample paths and mean zero (see

Abrevaya and Huang (2005) for more details). We also know that Wn ⇒arg max Z(t). An application of Lemma 3.1 with Wn and Wn, shows that

arg max Z(t) and arg max Z(t)− arg max Z(t) should be independent. Now,

if we specialize to cube-root asymptotics, we can take Z(t) = W (t)− t2 and

25

Z(t) = W (t) + W (t) − t2, where W (t) and W (t) are two independent two

sided standard Brownian motions on R with W (0) = W (0) = 0. There is

abundant numerical evidence to suggest that arg max Z(t) and arg max Z(t)−arg max Z(t) are not independent in this situation, contradicting Abrevaya

and Huang’s claim.

Section 4 of Abrevaya and Huang (2005) gives a method for correcting the

bootstrap confidence interval. In light of the above discussion the construc-

tion of asymptotically correct bootstrap confidence intervals in this situation

is a suspect.

In case of the Grenander estimator, the LCM of the e.d.f. is another

obvious choice for generating the bootstrap samples, as it is a concave dis-

tribution function. It is probably more natural to expect that bootstrapping

from the LCM of the e.d.f. would work, as it has a well-defined probability

density, while the e.d.f. does not have a density. But this bootstrap proce-

dure is also inconsistent, and we claim that the bootstrap statistic does not

have any conditional weak limit, in probability.

We have derived sufficient conditions for the consistency of bootstrap

methods for this problem. Using these conditions we have shown the strong

consistency of a smoothed version of bootstrap, and weak consistency of the

m out of n bootstrap procedure when generating bootstrap samples from Fn

and Fn.

A Appendix section

We will use the following lemma to prove Corollary 2.2.

Lemma A.1 Let Ψ : R → R be a function such that Ψ(h) ≤ M for all

h ∈ R, for some M > 0, and

lim|h|→∞

Ψ(h)

|h| = −∞. (17)

26

Then there exists c0 > 0 such that for any c ≥ c0, CMR[Ψ](h) = CM[−c,c][Ψ](h)

for all |h| ≤ 1.

Proof. Note that for any c > 0, CMR[Ψ](h) ≥ CM[−c,c][Ψ](h) for all

h ∈ [−c, c]. Let us define Φc : R → R such that Φc(h) = CM[−c,c][Ψ](h)

for h ∈ [−1, 1], and Φc is the linear extension of CM[−c,c][Ψ]∣∣[−1,1] outside

[−1, 1].

We will show that there exists c0 > 2 such that Φc0 ≥ Ψ. Then Φc0 will

be a concave function everywhere greater than Ψ, and thus Φc0 ≥ CMR[Ψ].

Hence, CMR[Ψ](h) ≤ Φc0(h) = CM[−c0,c0][Ψ](h) for h ∈ [−1, 1], yielding the

desired result.

For any c > 2, let Φc(h) = ac + Φ′c(1)h for h ≥ 1. Using the min-max

formula, we can bound Φ′c(1) as

Φ′c(1) = min

−c≤s≤1max1≤t≤c

Ψ(t)−Ψ(s)

t− s≥ min

−c≤s≤1

Ψ(2)−Ψ(s)

2− s

≥ min−c≤s≤1

Ψ(2)−M

2− s= Ψ(2)−M =: B0 ≤ 0.

We can also bound ac by using the inequality Ψ(1) ≤ Φc(1) = ac + Φ′c(1).

Thus for h ≥ 1,

Φc(h) = ac + Φ′c(1)h ≥ Ψ(1)− Φ′

c(1)+ Φ′c(1)h

≥ Ψ(1) + (h− 1)B0 ≥ −K1h (18)

for some suitably chosen K1 > 0.

Similarly, for any c > 2, let Φc(h) = bc + Φ′c(−1)h for h ≤ −1. We can

bound Φ′c(−1) as

Φ′c(−1) = min

−c≤s≤−1max−1≤t≤c

Ψ(t)−Ψ(s)

t− s≤ max

−1≤t≤c

Ψ(t)−Ψ(−2)

t + 2

≤ max−1≤t≤c

M −Ψ(−2)

t + 2= M −Ψ(−2) =: B1 ≥ 0.

27

We can also bound bc by noticing that Ψ(−1) ≤ Φc(−1) = bc−Φ′c(−1). Thus

for h ≤ −1,

Φc(h) = bc + Φ′c(−1)h ≥ Ψ(−1) + Φ′

c(−1)(h + 1)

≥ Ψ(−1) + B1+ hB1 ≥ K2h = −K2|h| (19)

for some suitably chosen K2 > 0. Note that K1 and K2 do not depend on the

choice of c. Given K = maxK1, K2, there exists c0 > 2 such that Ψ(h) ≤−K|h| for all |h| ≥ c0 from Equation (17). But from Equations (18) and (19)

Φc0(h) ≥ −K|h| for all |h| ≥ 1. Combining, we get Ψ(h) ≤ −K|h| ≤ Φc0(h)

for all |h| ≥ c0 > 1. Further, we know that Φc0(h) ≥ CM[−c0,c0][Ψ](h) ≥ Ψ(h)

for |h| ≤ c0. Thus we have been able to show that there exists c0 > 2 such

that Φc0 ≥ Ψ. ¤We will use the following easily verified fact (see Pollard (1984), page 70).

Lemma A.2 If Xn,c, Yn, Wc and Y are sets of random elements tak-

ing values in a metric space (X ,d), n = 0, 1, . . ., and c ∈ R such that for any

δ > 0,

(i) limc→∞ lim supn→∞ Pd(Xn,c, Yn) > δ = 0,

(ii) limc→∞ Pd(Wc, Y ) > δ = 0,

(iii) Xn,c ⇒ Wc as n →∞ for every c ∈ R.

Then Yn ⇒ Y as n →∞.

Proof of Corollary 2.2. For the proof of the corollary we appeal to

Lemma A.2. We take Xn,c = m1/3n fn,mn,c(t0) − fn(t0) where fn,mn,c(t0) is

the slope at t0 of the LCM of Fn,mn restricted to [t0 − cm−1/3n , t0 + cm

−1/3n ],

and Yn = m1/3n fn,mn(t0) − fn(t0). Let us denote by Cn,c the LCM of the

restriction of Zn to [−c, c]. Also, we take Wc as the left-hand slope at 0 of

28

Cc, the LCM of the restriction of Z to [−c, c], and Y as the slope at 0 of C,

the LCM of Z.

Note that as Xn,c = C′n,c(0) = CM[−c,c][Z](0), an application of the usual

continuous mapping theorem (see lemma on page 330 of Robertson, Wright

and Dykstra (1988)) and the uniform convergence of Zn to Z on [−c, c] with

condition (3) of the corollary yields Xn,c ⇒ Wc = C′c(0), for every c. This

shows that condition (iii) of the lemma holds.

To verify condition (ii) of the lemma we will make use of Lemma A.1.

For a.e. ω, let c0(ω) be the smallest positive integer such that for any c ≥ c0,

CMR[Z](h) = CM[−c,c][Z](h) for all |h| ≤ 1. Note that such a c0 exists and

is finite w.p.1. Then the event Wc 6= Y ⊂ co > c and thus for any δ > 0,

Pd(Wc, Y ) > δ ≤ Pco > c → 0 as c →∞.

Next we show that condition (i) holds and apply Lemma A.2 to conclude

that Yn converges to Y , thereby completing the proof of the corollary. The

following series of claims are adopted from the assertion in page 217 of Kim

and Pollard (1990).

Claim 1. Condition (i) of Lemma A.2 follows if we can show the existence

of random variables τn and σn of order OP (1) such that τn < 0 ≤ σn

and Cn(τn) = Zn(τn) and Cn(σn) = Zn(σn).

Proof of Claim 1. Let ε > 0 be given. As τn and σn are of

order OP (1), we can get Mε > 0 such that lim supn→∞ PAε < ε, where

Aε = τn < −Mε, σn > Mε. Take ω ∈ Acε. Then −Mε ≤ τn(ω) < 0 and

0 ≤ σn(ω) ≤ Mε. Note that

Zn(τn(ω)) ≤ Cn,c(τn(ω)) ≤ Cn(τn(ω)) and

Zn(σn(ω)) ≤ Cn,c(σn(ω)) ≤ Cn(σn(ω)) (20)

for c > Mε. From the given condition in the claim we have equality in

Equation (20) and by using a property (noted as as remark below) of concave

majorants it follows that Cn,c(h)(ω) = Cn(h)(ω) for all h ∈ [τn, σn].

29

Remark. Let [a, b] ⊂ B ⊂ R and suppose that CM[a,b](g)(x1) =

CMB(g)(x1) and CM[a,b](g)(x2) = CMB(g)(x2), for x1 < x2 in [a, b]. Then

CM[a,b](g)(t) = CMB(g)(t) for all t in [x1, x2].

Thus, Xn,c(ω) = Yn(ω). Therefore, Acε ⊂ Xn,c = Yn which implies that

for any δ > 0, lim supn→∞ Pd(Xn,c, Yn) > δ ≤ lim supn→∞ PAε < ε, for

c > Mε. ¤Therefore it suffices to show that we can construct random variables τn

and σn of order OP (1) so that Cn(τn) = Zn(τn) and Cn(σn) = Zn(σn) for

τn < 0 ≤ σn.

Claim 2. There exist random variables τn and σn of order OP (1)

such that τn < 0, σn ≥ 0 and Cn(τn) = Zn(τn) and Cn(σn) = Zn(σn).

Proof of Claim 2. Let Kn denote the LCM of Fn,mn . The line through

(t0, Kn(t0)) with slope fn,mn(t0) must lie above Fn,mn touching it at the two

points t0 − Ln and t0 + Rn, where Ln > 0 and Rn ≥ 0. Note that t0 − Ln

and t0 + Rn are the nearest points to t0 such that Kn and Fn,mn coincide.

The line segment from (t0 − Ln,Fn,mn(t0 − Ln)) to (t0 + Rn,Fn,mn(t0 + Rn))

makes up part of Kn. It will suffice to show that Ln = OP (m−1/3n ), as then

τn := −m1/3n Ln = OP (1). The argument depends on the inequality

Kn(t0) + fn,mn(t0)β ≥ Fn,mn(t0 + β) for all β,

with equality at β = −Ln and β = Rn.

Let Γn(β) = Fn,mn(t0 + β) − Fn,mn(t0) − βfn,mn(t0). Γn is the distance

between Fn,mn(t0 + β) and Fn,mn(t0) + βfn,mn(t0). It follows that Γn(β)

achieves its maximum at β = −Ln and β = Rn and Γn(−Ln) = Γn(Rn). We

can easily show using condition (a) that Ln, Rn and γn := fn,mn(t0)− fn(t0)

are of order oP (1). That lets us argue locally. Let

gn(y, β) := 1y ≤ t0 + β − 1y ≤ t0 − fn(t0)β.

30

Claim 3. For any ε > 0, we have

1

mn

∣∣∣∣∣mn∑i=1

gn(Xn,i, β)− Egn(Xn,i, β)∣∣∣∣∣ ≤ εβ2 + OP (m−2/3

n )

uniformly over β in a neighborhood of zero.

For the time being, we assume Claim 3, which is proved later in the Ap-

pendix. From condition (d), |Egn(·, β)− 12β2f ′(t0)| ≤ εβ2 +o(β2)+O(m

−2/3n )

for sufficiently large n. Thus

|Γn(β) + βγn − 1

2β2f ′(t0)|

= |Fn,mn(t0 + β)− Fn,mn(t0)− βfn(t0)− 1

2β2f ′(t0)|

≤ 2εβ2 + o(β2) + OP (m−2/3n ) (21)

uniformly for β in a neighborhood of 0 by Claim 3 and the triangle inequality.

As f ′(t0) < 0, for n → ∞, there exist constants c1, c2 > 0 such that, with

probability tending to 1, for β in a small neighborhood of 0,

−1

2c2β

2 − βγn −OP (m−2/3n ) ≤ Γn(β) ≤ −1

2c1β

2 − βγn + OP (m−2/3n ).

The quadratic −12c1β

2−βγn assumes its maximum of 12γ2

n/c1 at −γn/c1, and

takes negative values for those β with the same sign of γn. It follows that

with probability tending to 1,

maxβ

Γn(β) = min(Γn(−Ln), Γn(Rn)) ≤ OP (m−2/3n ).

We also have maxβ

Γn(β) ≥ Γn(−γn/c2) ≥ 1

2γ2

n/c2 −OP (m−2/3n ).

These two bounds imply that γn = OP (m−1/3n ). With this rate for conver-

gence for γn we can now deduce from the inequalities

0 = Γn(0) ≤ Γn(−Ln) ≤ 1

2c1(Ln − γn/c1)

2 +1

2γ2

n/c1 + OP (m−2/3n )

31

that Ln = OP (m−1/3n ), as required. Similarly, we can show that Rn =

OP (m−1/3n ).

Proof of Claim 3. Let us define Gn(β) as

1

mn

mn∑i=1

gn(Xn,i, β)− Egn(·, β) = (Fn,mn − Fn)(t0 + β)− (Fn,mn − Fn)(t0).

We will show that |Gn(β)| ≤ εβ2 + m−2/3n M2

n uniformly over a neighborhood

of 0, for Mn of order OP (1). We fix a neighborhood [−b, b] for β obtained

from condition (e). We define Mn(ω) as the infimum (possibly +∞) of those

values for which the asserted uniform inequality holds. Let us define A(n, j)

to be the set of those β in [−b, b] for which (j − 1)m−1/3n ≤ |β| < jm

−1/3n .

Then for m constant,

PMn > m ≤ P∃β ∈ [−b, b] : |Gn(β)| > εβ2 + m−2/3n m2

≤∑

j:jm−1/3n ≤b

P∃β ∈ A(n, j) : m2/3n |Gn(β)| > ε(j − 1)2 + m2

≤∑

j:jm−1/3n ≤b

E(sup|β|<jm

−1/3n

m4/3n |Gn(β)|2

)

ε(j − 1)2 + m22

≤∑

j:jm−1/3n ≤b

C ′jε(j − 1)2 + m22

(22)

for mn sufficiently large. The last inequality follows from a maximal in-

equality as in part (ii) of Result 3.1 of Kim and Pollard (1990) and using

condition (e). To be more precise, fix j ≥ 1 such that jm−1/3n ≤ b and let

F := hβ : |β| < jm−1/3n be a collection of functions where hβ(x) = 1x ≤

t0+β−1x ≤ t0. Note that F is a class of functions with envelope function

H(x) = 1x ≤ t0+jm−1/3n −1x ≤ t0−jm

−1/3n . From the maximal inequal-

ity in 3.1 of Kim and Pollard (1990) we can bound m4/3n E(supF |Gn(β)|2) by

J2(1)m1/3n Fn(t0 + jm−1/3

n )− Fn(t0 − jm−1/3n ) ≤ C ′j

32

for n sufficiently large, by adding and subtracting Fn(t0) and using condi-

tions (e), where J is a continuous and increasing function with J(0) = 0 and

J(1) < ∞, not depending on n and C is a constant. We can therefore ensure

that the sum in Equation (22) is suitably small for large mn by choosing m

large enough. This proves the claim. ¤

Proof of Corollary 3.4. To prove the corollary we appeal to Lemma A.2

by establishing conditions (i)-(iii) (in the lemma) with Xn,c = CM[−c,c][Zn]′(0),

Yn = CMIn [Zn]′(0), Wc = CM[−c,c][Z]′(0) and Y = CMR[Z]′(0). Note that

the process Z satisfies conditions (1)-(3) of Corollary 2.2 and so condition

(ii) of the lemma holds. An application of the continuous mapping theorem

and the uniform convergence of Zn to Z on [−c, c] yields condition (iii).

If we can show that (τn, σn), defined as in the proof of Claim 2 of Corol-

lary 2.2 with mn = n, Fn = Fn, and Fn,mn = F∗n, are of order OP (1), then

using Claim 1 in the proof of Corollary 2.2 we can establish condition (i).

But this step requires a bit of work. Although the argument is similar to that

of the proof of Claim 2 of Corollary 2.2, there are some subtle differences.

Note that here we want to study the unconditional behavior of (τn, σn), and

so Fn = Fn cannot be treated as fixed.

As a first step we show that slightly modified versions of conditions (a),

(d) and (e), to be used later in the proof, are satisfied. Condition (a) trivially

holds a.s. Condition (e) also holds a.s. and can be verified using Equation (8).

Note that the neighborhood for β around 0 in condition (e) can be chosen

to be a fixed interval a.s. (not depending on X, but possibly on F ). Let

ε > 0 be given. We show that condition (d) holds with O(m−2/3n ) replaced

by OP (n−2/3). The term of interest can be grouped as

∣∣∣∣Fn(t0 + β)− Fn(t0)− βfn(t0)− 1

2β2f ′(t0)

∣∣∣∣≤ |(Fn − F )(t0 + β)− (Fn − F )(t0)|+ |β|

∣∣∣fn(t0)− f(t0)∣∣∣

33

+

∣∣∣∣F (t0 + β)− F (t0)− βf(t0)− 1

2β2f ′(t0)

∣∣∣∣≤ εβ2 + o(β2) + OP (n−2/3). (23)

The first term can be bounded byOP (n−2/3) + 1

2εβ2

uniformly for β in a

small neighborhood of 0, using Claim 3 in the proof of Corollary 2.2 with

Fn,mn = Fn and Fn = F . The second term |β|∣∣∣fn(t0)− f(t0)

∣∣∣ can be bounded

by

1

2εβ2 +

1

2ε

∣∣∣fn(t0)− f(t0)∣∣∣2

=1

2εβ2 + OP (n−2/3). (24)

By Taylor expansion it is easy to see that the third term is of order o(β2).

Next we define Ln, Rn and γn as in Claim 2. It is easy to show that

Ln, Rn and γn are of order oP (1), using condition (a). The main crux of the

argument in the proof of Claim 2 of Corollary 2.2 is establishing Equation (21)

uniformly for β in a neighborhood of 0. We show that Equation (21) still

holds unconditionally in our context, thereby yielding (τn, σn) = OP (1), from

the discussion succeeding the equation. Observe that,∣∣∣∣Γn(β) + βγn − 1

2β2f ′(t0)

∣∣∣∣ =

∣∣∣∣F∗n(t0 + β)− F∗n(t0)− βfn(t0)− 1

2β2f ′(t0)

∣∣∣∣

can be bounded by the sum of∣∣∣Fn(t0 + β)− Fn(t0)− βfn(t0)− 1

2β2f ′(t0)

∣∣∣and |(F∗n − Fn)(t0 + β)− (F∗n − Fn)(t0)|. Equation (23) is employed to bound

the first term, whereas the following result

|(F∗n − Fn)(t0 + β)− (F∗n − Fn)(t0)| ≤ εβ2 + OP (n−2/3)

bounds the second. Combining, we have∣∣∣∣Γn(β) + βγn − 1

2β2f ′(t0)

∣∣∣∣ ≤ 2εβ2 + o(β2) + OP (n−2/3)

for β in a neighborhood of 0. Note that an application of the maximal

inequality as in the proof of Claim 3, conditional on X, gives us the bound

|(F∗n − Fn)(t0 + β)− (F∗n − Fn)(t0)| ≤ εβ2 + Tn

34

uniformly for β in a neighborhood of 0, not depending on X, where Tn =

OP ∗(n−2/3) a.s. From the following series of inequalities it follows that Tn =

OP (n−2/3). Suppose that Sn is a sequence of random variables that are

OP ∗(1) a.s., i.e,

limT→∞

lim supn→∞

P ∗|Sn| ≥ T → 0 a.s., then

limT→∞

lim supn→∞

P|Sn| ≥ T = limT→∞

lim supn→∞

E[P ∗|Sn| ≥ T]

≤ limT→∞

E

[lim sup

n→∞P ∗|Sn| ≥ T

]= E

[lim

T→∞lim sup

n→∞P ∗|Sn| ≥ T

]= 0

by an application of Fatou’s lemma and the dominated convergence theorem.

¤

Proof of Theorem 3.7 (iii). We use Lemma A.2 to prove the result.

Note that here (τn, σn) are defined as in the proof of Claim 2 of Corollary 2.2

with mn = n, Fn = Fn, and Fn,mn = F∗n. The proof is very similar to

that of Corollary 3.4. We only need to show that (τn, σn) are of order OP (1).

Conditions (a) and (e) hold a.s. It is enough to show that condition (d) holds

with the O(m−2/3n ) term replaced by OP (n−2/3), with probability increasing

to 1; as then Equation (21) holds, and from the discussion succeeding the

equation it follows that (τn, σn) are of order OP (1).

Let ε > 0 be given. Without loss of generality we can assume that

f ′(t0) < −4ε. It is enough to show that

∣∣∣∣Fn(t0 + β)− Fn(t0)− βf(t0)− 1

2β2f ′(t0)

∣∣∣∣ ≤ 2εβ2 + OP (n−2/3)(25)

uniformly in a neighborhood of 0, as we can bound the left hand-side of (d)

by∣∣∣Fn(t0 + β)− Fn(t0)− βf(t0)− 1

2β2f ′(t0)

∣∣∣ +∣∣∣Fn(t0)− Fn(t0)

∣∣∣ + |β||fn(t0)

−f(t0)|, where the second term is OP (n−2/3) (by Theorem 1 of Wang (1994))

and the third term can be bounded by εβ2 + OP (n−2/3) (see Equation (24)).

35

Given ε, there exists a neighborhood of 0 for β such that∣∣∣∣F (t0 + β)− F (t0)− βf(t0)− 1

2β2f ′(t0)

∣∣∣∣ ≤ εβ2

by the twice differentiability of F at t0. Thus, there exists δ > 0, such that∣∣∣∣Fn(t0 + β)− Fn(t0)− βf(t0)− 1

2β2f ′(t0)

∣∣∣∣ ≤ 2εβ2 + OP (n−2/3)(26)

uniformly for β ∈ [−2δ, 2δ], by the discussion following Equation (23). There-

fore, for β ∈ [−2δ, 2δ],

Fn(t0 + β)− Fn(t0) ≤ 2εβ2 + βf(t0) +1

2β2f ′(t0) + OP (n−2/3) (27)

Letting F δn be the LCM of the restriction of Fn on [−2δ, 2δ], we have,

F δn(t0 + β)− Fn(t0) ≤ 2εβ2 + βf(t0) +

β2

2f ′(t0) + OP (n−2/3)

for β ∈ [−2δ, 2δ], by taking concave majorants on both sides of Equation (27)

and realizing that the OP (n−2/3) is uniform in β. Since Fn ≥ Fn, it is

immediate from Equation (26) that

Fn(t0 + β)− Fn(t0) ≥ −2εβ2 + βf(t0) +β2

2f ′(t0)−OP (n−2/3).

Letting

An :=

F δn(t0 + β) = Fn(t0 + β) for all β ∈ [−δ, δ]

it is easy to show from the strict concavity of F around t0 that limn→∞ PAn =

1 (for a complete proof of this see Proposition 6.1 of Wang and Woodroofe

(2007)). Thus Equation (25) holds with probability tending to 1 on [−δ, δ].

This completes the argument. ¤

References

[1] Abrevaya, J. and Huang, J. (2005). On the Bootstrap of the Maxi-

mum Score Estimator. Econometrica, 73 1175–1204.

36

[2] Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J.,

Rogers,W. H. and Tukey, J. W. (1972). Robust Estimates of Lo-

cation. Princeton Univ. Press, Princeton, N.J.

[3] Bickel, P. and Freedman, D. (1981). Some Asymptotic Theory for

the Bootstrap. Ann. Statis., 9, 1196–1217.

[4] Breiman, L. (1968). Probability. Addison-Wesley Series in Statistics.

[5] Brunk, H. D. (1968). Estimation of Isotonic Regression. Nonparametric

Techniques in Statistical Inference, 177–195. Cambridge Univ. Press.

[6] Chernoff, H.(1964). Estimation of the mode. Ann. Inst. Statis. Math.,

16 31–41.

[7] Csorgo, M. and Revesz, P. (1981). Strong Approximations in Prob-

ability and Statistics. Academic Press, New York-Akademiai Kiado, Bu-

dapest.

[8] Devroye, L. (1987). A course in Density Estimation. Birkhauser,

Boston.

[9] Grenander, U. (1956). On the theory of mortality measurement, Part

II. Skand. Akt., 39 125–153.

[10] Kiefer, J. and Wolfowitz, J. (1976). Asymptotically minimax esti-

mation of concave and convex distribution functions. Z. Wahrsch. Verw.

Gebiete., 34, 73-85.

[11] Kim, J. and Pollard, D. (1990). Cube-root Asymptotics. Ann.

Statis., 18 191–219.

[12] Komlos, J., Major, P. and Tusnady, G. (1975). An approximation

of partial sums of independent RV’s and the sample DF.I. Z. Wahrsch.

Verw. Gebiete., 32 111-131.

37

[13] Kosorok, M. (2007). Bootstrapping the Grenander estimator. Beyond

Parametrics in Interdisciplinary Research: Festschrift in honour of Pro-

fessor Pranab K. Sen. IMS Lecture Notes and Monograph Series. Eds.:

N. Balakrishnan, E. Pena and M. Silvapulle.

[14] Lee, S. M. S. and Pun, M. C. (2006). On m out of n Bootstrapping for

Nonstandard M-Estimation With Nuisance Parameters. J. Amer. Statis.

Assoc., 101 1185–1197.

[15] Leger, C. and MacGibbon, B. (2006). On the bootstrap in cube root

asymptotics. Can. J. of Statis., 34 29–44.

[16] Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling.

Springer-Verlag, New York.

[17] Pollard. D. (1984). Convergence of Stochastic

Processes. Springer-Verlag, New York. Available at

http://www.stat.yale.edu/∼pollard/1984book/pollard1984.pdf

[18] Prakasa Rao, B. L. S. (1969). Estimation of a unimodal density.

Sankhya Ser. A, 31 23–36.

[19] Robertson,T., Wright, F. T. and Dykstra, R.L. (1988). Order

restricted statistical inference. Wiley, New York.

[20] Rousseeuw, P. J. (1984). Least median of squares regression. J. Amer.

Statis. Assoc., 79 871–880.

[21] Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-

Verlag, New York.

[22] Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes

with Applications to Statistics. Wiley, New York.

38

[23] Singh, K. (1981). On asymptotic accuracy of Efron’s bootstrap. Ann.

Statist., 9 1187–1195.

[24] Wang, X. and Woodroofe, M. (2007). A Kiefer Wolfowitz Compar-

ison Theorem for Wicksell’s Problem, Ann. Statis., 35 1559–1575.

[25] Wang, Y. (1994). The limit distribution of the concave majorant of an

empirical distribution function. Statist. Probab. Lett. 20 81-84.

39

Documents

Inconsistency of Bootstrap: the Grenander Estimatordept.stat.lsa.umich.edu/~moulib/Grenboots.pdf · estimate, smoothed bootstrap. 1 Introduction Suppose that we observe i.i.d. random