Inference for moments of ratios with robustness against ... Workshops/Se… · Inference for moments of ratios with robustness against large trimming bias and ... trimming bias and

Inference for moments of ratios withrobustness against large trimming bias and

unknown convergence rate∗

Yuya SasakiDepartment of Economics, Vanderbilt University

andTakuya Ura

Department of Economics, University of California, Davis

October 26, 2017

Abstract

We consider statistical inference for moments of the form E[B/A]. A naıve samplemean is unstable with small values of the denominator variable, A. This paperdevelops a method of robust inference, and proposes a data-driven practical choice oftrimming observations with small values of A. Our sense of the robustness is twofold.First, bias correction allows for robustness against large trimming bias. Second,adaptive inference allows for robustness against unknown convergence rate. Theproposed method allows for closer-to-optimal trimming, and thus more informativeinference results in practice. Simulation studies and real data analysis demonstratethis practical advantage.

Keywords: bias correction, ratio, robust inference, trimming, unknown convergence rate.

∗We benefited from useful comments by Federico A. Bugni, V. Joseph Hotz, Kengo Kato, Ulrich Muller,Liang Peng, Adam M. Rosen, Yichong Zhang, and seminar participants at Duke. All remaining errors areours.

1

arX

iv:1

709.

0098

1v3

[st

at.M

E]

25

Oct

201

7

1 Introduction

Moments of ratios of the form E[B/A] are ubiquitous in empirical research. Summary

tables often report statistics of ratios of random variables – see for example the well cited

paper by Rajan and Zingales (1998) among others. In addition, there are a number of

specific research methods in which moments of ratios are quantities of direct interest – in-

cluding inverse probability weighting (Horvitz and Thompson, 1952), binary choice models

with special regressor (Lewbel, 1997), and correlated random coefficient panel data models

(Graham and Powell, 2012), to list a few. When some observations have values of the

denominator A close to zero, they behave as outliers in terms of the ratio, B/A, and thus

can exercise large influences on the naıve sample mean statistics.

To avoid this problem, researchers often trim away those observations with small values

of the denominator variable A. For example, the well-cited paper by Crump, Hotz, Imbens

and Mitnik (2009) proposes to trim away observations with the denominator less than 0.1

for estimating average treatment effects. Trimming indeed mitigates the potentially large

variance, but it does so at the cost of increased bias in general. Furthermore, trimmed

estimators of moments of ratios are known to be associated with an unknown convergence

rate.1 Ideally, we want a method of inference to be robust against these two issues, namely

trimming bias and unknown convergence rate. In this paper, we develop such a method of

inference for moments of ratios with the twofold robustness.

In order to achieve the first sense of robustness, i.e., against large trimming bias, we need

to carefully choose a trimming threshold and to appropriately remove the trimming bias to

the extent where the bias no longer affects the center of the asymptotic distribution relative

to the variance. We develop a method of bias-corrected inference by explicitly estimating

1A large body of the literature discusses asymptotic distribution theories for trimmed sums – see Griffin

and Pruitt (1987); Csorgo, Haeusler, and Mason (1988); Griffin and Pruitt (1989) and references therein.

2

and removing the trimming bias to the necessary extent, and accordingly propose a practical

and systematic choice of trimming to meet the theoretical requirements. We are far from the

first to take this approach, and this idea is motivated by Cattaneo, Crump, and Jansson

(2010, 2014), Calonico, Cattaneo and Titiunik (2014), Calonico, Cattaneo, and Farrell

(2017), and in particular, Graham and Powell (2012) for the similarity in terms of the

irregularly identified estimand and trimmed estimation.

The second sense of robustness we aim to achieve is against an unknown convergence

rate. The asymptotic variance of trimmed estimators for moments of ratios converges at

the parametric√n rate in “regular” cases, whereas its convergence rate can be as slow

as the nonparametric rates in “irregular” cases – see Khan and Tamer (2010). Inference

should be robustly valid even without a prior knowledge of a researcher about whether the

case is regular or irregular. In order to accommodate this issue, we take advantage of the

rate-adaptive inference method based with self-normalized processes (cf. Pena, Lai, and

Shao, 2009), and extend it with the aforementioned bias correction approach to acquire the

twofold robustness.

Romano and Wolf (2000), Peng (2001, 2004) and Chaudhuri and Hill (2016) discuss

inference for the mean without finite second moments as we do in this paper. In particular,

our approach of rate-adaptive inference in conjunction with trimming bias correction is

closely related to that of Peng (2001) and Chaudhuri and Hill (2016). In fact, by using

the information of the ratio structure, our method complements and adds practical values

to the preceding idea of Peng (2001) and Chaudhuri and Hill (2016) in a few dimensions.

First, we introduce a data-driven selection of trimming threshold in a systematic way and

consistently with the asymptotic theory. Second, our method circumvents the need to pre-

estimate the tail index. Third, our framework allows for use of larger trimming thresholds

which enables faster convergence rates. We emphasize that we actively use the information

3

of the ratio structure, and a direct comparison of advantages and disadvantages between

our framework and this heavy tail literature is not straightforward. The bias correction

approach based on the local polynomial expression of the bias near ‘zero’ (as opposed to

infinity) was made feasible with our approach to trimming based on the denominator. This

aspect of our method is crucial for these two practical contributions.

Notations: E[X] and V ar(X) denote the expected value and the variance of random

variable X, respectively. Their sample counterparts are denoted by En[X] = n−1∑n

i=1Xi

and V arn(X) = En[X2]− En[X]2. The convergence in probability and the convergence in

distribution are denoted by →p and →d, respectively. 1{·} denotes the indicator function.

Outline of the paper: The rest of this paper is organized as follows. In Section 2,

we present main results of our method of inference with the twofold robustness. In Section

3, we provide a practical recipe for estimation and inference. In Section 4, we conduct

simulation studies to evaluate the performance of our method. In Section 5, we apply our

method to real data. All mathematical proofs are delegated to Section A in the appendix.

2 Main Results

Given an i.i.d. sample of random vector (A,B), consider the problem of estimating

β = E

[B

A

]. (2.1)

This estimand β exists under part (i) of the following assumption on finite moments.

Assumption 1. (i) E[∣∣BA

∣∣] <∞. (ii) E [B4] <∞. (iii) V ar(BA

)6= 0.

Part (ii) states that the possibility of infinite V ar(B/A) is imputed to small A rather

than heavy-tailed distribution of B. Part (iii) assumes away the trivial case where B/A

4

is degenerate. For simplicity of writing, we consider the case where A is supported in the

half line R+, although this restriction is not crucial for the substance of the main results.

Because the integrand may take large values if A is close to zero, it is natural to consider

the regularized estimator defined by

βh = En

[B

A· 1{A > h}

](2.2)

with a trimming threshold value h > 0. The idea behind this estimator is to trim away

those observations that have very small values of A in the denominator of the estimand.

For a fixed trimming threshold h, the mean of the regularized estimator is

βh = E

[B

A· 1{A > h}

], (2.3)

which exists under Assumption 1 (i). The difference, β − βh, is the bias of the regularized

estimator βh for the true estimand β, which we will hereafter refer to as a trimming bias.

The order of this trimming bias depends on specific applications, but it may well be as

slow as the linear order of h in many plausible applications.

The main difficulty lies in that a naıve bias estimate, En[BA· 1{A ≤ h}

], may entail

infinite variance. We take advantage of the fact that the bias term, E[BA· 1{A ≤ h}

], can

be approximated by estimable objects with bounded variances. Graham and Powell (2012,

p. 2125) suggests a similar idea in the context of correlated random coefficient panel data

models. To the best of our knowledge, however, this idea has not been formally established

for robust inference. Moreover, it is unclear whether this idea is generically applicable

beyond the framework of Graham and Powell (2012).

Before proceeding with our formal results, we briefly describe the intuition behind

our approach. Assumption 1 (i) and suitable regularity conditions imply E [B | A = 0] ·

fA(0) = 0, because we can write E[BA

]=∫∞0

E[B|A=a]·fA(a)a

da. Thus, the bias term,

5

E[BA· 1{A ≤ h}

], can be approximated as

E

[B

A· 1{A ≤ h}

]=

∫ h

0

E [B | A = a] · fA(a)

ada

=

∫ h

0

E [B | A = a] · fA(a)− E [B | A = 0] · fA(0)

ada ≈ h · d

daE [B | A = a] · fA(a)

∣∣∣∣a=0

.

Furthermore, the derivative ddaE [B | A = a] · fA(a)

∣∣a=0

in the last expression can be esti-

mated by the derivative of the numerator of the Nadaraya-Watson estimator whose statis-

tical properties are well studied – see Ullah and Vinod (1993, p. 94) and references therein.

For robust inference, we adjust the asymptotic variance by incorporating the variability

of the approximate bias estimator following the idea of Calonico, Cattaneo and Titiunik

(2014).

2.1 Bias Correction

This subsection discusses means to mitigate the trimming bias for the purpose of conducting

valid inference. The bias reduction is based on the continuous distribution of A and the

smoothness of its density fA as well as the smoothness of the conditional expectation

function E[B|A = · ] as concretely suggested by Assumptions 2 and 3 below.

Assumption 2 (Continuous Distribution of A). The distribution of A is absolutely con-

tinuous in a neighborhood of 0 with its density function denoted by fA.

Under this assumption, we introduce the short-hand notation

τj(a) = fA(a) · E[Bj | A = a] (2.4)

for j ∈ N. It is argued below that part (i) of the following assumption allows for bias

correction up to the order of hk for an integer k > 1.

6

Assumption 3 (Smoothness). (i) τ1 is k-times continuously differentiable with a bounded

k-th derivative in a neighborhood of 0 for an integer k > 1. (ii) τ2, τ3, and τ4 are continu-

ously differentiable with bounded first derivatives in a neighborhood of 0.

Lemma 2.1 (Bias Correction). Suppose that Assumptions 1 (i), 2, and 3 (i) are satisfied.

For an integer k > 1 provided in Assumption 3 (i),

β − βh = P k−1h +O(hk)

holds as h→ 0, where P k−1h is defined by

P k−1h =

k−1∑κ=1

hκ

κ! · κ· τ (κ)1 (0). (2.5)

This lemma shows that the trimming bias, β−βh, of the estimator (2.2) can be decom-

posed into an estimable part P k−1h and a remaining bias of order hk. Hence, if the estimable

part P k−1h can be estimated with the bias of order hk, then substituting such an estimate

P k−1h for P k−1

h can correct the trimming bias of the estimator (2.2) up to the order of hk.

In other words, we suggest a bias-corrected estimator

βh + P k−1h

with any bias estimator P k−1h satisfying the following property.

E[P k−1h ] = P k−1

h +O(hk).

Section 2.2 presents a concrete suggestion of such an estimator.

2.2 Bias Estimation

From (2.5) in the statement of Lemma 2.1, it is a natural idea to develop a bias estimator

P k−1h based on an estimator of τ

(κ)1 , κ ∈ {1, ..., k − 1}. The κ-th derivative of the function

7

τ1 defined in (2.4) can be estimated by the local derivative estimator

τ(κ)1 (0) = En

[(−1)κ ·Bhκ+1

·K(κ)

(A

h

)], (2.6)

where K denotes a kernel function satisfying the following properties.

Assumption 4 (Kernel). (i) K has the support (0, 1). (ii)∫ 1

0K(u)du = 1. (iii) K is

k-times continuously differentiable with a bounded k-th derivative for an integer k > 1.

Following (2.5), we consider the bias estimator defined by

P k−1h =

k−1∑κ=1

θκ · hκ

κ· τ (κ)1 (0), (2.7)

where the weights {θκ}k−1κ=1 are chosen to satisfy

κ1

k−1∑κ=1

(−1)κ · θκκ

·∫ 1

0

uκ1K(κ) (u) du = 1 for each κ1 = 1, . . . , k − 1. (2.8)

Note that such weights {θκ}k−1κ=1 implied by (2.8) are generally different from the weights

1κ!

for the population bias P k−1h given in (2.5). This discrepancy results because the bias

estimator {τκ1 (0)}k−1κ=1 in (2.6) itself has a bias for the population counterpart {τκ1 (0)}k−1κ=1,

and we need to correct for these biases of the bias estimator as well as the trimming bias.

With a suitable choice of weights and a kernel function following the above prescription,

the asymptotic order of the bias with the proposed bias correction can be nicely controlled

as the following first main theorem of this paper states.

Theorem 2.1 (Bias Correction). If Assumptions 1 (i), 2, 3 (i), and 4 are satisfied, then

β − E[βh

]− E

[P k−1h

]= O(hk)

as h→ 0.

8

2.3 Rate-Adaptive Inference

The previous subsection focuses on the asymptotic bias of the bias corrected estimator. The

current subsection in turn focuses on the stochastic part. At the end of this subsection, we

combine the “bias” part and the “stochastic”part to develop an asymptotic distribution

result for robust inference.

Making the dependence of the bandwidth h on the sample size n explicit by hn, we

introduce the random variable

Zn =B

A· 1{A > hn}+

k−1∑κ=1

(−1)κ · θκ ·Bhn · κ

K(κ)

(A

hn

). (2.9)

Note that (2.2), (2.6), (2.7), and (2.9) yield

En[Zn] = βhn + P k−1hn

,

i.e., the sample mean of Zn is the bias corrected estimator. We also introduce the notation

Sn = E [Z2n]− E [Zn]2 for the variance of Zn, and its analog estimate

Sn = En[Z2n

]− En [Zn]2 .

We use the following assumption for the asymptotic normality result.

Lemma 2.2 (Asymptotic Normality). If Assumptions 1, 3 (ii), and 4 are satisfied, then

n1/2S−1/2n (En[Zn]− E[Zn])→d N(0, 1)

for nh2n →∞ as n→∞.

Recall that Assumptions 1 (i), 2, 3 (i), and 4 are used to obtain the “bias” part in

Theorem 2.1. On the other hand, Assumption 1, 3 (ii), and 4 are used to obtain the

stochastic or “stochastic” part in Lemma 2.2. Combining Theorem 2.1 and Lemma 2.2

together, we state the second main result of this paper below.

9

Theorem 2.2 (Asymptotic Normality). If Assumptions 1, 2, 3, and 4 are satisfied, then

n1/2S−1/2n

(βh + P k−1

h − β)→d N(0, 1)

for nh2n →∞ and nh2kn → 0 as n→∞.

2.4 Invariance in Convergence Rate

As in Khan and Tamer (2010), the convergence rate of the regularized estimator (2.2) is

unknown in general. As such, it may be unclear at first glance about which one of the

regularized estimator, βh, and the bias estimator, P k−1h , dominates in the yet unknown

variance of the combined estimator βh + P k−1h . In this section, we claim that adding the

bias estimator will not slow the convergence rate compared with that of the regularized

estimator without bias correction.

Theorem 2.3 (Invariant Convergence Rate). If Assumptions 1 (i), (iii), 2, 3, and 4 are

satisfied, then

V ar(βhn + P k−1hn

)/V ar(βhn) = O(1)

for hn → 0 as n→∞.

The conclusion of this theorem ensures that we do not worsen the convergence rate by

adding the bias correction terms. Therefore, the cost of bias correction and robust inference

is only the augmented variance, and not a slowed convergence rate.

3 Guide for Practice

We propose a practical recipe for the case of the order k = 3 and a choice of bandwidth hn =

O(n−1/5) based on the mean square error minimization with respect to the second-order

10

bias. These choices are consistent with the rate assumptions stated in the general theory

both under regular and irregular cases. Of course, a researcher could make alternative

choices of k and bandwidth as far as the rate requirements in the general theory are met.

3.1 Procedural Recipe

Step 1: Bandwidth Choice We choose the bandwidth h∗n by minimizing

h4

16En

[B2

h3preK(2)

(A

hpre

)]2+ n−1V arn

(B

A· 1{A > h} − B

h · (1−K(1))·K(1)

(A

h

))with respect to h, where the preliminary bandwidth hpre used for a preliminary second-

order bias estimation is set to max{Ai}ni=1 for a global estimation.2

Step 2: Bias Corrected Estimation Once the bandwidth h∗n is obtained, the bias-

corrected estimate is given by

βh∗n + P 2h∗n

= En

[B

A· 1{A > h∗n} −

θ1B

h∗nK(1)

(A

h∗n

)+θ2B

2h∗nK(2)

(A

h∗n

)]following (2.2), (2.6), and (2.7), where θ1 and θ2 are given following (2.8) by θ1

θ2

=

−2∫ 1

0uK(1)(u)du

∫ 1

0uK(2)(u)du

−2∫ 1

0u2K(1)(u)du

∫ 1

0u2K(2)(u)du

−1 2

1

.

Step 3: Variance Estimation The variance of βh∗n + P 2h∗n

is approximated by

n−1V arn

(B

A· 1{A > h∗n} −

θ1B

h∗nK(1)

(A

h∗n

)+θ2B

2h∗nK(2)

(A

h∗n

))following the result in Section 2.3, where θ1 and θ2 are the same as those given in Step 2.

2This choice of the preliminary bandwidth for the global estimation is analogous to the recommendation

by Fan and Gijbels (1996, p. 198) to use a global preliminary parametric estimation for a plug-in optimal

bandwidth selection in the context of local polynomial estimation of nonparametric functions.

11

3.2 Remark on Consistency between Recipe and Theory

The bandwidth choice suggested in Step 1 of Section 3.1 induces asymptotically negligi-

ble bias relative to the variance when it is used with a bias-corrected inference based on

k = 3 as in Steps 2 and 3. Specifically, if the identification is regular in the sense that

V ar(BA· 1{A > h} − θ1·B

h·K(1)

(Ah

))∼ 1 as h→ 0, then we have h∗n ∼ n−1/4. On the other

hand, if the identification is regular in the sense that V ar(BA· 1{A > h} − θ1·B

h·K(1)

(Ah

))∼

h−1 as h→ 0, then we have the standard nonparametric MSE-optimal rate h∗n ∼ n−1/5. In

both of these two cases, we can see that the rate requirements nh2n →∞ and nh6n → 0 as

n→∞ given in Theorem 2.2 for k = 3 are satisfied by hn = h∗n.

4 Simulation Studies

This section presents simulation studies for evaluating the performance of our method

based on βh∗n + P 2h∗n

and its asymptotic distribution, in comparison with the naıve sample

mean estimator, En[B/A], and the regularized estimator, βh∗n , without bias correction. We

consider the following simulation designs. The denominator variable A and the numerator

variable B are independently generated according to the following laws.

A = (X/ν)1/2 where X ∼ χ2(ν), ν > 1.

B = Z + µ where Z ∼ N(0, 1), µ ∈ R.

The ratio, B/A, thus follows the (non-central) t distribution. The first moment E[B/A]

exists if and only if ν > 1. The second moment E[(B/A)2] exists if and only if ν > 2. The

smaller ν is, the more observations are there with the denominator variable A close to 0.

We also remark that the regularized estimator has the regular√n rate of convergence when

ν = 3, whereas it converges at a rate slower than√n when ν = 2. Because a researcher does

12

not know the true data generating process, this fact necessitates the method of inference

to be robust against unknown convergence rate as in our method.

We rely on the assumption that the first moment, E[B/A], exists, and thus ν > 1 is

maintained. Our method does not assume the existence of the second moment, E[(B/A)2],

and hence ν > 2 is not necessary. Therefore, the theory predicts that inference will work

with our method even when ν = 2. On other hand, inference based on the naıve sample

mean estimator En[B/A] is not guaranteed to work well when ν = 2 because of the infinite

variance. This is one point to check with simulations studies.

The bias of the regularized estimator, βh, can be analytically shown to be equal to

µ · E[1{A≤h}

A

]under the above data generating process. Therefore, there is no bias when

µ = 0. This means that inference works correctly by coincidence even with the regularized

estimator, βh∗n , without bias correction when µ = 0 is true. However, as µ deviates away

from 0, inference will start to work poorly without bias correction, whereas our method

based on, βh∗n + P 2h∗n

, with its appropriate asymptotic distribution is supposed to work

robustly under µ 6= 0 as well. This is another point to check with simulations studies.

For each combination of the data generating parameters, (µ, ν,N), we run 10,000 iter-

ations of data generation, estimation, and inference. Tables 1 and 2 summarize simulation

results under ν = 2 and 3, respectively. In each table, the top two row groups, the middle

two row groups, and the bottom two row groups use the location parameters µ = −1, 0,

and 1, respectively. Simulation statistics, including mean, bias, standard deviation, root

mean square error, and 95% coverage frequency, are reported for each of (i) the method

based on the sample mean, En[B/A], with its estimated standard error; (ii) the method

based on the regularized estimator, βh∗n , with its rate-adaptive standard error3; and (iii)

3We remark that simulation results for the regularized estimator, βh∗n, without bias correction are shown

only for the sake of illustrating the efficacy of bias correction. The bandwidth, h∗n, used for our inference

13

our robust method based on the bias-corrected estimator, βh∗n + P 2h∗n

, with its rate-adaptive

standard error implemented following practical guideline of Section 3 and the biweight

kernel function. We make the following three observations in these results.

First, recall that inference based on (i) is not guaranteed to work when ν = 2, whereas

our theory guarantees that inference based on (iii) will work even in this case. Table 1

demonstrates that this is the case indeed. When µ = 0, all the three methods, (i), (ii), and

(iii), work well in terms of the 95% coverage frequency. On the other hand, µ 6= 0 forces (i)

and (ii) to have size distortions, while (iii) continues to have the 95% coverage frequencies

close to the nominal level. Second, recall also that ν = 3 allows for the second moment

E[(B/A)2] to exist, and hence method (i) is not necessarily doomed to fail entirely. Table

2 demonstrates that this is the case indeed. For all values of µ ∈ {−1, 0, 1}, both methods

(i) and (iii) work well in terms of the 95% coverage frequency. However, as a researcher

may not know whether the second moment exists given empirical data, we recommend that

method (iii) instead of method (i) be used for the robustness of inference against unknown

alternative cases of the existence of moments. Finally, recall that µ = 0 allows inference

based on (ii) to work, whereas µ 6= 0 will not. Both of Tables 1 and 2 demonstrate

that this is the case indeed. These results suggest that bias correction and accordingly

adjusted asymptotic variance be used for robustness in inference against alternative data

generating processes with unknown magnitudes of potential biases. In other words, we

recommend that method (iii) instead of method (ii) be used for the robustness. In summary,

simulation results demonstrate that method (iii) is the only method to produce precise

coverage frequencies robustly across alternative data generating processes.

is larger than those required for a valid inference without bias correction, and hence our simulation results

are not necessarily overturning the theoretical properties of previous papers on adaptive inference without

bias correction.

14

µ ν N E[B/A] Method Mean Bias SD RMSE 95% Cover

-1 2 250 -1.772 (i) En[B/A] -1.771 0.001 0.409 0.409 0.905

(ii) βh∗n -0.447 1.325 0.105 1.329 0.000

(iii) βh∗n + P 2h∗n

-1.761 0.011 0.205 0.205 0.948

-1 2 500 -1.772 (i) En[B/A] -1.772 0.001 0.231 0.231 0.909

(ii) βh∗n -0.545 1.227 0.091 1.230 0.000


-1.769 0.004 0.148 0.148 0.957

0 2 250 0.000 (i) En[B/A] 0.002 0.002 0.238 0.238 0.957

(ii) βh∗n 0.000 0.000 0.027 0.027 0.948


0.000 0.000 0.152 0.152 0.943

0 2 500 0.000 (i) En[B/A] 0.001 0.001 0.178 0.178 0.955

(ii) βh∗n 0.000 0.000 0.023 0.023 0.947


0.000 0.000 0.110 0.110 0.945

1 2 250 1.772 (i) En[B/A] 1.770 -0.002 0.297 0.297 0.904

(ii) βh∗n 0.448 -1.324 0.106 1.328 0.000


1.762 -0.011 0.203 0.204 0.951

1 2 500 1.772 (i) En[B/A] 1.772 -0.001 0.239 0.239 0.914

(ii) βh∗n 0.546 -1.227 0.091 1.230 0.000


1.768 -0.005 0.147 0.147 0.956

Table 1: Simulation results under ν = 2 for three methods: (i) the method based on the

sample mean, En[B/A], and its asymptotic normal distribution; (ii) the method based on

the regularized estimator, βh∗n , with its rate-adaptive standard error; and (iii) our robust

method based on the bias-corrected estimator, βh∗n + P 2h∗n

, with its rate-adaptive standard

error. Displayed statistics are mean (Mean), bias (Bias), standard deviation (SD), root

mean square error (RMSE), and 95% coverage frequency (95% Cover).

15

µ ν N E[B/A] Method Mean Bias SD RMSE 95% Cover

-1 3 250 -1.382 (i) En[B/A] -1.382 0.000 0.128 0.128 0.940

(ii) βh∗n -0.618 0.764 0.168 0.782 0.002


-1.422 -0.040 0.195 0.199 0.930

-1 3 500 -1.382 (i) En[B/A] -1.381 0.001 0.091 0.091 0.946

(ii) βh∗n -0.776 0.606 0.123 0.619 0.000


-1.394 -0.012 0.132 0.132 0.943

0 3 250 0.000 (i) En[B/A] 0.000 0.000 0.110 0.110 0.952

(ii) βh∗n 0.000 0.000 0.033 0.033 0.944


0.000 0.000 0.145 0.145 0.940

0 3 500 0.000 (i) En[B/A] 0.000 0.000 0.077 0.077 0.948

(ii) βh∗n 0.000 0.000 0.030 0.030 0.947


0.001 0.001 0.102 0.102 0.945

1 3 250 1.382 (i) En[B/A] 1.382 0.001 0.129 0.129 0.940

(ii) βh∗n 0.614 -0.768 0.168 0.786 0.001


1.421 0.039 0.195 0.199 0.933

1 3 500 1.382 (i) En[B/A] 1.383 0.001 0.090 0.090 0.948

(ii) βh∗n 0.776 -0.606 0.124 0.618 0.000


1.395 0.013 0.132 0.133 0.946

Table 2: Simulation results under ν = 3 for three methods: (i) the method based on the

sample mean, En[B/A], and its asymptotic normal distribution; (ii) the method based on

the regularized estimator, βh∗n , with its rate-adaptive standard error; and (iii) our robust

method based on the bias-corrected estimator, βh∗n + P 2h∗n

, with its rate-adaptive standard

error. Displayed statistics are mean (Mean), bias (Bias), standard deviation (SD), root

mean square error (RMSE), and 95% coverage frequency (95% Cover).

16

5 Real Data Analysis

We illustrate our robust inference method with an empirical analysis of the dependence

of firms on external finance in the U.S. manufacturing industry for years 2000–2014. The

external dependence for a firm is defined as a ratio of external borrowing to capital ex-

penditures. It indicates extents to which manufacturing firms need external finance. This

ratio has been used in the economic growth literature – for example, Rajan and Zingales

(1998) uses the external dependence to identify which industrial sectors can benefit more

from financial development.

We use the Compustat database (Fundamentals Annual) for years 2000–2014. We focus

on manufacturing firms excluding the food industry, i.e., Standard Industry Classification

(SIC) Code ranging between 2100–3999. Following Rajan and Zingales (1998), we construct

a proxy for the firm’s dependence via the ratio of external borrowing to capital expenditures.

Specifically, the variable A represent the capital expenditures (CAPX), and the variable B

represents the capital expenditures (CAPX) minus cash flows from operations (FOPT).4

We focus on the U.S. firms. The observations with missing A or B are dropped. The

samples consists of sizes n = 2008, 1844, 1771, 1699, 1645, 1583, 1481, 1386, 1289, 1214,

1179, 1173, 1233, 1223, and 1146 for years 2000–2014, respectively.

Figure 1 displays results. The shades indicate 95% and 99% confidence intervals for the

mean/median of B/A on the vertical axis plotted against years 2000–2014 on the horizontal

axis. The top left graph shows the result for the mean E[B/A] using method (i) En[B/A],

i.e., the naıve sample mean. The top right graph shows the result for the median Med[B/A].

The bottom left graph shows the result for the mean E[B/A] using method (ii) βh∗n , i.e.,

the trimmed estimate without bias correction. The bottom right graph shows the result

4As in Rajan and Zingales (1998), for the firms with the format code 7 (SCF=7), we cannot observe

FOPT and thus impute the sum of IBC, DPC, TXDC, ESUBC, SPPIV, and FOPO instead.

17

Figure 1: 95% and 99% confidence intervals for the mean/median of the external financial

dependence in the U.S. manufacturing industry for years 2000–2014. The top left graph

shows the result for the mean E[B/A] using (i) En[B/A], the top right graph shows the

result for the median Med[B/A], the bottom left graph shows the result for the mean

E[B/A] using (ii) βh∗n , and the bottom right graph shows the result for the mean E[B/A]

using (iii) βh∗n + P 2h∗n

. 18

for the mean E[B/A] using method (iii) βh∗n + P 2h∗n

, i.e., our proposed method.

Not surprisingly, the naıve method (i) yields huge confidence intervals, and does not

provide informative inference results. The median is robust against outliers, and it indeed

provides more informative inference results. Likewise, method (ii) shows that the mean

is negative throughout, but it is not significantly different from zero at the 0.01 level for

all years. On the other hand, our robust method (iii) shows significantly negative mean

external financial dependence for all years, and its drop in level right after the 2007–2008

financial crisis is striking. These observations characterize qualitative differences among

the alternative methods.

6 Discussion

This paper proposes a new method of inference for moments of ratios of the form E[B/A].

The main purpose of this method is to deal with a number of practical concerns in a theo-

retically coherent way. Our method entail the following two practical values in particular.

First, our bias correction framework allows for a use of larger trimming thresholds, e.g.,

h ∝ n−1/5, which admit faster convergence rates. This feature proves useful in practice,

as evidenced in our simulation studies (Section 4) and real data analysis (Section 5). Fur-

thermore, our bias correction approach admits a systematic method of choosing trimming

thresholds in practice. By balancing the variance and the bias of a unit order less than the

order of supposed smoothness, we obtain the data-driven trimming rule that is consistent

with valid inference while achieving a close-to-optimal convergence rate.

Second, the rate-adaptive method of inference based on the self-normalized sum allows

for valid inference without a prior knowledge about the unknown convergence rate. This

feature is useful as it eliminates the need for a practitioner to pre-estimate a parameter that

19

determines the convergence rate, such as the curvature parameter of the density function

in a neighborhood of A. As such, the practical procedure that we propose (Section 3.1)

indeed consists of very simple steps, and its computational cost is actually minimal.

In summary, the combination of the rate-adaptive method and the trimming bias cor-

rection accounting for the asymptotic variance of the bias estimator as well allows for the

twofold robustness in inference for the moment E[B/A], namely against large trimming

bias and unknown convergence rate. We believe that this paper will to contribute to em-

pirical practice by providing this robustly valid inference procedure with the user-friendly

and data-driven implementation procedure.

A Appendix: Proofs

A.1 Proof of Lemma 2.1

Proof. Under Assumptions 1 (i), 2, and 3 (i), we can write the bias term as

β − βh =

∫ h

0

fA(a) · E[B|A = a]

ada =

∫ h

0

τ1(a)− τ1(0)

ada

where the first equality is due to the law of iterated expectations, and the second equality

follows from Assumption 3 (i). By the k-order mean value expansion under Assumption 3

(i), we can write the difference in the last expression as

τ1(a)− τ1(0) =k−1∑κ=1

aκ

κ!τ(κ)1 (0) + ak ·Rk(αk(a))

with αk(a) ∈ (0, a), where the remainder function Rk given by Rk(a) = 1k!τ(k)1 (a) is uni-

formly bounded in absolute value on [0, η] for some small η > 0 by Assumption 3 (i).

20

Combining the above results together, we can now write the bias term as

β − βh =k−1∑κ=1

1

κ!τ(κ)1 (0)

∫ h

0

aκ−1da+

∫ h

0

ak−1Rk(αk(a)

)da

=k−1∑κ=1

1

κ!

1

κhκτ

(κ)1 (0) +

∫ h

0

ak−1Rk(αk(a)

)da

where the remaining bias is bounded in absolute value as∣∣∣∣∫ h

0

ak−1Rk(αk(a)

)da

∣∣∣∣ ≤ ∫ h

0

ak−1da supa∈(0,h)

∣∣Rk(αk(a)

)∣∣ =1

khk sup

a∈(0,h)

∣∣Rk(αk(a)

)∣∣ .Finally, noting that supa∈[0,η]

∣∣Rk(a)∣∣ <∞ proves the claim for h ≤ η.

A.2 Proof of Theorem 2.1

Proof. First, by the definition of τ(κ)1 (0) given in (2.6), we can write

E[P k−1h ] = E

[k−1∑κ=1

θκhκ

κτ(κ)1 (0)

]=

k−1∑κ=1

θκhκ

κE[τ(κ)1 (0)

]=

k−1∑κ=1

(−1)κ

κhθκE

[BK(κ)

(A

h

)].

(A.1)

By Assumptions 2 and 4 together with the definition of τ1 given in (2.4), the last expression

in (A.1) may be rewritten as

k−1∑κ=1

(−1)κ

κhθκE

[BK(κ)

(A

h

)]=

k−1∑κ=1

(−1)κ

κhθκ

∫ h

0

τ1(a)K(κ)(ah

)da (A.2)

From the proof of Lemma 2.1, τ1(a) = τ1(a)− τ1(0) =∑k−1

κ=1aκ

κ!τ(κ)1 (0) + ak ·Rk(αk(a)) with

αk(a) ∈ (0, a), where the remainder function Rk given by Rk(a) = 1k!τ(k)1 (a) is uniformly

bounded in absolute value on [0, η] for some small η > 0 by Assumption 3 (i). Substituting

21

this mean value expansion in the last expression in (A.2) yields

k−1∑κ=1

(−1)κ

κhθκ

∫ h

0

τ1(a)K(κ)(ah

)da

=k−1∑κ1=1

1

κ1!

1

κ1hκ1τ

(κ1)1 (0)κ1

k−1∑κ=1

θκ(−1)κ

κ

∫ 1

0

uκ1K(κ) (u) du (A.3)

+ hkk−1∑κ=1

θκ(−1)κ

κ

∫ 1

0

Rk(αk(uh))ukK(κ) (u) du, (A.4)

where the last equality is due to changes of variables. The expression in line (A.3) reduces

tok−1∑κ1=1

1

κ1!

1

κ1hκ1τ

(κ1)1 (0)κ1

k−1∑κ=1

θκ(−1)κ

κ

∫ 1

0

uκ1K(κ) (u) du = P k−1h (A.5)

by the definition of P k−1h and the choice of {θκ}k−1κ=1 to satisfy (2.8). To see the asymptotic

behavior of line (A.4), note that∣∣∣∣∣k−1∑κ=1

θκ(−1)κ

κ

∫ 1

0

Rk(αk(uh))ukK(κ) (u) du

∣∣∣∣∣ ≤k−1∑κ=1

|θκ|κ

∫ 1

0

uk∣∣K(κ) (u)

∣∣ du supa∈(0,h)

∣∣Rk(αk(a)

)∣∣ ,where

∫ 1

0uk∣∣K(κ) (u)

∣∣ du <∞ for each κ ∈ {1, ..., k−1} by Assumption 4, and supa∈(0,h)∣∣Rk

(αk(a)

)∣∣is uniformly bounded for h ∈ [0, η]. Therefore,

hkk−1∑κ=1

θκ(−1)κ

κ

∫ 1

0

Rk(αk(uh))ukK(κ) (u) du = O(hk) (A.6)

as h→ 0. Combining the chains of equalities from (A.1)–(A.6), we obtain

E[P k−1h ] = P k−1

h +O(hk) (A.7)

as h→ 0. On the other hand, from Lemma 2.1, we also have

β − βh = P k−1h +O(hk) (A.8)

as h→ 0. Combining (A.7) and (A.8) yields β −E[βh

]−E[P k−1

h ] = O(hk) as h→ 0.

22

A.3 Auxiliary Lemma: L4 to L2 Ratio

Lemma A.1 (L4 to L2 Ratio). If Assumptions 1, 3 (ii), and 4 are satisfied, then

(i) n−1/4 · E[(Zn − E[Zn])4]1/4

E[(Zn − E[Zn])2]1/2→ 0 and (ii) n−1/4 · E[(Z4

n − E[Z2n]2)]1/4

E[(Zn − E[Zn])2]1/2→ 0

as nh2n →∞.

Proof. (i) Since Zn = BA·1{A > h}+

∑k−1κ=1 θκ

1κh−1(−1)κBK(κ)

(Ah

), Minkowski’s inequality

yields

E[(Zn − E[Zn])4]1/4 ≤ E

[(B

A· 1{A > h} − E

[B

A· 1{A > h}

])4]1/4

+k−1∑κ=1

θκ1

κh−1E

[(BK(κ)

(A

h

)− E

[BK(κ)

(A

h

)])4]1/4

. (A.9)

Furthermore, Assumption 4 and i.i.d. sampling imply that BA·1{A > h} and

∑k−1κ=1 θκ

1κh−1(−1)κBK(κ)

(Ah

)are independent, and hence

E[(Zn − E[Zn])2] ≥ E

[(B

A· 1{A > h} − E

[B

A· 1{A > h}

])2]. (A.10)

Since K(κ) is bounded under Assumption 4 and E[B4] < ∞ under Assumption 1 (ii), we

have

h−1E

[(BK(κ)

(A

h

)− E

[BK(κ)

(A

h

)])4]

= O(h−1) (A.11)

for each κ ∈ {1, ..., k − 1}. By (A.9), (A.10), (A.11), nh → ∞, and Lemma A.3 under

Assumptions 1 (i) and (iii), it suffices to show that

E[(

BA· 1{A > h} − E

[BA· 1{A > h}

])4]nE[(

BA· 1{A > h} − E

[BA· 1{A > h}

])2]2 = o(1).

23

We branch into two cases below. Throughout, we use the property that τj(a) = τj(0) +

R1j (α

1j (a)) with α1

j (a) ∈ (0, a), where the remainder function R1j given by R1

j (a) = τ ′j(a) is

uniformly bounded in absolute value on [0, η] for some small η > 0 for each j ∈ {2, 3, 4}

under Assumption 3 (ii).

Case 1: τ2(0) = 0. By nh2 → ∞ and Lemma A.3 under Assumptions 1 (i) and (iii), it

suffices to show that

E

[(B

A· 1{A > h} − E

[B

A· 1{A > h}

])4]

= O(h−2) (A.12)

as h→ 0. The following four parts together show (A.12). First,

E

[B

A· 1{A > h}

]= O(1)

as h→ 0 under Assumption 1 (i). Second, for h ∈ (0, η2), we can write

E

[B2

A2· 1{A > h}

]= E

[B2

A2· 1{h < A ≤ h1/2}

]+ E

[B2

A2· 1{A > h1/2}

]≤∫ h1/2

h

τ2(a)

a2da+ h−1E

[B2 · 1{A > h1/2}

]=

∫ h1/2

h

R12(α

12(a))

ada+ h−1E

[B2 · 1{A > h1/2}

]=

∫ h−1/2

1

R12(α

12(uh))

udu+ h−1E

[B2 · 1{A > h1/2}

]= O(log(h−1/2)) +O(h−1) = O(h−1)

as h→ 0 under Assumption 1 (ii) and τ2(0) = 0. Third, by similar lines of calculations for

h ∈ (0, η2), we have

E

[B3

A3· 1{A > h}

]= E

[B3

A3· 1{h ≤ A ≤ h1/2}

]+ E

[B3

A3· 1{A > h1/2}

]= O(h−1/2) +O(h−3/2) = O(h−3/2)

24

as h → 0 under Assumption 1 (ii). Fourth, by similar lines of calculations for h ∈ (0, η2)

again, we have

E

[B4

A4· 1{A > h}

]= E

[B4

A4· 1{h ≤ A ≤ h1/2}

]+ E

[B4

A4· 1{A > h1/2}

]= O(h−1) +O(h−2) = O(h−2)

as h→ 0 under Assumption 1 (ii). Therefore, (A.12) holds.

Case 2: τ2(0) 6= 0. Since we let nh2 →∞, it suffices to show

(i) that hE[(

BA· 1{A > h} − E

[BA· 1{A > h}

])2]is bounded away from zero in a neigh-

borhood of h = 0; and

(ii) that E[(

BA· 1{A > h} − E

[BA· 1{A > h}

])4]= O(h−3).

First, to show (i), we make the following lines of calculations.

hE

[(B

A· 1{A > h} − E

[B

A· 1{A > h}

])2]

= hE

[B2

A2· 1{A > h}

]− hE

[B

A· 1{A > h}

]2= h

∫ ∞h

τ2(a)

a2da− hE

[B

A· 1{A > h}

]2≥ h

∫ 2h

h

τ2(a)

a2da− hE

[B

A· 1{A > h}

]2=

∫ 2

1

τ2(uh)

u2du− hE

[B

A· 1{A > h}

]2=

∫ 2

1

τ2(0) + uhR12(α

12(uh))

u2du− hE

[B

A· 1{A > h}

]2≥ 1

2τ2(0)− h

(1

2

∫ 2

1

∣∣R12(α

12(uh))

∣∣ du− E [BA· 1{A > h}

]2).

25

The last expression is bounded away from zero in a neighborhood of h = 0 by τ2(0) 6= 0

and Assumptions 1 (i) and 3 (ii). Second, to show (ii), we note that

E

[B

A· 1{A > h}

]= O(1), E

[B2

A2· 1{A > h}

]= O(h−2),

and E

[B3

A3· 1{A > h}

]= O(h−3)

as h→ 0 under Assumption 1 (ii). For the fourth moment, similar lines of calculations to

those in case 1 yield

E

[B4

A4· 1{A > h}

]=

∫ h1/2

h

τ4(a)

a4da+ E

[B4

A4· 1{A > h1/2}

]=

∫ h1/2

h

τ4(0) + aR14(α

14(a))

a4da+ E

[B4

A4· 1{A > h1/2}

]=

∫ h1/2

h

τ4(0)

a4da+

∫ h1/2

h

R14(α

14(a))

a3da+ E

[B4

A4· 1{A > h1/2}

]= O(h−3) +O(h−2) +O(h−2) = O(h−3)

as h→ 0 under Assumption 1 (ii). This completes a proof of part (i). A proof of part (ii)

similarly follows.

A.4 Proof of Lemma 2.2

Proof. First, we obtain

V ar

(En[(Zn − E[Zn])2]

E[(Zn − E[Zn])2]

)=

1

nV ar

((Zn − E[Zn])2

E[(Zn − E[Zn])2]

)=

1

n

(E[(Zn − E[Zn])4]

E[(Zn − E[Zn])2]2− 1

)= o(1)

as n → ∞ and nh2n → ∞ by i.i.d. sampling and Lemma A.1 (i) under Assumption 1, 3

(ii), and 4. Therefore, using Chebyshev’s inequality yields

En[(Zn − E[Zn])2]

E[(Zn − E[Zn])2]→p 1 (A.13)

26

as n→∞.

Second, note that Lindeberg condition holds for the triangular array{(n, i) 7→ n−1/2

Zn,i − E[Zn]

E[(Zn − E[Zn])2]1/2

∣∣∣∣ i ∈ {1, ..., n}, n ∈ N}

because Lyapunov condition,

n−δ/2E[|Zn − E[Zn]|2+δ

]E[(Zn − E[Zn])2

](2+δ)/2 → 0

as n → ∞ and nh2n → ∞, is satisfied in particular with δ = 2 by Lemma A.1 under

Assumptions 1, 3 (ii), and 4. Therefore, applying Lindeberg-Feller Theorem yields

n1/2 En[Zn]− E[Zn]

E[(Zn − E[Zn])2]1/2→d N(0, 1) (A.14)

as n→∞.

Third, applying Continuous Mapping Theorem and Slutsky’s Theorem to (A.13) and

(A.14), we obtain

n1/2 En[Zn]− E[Zn]

En[(Zn − E[Zn])2]1/2→d N(0, 1)

as n→∞. Finally, using the generic identical equality

P

(n1/2 En[Zn]− E[Zn]

En[(Zn − En[Zn])2]1/2≥ x

)= P

(n1/2 En[Zn]− E[Zn]

En[(Zn − E[Zn])2]1/2≥ x ·

(1 +

x2

n

)−1/2),

we obtain the desired result.

A.5 Auxiliary Lemma: βhn − βhn →p 0

Lemma A.2. If Assumptions 1 (i) and (ii) are satisfied, then βhn − βhn →p 0 as n→∞

and nh2n →∞.

27

Proof. First, note that E[βhn ] = βhn holds by the definitions of βhn and βhn . On the other

hand, the variance can be written as

V ar(βhn

)= n−1E

[B2

A2· 1{A > hn}

]− n−1E

[B

A· 1{A > hn}

]2. (A.15)

under the i.i.d. sampling. The first term on the right-hand side of (A.15) is

n−1E

[B2

A2· 1{A > hn}

]≤ n−1h−2E

[B2 · 1{A > hn}

]= O(n−1h−2)

under Assumption 1 (ii). The first term on the right-hand side of (A.15) is

n−1E

[B

A· 1{A > hn}

]= O(n−1)

under Assumption 1 (i). This shows that V ar(βhn

)in (A.15) is o(1) as n → ∞ and

nh2n → ∞. Therefore, by the weak law of large numbers for triangular array, we have

βhn − βhn →p 0 as n→∞ and nh2n →∞.

A.6 Auxiliary Lemma: Positive Variance

Lemma A.3. Assumption 1 (i) and (iii) then there exists ε > 0 such that

V ar

(B

A· 1{A > h}

)> ε

holds for all h in a neighborhood of 0.

Proof. Since BA

is integrable under Assumption 1, an application of the dominated conver-

gence theorem yields

E

[B

A· 1{A > h}

]→ E

[B

A

]as h→ 0. This implies that for each ε′ > 0 there exist hε′,1 > 0 such that

E

[B

A

]2≥ E

[B

A· 1{A > h}

]2− ε′

3(A.16)

28

for all h < hε′,1. Furthermore, since E[B2

A2 · 1{A > h}]

is non-increasing in h, an application

of the monotone convergence theorem gives

E

[B2

A2· 1{A > h}

]→ E

[B2

A2

]as h→ 0. This implies that for each ε′ > 0 there exist hε′,2 > 0 such that

E

[B2

A2· 1{A > h}

]≥ E

[B2

A2

]− ε′

3(A.17)

for all h < hε′,2. Thus, we can get the lower bound of V ar(BA· 1{A > h}

)as

V ar

(B

A· 1{A > h}

)= E

[B2

A2· 1{A > h}

]− E

[B

A· 1{A > h}

]2≥ E

[B2

A2

]−V ar

(BA

)3

− E[B

A

]2−V ar

(BA

)3

=1

3V ar

(B

A

)where the inequality is due to (A.16) and (A.17) with the choice of ε′ = V ar

(BA

)> 0 under

Assumption 1 (iii). Finally, letting ε = 13V ar

(BA

)> 0 proves the lemma.


Proof. First, consider the ratio

SnSn− 1 =

En [Z2n − E[Z2

n]]

Sn−β2hn− β2

hn

Sn(A.18)

The first term on the right-hand side of (A.18) has the mean

E

[En [Z2

n − E[Z2n]]

Sn

]= 0

and the variance

V ar

(En [Z2

n − E[Z2n]]

Sn

)= n−1

E[Z4n − E[Z2

n]2]

E[(Zn − E[Zn])2]2→ 0

29

as n→∞ and nh2n →∞ by Lemma A.1 (ii) under Assumptions 1, 3 (ii), and 4. Therefore,

by the weak law of large numbers for triangular array, we obtain

En [Z2n − E[Z2

n]]

Sn= op(1)

as n→∞. The second term on the right-hand side of (A.18) has the numerator

β2hn − β

2hn = op(1)

as n → ∞ and nh2n → ∞ by Lemma A.2 under Assumptions 1 (i) and (ii). Therefore,

(A.18) is

SnSn− 1 = op(1) (A.19)

as n→∞ and nh2n →∞ by Lemma A.3 under Assumptions 1 (i) and (iii). Now, combining

(A.19), Lemma A.3 under Assumption 1 (i) and (iii), and Theorem 2.1 under Assumptions

1 (i), 2, 3 (i), and 4 together imply

n1/2S−1/2n

(E[βhn

]+ E

[P k−1hn

]− β

)= n1/2S−1/2n

(SnSn

)−1/2 (E[βhn

]+ E

[P k−1hn

]− β

)= n1/2S−1/2n (1 + op(1))−1/2O(hkn) = op(1) (A.20)

as n → ∞ and nh2kn → 0. Finally, combining this equation (A.20) and Lemma 2.2 under

Assumptions 1, 3 (ii), and 4 yields

n1/2S−1/2n

(βh + P k−1

h − β)

= n1/2S−1/2n (En[Zn]− E[Zn]) + n1/2S−1/2n

(E[βhn

]+ E

[P k−1hn

]− β

)→d N(0, 1)

as n→∞ and nh2kn → 0

30


Proof. Note that βhn and P k−1hn

are independent under Assumption 4. From (2.6) and (2.7),

it suffices to show that

V ar(θκh

κhτ

(κ)1 (0)

)/V ar

(βhn

)= O(1)

as n→∞. Using the i.i.d. sampling, we first rewrite the denominator and the numerator

as

V ar(βhn

)=

1

nV ar

(B

A· 1{A > hn}

)=

1

n

(E

[B2

A2· 1{A > hn}

]− E

[B

A· 1{A > hn}

]2)and

V ar(θκh

κτ(κ)1 (0)

)= θ2κ

1

κ21

nh2nV ar

(BK(κ)

(A

hn

))= θ2κ

1

κ21

nh2n

(E

[B2K(κ)

(A

hn

)2]− E

[BK(κ)

(A

hn

)]2).

Therefore, we obtain

V ar(θκh

κnτ

(κ)1 (0)

)/V ar

(βhn

)= c · h−2n ·

E

[B2K(κ)

(Ahn

)2]− E

[BK(κ)

(Ahn

)]2E[B2

A2 · 1{A > hn}]− E

[BA· 1{A > hn}

]2 ,(A.21)

where c = θ2κ/κ2 ∈ (0,∞). To analyze the asymptotic order of the the right-hand side of

the above equation, we now branch into two cases.

Case 1: τ2(0) = 0.

1

hnE

[BK(κ)

(A

hn

)]=

1

hn

∫ hn

0

τ1(a)K(κ)

(a

hn

)da =

∫ 1

0

τ1(uhn)K(κ) (u) du = O(1)

as n→∞ under Assumptions 1 (i), 2, 3 (i) and 4. From Assumption 3 (ii),

τ2(a) = τ2(0) + a ·R12(α

12(a))

31

with α12(a) ∈ (0, a), where the remainder function R1

2 given by R12(a) = τ ′2(a) is uniformly

bounded in absolute value on [0, η] for some small η > 0. Therefore, using τ2(0) = 0 yields

1

h2nE

[B2K(κ)

(A

hn

)2]

=1

h2n

∫ hn

0

τ2(a)K(κ)

(a

hn

)2

da =1

hn

∫ 1

0

τ2(uhn)K(κ) (u)2 du

=

∫ 1

0

u ·R12(α

12(uhn))K(κ) (u)2 du = O(1)

as n→∞ under Assumptions 1 (i), 2 and 4. This shows that

h−2n

(E

[B2K(κ)

(A

hn

)2]− E

[BK(κ)

(A

hn

)]2)= O(1)

as n→∞. By Lemma A.3 under Assumption 1 (i) and (iii), therefore, the variance ratio

(A.21) is O(1) as n→∞.

Case 2: τ2(0) > 0. Since Assumptions 1 (i), 2, 3 (i) and 4 yield

1

hnE

[BK(κ)

(A

hn

)]= O(1)

as n→∞, as argued above, we have

1

hnE

[BK(κ)

(A

hn

)]2= o(1)

Furthermore, Assumption 3 (ii) provides

1

hnE

[B2K(κ)

(A

hn

)2]

=1

hn

∫ hn

0

τ2(a)K(κ)

(a

hn

)2

da =

∫ 1

0

τ2(uhn)K(κ) (u)2 du

= τ2(0)

∫ 1

0

K(κ) (u)2 du+ hn

∫ 1

0

u ·R12(α

12(uhn))K(κ) (u)2 du = O(1)

as n → ∞ under Assumptions 1 (i), 2 and 4. Combining the above two equations, we

obtain1

hnE

[B2K(κ)

(A

hn

)2]− 1

hnE

[BK(κ)

(A

hn

)]2= O(1) (A.22)

32

as n→∞. Note that Assumption 1 (i) yields

hnE

[B

A· 1{A > hn}

]2= o(1)

as n→∞. Assumption 3 (ii) provides

hnE

[B2

A2· 1{A > hn}

]= hn

∫ ∞hn

τ2(a)1

a2da =

∫ ∞1

τ2(uhn)1

u2du ≥

∫ 2

1

τ2(uhn)1

u2du

≥∫ 2

1

τ2(uhn)1

4du =

1

4

∫ 2

1

τ2(uhn)du =1

4

∫ 2

1

(τ2(0) + uhnR12(α

12(uhn))du

=1

4τ2(0) + hn

1

4

∫ 2

1

uR12(α

12(uhn))du =

1

4τ2(0) + o(1)

as n→∞. Combining the above two equations, we conclude that

hnE

[B2

A2· 1{A > hn}

]− hnE

[B

A· 1{A > hn}

]2=

1

4τ2(0) + o(1) ≥ 1

8τ2(0) > 0 (A.23)

for all hn as n→∞. By (A.22) and (A.23), therefore, the variance ratio (A.21) is O(1) as

n→∞.

References

Calonico, S., Cattaneo, M.D., and Farrell, M.H. (2017). On the effect of bias estimation on

coverage accuracy in nonparametric inference. J. Amer. Stat. Assoc. Forthcoming.

Calonico, S, Cattaneo, M.D., and Titiunik, R. (2014). Robust nonparametric confidence intervals

for regression discontinuity designs. Econometrica, 82 2295–2326.

Cattaneo, M.D., Crump, R.K., and Jansson, M. (2010). Robust data-driven inference for density-

weighted average derivatives. J. Amer. Stat. Assoc. 105 1070–1083.

Cattaneo, M.D., Crump, R.K., and Jansson, M. (2010). Small bandwidth asymptotics for density-

weighted average derivatives. Econometr. Theor. 30 176–200.

33

Chaudhuri, S. and Hill, J.B. (2016). Heavy tail robust estimation and inference for average treat-

ment effects. Working Paper.

Csorgo, S., Haeusler, E., and Mason, D.M. (1988). The asymptotic distribution of trimmed sums.

Ann. Probab. 16 672–699.

Crump, R.K., Hotz, V.J., Imbens, G.W., and Mitnik, O.A. (2009). Dealing with limited overlap

in estimation of average treatment effects. Biometrika 96 187–199.

Fan, J. and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. Chapman &

Hall/CRC.

Graham, B.S. and Powell, J.L. (2012). Identification and estimation of average partial effects in

“irregular” correlated random coefficient panel data models. Econometrica 80 2105–2152.

Griffin, P.S. and Pruitt, W.E. (1987). The central limit problem for trimmed sums. Math. Proc.

Camb. Phil. Soc. 102 329–349.

Griffin, P.S. and Pruitt, W.E. (1989). Asymptotic normality and subsequential limits of trimmed

sums. Ann. Probab. 17 1186–1219.

Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement

from a finite universe. J. Amer. Stat. Assoc. 47 663–685.

Khan, S. and Tamer, E. (2010). Irregular identification, support conditions, and inverse weight

estimation. Econometrica 78 2021–2042.

Lewbel, A. (1997). Semiparametric estimation of location and other discrete choice moments.

Econometr. Theor. 13 32–51.

Pena, V.H., Lai, T.L., and Shao, Q.-M. (2009). Self-Normalized Processes: Limit Theory and

Statistical Applications. Springer.

34

Peng, L. (2001). Estimating the mean of a heavy tailed distribution. Stat. Probab. Lett. 52 255–

264.

Peng, L. (2004). Empirical-likelihood-based confidence interval for the mean with a heavy-tailed

distribution. Ann. Stat. 32 1192–1214.

Rajan, R.G. and Zingales, L. (1998). Financial dependence and growth. Am. Econ. Rev. 88 559–

586.

Romano, J.P. and Wolf, M. (2000). Subsampling inference for the mean in the heavy-tailed case.

Metrika 50 55–69.

Ullah, A. and Vinod H.D. (1993). General nonparametric regression estimation and testing in

econometrics. in Handbook of Statistics, G.S. Maddala, C.R. Rao, and H.D. Vinod, eds. 11

85–116.

35

Documents

Inference for moments of ratios with robustness against ... Workshops/Se… · Inference for moments of ratios with robustness against large trimming bias and ... trimming bias and