Upload
doxuyen
View
212
Download
0
Embed Size (px)
Citation preview
Inference for moments of ratios withrobustness against large trimming bias and
unknown convergence rate∗
Yuya SasakiDepartment of Economics, Vanderbilt University
andTakuya Ura
Department of Economics, University of California, Davis
October 26, 2017
Abstract
We consider statistical inference for moments of the form E[B/A]. A naıve samplemean is unstable with small values of the denominator variable, A. This paperdevelops a method of robust inference, and proposes a data-driven practical choice oftrimming observations with small values of A. Our sense of the robustness is twofold.First, bias correction allows for robustness against large trimming bias. Second,adaptive inference allows for robustness against unknown convergence rate. Theproposed method allows for closer-to-optimal trimming, and thus more informativeinference results in practice. Simulation studies and real data analysis demonstratethis practical advantage.
Keywords: bias correction, ratio, robust inference, trimming, unknown convergence rate.
∗We benefited from useful comments by Federico A. Bugni, V. Joseph Hotz, Kengo Kato, Ulrich Muller,Liang Peng, Adam M. Rosen, Yichong Zhang, and seminar participants at Duke. All remaining errors areours.
1
arX
iv:1
709.
0098
1v3
[st
at.M
E]
25
Oct
201
7
1 Introduction
Moments of ratios of the form E[B/A] are ubiquitous in empirical research. Summary
tables often report statistics of ratios of random variables – see for example the well cited
paper by Rajan and Zingales (1998) among others. In addition, there are a number of
specific research methods in which moments of ratios are quantities of direct interest – in-
cluding inverse probability weighting (Horvitz and Thompson, 1952), binary choice models
with special regressor (Lewbel, 1997), and correlated random coefficient panel data models
(Graham and Powell, 2012), to list a few. When some observations have values of the
denominator A close to zero, they behave as outliers in terms of the ratio, B/A, and thus
can exercise large influences on the naıve sample mean statistics.
To avoid this problem, researchers often trim away those observations with small values
of the denominator variable A. For example, the well-cited paper by Crump, Hotz, Imbens
and Mitnik (2009) proposes to trim away observations with the denominator less than 0.1
for estimating average treatment effects. Trimming indeed mitigates the potentially large
variance, but it does so at the cost of increased bias in general. Furthermore, trimmed
estimators of moments of ratios are known to be associated with an unknown convergence
rate.1 Ideally, we want a method of inference to be robust against these two issues, namely
trimming bias and unknown convergence rate. In this paper, we develop such a method of
inference for moments of ratios with the twofold robustness.
In order to achieve the first sense of robustness, i.e., against large trimming bias, we need
to carefully choose a trimming threshold and to appropriately remove the trimming bias to
the extent where the bias no longer affects the center of the asymptotic distribution relative
to the variance. We develop a method of bias-corrected inference by explicitly estimating
1A large body of the literature discusses asymptotic distribution theories for trimmed sums – see Griffin
and Pruitt (1987); Csorgo, Haeusler, and Mason (1988); Griffin and Pruitt (1989) and references therein.
2
and removing the trimming bias to the necessary extent, and accordingly propose a practical
and systematic choice of trimming to meet the theoretical requirements. We are far from the
first to take this approach, and this idea is motivated by Cattaneo, Crump, and Jansson
(2010, 2014), Calonico, Cattaneo and Titiunik (2014), Calonico, Cattaneo, and Farrell
(2017), and in particular, Graham and Powell (2012) for the similarity in terms of the
irregularly identified estimand and trimmed estimation.
The second sense of robustness we aim to achieve is against an unknown convergence
rate. The asymptotic variance of trimmed estimators for moments of ratios converges at
the parametric√n rate in “regular” cases, whereas its convergence rate can be as slow
as the nonparametric rates in “irregular” cases – see Khan and Tamer (2010). Inference
should be robustly valid even without a prior knowledge of a researcher about whether the
case is regular or irregular. In order to accommodate this issue, we take advantage of the
rate-adaptive inference method based with self-normalized processes (cf. Pena, Lai, and
Shao, 2009), and extend it with the aforementioned bias correction approach to acquire the
twofold robustness.
Romano and Wolf (2000), Peng (2001, 2004) and Chaudhuri and Hill (2016) discuss
inference for the mean without finite second moments as we do in this paper. In particular,
our approach of rate-adaptive inference in conjunction with trimming bias correction is
closely related to that of Peng (2001) and Chaudhuri and Hill (2016). In fact, by using
the information of the ratio structure, our method complements and adds practical values
to the preceding idea of Peng (2001) and Chaudhuri and Hill (2016) in a few dimensions.
First, we introduce a data-driven selection of trimming threshold in a systematic way and
consistently with the asymptotic theory. Second, our method circumvents the need to pre-
estimate the tail index. Third, our framework allows for use of larger trimming thresholds
which enables faster convergence rates. We emphasize that we actively use the information
3
of the ratio structure, and a direct comparison of advantages and disadvantages between
our framework and this heavy tail literature is not straightforward. The bias correction
approach based on the local polynomial expression of the bias near ‘zero’ (as opposed to
infinity) was made feasible with our approach to trimming based on the denominator. This
aspect of our method is crucial for these two practical contributions.
Notations: E[X] and V ar(X) denote the expected value and the variance of random
variable X, respectively. Their sample counterparts are denoted by En[X] = n−1∑n
i=1Xi
and V arn(X) = En[X2]− En[X]2. The convergence in probability and the convergence in
distribution are denoted by →p and →d, respectively. 1{·} denotes the indicator function.
Outline of the paper: The rest of this paper is organized as follows. In Section 2,
we present main results of our method of inference with the twofold robustness. In Section
3, we provide a practical recipe for estimation and inference. In Section 4, we conduct
simulation studies to evaluate the performance of our method. In Section 5, we apply our
method to real data. All mathematical proofs are delegated to Section A in the appendix.
2 Main Results
Given an i.i.d. sample of random vector (A,B), consider the problem of estimating
β = E
[B
A
]. (2.1)
This estimand β exists under part (i) of the following assumption on finite moments.
Assumption 1. (i) E[∣∣BA
∣∣] <∞. (ii) E [B4] <∞. (iii) V ar(BA
)6= 0.
Part (ii) states that the possibility of infinite V ar(B/A) is imputed to small A rather
than heavy-tailed distribution of B. Part (iii) assumes away the trivial case where B/A
4
is degenerate. For simplicity of writing, we consider the case where A is supported in the
half line R+, although this restriction is not crucial for the substance of the main results.
Because the integrand may take large values if A is close to zero, it is natural to consider
the regularized estimator defined by
βh = En
[B
A· 1{A > h}
](2.2)
with a trimming threshold value h > 0. The idea behind this estimator is to trim away
those observations that have very small values of A in the denominator of the estimand.
For a fixed trimming threshold h, the mean of the regularized estimator is
βh = E
[B
A· 1{A > h}
], (2.3)
which exists under Assumption 1 (i). The difference, β − βh, is the bias of the regularized
estimator βh for the true estimand β, which we will hereafter refer to as a trimming bias.
The order of this trimming bias depends on specific applications, but it may well be as
slow as the linear order of h in many plausible applications.
The main difficulty lies in that a naıve bias estimate, En[BA· 1{A ≤ h}
], may entail
infinite variance. We take advantage of the fact that the bias term, E[BA· 1{A ≤ h}
], can
be approximated by estimable objects with bounded variances. Graham and Powell (2012,
p. 2125) suggests a similar idea in the context of correlated random coefficient panel data
models. To the best of our knowledge, however, this idea has not been formally established
for robust inference. Moreover, it is unclear whether this idea is generically applicable
beyond the framework of Graham and Powell (2012).
Before proceeding with our formal results, we briefly describe the intuition behind
our approach. Assumption 1 (i) and suitable regularity conditions imply E [B | A = 0] ·
fA(0) = 0, because we can write E[BA
]=∫∞0
E[B|A=a]·fA(a)a
da. Thus, the bias term,
5
E[BA· 1{A ≤ h}
], can be approximated as
E
[B
A· 1{A ≤ h}
]=
∫ h
0
E [B | A = a] · fA(a)
ada
=
∫ h
0
E [B | A = a] · fA(a)− E [B | A = 0] · fA(0)
ada ≈ h · d
daE [B | A = a] · fA(a)
∣∣∣∣a=0
.
Furthermore, the derivative ddaE [B | A = a] · fA(a)
∣∣a=0
in the last expression can be esti-
mated by the derivative of the numerator of the Nadaraya-Watson estimator whose statis-
tical properties are well studied – see Ullah and Vinod (1993, p. 94) and references therein.
For robust inference, we adjust the asymptotic variance by incorporating the variability
of the approximate bias estimator following the idea of Calonico, Cattaneo and Titiunik
(2014).
2.1 Bias Correction
This subsection discusses means to mitigate the trimming bias for the purpose of conducting
valid inference. The bias reduction is based on the continuous distribution of A and the
smoothness of its density fA as well as the smoothness of the conditional expectation
function E[B|A = · ] as concretely suggested by Assumptions 2 and 3 below.
Assumption 2 (Continuous Distribution of A). The distribution of A is absolutely con-
tinuous in a neighborhood of 0 with its density function denoted by fA.
Under this assumption, we introduce the short-hand notation
τj(a) = fA(a) · E[Bj | A = a] (2.4)
for j ∈ N. It is argued below that part (i) of the following assumption allows for bias
correction up to the order of hk for an integer k > 1.
6
Assumption 3 (Smoothness). (i) τ1 is k-times continuously differentiable with a bounded
k-th derivative in a neighborhood of 0 for an integer k > 1. (ii) τ2, τ3, and τ4 are continu-
ously differentiable with bounded first derivatives in a neighborhood of 0.
Lemma 2.1 (Bias Correction). Suppose that Assumptions 1 (i), 2, and 3 (i) are satisfied.
For an integer k > 1 provided in Assumption 3 (i),
β − βh = P k−1h +O(hk)
holds as h→ 0, where P k−1h is defined by
P k−1h =
k−1∑κ=1
hκ
κ! · κ· τ (κ)1 (0). (2.5)
This lemma shows that the trimming bias, β−βh, of the estimator (2.2) can be decom-
posed into an estimable part P k−1h and a remaining bias of order hk. Hence, if the estimable
part P k−1h can be estimated with the bias of order hk, then substituting such an estimate
P k−1h for P k−1
h can correct the trimming bias of the estimator (2.2) up to the order of hk.
In other words, we suggest a bias-corrected estimator
βh + P k−1h
with any bias estimator P k−1h satisfying the following property.
E[P k−1h ] = P k−1
h +O(hk).
Section 2.2 presents a concrete suggestion of such an estimator.
2.2 Bias Estimation
From (2.5) in the statement of Lemma 2.1, it is a natural idea to develop a bias estimator
P k−1h based on an estimator of τ
(κ)1 , κ ∈ {1, ..., k − 1}. The κ-th derivative of the function
7
τ1 defined in (2.4) can be estimated by the local derivative estimator
τ(κ)1 (0) = En
[(−1)κ ·Bhκ+1
·K(κ)
(A
h
)], (2.6)
where K denotes a kernel function satisfying the following properties.
Assumption 4 (Kernel). (i) K has the support (0, 1). (ii)∫ 1
0K(u)du = 1. (iii) K is
k-times continuously differentiable with a bounded k-th derivative for an integer k > 1.
Following (2.5), we consider the bias estimator defined by
P k−1h =
k−1∑κ=1
θκ · hκ
κ· τ (κ)1 (0), (2.7)
where the weights {θκ}k−1κ=1 are chosen to satisfy
κ1
k−1∑κ=1
(−1)κ · θκκ
·∫ 1
0
uκ1K(κ) (u) du = 1 for each κ1 = 1, . . . , k − 1. (2.8)
Note that such weights {θκ}k−1κ=1 implied by (2.8) are generally different from the weights
1κ!
for the population bias P k−1h given in (2.5). This discrepancy results because the bias
estimator {τκ1 (0)}k−1κ=1 in (2.6) itself has a bias for the population counterpart {τκ1 (0)}k−1κ=1,
and we need to correct for these biases of the bias estimator as well as the trimming bias.
With a suitable choice of weights and a kernel function following the above prescription,
the asymptotic order of the bias with the proposed bias correction can be nicely controlled
as the following first main theorem of this paper states.
Theorem 2.1 (Bias Correction). If Assumptions 1 (i), 2, 3 (i), and 4 are satisfied, then
β − E[βh
]− E
[P k−1h
]= O(hk)
as h→ 0.
8
2.3 Rate-Adaptive Inference
The previous subsection focuses on the asymptotic bias of the bias corrected estimator. The
current subsection in turn focuses on the stochastic part. At the end of this subsection, we
combine the “bias” part and the “stochastic”part to develop an asymptotic distribution
result for robust inference.
Making the dependence of the bandwidth h on the sample size n explicit by hn, we
introduce the random variable
Zn =B
A· 1{A > hn}+
k−1∑κ=1
(−1)κ · θκ ·Bhn · κ
K(κ)
(A
hn
). (2.9)
Note that (2.2), (2.6), (2.7), and (2.9) yield
En[Zn] = βhn + P k−1hn
,
i.e., the sample mean of Zn is the bias corrected estimator. We also introduce the notation
Sn = E [Z2n]− E [Zn]2 for the variance of Zn, and its analog estimate
Sn = En[Z2n
]− En [Zn]2 .
We use the following assumption for the asymptotic normality result.
Lemma 2.2 (Asymptotic Normality). If Assumptions 1, 3 (ii), and 4 are satisfied, then
n1/2S−1/2n (En[Zn]− E[Zn])→d N(0, 1)
for nh2n →∞ as n→∞.
Recall that Assumptions 1 (i), 2, 3 (i), and 4 are used to obtain the “bias” part in
Theorem 2.1. On the other hand, Assumption 1, 3 (ii), and 4 are used to obtain the
stochastic or “stochastic” part in Lemma 2.2. Combining Theorem 2.1 and Lemma 2.2
together, we state the second main result of this paper below.
9
Theorem 2.2 (Asymptotic Normality). If Assumptions 1, 2, 3, and 4 are satisfied, then
n1/2S−1/2n
(βh + P k−1
h − β)→d N(0, 1)
for nh2n →∞ and nh2kn → 0 as n→∞.
2.4 Invariance in Convergence Rate
As in Khan and Tamer (2010), the convergence rate of the regularized estimator (2.2) is
unknown in general. As such, it may be unclear at first glance about which one of the
regularized estimator, βh, and the bias estimator, P k−1h , dominates in the yet unknown
variance of the combined estimator βh + P k−1h . In this section, we claim that adding the
bias estimator will not slow the convergence rate compared with that of the regularized
estimator without bias correction.
Theorem 2.3 (Invariant Convergence Rate). If Assumptions 1 (i), (iii), 2, 3, and 4 are
satisfied, then
V ar(βhn + P k−1hn
)/V ar(βhn) = O(1)
for hn → 0 as n→∞.
The conclusion of this theorem ensures that we do not worsen the convergence rate by
adding the bias correction terms. Therefore, the cost of bias correction and robust inference
is only the augmented variance, and not a slowed convergence rate.
3 Guide for Practice
We propose a practical recipe for the case of the order k = 3 and a choice of bandwidth hn =
O(n−1/5) based on the mean square error minimization with respect to the second-order
10
bias. These choices are consistent with the rate assumptions stated in the general theory
both under regular and irregular cases. Of course, a researcher could make alternative
choices of k and bandwidth as far as the rate requirements in the general theory are met.
3.1 Procedural Recipe
Step 1: Bandwidth Choice We choose the bandwidth h∗n by minimizing
h4
16En
[B2
h3preK(2)
(A
hpre
)]2+ n−1V arn
(B
A· 1{A > h} − B
h · (1−K(1))·K(1)
(A
h
))with respect to h, where the preliminary bandwidth hpre used for a preliminary second-
order bias estimation is set to max{Ai}ni=1 for a global estimation.2
Step 2: Bias Corrected Estimation Once the bandwidth h∗n is obtained, the bias-
corrected estimate is given by
βh∗n + P 2h∗n
= En
[B
A· 1{A > h∗n} −
θ1B
h∗nK(1)
(A
h∗n
)+θ2B
2h∗nK(2)
(A
h∗n
)]following (2.2), (2.6), and (2.7), where θ1 and θ2 are given following (2.8) by θ1
θ2
=
−2∫ 1
0uK(1)(u)du
∫ 1
0uK(2)(u)du
−2∫ 1
0u2K(1)(u)du
∫ 1
0u2K(2)(u)du
−1 2
1
.
Step 3: Variance Estimation The variance of βh∗n + P 2h∗n
is approximated by
n−1V arn
(B
A· 1{A > h∗n} −
θ1B
h∗nK(1)
(A
h∗n
)+θ2B
2h∗nK(2)
(A
h∗n
))following the result in Section 2.3, where θ1 and θ2 are the same as those given in Step 2.
2This choice of the preliminary bandwidth for the global estimation is analogous to the recommendation
by Fan and Gijbels (1996, p. 198) to use a global preliminary parametric estimation for a plug-in optimal
bandwidth selection in the context of local polynomial estimation of nonparametric functions.
11
3.2 Remark on Consistency between Recipe and Theory
The bandwidth choice suggested in Step 1 of Section 3.1 induces asymptotically negligi-
ble bias relative to the variance when it is used with a bias-corrected inference based on
k = 3 as in Steps 2 and 3. Specifically, if the identification is regular in the sense that
V ar(BA· 1{A > h} − θ1·B
h·K(1)
(Ah
))∼ 1 as h→ 0, then we have h∗n ∼ n−1/4. On the other
hand, if the identification is regular in the sense that V ar(BA· 1{A > h} − θ1·B
h·K(1)
(Ah
))∼
h−1 as h→ 0, then we have the standard nonparametric MSE-optimal rate h∗n ∼ n−1/5. In
both of these two cases, we can see that the rate requirements nh2n →∞ and nh6n → 0 as
n→∞ given in Theorem 2.2 for k = 3 are satisfied by hn = h∗n.
4 Simulation Studies
This section presents simulation studies for evaluating the performance of our method
based on βh∗n + P 2h∗n
and its asymptotic distribution, in comparison with the naıve sample
mean estimator, En[B/A], and the regularized estimator, βh∗n , without bias correction. We
consider the following simulation designs. The denominator variable A and the numerator
variable B are independently generated according to the following laws.
A = (X/ν)1/2 where X ∼ χ2(ν), ν > 1.
B = Z + µ where Z ∼ N(0, 1), µ ∈ R.
The ratio, B/A, thus follows the (non-central) t distribution. The first moment E[B/A]
exists if and only if ν > 1. The second moment E[(B/A)2] exists if and only if ν > 2. The
smaller ν is, the more observations are there with the denominator variable A close to 0.
We also remark that the regularized estimator has the regular√n rate of convergence when
ν = 3, whereas it converges at a rate slower than√n when ν = 2. Because a researcher does
12
not know the true data generating process, this fact necessitates the method of inference
to be robust against unknown convergence rate as in our method.
We rely on the assumption that the first moment, E[B/A], exists, and thus ν > 1 is
maintained. Our method does not assume the existence of the second moment, E[(B/A)2],
and hence ν > 2 is not necessary. Therefore, the theory predicts that inference will work
with our method even when ν = 2. On other hand, inference based on the naıve sample
mean estimator En[B/A] is not guaranteed to work well when ν = 2 because of the infinite
variance. This is one point to check with simulations studies.
The bias of the regularized estimator, βh, can be analytically shown to be equal to
µ · E[1{A≤h}
A
]under the above data generating process. Therefore, there is no bias when
µ = 0. This means that inference works correctly by coincidence even with the regularized
estimator, βh∗n , without bias correction when µ = 0 is true. However, as µ deviates away
from 0, inference will start to work poorly without bias correction, whereas our method
based on, βh∗n + P 2h∗n
, with its appropriate asymptotic distribution is supposed to work
robustly under µ 6= 0 as well. This is another point to check with simulations studies.
For each combination of the data generating parameters, (µ, ν,N), we run 10,000 iter-
ations of data generation, estimation, and inference. Tables 1 and 2 summarize simulation
results under ν = 2 and 3, respectively. In each table, the top two row groups, the middle
two row groups, and the bottom two row groups use the location parameters µ = −1, 0,
and 1, respectively. Simulation statistics, including mean, bias, standard deviation, root
mean square error, and 95% coverage frequency, are reported for each of (i) the method
based on the sample mean, En[B/A], with its estimated standard error; (ii) the method
based on the regularized estimator, βh∗n , with its rate-adaptive standard error3; and (iii)
3We remark that simulation results for the regularized estimator, βh∗n, without bias correction are shown
only for the sake of illustrating the efficacy of bias correction. The bandwidth, h∗n, used for our inference
13
our robust method based on the bias-corrected estimator, βh∗n + P 2h∗n
, with its rate-adaptive
standard error implemented following practical guideline of Section 3 and the biweight
kernel function. We make the following three observations in these results.
First, recall that inference based on (i) is not guaranteed to work when ν = 2, whereas
our theory guarantees that inference based on (iii) will work even in this case. Table 1
demonstrates that this is the case indeed. When µ = 0, all the three methods, (i), (ii), and
(iii), work well in terms of the 95% coverage frequency. On the other hand, µ 6= 0 forces (i)
and (ii) to have size distortions, while (iii) continues to have the 95% coverage frequencies
close to the nominal level. Second, recall also that ν = 3 allows for the second moment
E[(B/A)2] to exist, and hence method (i) is not necessarily doomed to fail entirely. Table
2 demonstrates that this is the case indeed. For all values of µ ∈ {−1, 0, 1}, both methods
(i) and (iii) work well in terms of the 95% coverage frequency. However, as a researcher
may not know whether the second moment exists given empirical data, we recommend that
method (iii) instead of method (i) be used for the robustness of inference against unknown
alternative cases of the existence of moments. Finally, recall that µ = 0 allows inference
based on (ii) to work, whereas µ 6= 0 will not. Both of Tables 1 and 2 demonstrate
that this is the case indeed. These results suggest that bias correction and accordingly
adjusted asymptotic variance be used for robustness in inference against alternative data
generating processes with unknown magnitudes of potential biases. In other words, we
recommend that method (iii) instead of method (ii) be used for the robustness. In summary,
simulation results demonstrate that method (iii) is the only method to produce precise
coverage frequencies robustly across alternative data generating processes.
is larger than those required for a valid inference without bias correction, and hence our simulation results
are not necessarily overturning the theoretical properties of previous papers on adaptive inference without
bias correction.
14
µ ν N E[B/A] Method Mean Bias SD RMSE 95% Cover
-1 2 250 -1.772 (i) En[B/A] -1.771 0.001 0.409 0.409 0.905
(ii) βh∗n -0.447 1.325 0.105 1.329 0.000
(iii) βh∗n + P 2h∗n
-1.761 0.011 0.205 0.205 0.948
-1 2 500 -1.772 (i) En[B/A] -1.772 0.001 0.231 0.231 0.909
(ii) βh∗n -0.545 1.227 0.091 1.230 0.000
(iii) βh∗n + P 2h∗n
-1.769 0.004 0.148 0.148 0.957
0 2 250 0.000 (i) En[B/A] 0.002 0.002 0.238 0.238 0.957
(ii) βh∗n 0.000 0.000 0.027 0.027 0.948
(iii) βh∗n + P 2h∗n
0.000 0.000 0.152 0.152 0.943
0 2 500 0.000 (i) En[B/A] 0.001 0.001 0.178 0.178 0.955
(ii) βh∗n 0.000 0.000 0.023 0.023 0.947
(iii) βh∗n + P 2h∗n
0.000 0.000 0.110 0.110 0.945
1 2 250 1.772 (i) En[B/A] 1.770 -0.002 0.297 0.297 0.904
(ii) βh∗n 0.448 -1.324 0.106 1.328 0.000
(iii) βh∗n + P 2h∗n
1.762 -0.011 0.203 0.204 0.951
1 2 500 1.772 (i) En[B/A] 1.772 -0.001 0.239 0.239 0.914
(ii) βh∗n 0.546 -1.227 0.091 1.230 0.000
(iii) βh∗n + P 2h∗n
1.768 -0.005 0.147 0.147 0.956
Table 1: Simulation results under ν = 2 for three methods: (i) the method based on the
sample mean, En[B/A], and its asymptotic normal distribution; (ii) the method based on
the regularized estimator, βh∗n , with its rate-adaptive standard error; and (iii) our robust
method based on the bias-corrected estimator, βh∗n + P 2h∗n
, with its rate-adaptive standard
error. Displayed statistics are mean (Mean), bias (Bias), standard deviation (SD), root
mean square error (RMSE), and 95% coverage frequency (95% Cover).
15
µ ν N E[B/A] Method Mean Bias SD RMSE 95% Cover
-1 3 250 -1.382 (i) En[B/A] -1.382 0.000 0.128 0.128 0.940
(ii) βh∗n -0.618 0.764 0.168 0.782 0.002
(iii) βh∗n + P 2h∗n
-1.422 -0.040 0.195 0.199 0.930
-1 3 500 -1.382 (i) En[B/A] -1.381 0.001 0.091 0.091 0.946
(ii) βh∗n -0.776 0.606 0.123 0.619 0.000
(iii) βh∗n + P 2h∗n
-1.394 -0.012 0.132 0.132 0.943
0 3 250 0.000 (i) En[B/A] 0.000 0.000 0.110 0.110 0.952
(ii) βh∗n 0.000 0.000 0.033 0.033 0.944
(iii) βh∗n + P 2h∗n
0.000 0.000 0.145 0.145 0.940
0 3 500 0.000 (i) En[B/A] 0.000 0.000 0.077 0.077 0.948
(ii) βh∗n 0.000 0.000 0.030 0.030 0.947
(iii) βh∗n + P 2h∗n
0.001 0.001 0.102 0.102 0.945
1 3 250 1.382 (i) En[B/A] 1.382 0.001 0.129 0.129 0.940
(ii) βh∗n 0.614 -0.768 0.168 0.786 0.001
(iii) βh∗n + P 2h∗n
1.421 0.039 0.195 0.199 0.933
1 3 500 1.382 (i) En[B/A] 1.383 0.001 0.090 0.090 0.948
(ii) βh∗n 0.776 -0.606 0.124 0.618 0.000
(iii) βh∗n + P 2h∗n
1.395 0.013 0.132 0.133 0.946
Table 2: Simulation results under ν = 3 for three methods: (i) the method based on the
sample mean, En[B/A], and its asymptotic normal distribution; (ii) the method based on
the regularized estimator, βh∗n , with its rate-adaptive standard error; and (iii) our robust
method based on the bias-corrected estimator, βh∗n + P 2h∗n
, with its rate-adaptive standard
error. Displayed statistics are mean (Mean), bias (Bias), standard deviation (SD), root
mean square error (RMSE), and 95% coverage frequency (95% Cover).
16
5 Real Data Analysis
We illustrate our robust inference method with an empirical analysis of the dependence
of firms on external finance in the U.S. manufacturing industry for years 2000–2014. The
external dependence for a firm is defined as a ratio of external borrowing to capital ex-
penditures. It indicates extents to which manufacturing firms need external finance. This
ratio has been used in the economic growth literature – for example, Rajan and Zingales
(1998) uses the external dependence to identify which industrial sectors can benefit more
from financial development.
We use the Compustat database (Fundamentals Annual) for years 2000–2014. We focus
on manufacturing firms excluding the food industry, i.e., Standard Industry Classification
(SIC) Code ranging between 2100–3999. Following Rajan and Zingales (1998), we construct
a proxy for the firm’s dependence via the ratio of external borrowing to capital expenditures.
Specifically, the variable A represent the capital expenditures (CAPX), and the variable B
represents the capital expenditures (CAPX) minus cash flows from operations (FOPT).4
We focus on the U.S. firms. The observations with missing A or B are dropped. The
samples consists of sizes n = 2008, 1844, 1771, 1699, 1645, 1583, 1481, 1386, 1289, 1214,
1179, 1173, 1233, 1223, and 1146 for years 2000–2014, respectively.
Figure 1 displays results. The shades indicate 95% and 99% confidence intervals for the
mean/median of B/A on the vertical axis plotted against years 2000–2014 on the horizontal
axis. The top left graph shows the result for the mean E[B/A] using method (i) En[B/A],
i.e., the naıve sample mean. The top right graph shows the result for the median Med[B/A].
The bottom left graph shows the result for the mean E[B/A] using method (ii) βh∗n , i.e.,
the trimmed estimate without bias correction. The bottom right graph shows the result
4As in Rajan and Zingales (1998), for the firms with the format code 7 (SCF=7), we cannot observe
FOPT and thus impute the sum of IBC, DPC, TXDC, ESUBC, SPPIV, and FOPO instead.
17
Figure 1: 95% and 99% confidence intervals for the mean/median of the external financial
dependence in the U.S. manufacturing industry for years 2000–2014. The top left graph
shows the result for the mean E[B/A] using (i) En[B/A], the top right graph shows the
result for the median Med[B/A], the bottom left graph shows the result for the mean
E[B/A] using (ii) βh∗n , and the bottom right graph shows the result for the mean E[B/A]
using (iii) βh∗n + P 2h∗n
. 18
for the mean E[B/A] using method (iii) βh∗n + P 2h∗n
, i.e., our proposed method.
Not surprisingly, the naıve method (i) yields huge confidence intervals, and does not
provide informative inference results. The median is robust against outliers, and it indeed
provides more informative inference results. Likewise, method (ii) shows that the mean
is negative throughout, but it is not significantly different from zero at the 0.01 level for
all years. On the other hand, our robust method (iii) shows significantly negative mean
external financial dependence for all years, and its drop in level right after the 2007–2008
financial crisis is striking. These observations characterize qualitative differences among
the alternative methods.
6 Discussion
This paper proposes a new method of inference for moments of ratios of the form E[B/A].
The main purpose of this method is to deal with a number of practical concerns in a theo-
retically coherent way. Our method entail the following two practical values in particular.
First, our bias correction framework allows for a use of larger trimming thresholds, e.g.,
h ∝ n−1/5, which admit faster convergence rates. This feature proves useful in practice,
as evidenced in our simulation studies (Section 4) and real data analysis (Section 5). Fur-
thermore, our bias correction approach admits a systematic method of choosing trimming
thresholds in practice. By balancing the variance and the bias of a unit order less than the
order of supposed smoothness, we obtain the data-driven trimming rule that is consistent
with valid inference while achieving a close-to-optimal convergence rate.
Second, the rate-adaptive method of inference based on the self-normalized sum allows
for valid inference without a prior knowledge about the unknown convergence rate. This
feature is useful as it eliminates the need for a practitioner to pre-estimate a parameter that
19
determines the convergence rate, such as the curvature parameter of the density function
in a neighborhood of A. As such, the practical procedure that we propose (Section 3.1)
indeed consists of very simple steps, and its computational cost is actually minimal.
In summary, the combination of the rate-adaptive method and the trimming bias cor-
rection accounting for the asymptotic variance of the bias estimator as well allows for the
twofold robustness in inference for the moment E[B/A], namely against large trimming
bias and unknown convergence rate. We believe that this paper will to contribute to em-
pirical practice by providing this robustly valid inference procedure with the user-friendly
and data-driven implementation procedure.
A Appendix: Proofs
A.1 Proof of Lemma 2.1
Proof. Under Assumptions 1 (i), 2, and 3 (i), we can write the bias term as
β − βh =
∫ h
0
fA(a) · E[B|A = a]
ada =
∫ h
0
τ1(a)− τ1(0)
ada
where the first equality is due to the law of iterated expectations, and the second equality
follows from Assumption 3 (i). By the k-order mean value expansion under Assumption 3
(i), we can write the difference in the last expression as
τ1(a)− τ1(0) =k−1∑κ=1
aκ
κ!τ(κ)1 (0) + ak ·Rk(αk(a))
with αk(a) ∈ (0, a), where the remainder function Rk given by Rk(a) = 1k!τ(k)1 (a) is uni-
formly bounded in absolute value on [0, η] for some small η > 0 by Assumption 3 (i).
20
Combining the above results together, we can now write the bias term as
β − βh =k−1∑κ=1
1
κ!τ(κ)1 (0)
∫ h
0
aκ−1da+
∫ h
0
ak−1Rk(αk(a)
)da
=k−1∑κ=1
1
κ!
1
κhκτ
(κ)1 (0) +
∫ h
0
ak−1Rk(αk(a)
)da
where the remaining bias is bounded in absolute value as∣∣∣∣∫ h
0
ak−1Rk(αk(a)
)da
∣∣∣∣ ≤ ∫ h
0
ak−1da supa∈(0,h)
∣∣Rk(αk(a)
)∣∣ =1
khk sup
a∈(0,h)
∣∣Rk(αk(a)
)∣∣ .Finally, noting that supa∈[0,η]
∣∣Rk(a)∣∣ <∞ proves the claim for h ≤ η.
A.2 Proof of Theorem 2.1
Proof. First, by the definition of τ(κ)1 (0) given in (2.6), we can write
E[P k−1h ] = E
[k−1∑κ=1
θκhκ
κτ(κ)1 (0)
]=
k−1∑κ=1
θκhκ
κE[τ(κ)1 (0)
]=
k−1∑κ=1
(−1)κ
κhθκE
[BK(κ)
(A
h
)].
(A.1)
By Assumptions 2 and 4 together with the definition of τ1 given in (2.4), the last expression
in (A.1) may be rewritten as
k−1∑κ=1
(−1)κ
κhθκE
[BK(κ)
(A
h
)]=
k−1∑κ=1
(−1)κ
κhθκ
∫ h
0
τ1(a)K(κ)(ah
)da (A.2)
From the proof of Lemma 2.1, τ1(a) = τ1(a)− τ1(0) =∑k−1
κ=1aκ
κ!τ(κ)1 (0) + ak ·Rk(αk(a)) with
αk(a) ∈ (0, a), where the remainder function Rk given by Rk(a) = 1k!τ(k)1 (a) is uniformly
bounded in absolute value on [0, η] for some small η > 0 by Assumption 3 (i). Substituting
21
this mean value expansion in the last expression in (A.2) yields
k−1∑κ=1
(−1)κ
κhθκ
∫ h
0
τ1(a)K(κ)(ah
)da
=k−1∑κ1=1
1
κ1!
1
κ1hκ1τ
(κ1)1 (0)κ1
k−1∑κ=1
θκ(−1)κ
κ
∫ 1
0
uκ1K(κ) (u) du (A.3)
+ hkk−1∑κ=1
θκ(−1)κ
κ
∫ 1
0
Rk(αk(uh))ukK(κ) (u) du, (A.4)
where the last equality is due to changes of variables. The expression in line (A.3) reduces
tok−1∑κ1=1
1
κ1!
1
κ1hκ1τ
(κ1)1 (0)κ1
k−1∑κ=1
θκ(−1)κ
κ
∫ 1
0
uκ1K(κ) (u) du = P k−1h (A.5)
by the definition of P k−1h and the choice of {θκ}k−1κ=1 to satisfy (2.8). To see the asymptotic
behavior of line (A.4), note that∣∣∣∣∣k−1∑κ=1
θκ(−1)κ
κ
∫ 1
0
Rk(αk(uh))ukK(κ) (u) du
∣∣∣∣∣ ≤k−1∑κ=1
|θκ|κ
∫ 1
0
uk∣∣K(κ) (u)
∣∣ du supa∈(0,h)
∣∣Rk(αk(a)
)∣∣ ,where
∫ 1
0uk∣∣K(κ) (u)
∣∣ du <∞ for each κ ∈ {1, ..., k−1} by Assumption 4, and supa∈(0,h)∣∣Rk
(αk(a)
)∣∣is uniformly bounded for h ∈ [0, η]. Therefore,
hkk−1∑κ=1
θκ(−1)κ
κ
∫ 1
0
Rk(αk(uh))ukK(κ) (u) du = O(hk) (A.6)
as h→ 0. Combining the chains of equalities from (A.1)–(A.6), we obtain
E[P k−1h ] = P k−1
h +O(hk) (A.7)
as h→ 0. On the other hand, from Lemma 2.1, we also have
β − βh = P k−1h +O(hk) (A.8)
as h→ 0. Combining (A.7) and (A.8) yields β −E[βh
]−E[P k−1
h ] = O(hk) as h→ 0.
22
A.3 Auxiliary Lemma: L4 to L2 Ratio
Lemma A.1 (L4 to L2 Ratio). If Assumptions 1, 3 (ii), and 4 are satisfied, then
(i) n−1/4 · E[(Zn − E[Zn])4]1/4
E[(Zn − E[Zn])2]1/2→ 0 and (ii) n−1/4 · E[(Z4
n − E[Z2n]2)]1/4
E[(Zn − E[Zn])2]1/2→ 0
as nh2n →∞.
Proof. (i) Since Zn = BA·1{A > h}+
∑k−1κ=1 θκ
1κh−1(−1)κBK(κ)
(Ah
), Minkowski’s inequality
yields
E[(Zn − E[Zn])4]1/4 ≤ E
[(B
A· 1{A > h} − E
[B
A· 1{A > h}
])4]1/4
+k−1∑κ=1
θκ1
κh−1E
[(BK(κ)
(A
h
)− E
[BK(κ)
(A
h
)])4]1/4
. (A.9)
Furthermore, Assumption 4 and i.i.d. sampling imply that BA·1{A > h} and
∑k−1κ=1 θκ
1κh−1(−1)κBK(κ)
(Ah
)are independent, and hence
E[(Zn − E[Zn])2] ≥ E
[(B
A· 1{A > h} − E
[B
A· 1{A > h}
])2]. (A.10)
Since K(κ) is bounded under Assumption 4 and E[B4] < ∞ under Assumption 1 (ii), we
have
h−1E
[(BK(κ)
(A
h
)− E
[BK(κ)
(A
h
)])4]
= O(h−1) (A.11)
for each κ ∈ {1, ..., k − 1}. By (A.9), (A.10), (A.11), nh → ∞, and Lemma A.3 under
Assumptions 1 (i) and (iii), it suffices to show that
E[(
BA· 1{A > h} − E
[BA· 1{A > h}
])4]nE[(
BA· 1{A > h} − E
[BA· 1{A > h}
])2]2 = o(1).
23
We branch into two cases below. Throughout, we use the property that τj(a) = τj(0) +
R1j (α
1j (a)) with α1
j (a) ∈ (0, a), where the remainder function R1j given by R1
j (a) = τ ′j(a) is
uniformly bounded in absolute value on [0, η] for some small η > 0 for each j ∈ {2, 3, 4}
under Assumption 3 (ii).
Case 1: τ2(0) = 0. By nh2 → ∞ and Lemma A.3 under Assumptions 1 (i) and (iii), it
suffices to show that
E
[(B
A· 1{A > h} − E
[B
A· 1{A > h}
])4]
= O(h−2) (A.12)
as h→ 0. The following four parts together show (A.12). First,
E
[B
A· 1{A > h}
]= O(1)
as h→ 0 under Assumption 1 (i). Second, for h ∈ (0, η2), we can write
E
[B2
A2· 1{A > h}
]= E
[B2
A2· 1{h < A ≤ h1/2}
]+ E
[B2
A2· 1{A > h1/2}
]≤∫ h1/2
h
τ2(a)
a2da+ h−1E
[B2 · 1{A > h1/2}
]=
∫ h1/2
h
R12(α
12(a))
ada+ h−1E
[B2 · 1{A > h1/2}
]=
∫ h−1/2
1
R12(α
12(uh))
udu+ h−1E
[B2 · 1{A > h1/2}
]= O(log(h−1/2)) +O(h−1) = O(h−1)
as h→ 0 under Assumption 1 (ii) and τ2(0) = 0. Third, by similar lines of calculations for
h ∈ (0, η2), we have
E
[B3
A3· 1{A > h}
]= E
[B3
A3· 1{h ≤ A ≤ h1/2}
]+ E
[B3
A3· 1{A > h1/2}
]= O(h−1/2) +O(h−3/2) = O(h−3/2)
24
as h → 0 under Assumption 1 (ii). Fourth, by similar lines of calculations for h ∈ (0, η2)
again, we have
E
[B4
A4· 1{A > h}
]= E
[B4
A4· 1{h ≤ A ≤ h1/2}
]+ E
[B4
A4· 1{A > h1/2}
]= O(h−1) +O(h−2) = O(h−2)
as h→ 0 under Assumption 1 (ii). Therefore, (A.12) holds.
Case 2: τ2(0) 6= 0. Since we let nh2 →∞, it suffices to show
(i) that hE[(
BA· 1{A > h} − E
[BA· 1{A > h}
])2]is bounded away from zero in a neigh-
borhood of h = 0; and
(ii) that E[(
BA· 1{A > h} − E
[BA· 1{A > h}
])4]= O(h−3).
First, to show (i), we make the following lines of calculations.
hE
[(B
A· 1{A > h} − E
[B
A· 1{A > h}
])2]
= hE
[B2
A2· 1{A > h}
]− hE
[B
A· 1{A > h}
]2= h
∫ ∞h
τ2(a)
a2da− hE
[B
A· 1{A > h}
]2≥ h
∫ 2h
h
τ2(a)
a2da− hE
[B
A· 1{A > h}
]2=
∫ 2
1
τ2(uh)
u2du− hE
[B
A· 1{A > h}
]2=
∫ 2
1
τ2(0) + uhR12(α
12(uh))
u2du− hE
[B
A· 1{A > h}
]2≥ 1
2τ2(0)− h
(1
2
∫ 2
1
∣∣R12(α
12(uh))
∣∣ du− E [BA· 1{A > h}
]2).
25
The last expression is bounded away from zero in a neighborhood of h = 0 by τ2(0) 6= 0
and Assumptions 1 (i) and 3 (ii). Second, to show (ii), we note that
E
[B
A· 1{A > h}
]= O(1), E
[B2
A2· 1{A > h}
]= O(h−2),
and E
[B3
A3· 1{A > h}
]= O(h−3)
as h→ 0 under Assumption 1 (ii). For the fourth moment, similar lines of calculations to
those in case 1 yield
E
[B4
A4· 1{A > h}
]=
∫ h1/2
h
τ4(a)
a4da+ E
[B4
A4· 1{A > h1/2}
]=
∫ h1/2
h
τ4(0) + aR14(α
14(a))
a4da+ E
[B4
A4· 1{A > h1/2}
]=
∫ h1/2
h
τ4(0)
a4da+
∫ h1/2
h
R14(α
14(a))
a3da+ E
[B4
A4· 1{A > h1/2}
]= O(h−3) +O(h−2) +O(h−2) = O(h−3)
as h→ 0 under Assumption 1 (ii). This completes a proof of part (i). A proof of part (ii)
similarly follows.
A.4 Proof of Lemma 2.2
Proof. First, we obtain
V ar
(En[(Zn − E[Zn])2]
E[(Zn − E[Zn])2]
)=
1
nV ar
((Zn − E[Zn])2
E[(Zn − E[Zn])2]
)=
1
n
(E[(Zn − E[Zn])4]
E[(Zn − E[Zn])2]2− 1
)= o(1)
as n → ∞ and nh2n → ∞ by i.i.d. sampling and Lemma A.1 (i) under Assumption 1, 3
(ii), and 4. Therefore, using Chebyshev’s inequality yields
En[(Zn − E[Zn])2]
E[(Zn − E[Zn])2]→p 1 (A.13)
26
as n→∞.
Second, note that Lindeberg condition holds for the triangular array{(n, i) 7→ n−1/2
Zn,i − E[Zn]
E[(Zn − E[Zn])2]1/2
∣∣∣∣ i ∈ {1, ..., n}, n ∈ N}
because Lyapunov condition,
n−δ/2E[|Zn − E[Zn]|2+δ
]E[(Zn − E[Zn])2
](2+δ)/2 → 0
as n → ∞ and nh2n → ∞, is satisfied in particular with δ = 2 by Lemma A.1 under
Assumptions 1, 3 (ii), and 4. Therefore, applying Lindeberg-Feller Theorem yields
n1/2 En[Zn]− E[Zn]
E[(Zn − E[Zn])2]1/2→d N(0, 1) (A.14)
as n→∞.
Third, applying Continuous Mapping Theorem and Slutsky’s Theorem to (A.13) and
(A.14), we obtain
n1/2 En[Zn]− E[Zn]
En[(Zn − E[Zn])2]1/2→d N(0, 1)
as n→∞. Finally, using the generic identical equality
P
(n1/2 En[Zn]− E[Zn]
En[(Zn − En[Zn])2]1/2≥ x
)= P
(n1/2 En[Zn]− E[Zn]
En[(Zn − E[Zn])2]1/2≥ x ·
(1 +
x2
n
)−1/2),
we obtain the desired result.
A.5 Auxiliary Lemma: βhn − βhn →p 0
Lemma A.2. If Assumptions 1 (i) and (ii) are satisfied, then βhn − βhn →p 0 as n→∞
and nh2n →∞.
27
Proof. First, note that E[βhn ] = βhn holds by the definitions of βhn and βhn . On the other
hand, the variance can be written as
V ar(βhn
)= n−1E
[B2
A2· 1{A > hn}
]− n−1E
[B
A· 1{A > hn}
]2. (A.15)
under the i.i.d. sampling. The first term on the right-hand side of (A.15) is
n−1E
[B2
A2· 1{A > hn}
]≤ n−1h−2E
[B2 · 1{A > hn}
]= O(n−1h−2)
under Assumption 1 (ii). The first term on the right-hand side of (A.15) is
n−1E
[B
A· 1{A > hn}
]= O(n−1)
under Assumption 1 (i). This shows that V ar(βhn
)in (A.15) is o(1) as n → ∞ and
nh2n → ∞. Therefore, by the weak law of large numbers for triangular array, we have
βhn − βhn →p 0 as n→∞ and nh2n →∞.
A.6 Auxiliary Lemma: Positive Variance
Lemma A.3. Assumption 1 (i) and (iii) then there exists ε > 0 such that
V ar
(B
A· 1{A > h}
)> ε
holds for all h in a neighborhood of 0.
Proof. Since BA
is integrable under Assumption 1, an application of the dominated conver-
gence theorem yields
E
[B
A· 1{A > h}
]→ E
[B
A
]as h→ 0. This implies that for each ε′ > 0 there exist hε′,1 > 0 such that
E
[B
A
]2≥ E
[B
A· 1{A > h}
]2− ε′
3(A.16)
28
for all h < hε′,1. Furthermore, since E[B2
A2 · 1{A > h}]
is non-increasing in h, an application
of the monotone convergence theorem gives
E
[B2
A2· 1{A > h}
]→ E
[B2
A2
]as h→ 0. This implies that for each ε′ > 0 there exist hε′,2 > 0 such that
E
[B2
A2· 1{A > h}
]≥ E
[B2
A2
]− ε′
3(A.17)
for all h < hε′,2. Thus, we can get the lower bound of V ar(BA· 1{A > h}
)as
V ar
(B
A· 1{A > h}
)= E
[B2
A2· 1{A > h}
]− E
[B
A· 1{A > h}
]2≥ E
[B2
A2
]−V ar
(BA
)3
− E[B
A
]2−V ar
(BA
)3
=1
3V ar
(B
A
)where the inequality is due to (A.16) and (A.17) with the choice of ε′ = V ar
(BA
)> 0 under
Assumption 1 (iii). Finally, letting ε = 13V ar
(BA
)> 0 proves the lemma.
A.7 Proof of Theorem 2.2
Proof. First, consider the ratio
SnSn− 1 =
En [Z2n − E[Z2
n]]
Sn−β2hn− β2
hn
Sn(A.18)
The first term on the right-hand side of (A.18) has the mean
E
[En [Z2
n − E[Z2n]]
Sn
]= 0
and the variance
V ar
(En [Z2
n − E[Z2n]]
Sn
)= n−1
E[Z4n − E[Z2
n]2]
E[(Zn − E[Zn])2]2→ 0
29
as n→∞ and nh2n →∞ by Lemma A.1 (ii) under Assumptions 1, 3 (ii), and 4. Therefore,
by the weak law of large numbers for triangular array, we obtain
En [Z2n − E[Z2
n]]
Sn= op(1)
as n→∞. The second term on the right-hand side of (A.18) has the numerator
β2hn − β
2hn = op(1)
as n → ∞ and nh2n → ∞ by Lemma A.2 under Assumptions 1 (i) and (ii). Therefore,
(A.18) is
SnSn− 1 = op(1) (A.19)
as n→∞ and nh2n →∞ by Lemma A.3 under Assumptions 1 (i) and (iii). Now, combining
(A.19), Lemma A.3 under Assumption 1 (i) and (iii), and Theorem 2.1 under Assumptions
1 (i), 2, 3 (i), and 4 together imply
n1/2S−1/2n
(E[βhn
]+ E
[P k−1hn
]− β
)= n1/2S−1/2n
(SnSn
)−1/2 (E[βhn
]+ E
[P k−1hn
]− β
)= n1/2S−1/2n (1 + op(1))−1/2O(hkn) = op(1) (A.20)
as n → ∞ and nh2kn → 0. Finally, combining this equation (A.20) and Lemma 2.2 under
Assumptions 1, 3 (ii), and 4 yields
n1/2S−1/2n
(βh + P k−1
h − β)
= n1/2S−1/2n (En[Zn]− E[Zn]) + n1/2S−1/2n
(E[βhn
]+ E
[P k−1hn
]− β
)→d N(0, 1)
as n→∞ and nh2kn → 0
30
A.8 Proof of Theorem 2.3
Proof. Note that βhn and P k−1hn
are independent under Assumption 4. From (2.6) and (2.7),
it suffices to show that
V ar(θκh
κhτ
(κ)1 (0)
)/V ar
(βhn
)= O(1)
as n→∞. Using the i.i.d. sampling, we first rewrite the denominator and the numerator
as
V ar(βhn
)=
1
nV ar
(B
A· 1{A > hn}
)=
1
n
(E
[B2
A2· 1{A > hn}
]− E
[B
A· 1{A > hn}
]2)and
V ar(θκh
κτ(κ)1 (0)
)= θ2κ
1
κ21
nh2nV ar
(BK(κ)
(A
hn
))= θ2κ
1
κ21
nh2n
(E
[B2K(κ)
(A
hn
)2]− E
[BK(κ)
(A
hn
)]2).
Therefore, we obtain
V ar(θκh
κnτ
(κ)1 (0)
)/V ar
(βhn
)= c · h−2n ·
E
[B2K(κ)
(Ahn
)2]− E
[BK(κ)
(Ahn
)]2E[B2
A2 · 1{A > hn}]− E
[BA· 1{A > hn}
]2 ,(A.21)
where c = θ2κ/κ2 ∈ (0,∞). To analyze the asymptotic order of the the right-hand side of
the above equation, we now branch into two cases.
Case 1: τ2(0) = 0.
1
hnE
[BK(κ)
(A
hn
)]=
1
hn
∫ hn
0
τ1(a)K(κ)
(a
hn
)da =
∫ 1
0
τ1(uhn)K(κ) (u) du = O(1)
as n→∞ under Assumptions 1 (i), 2, 3 (i) and 4. From Assumption 3 (ii),
τ2(a) = τ2(0) + a ·R12(α
12(a))
31
with α12(a) ∈ (0, a), where the remainder function R1
2 given by R12(a) = τ ′2(a) is uniformly
bounded in absolute value on [0, η] for some small η > 0. Therefore, using τ2(0) = 0 yields
1
h2nE
[B2K(κ)
(A
hn
)2]
=1
h2n
∫ hn
0
τ2(a)K(κ)
(a
hn
)2
da =1
hn
∫ 1
0
τ2(uhn)K(κ) (u)2 du
=
∫ 1
0
u ·R12(α
12(uhn))K(κ) (u)2 du = O(1)
as n→∞ under Assumptions 1 (i), 2 and 4. This shows that
h−2n
(E
[B2K(κ)
(A
hn
)2]− E
[BK(κ)
(A
hn
)]2)= O(1)
as n→∞. By Lemma A.3 under Assumption 1 (i) and (iii), therefore, the variance ratio
(A.21) is O(1) as n→∞.
Case 2: τ2(0) > 0. Since Assumptions 1 (i), 2, 3 (i) and 4 yield
1
hnE
[BK(κ)
(A
hn
)]= O(1)
as n→∞, as argued above, we have
1
hnE
[BK(κ)
(A
hn
)]2= o(1)
Furthermore, Assumption 3 (ii) provides
1
hnE
[B2K(κ)
(A
hn
)2]
=1
hn
∫ hn
0
τ2(a)K(κ)
(a
hn
)2
da =
∫ 1
0
τ2(uhn)K(κ) (u)2 du
= τ2(0)
∫ 1
0
K(κ) (u)2 du+ hn
∫ 1
0
u ·R12(α
12(uhn))K(κ) (u)2 du = O(1)
as n → ∞ under Assumptions 1 (i), 2 and 4. Combining the above two equations, we
obtain1
hnE
[B2K(κ)
(A
hn
)2]− 1
hnE
[BK(κ)
(A
hn
)]2= O(1) (A.22)
32
as n→∞. Note that Assumption 1 (i) yields
hnE
[B
A· 1{A > hn}
]2= o(1)
as n→∞. Assumption 3 (ii) provides
hnE
[B2
A2· 1{A > hn}
]= hn
∫ ∞hn
τ2(a)1
a2da =
∫ ∞1
τ2(uhn)1
u2du ≥
∫ 2
1
τ2(uhn)1
u2du
≥∫ 2
1
τ2(uhn)1
4du =
1
4
∫ 2
1
τ2(uhn)du =1
4
∫ 2
1
(τ2(0) + uhnR12(α
12(uhn))du
=1
4τ2(0) + hn
1
4
∫ 2
1
uR12(α
12(uhn))du =
1
4τ2(0) + o(1)
as n→∞. Combining the above two equations, we conclude that
hnE
[B2
A2· 1{A > hn}
]− hnE
[B
A· 1{A > hn}
]2=
1
4τ2(0) + o(1) ≥ 1
8τ2(0) > 0 (A.23)
for all hn as n→∞. By (A.22) and (A.23), therefore, the variance ratio (A.21) is O(1) as
n→∞.
References
Calonico, S., Cattaneo, M.D., and Farrell, M.H. (2017). On the effect of bias estimation on
coverage accuracy in nonparametric inference. J. Amer. Stat. Assoc. Forthcoming.
Calonico, S, Cattaneo, M.D., and Titiunik, R. (2014). Robust nonparametric confidence intervals
for regression discontinuity designs. Econometrica, 82 2295–2326.
Cattaneo, M.D., Crump, R.K., and Jansson, M. (2010). Robust data-driven inference for density-
weighted average derivatives. J. Amer. Stat. Assoc. 105 1070–1083.
Cattaneo, M.D., Crump, R.K., and Jansson, M. (2010). Small bandwidth asymptotics for density-
weighted average derivatives. Econometr. Theor. 30 176–200.
33
Chaudhuri, S. and Hill, J.B. (2016). Heavy tail robust estimation and inference for average treat-
ment effects. Working Paper.
Csorgo, S., Haeusler, E., and Mason, D.M. (1988). The asymptotic distribution of trimmed sums.
Ann. Probab. 16 672–699.
Crump, R.K., Hotz, V.J., Imbens, G.W., and Mitnik, O.A. (2009). Dealing with limited overlap
in estimation of average treatment effects. Biometrika 96 187–199.
Fan, J. and Gijbels, I. (1996). Local Polynomial Modeling and Its Applications. Chapman &
Hall/CRC.
Graham, B.S. and Powell, J.L. (2012). Identification and estimation of average partial effects in
“irregular” correlated random coefficient panel data models. Econometrica 80 2105–2152.
Griffin, P.S. and Pruitt, W.E. (1987). The central limit problem for trimmed sums. Math. Proc.
Camb. Phil. Soc. 102 329–349.
Griffin, P.S. and Pruitt, W.E. (1989). Asymptotic normality and subsequential limits of trimmed
sums. Ann. Probab. 17 1186–1219.
Horvitz, D.G. and Thompson, D.J. (1952). A generalization of sampling without replacement
from a finite universe. J. Amer. Stat. Assoc. 47 663–685.
Khan, S. and Tamer, E. (2010). Irregular identification, support conditions, and inverse weight
estimation. Econometrica 78 2021–2042.
Lewbel, A. (1997). Semiparametric estimation of location and other discrete choice moments.
Econometr. Theor. 13 32–51.
Pena, V.H., Lai, T.L., and Shao, Q.-M. (2009). Self-Normalized Processes: Limit Theory and
Statistical Applications. Springer.
34
Peng, L. (2001). Estimating the mean of a heavy tailed distribution. Stat. Probab. Lett. 52 255–
264.
Peng, L. (2004). Empirical-likelihood-based confidence interval for the mean with a heavy-tailed
distribution. Ann. Stat. 32 1192–1214.
Rajan, R.G. and Zingales, L. (1998). Financial dependence and growth. Am. Econ. Rev. 88 559–
586.
Romano, J.P. and Wolf, M. (2000). Subsampling inference for the mean in the heavy-tailed case.
Metrika 50 55–69.
Ullah, A. and Vinod H.D. (1993). General nonparametric regression estimation and testing in
econometrics. in Handbook of Statistics, G.S. Maddala, C.R. Rao, and H.D. Vinod, eds. 11
85–116.
35