Upload
lamphuc
View
216
Download
1
Embed Size (px)
Citation preview
Em~irical Likelihood Inference &der S t ratified Sampling
b y
Bob(Chongxin) Zhong
A thesis submitted to
the Faculty of Graduate Studies and Research
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
Department of Mat hematics and Statistics
Carleton University
Ottawa. Ontario, Canada
December 1997
@ Copyright 1997
Bob(Chongxin) Zhong
National Library 1+1 of Canada Bibliothèque nationale du Canada
Acquisitions and Acquisitions et Bibliographic Services services bibliographiques
395 Wellington Street 395, rue Wellington Ottawa ON K1A ON4 Ottawa ON K1A ON4 Canada Canada
The author has granted a non- exclusive licence dowing the National Library of Canada to reproduce, loan, distribute or seii copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in t h ~ s thesis. Neither the thesis nor substantial extracts fkom it may be printed or otherwise reproduced without the author's permission.
L'auteur a accordé une licence non exclusive permettant à la Bibliothèque nationale du Canada de reproduire, prêter, distribuer ou vendre des copies de cette thése sous la forme de microfiche/fïlm, de reproduction sur papier ou sur format électronique.
L'auteur conserve la propriété du droit d'auteur qui protège cette thèse. Ni la thèse ni des extraits substantiels de celle-ci ne doivent être imprimés ou autrement reproduits sans son autorisation.
Abstract
This dissertation consists of seven chapters. The first chapter is an introduction
to empirical likelihood inference, the last chapter gives conclusions and further re-
search areas, while the remaining five chapters deal with specific topics related to
ernpirical li kelihood met hod. Strat ified random sampling is discussed in chapters 2
to 5.
In chapter 3. empirical likeli hood under stratified random sampling is discussed
for finite population inference problems. üsing population auxiliary information.
such as overall total, this method is shown to lead to more efficient estimators.
Some limi ted simulation results are also given. Empirical likeli hood ratio tests are
also discussed.
In chapter 3 , empirical likelihood and strat ified sampling are combined wi t h
estimating equations to make non-parametric inferences. The results show t hat the
empirical li kelihood for parameters of interest have properties similar to t hose under
a parametric likelihood. These results cover empirical likelihood ratio estimation
and testing. Parameters subject to constraints are also studied. Illustrations of the
met hods are provided.
In chapter 4, a pseudo empirical likelihood method is developed in order to handle
complex survey sampling designs. Asymptotic properties of these estimators are also
discussed. Pseudo empirical likelihood ratio testing is also included. Illustrations of
the methods are provided.
In chapter 5. we apply the empirical likelihood technique to propose a new class
of M-estimators in the presence of auxiliary information under a nooparametric set-
ting, using stratified random sarnpling. It is shown that the proposed M-estimators
are consistent and asymptoticdly normally distributed with smaller asymptot ic vari-
ances than those of the usual M-estimators.
In chapter 6. we consider the following scenario: several different imperfect in-
struments and one perfect instrument are used independently t o measure a char-
acteristic of interest of a target population. We wish to combine the information
from these independent sarnples to make statistical inference on parameters of inter-
est, such as the population mean and population distribution function. We develop
empirical likeli hood estimators and empirical likelihood ratio tests and study their
asymptotic properties.
Finally. some suggestions for further research are presented in chapter 7.
Acknowledgement s
First of all, I would Like to express rny sincere and deeply grateful thanks to
my thesis supervisor, Professor J. N. K. Rao for his supervision and encouragement
during the course of this work, as well as for his kindness and support during the
past four years.
My thanks also go to Prof. Shisong Mao, my M.sc. supervisor, and to Prof.
J . Shao. Their support helped to make it possible for me to corne to Carleton
University.
My sincere thanks are also extended to Professor M. Csorgo. for his helpful
discussions and suggestions.
This research was partially supported by Chand and Ratna Devi Marwah Merno-
rial Scholarship and NSERC research grants of Professor J. N. K. Rao.
Finally, special thanks go to my wife Ci Jiang for her persistent support and to
my parents for their understanding.
Contents
Acceptance Sheet
Abstract
Acknowledgements
Contents
List of Tables ... Vl l l
Introduction 1
-2 1.1 Empirical likelihood and asymptotic setup . . . . . . . . . . . . . . . - 3 1.1.1 Empirical likelihood . . . . . . . . . . . . . . . . . . . . . . . -.
1.1.2 Asymptotic setup . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Outline of Chapters 2 to 4 . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Introduction to Chapter 2 . . . . . . . . . . . . . . . . . . . . 4
- 1.2.2 EL and estimating equations . . . . . . . . . . . . . . . . . . . r
1.2.3 Pseudo EL and estimating equations . . . . . . . . . . . . . . 9
1.3 EL and M-estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 EL in the Presence of Measurement Error . . . . . . . . . . . . . . . 14
2 EL for Finite Populations 17
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 Maximuin EmpiricaI Likelihood Estimator . . . . . . . . . . . . . . . 19
2 . 2 1 Numerical Evaluation for MELE . . . . . . . . . . . . . . . . . '21
2.2.2 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.3 Variance Estimation . . . . . . . . . . . . . . . . . . . . . . . 77
2.3 Estimation of Small Area Means . . . . . . . . . . . . . . . . . . . . . 28
2.4 Empirical Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . 30
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Simulation study 34
1.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3 EL and GEE 55
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5
3.2 MELR Estimators and Their Properties . . . . . . . . . . . . . . . . 5 S
3.2.1 MELR estirriators . . . . . . . . . . . . . . . . . . . . . . . . . 5s
3 - 2 2 Numerical method for given 0 . . . . . . . . . . . . . . . . . . 60
3.2.3 Asymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 MELR Estimators and Testing with Constraints . . . . . . . . . . . . 64
3.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
3.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4 Pseudo EL and GEE
. . . . . . . . . . . . . 4 1 Introduction
vii
4.2 MPELR Estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3 Xsymptotic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
-4.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
4.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5 M-estimation and EL 114
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.2 Profile empirical likelihood . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3 $1-estimators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
6 EL in the presence of measurement error 127
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.2 Empirical likelihood method in the presence of measurement error . 129
6.3 Xsymptotic properties of MELRE . . . . . . . . . . . . . . . . . . . 132
6.4 Likelihood Ratio Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5
6.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
7 Conclusions and Fùrther Research 150
7.1 Conchsions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
7.2 Further Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15'3
References 154
List of Tables
. . . . . . . . . . 2.1 Parameter settings for generated finite populations 37
. . . . . . . . . . . . 2.2 Ratios of MSE's of the to FGREGt J s t o FOPT 3s
. . . . . . . . . . . 2.3 Ratios of MSE's of F ( ~ , ) to F ~ ~ ( ~ ~ ) and F.,, (IJ,) 39
2.4 Empirical Error Rate(% ) for MELRT . . . . . . . . . . . . . . . . . 40
. . . . . . . . . . . . . . . . . . . 2.5 Empirical Error Rate(% ) for .A NT 40
Chapter 1
Introduction
Li kelihood and est imating equat ions provide common approaches to parametric in-
ference. Receotly, t hese approaches have been shown to be uselul in nonparametric
contexts.
Likelihood in nonparametric contexts. cailed empirical likelihood (described in
Section 1 below). was recently introduced by Owen( 1988. 1990. 1991). Owen has
shown that the empirical likeliliood ratio statistics have limiting chi-square dist ribu-
tions in certain situations. and has shown how to ohtain tests and confidence limits
for parameters. espressed as functionals. B ( F ) of an unknown distribution function
F. Ot her asymptotic properties and the possi bility of correct ing likelihood rat.io
statistics or their signed roots have been studied by DiCiccio and Romano(lSY9).
Hall(l990), DiCicco, Hall and Romano(1989, 1991) and ot hers.
Estimating equations is anot her important met hod in statistics. Qin and Law-
less(1994, 1995) showed how to link empirical likelihood and estimating equations.
and have shown that empirical likelihoods lead to asymptotic results similar to t hose
CHAPTER 1 . INTRODUCTION
under parametric likelihoods.
Actually, Hartley and Rao(1968) gave the original idea of empirical likelihood
in sample survey context as earfy as 1968. They called the method "scale-load*
approach.
The most of above discussions deal with simple random sarnpling. Since simple
random sampling is rarely used in practice frorn both practical and theoretical con-
sideration, we will consider the case of stratified random sampling. The sampling
scherne in each stratum maybe simple random sampling or a more complex sampling
met hod.
In this chapter. we will first describe the empirical likelihood(EL) and an asymp-
totic set-up in Section 1.1. Section 1.2 describes briefly the problems discussed in
chapters 2 to 4 and our main residts. Section 1.3 covers the problerns discussed in
Chapter 5. Section 1.4 describes the problems studied in Chapter 6.
1.1 Empirical likelihood and asymptotic setup
1.1.1 Empirical likelihood
Suppose that a target population is divided into H strata with known weights CC;I
for al1 strata h , Eh Wh = 1. In stratum h there are !Vh units with values x h ; ( i = 1.
. , .Wh; h = 1,. , H), where X h i is d-dimension variable. Denote the hth stratum
population mean, rnedian and distribution function by .Th. mxh and
respect idy. and l ( x h i < X ) = < z l ) , - - - l ( X h i Y d < x ~ ) ) ~ . X h i , , and z, are
j - th component of vector X h i and x respectively, l ( x h i , < x,) is the indicator of set
( < ) A~so, let -F, rnx and F x ( r ) = xf=i=l WhFhiVh(z) be the mean: rnedian
H and distribution function of the target population respectively, where M = lvh h'
and 5 = CE, IVhxh.
Suppose that x h l , . . xh,, is a simple random sample(SRS) without replacement
from stratum h with distribution function Fhm for al1 h. and that the sampies are
selected independently across the strata. As argued by Chen and Qin (1993), the
empirical likelihood for the above sampling scheme can be approximated by
if nh << l\ih and Nh is large, where phi = P r ( x h = x h i ) and x h has the distribution
function F h i v h -
CVe will cal1 (1.1) the empirical likelihood. Maximizing ( 1 . 1 ) subject to some
addi tional condit ions according to available auxiliary information will give us more
efficient estimators of parameters of interest, denoted as maximum empirical likeli-
hood estimators or ,MELE.
1.1.2 Asymptoticsetup
It is difficult to study the finite sample properties of MELE theoretically. Hence.
we study their asymptotic properties. The finite sample properties of MELE will be
studied through limited simulations. Let {Pu, u = 1,2. -) be a sequence of finite
populations. Throughout the paper v is used as the index of the finite population.
The asymptotic setup here is that both the sample size n h and the stratum size ;\.h
CHAPTER 1. INTRODUCTION 4
for each h go to infinity as v + oo, and Fhqivh goes to distribution function Fh, but
we will suppress the index v for convenience. We can suppose the population of
stratum h as a SRS sample from a super population Fh, hence FhYiVh -+ Fh. We also
H suppose that n/nh + kh > O, h = 1, . H. where n = zizi n h .
1.2 Outline of Chapters 2 to 4
In this section, we present an outline of Chapters 2 to 1. First. Chapters 2 to 4
a11 deal with stratified sampling. However, in Chapters 2: 3, we use simple random
sampling in each stratum. while in Chapter 1 we permit more complex sampling in
each stratum. Samples are al1 taken independently across strata.
1.2.1 Introduction to Chapter 2
Data set and auxiliary information
In order to be consistent with the convention of using x as the auxiliary variable
and y as the response variable. we change the notation for characteristics of popu-
lation to z = ( x r , gr)'. where x is auxiliary variable of dimension d - p and y is the
response variable of dimension p.
We suppose that samples are taken in each stratum by simple random sampling
wit hout replacement. Samples are denoted by
Suppose some auxiliary information about z is available, such as population mean
X or rn, when z is a scalar. In the following we suppose that only the overall mean
CHAPTER 1. INTROD UCTIOL!
.% is known from external sources such as demographic projections.
MEL estimators
The information X can be used to adjust our estirnators for p l i , - - - , PHnx? that
is, to force these estimators to satisfy
Maximizing ( 1.1) subject to conditions (2.1 ) and xi phi = 1. phi > O to get estimators
f i h i , we estimate the population mean Y by maximum empirical likeli hood( MEL)
and the MEL estimator of distribution function Fv(y) as
Asymptotic properties of MELE
Under appropriate conditions, we can prove that t h e estimator Y has the same
asymptotic variance as the optimal regression estimator
where Ê0 is given by Rao and Liu(see Chapter 2. Section 2). Further. is
asyniptotically more efficient t han F,(~) = Eh wh ;hFh.,,, ( Y ) : the customary cumuia-
tive distribution function. It is possible to construct an optimal regression est imator
of Fv(y) by changing ?Jhi t o the indicator variable l(yhi 5 y) in the formula for Io,,
(Rao and Liu, 1992). This estirnator, called F ~ ~ ~ ( ~ ) , is asyrnptotically equivalent to
F ~ ( ~ ) , but it rnay not be monotone unlike the MEL estirnator F ~ ( ~ ) .
Our simulation studies show t hat outperforms the GREG estimator. especially
for populations wit h widely varying intercept terms across strata. It is also consider-
ably more efficient t han the stratified mean gSl, except in the case of population with
a weak relationship between y and x. As expected? and the optimal estirnator
are equally efficient. However, the MEL estimator F~ ( y) out performs the empirical
distribution function FSt (IJ). Although &+J) and Po&) are asymptotically equiv-
den t , our simulation results indicate t hat Fu ( y ) is often significantly more efficient
Empirical likelihood ratio test
Since ( 1 . l ) is defined as empirical likeiihood, we can conduct empirical likelihood
ratio test(ELRT) on parameter p. Let t9 = (ûf7û:)', el, = .y and O, = Y' then
E ( z ) = Eh Wh E ( i h ) = 0 and we define the profile empirical likelihood ratio funct ion
." Maximizing RE(t9) will give an estimator O, of the parameter 0,. hence we ma?
as the empirical likelihood ratio statistic, where Bo = (Sr . v;)' and Fo is specified.
Under appropriate conditions, we get
under the hypothesis Ho : I = 6.
CHAPTER 1 . INTRODUCTION
1.2.2 EL and estimating equat ions
In Chapter 3, we combine EL, estimating equations and stratified random sampling
together. The auxiliary information here is in the form of r 2 p functionally inde-
pendent unbiased estimation functions, t hat is functions gj(x. O ) . j = 1. - . r such
that E F f g j ( x , O)} = O. ln vector form, we have
where g(x, O ) satisfies
The basic idea is to maximize empirical likelihood subject to constraints provided
by (2.3). Since we need to consider empirical likelihood ratio tests. we first define
empirical likelihood ratio function, then get the empirical likelihood ratio estimator
for the parameter of interest, 6. The estimator of 6 obtained by this method is called
the MELR estimator.
MELR estimators
Since empirical likelihood function is
and (2.4) is maximized by the empirical distribution function F,(I) = Eh CV, Fnh (x).
where n = Eh nh, F,.,, = Ci l ( xh i 5 x), the empirical likelihood ratio is then
defined as R ( F ) = L ( F ) / L ( F , ) , which reduces to
CHAPTER 1. INTRODUCTION 8
Since we are interested in estimating the parameter 9 , and since we know the esti-
mating equat ion (2.3), we define the ernpirical likelihood ratio function
where ghi(B) = g(xhi, 6) for ail h and i. We may maximize RE(B) to obtain the h)
MELR estirnate of the parameter O. In addition, this yields estirnates phi's of
phi's, and an estimate for the distribution F as
hr h)
The asymptotic properties of 0 and F, ( y ) have also been studied. The results are
The terms involved in the above asymptotic distributions are given in Chapter 3. hi
Frorn the asymptotic variance of fi 6. we can see that it is of the usual sandwich rV
form. The matrix U is non-negative definite so that F, ( s ) is at least as efficient as
F,(y)? the sample cumulative distribution function.
MELR testing with constraints
The empirical likelihood methods can be extended to deal with the case in which
there are constraints on parameters and even to testing on these constraints. Sup-
pose tliat q-dimensional constraints on û are of the form
CHAPTER 1. INTRODUCTION 9
where r ( 0 ) is a q x 1 ( q 5 p ) and the q x p rnatrix R ( 0 ) = is of full rank q . We
first consider how to estimate O . This can be done by defining
where v is a q x 1 vector of Lagrange multiplierso and then differentiating G2 with h
respect to 8 and v to get the estirnator for O denoted by O,.
Now the problern about how to test Ho : r(O) = O can be handled by any of the
t hree most popular met hods based on likelihood: likelihood-ratio test. Lagrange-
multiplier test and Wald test. Of course here they should be based on empirical
likelihood. We can prove t hat t hese met hods are asymp totically equivalent . The
conclusions in t his chapter show t hat the paramet ric likelihood and non-parametric
likelihood have sirnilar properties.
1.2.3 Pseudo EL and estimating equations
Sampling Design
Our sampling design in this chapter is stratified sampling, where samples are
taken frorn each straturn according to some specific design. such as simple random
sampling(SRS). probability proportional to size(PPS) sampling and samples are
independent across strata.
Pseudo EL
Suppose Pu with d-dimension characteristic x, has the unknown distribution
Fu and a pdirnension parameter 0 associated with it, and the hth stratum of P,
has d-dimension characteristic Xuh which has the unknown distribution Fuh. Sup-
pose xhl, - , x h h , the values of units in hth straturn of Pu. is a random sample
CHAPTER 1. INTROD UCTION 10
from a super-population, Say FA. If the entire population Pu were available: the
corresponding likelihood funct ion would be
where Phi = F:h(xhi) - F;h(xhi). The log-likelihood function then is
where N = CE, Nh. But we only have a sample. Say. s, from the finite population
and if we view (2.9) as a finite population total, then we may have available a design
unbiased estimator of l (F) ' namely.
h, and E refers to expectation with respect to the sampling design. Obviously LW
have E ( ~ ( F ) ) = l ( F ) . We will cal1 (2.10) the "pseudo empirical likelihood".
Using the pseudo empirical likelihood (2 . I O ) , we can obtain a maximum pseudo
empirical likeli hood ratio estimator for the parameter of interest. These estimators
will be called MPELR estimators.
MPELR estimators
Since Ci ph; = 1 , we know (2 .10) is rnaximized by F n ( x ) = Ch WhFnh(x )? where
n = Eh n h 7 Fnh(x) = xierh ~ h i ( ~ ) l ( x h i < x), hi(^) = dhi (s ) / CiEsh d h i ( ~ ) - The
log-empirical likeli hood ratio is t hen defined as
We also assume that information about 8 a r d F is available in the form of
r 2 p functionally independent unbiased estimation functions, that is. functions
gj ( x , O ) , j = 1, . - , r such that E F { g j ( x 7 O)} = O. In vector forrn, we have
where g ( x . 8) satisfies
Now we define the (negative) log-pseudo empirical likelihood ratio function of 6 by
where gh i (û ) = g ( x h , . 0 ) for al1 h and i. -u
We may minimize l E ( 0 ) to obtain an estimator 0 of the parameter 0 (called
MPELR estimator). In addition. this yields estimators P h i 7 % and therefore an esti-
mator of the distribution function F as
(2. l-i)
Asymptotic Results
Some conditions are needed to study the existence of solutions and the asymp-
totic properties of MPELR estimators. We assume that n m. n h / n -+ kh > O
and
Max 1 ibIûzidh;(s) l<hCH. l s i<nh dhi(s) = O( ;) 7 ~ i ~ ~ d ~ ~ ( ~ ) = 0(1)
as Y + 00. The assumption (2.15) means that no survey weight is disproportionately
large. Due to some technical difficulty. only stratified SRS and PPS with replacement
are discussed.
CIIAPTER 1. INTROD UCTION 12
Since our strata weights Wh are known and samples from different strata are
independent, we should also construct the weights to satisfy
S. we establish Under the conditions above and some other condition
normality of the estimator for both the parameter 0 and the distribution function
Pseudo empirical likelihood ratio test(PELRT) is discussed. Here the asymptotic
distribution of PELRT is the sum of weighted y: distribution. The reason for that is
that pseudo-EL is used as the likelihood. Methods of approximating the asymptotic
distribution are also discussed.
1.3 EL and M-estimation
M-estimation plays an important role in robust parametric inference and in non-
paramet ric inference. CVe consider a new class of hl-estimators here. Suppose t hat
some auxiliary information in the form of general estimating equations is available.
We can use this information to improve our estimator for distribution function. then
replace F in k1-functional O(F) by this improved estimator for distribution function
to get an estirnator 8. Since this estirnator 8 combines the information from auxiliary
information and hl-function, it wiil be more efficient than the usual M-estimator.
Auxiliary information
We assume that auxiliary information is in the form of r (2 1) functions gi (x)'
CHAPTER 1. INTRODUCTION
. -, g,(x) such that
EF&) = C bvh'IhEhg(x) = 0, (3-1 h
where g(x) = (gl(x), . . g,(x))', Eh denotes the expectation wit h respect to the
distribution function of stratum h. Here g ( x ) does not contain any unknoivn pa-
rameters, hence is different from the problems discussed in chapters 3 to 4.
M-est imat ion
An M-functional O( F) associated
root Bo of the equation
with distribution funct.ion F is defined as a
where x -- F and x, Bo, &(x, $0) E R. For a stratified sample +11: . - .XI,,: - : X H I :
- - -' x~~~ from F ( - ) , the usual estirnator of e0 is given by 4, = 6 ( F n ) , where F,(-) is
the naturai stratified ernpirical distributior? function, i-e., F,(z) = Ch -b xi l ( zh i < nh
x). 6(Fn) is called the M-estimator corresponding to t,b.
EL and M-estimation
The method to combine EL and M-estimation can be divided into two steps.
First. we use auxiliary information (3.1) to estimate F ( x ) . That is to define EL as
before as
where Phi = Pr(xI i = x h i ) Under condition (3.1), i.e..
and
CHAPTER 1. INTRODUCTION
maximizing L to get estimators for P h i :
where
t is the solution to
h h 1
Now let
then F*(z) can be regarded as an dternative estimator of the distribution function
The new M-estirnator to $ is then defined as by 6,. which is a root uo of the
squat ion
bVe will also cal1 0, the ELM-estirnator corresponding to Q.
Asymptotic properties
We prove that 0,, is asymptotic normal and that it has srnaller asymptotic vari-
ance compared to the one that does not use the auxiliary information.
1.4 EL in the Presence of Measurement Error
Sometime we are in the situation to measure a characteristic of interest of a target
population. We have several different "instruments" , one of t hem is "accuratel.
or it has srnalier measurement error cornpared to the other ones. In practice we
treat this instrument as "perfect", or no measurement error. Al1 other instruments
will have larger measurement error and we treat them as 'imperfect". The perfect
instrument can be used only very Iimited times due to cost, whereas the imperfect
ones, however, can be used more frequently. Now the question is how to combine
al1 the data available to rnake inference on parameters of interest.
Data set
Suppose t hat t here are H different instruments used in measuring t h e characteris-
tic of interest, the distribution function associated wit h instrument h ( h = 1. . . . . H)
is Fh(x) = P[zh 5 a ] , with unknown parameter O, where O is pdimension vector.
and the H-th measuring instrument is taken as perfect. These H different popu-
lations are related by the comrnon parameter B. Mie also assume that information
about O and Fh is available in the form of r h > p functionally independent unbiased
estimation functions, t hat is functions gh(xh, O ) such t hat
where gh(x. O ) is a rh-dimension vector function.
We furt her suppose that x h l , . -, xh,, is an i.i.d. sample frorn Fh(x). and samples
measured by different instruments are independent.
EL in the presence of measurement error
The ernpirical likelihood based on the indepe~dent samples
is given by
CHAPTER 1. INTROD UCTiON 16
where ph, = P r ( x h = xhi ) and Cy.l Phi = 1 for each h. The ernpirical likelihood
ratio is then defined as
Since we know the estimating equation (4.1 ), we define the empirical likelihood ratio
function for 8 as
where ghi(B) = gh (zhir 0) for d l h and i . LY
We can maximize RE(B) to obtain an estimator 6 of the parameter B. calied the
maximum empirical likelihood ratio estimator (MELRE). In addition. t his yields h
estimators phi, and an estimator for the true distribution function FH as
Asymptotic properties
+
cumulative distribution function of X H I . XH,,. The estimator e usually is
significantly more efficient t han the estimator using only ? -. xH,,.
Chapter 2
Empirical Likelihood Inference
for Finite Populations wit h
Auxiliary Informat ion Using
St ratified Random Sampling
2.1 Introduction
Suppose that a target population is divided into H strata with known weights Ch
for al1 s trata h, C h Wh = 1. In stratum h there are !Vh units with values zh;
( i = 1- - - . Nh; h = 1: - - , H), where zh, is d-dimension variables. -hi = (d. y;,)'.
xhj and yh, are d - p and p dimensions respectively and T denotes the transpose.
Denote the hth stratum population mean, median and distribution function by
CHAPTER 2. EL FOR FINITE POPULATIONS
(.Fi? F;)', (mi,? mih)* and
the j- th component of vectors ;hi and r respectively, I ( z h i , < z j ) is the indicator
xLi CVh Fhiv, ,(z) be the mean, rnedian and distribution function of the target pop-
H dation respectively. where N = Eh=* hrh, ?ir = xf=i NrhXh and P = Whrh. The reason we separate Zhi into two parts is that we are going to suppose that some
auxiliary information such as or rn, is available when we want to make inference
about target population parameters such as Y or m y .
In many practical problems, some population values of the auxiliary variable
x are known. such as the population mean or median from external sources such
as demographic projections. By using this knowledge. we would like to provide
improved inferences on the population parameters associated with the characteristic
of interest y. such as the population mean P, the population distri bution funct ion
of y, F&), and the population median rnv. Empirical likelihood met hods. recently
introduced by Owen ( 1988. 90, 9 1 ) in the context of iid random variables. provide
a systematic nonparametric approach to utilizing auxiliary information in making
inference on the parameters of interest. Hartley and Rao ( 1968) gave the original idea
of empirical li keli hood in sarnple survey context , using t heir '%cale-load" approach.
ünder simple random sampling, t hey obtained the maximum empirical likelihood
estimator of when only X is known and showed that it is approximately equal
to the regression estimator as the sarnple size, n o increases. Chen and Qin (1993)
CHAPTER 2. EL FOR FINITE POPULATIONS 19
extended these results to cover distribution function and median of Y? i.e., Fv ( y ) and
m p Qin and Lawless ( 1994, 95) used empirical likelihood and estimating equations
in the iid case to deal with interval estimation and hypothesis testing. They obtained
an empirical likelihood ratio test statistic (ELRS) and its asymptotic distribution
under null hypothesis. The main aim of this chapter is to deal with the case of
stratified simple random sarnpling.
In section 2.2, we derive the maximum empirical likeli hood est imator (MELE)
for the parameters of interest. Some large sample properties of MECLE are also
established. Application of MELE to small area estimation will b e given in section
2.3. In section 2.4, empirical likelihood ratio test (ELRT) on Y is considered and
its asymptotic distribution under the null hypothesis is obtained. In section 2.5. we
present the result of a limited simulation study on the finite sample properties of the
empirical likelihood method. Proofs of Theorems in section 2.2 and 2.4 are given in
section 2.6.
2.2 Maximum Empirical Likelihood Est imat or
Suppose t hat z h l , . . ? Zhnh is a simple randoni sarnple wit hout replacement from
stratum h with distribution function Fh1vh for al1 h. and that the samples are selected
independently across the strata. As argued by Chen and Qin (1993) for simple
random sampling , the empirical likelihood for the above sampling scheme can be
approximated by
L = nr=,n:,lPhi (2.1)
where ph; = P P ( Z ~ = z h i ) and zh = ( x i , y;)' has the distribution function Fhlvh.
CHAPTER 2. EL FOR F I N T E POPULA4TIONS 20
We consider the case of knoivn vector of population means. .%. of the variable
x = ( x ~ , . . , Xddp)':
Our results can be easily generalized to other forms of auxiliary information such as
a known population median in the case of scalar x.
The maximum empirical likelihood estimator should be sought among distribu-
tion functions satisfying (2.2) . Using the same argument as in Owen (1990), we
need consider only estimators of FhNh whose support are contained in the set of
observations. The problem t herefore reduces to maximizing
subject to
and
Wh p h i r h i = -Y- -
(2.5) h i
where Eh = xr==l and xi = xyi,. A solution for the above problem exists wi th
probability tending to one as the sample sizes go to infinity in each stratum. The
proof of this conclusion will be given in section 2.6.
Using the Lagrange multiplier method, let
11 = C C log phi - C X h ( C phi - 1) - n+'(C W 1 P h i x h i - X)-
where n = Eh n h and I,LJ is a (d - p)-dimension vector. Then
CHAPTER 2. EL FOR FINITE POPULATIONS
From (2.6) we get
( A h + nbIhS) 'xh; )ph i = 1.
and summing this expression over i leads to
using (2.4). Hence
where .Th = xi PhiXhi and mh = n b l / h n h L . Therefore. the estimators of phi's are the
solutions to the following system of equat ions:
subject to O 5 phi 5 1 . We denote the solutions of (2.7). (2.8) and (2.9) as fihi and
4. We cal1 the above method as the ernpirical likelihood rnethod. The estimator
we get by using this method is called the Maximum Empirical Likelihood Estimator
(MELE). We will now discuss how to solve (2.7), (2.5) and (2.9) in the following.
2.2.1 Numerical Evaluation for MELE
It is difficult to solve (2.'7),(2.8) and (2.9) directly. But they can be solved indirectly
as follows. Using Lagrange multiplier method, we maximize (2.3) subject to (2.4)
CHAPTER 2. EL FOR FINITE POPULATIONS
and
C P h i ( x h i - . Z ; i ) =O; h = 1, - ' . H I
for each given ( X I , - - - ,.rH) such that whrih = X. We get the maximum as
wit h the Lagrange multiplier sat isfying
where implicit ly
phi = nh[l f [ i ( x h i - - T h ) ] .
Hence, we can Say tha t Eh is a function of .Th and we then maximize
subject to
Using the Lagrange method again. let
solving
we get th = m,h#. Therefore,
CHAPTER 2. EL FOR FINITE POPULATIONS
Or ~h is a fiinction of 4, and Q should be chosen such that
It is evident that the solutions & .ch@)? phi to (2.10). (2.1 1) and (2.12). defined
as 4, . rh(g) , Phi , are the same as the solutions t o (2.7): (2.8) and (2.9). Hence
Xh(4) = .Ch satisfies (2.10). i.e.
We now prove that (2.10) and (2.12) have a unique solution. Denote
then, noting (?.IO),
aw where Id-' is the identity matrix of order d - p. For a given 4, is nonsingular
since the mat rix
CHAPTER 2. EL FOR FlNlTE POPULATIONS 24
is positive definite. If a solution -Yh to bV(-Y, 4) = O exists for each given 4. then
-Th is an implicit function of 4, Say ~ ( d ) , by irnplicit theorem. Since
a W ( x h ( d ) , 4 ) )
we obtain, noting (2.4) and (2.11),
for al1 h. By applying the theorem of the mean we will know that
will not have two differeot roots. Hence if a solution d exists. then it is unique and
we can find the 4 satisfying (2.12).
We solve (2.10)-(9.12) by cornbining (2.10) and (2.1'2) as a nonlinear system of
equations and then using NAG library function to get a solution for appropriatel-
selected initial values. e.g., $ = O and .Th = xh. For this solution the condition
O 5 Phi 5 I usually holds as seen in our simulation study.
Once we have the estimators P h i , we c m estimate I by
CHAPTER 2. EL FOR FINITE POPULATIONS
2.2.2 Asymptotic Results
In this section, we discuss the asymptotic behavior of the MELE. We use II . II to
denote Euclidean norm and oz > ol means that c2 -al is a positive definite matrix.
Also, we assume that both the sample size n h and the stratum size lVh for each h go
to infinity as the index u attached to n h and Nh goes to infinity, Le., n h u and !Vhv
go to infinity as u + m. However, for convenience, we will suppress the indes v in
the following whenever possible. And we will denote the solution $ by 4, since we
are going to deal with large sample problems.
Theorem 1. Suppose that as v -+ 3co Nh, nh. Nh - n h go t o infinit- n / n h -, kh.
kh > 0, and both Eh WhiVrl xi I I X h i Il3 and Ch ~ V ~ N L ' xi II yhi Il3 have an upper
bound independent of uo and ah,, = Cou(xh, x h ) > al > O for al1 h and v . where
o1 is a fixed positive definite rnatrix, then
as u + m. tvhere
= x h Wi(nhL - ~ v ; l ) [ ~ h , ~ ~ ( l - F ~ , , ~ ~ ) - 2GCov'(i(yh < y), zh ) + GVar(lh)GT]
CHAPTER 2. EL FOR FINITE POPULATIONS
From the results of Theorem 1' we find that the estimator y has the same
asymptotic variance as the optimal regression estimator
where
wit h
Furt her. & (ZJ ) is asyrnptotically more efficient t han the customary est imator Pn(y )
= Eh w ~ F ~ , ~ , , (y). I t is possible to construct an optimal regression estimator of
F u ( y ) by changing yhi to the indicator variable l ( y h i 5 y ) in the formula for Top,
(Rao and Liu, 1992). This estimator, F,,t!y ). is asyrnptotically equivalent to F ~ ( ~ ).
but it may not be monotone unlike the MEL estirnator klonotonicitÿ of
Fv ( y ) in the case of scalar y ensures the calculation of quantiles. etc.
In above discussion we considered auxiliary information of the form of (2.2). but
we can easily generalize it to the auriliary information of the form
The choice w(xh i ) = xh; - X gives (2.2). When the population median m, is knorvn
and x is a scalar, we let w ( x h i ) = Ikh,<mx] - 0.5.
ClYrlPTER 2. EL FOR FINITE POPULATIONS
2.2.3 Variance Estimation
In this section, we coosider the estimation of variance of B and F ~ ( ~ ) . Ünder the
conditions of Theorem 1 and from its proof, we can easily see that
is a consistent estimator of a. the asyrnptotic variance of P. in the sense that
n ( ô - 0) +, O. where
Çimilarly,
is a consistent estimator of OF,, the asymptotic variance of Fy(y 1, where
We can use (2.15) and (2.16) to get normal theory confidence intervals for f and
F&), which are asymptotically correct.
Another consistent estimator of O is obtained by the jackknife method. We give
the result in the following t heorem.
Theorem 2. Linder the same conditions as Theorern let Y - l j be the estimator
of Y when the j th observation of lth stratum is deleted and let
CHAPTER 2. EL FOR FINITE POPULATIONS
be the jackknife variance estimator. Tben
P n(âJ - 5 ) - 0. A jackknife variance estirnator for can be obtained from (2.17) by simply
changing yhi to L(yhi 5 y) . The consistency of t his variance estimator follows along
the lines of Theorem 2.
2.3 Estimation of Small Area Means
In some applications the strata may be regarded as srnall areas of interest. The
term %mal1 area?' is used to denote a srnall geographical area. such as a county. a
municipality or a census division. Sample survey data certainly can be used to derive
reliable estirnators of totals and means for large areas or domains. However. the
usual direct survey estimators for a small area, based on data only from the sample
units in the area, are likely to yield unacceptably large standard errors due to the
unduly small size of the sample in the area. Sample sizes for small areas are typically
small because the overall sample size in a survey is usually determined to provide
specific accuracy at a much higher level of aggregation than that of small areas.
Ghosh and Rao( 1994) reviewed many methods of small area estimation. including
t hose based on models that link the small area through auxiliary inforrnat.ion. Here
we do not assume any such model.
If the strata can be regarded as small areas and we only know the mean of
the population, we can g t more efficient estimators of the individual small area
CHAPTER 2. EL FOR FINITE POPCTLATIONS 29
(stratum) rneans ph by using auxilirtry information in the forrn of the overall rnean,
X. Under our set-up, we can use the MEL method to get MEL estimators for the
strata means vh as
where fihi's are the same as those given in section 2.
We now study the efficiency of T h relative to the sarnple mean y h that uses only
t h e y-data in small area h. We have
where
Hence
Therefore, the average variance over strata is
We use this as a measure of precision of the MEL estirnators vh. We consider two
special cases. First, if x and y are uncorrelated for al1 hl then
CHAPTER 2. EL FOR FINITE POPULATIONS 3 O
In this case, there is no gain in efficiency over y h . On the other hand, if yhi =
a h + C h X h i for al1 h , where Xhi is scalar, then
This suggests that in practice Y h will be more efficient than g h if y a n d r a r e
approximately Iinearly related in each small area h.
2.4 Empirical Likelihood Ratio Test
Now we turn to the empirical likelihood ratio test (ELRT) on p. Here ive denote
E ( z ) = Ch W h E ( z h ) = 8 , where 0 = (Bi18:)'. Bk = .r, 8, = Y. and k. u rnean
"known". "unknown' respectively. By (2.1) we know the empirical likelihood func-
tion is
L ( F ) = nclny& F ~ , , V ~ ( Z ~ ; ) = n:&Phi: (4.1 )
where Phi = d F h , i ~ h ( ~ h i ) = Pr( zh = zhi) and F denote the population distribution.
Noting that ph; 2 O, Ci ph; = 1 and F ( - ) = Ch CVhFh,iVh(z)~ we know (4.1) is
maximized by F,(z) = Ch WhFhVnh(z) , where FhVnh(z) = & x i I(ihi < 2). The
empirical likelihood ratio is then defined as R ( F ) = L ( F ) / L ( F,) which reduces to
CHAPTER 2. EL FOR FINITE POPULATIONS 31
Since we are interested in the parameter O, = I. and we have auxiliary information
(2.2), we define the profile empirical likelihood ratio function
The existence of a unique value for the right-hand side of (4.3) for a given 0, is the
same as discussed in section 2. üsing the Lagrange multiplier method, let
where t = ( t 1, t 2 , - - - . t d ) T are Lagrange multipliers. Taking derivat ives wit h respect
Hence,
where Z h = x i p h , z h i . mh = nkvhnhl and t is the solution to
From the discussion of section 2 we know that t can be deterrnined in terms of 6.
and t = t (8) is actually a continuous differentiable function of 8 .
The empirical likelihood ratio function for 0 now becomes as
CHAPTER 2. EL FOR FINITE POPULATIONS
which leads to the empirical log-likelihood ratio
"u hr "ur
We minimize l E ( 8 ) to obtain an estimator 8. of the parameter 8, and 8= (*TT. eu)'.
called the empirical likelihood ratio est imator. In addition. t his yields est imators hr
Phi from (4.4), and an estimator for the distribution function F:
Y
Now we turn to the calculation of 6 by minimizing of l E ( 8 ) . From the context above
we know that it is equivalent to maximizing
subject to
where 8 , is unknown. Since O, is unknown. (4.10) does not add any information
into t he estimation problem. Hence, L will be maximized by dropping (1.10). CVe
can show this as follows. Let
where tk , tu are multipliers with the same dimension as Bi, O, respectively. Then
811 -- - nt, = O 3 t , = O. 88,'
CHAPTER 2. EL FOR FINITE POPULATIONS 33
ry
This shows that we can drop (4.10). Therefore, Phi's are the same as Phi's given h) 1 ry
in section 2.2, and du= = Ch Wh xi Phi Y h i We can prove this by fdowing the
steps in section 2.2.
We note here that under Ho : O, = 8,: the estimator for phi will be changed to
1 phi =
nh[l + rnhir(~o)(-hi - S h ) ] '
where Zh = C i j h i i h i , Bo = ( Z r , O&)'. The calculation of Phi's is as discussed in
section 2 e?tcept for changing z h i to --hi? -Yh t0 i h , etc.
The empirical likelihood ratio statistic for testing Ho : 0, = 6, is given by
Theorem 3. Under the conditions of Theorem 1 and V a r ( z h ) 3 ai > O for an-
h and v . ive have
c ~ É ( o ~ ) 5 as o -+ oc
when Ho is true.
Using Theorem 3: we reject Ho at level a if WE(Bo) exceeds the upper a-point.
1 3 ~ ) : of x 2 ( p ) .
ClHAPTER 2. EL FOR FINITE POPULATIONS
2.5 Simulation study
We conducted a limited simulation study on the finite sample properties of the
proposed MEL estimators of the population mean F and the distribution Function
Fv(y) in the scalar case (p = 1. d = 3). We also considered the likelihood ratio test
on k.
For the simulation st udy, we employed six strat ified finite populations considered
by Chen and Sitter (1996). Each population consisted of L = 4 strata with stratum
sizes N h = 8000 - 300h and stratum sample sizes nh = 100 - 9h for h = 1.2.3.4.
The scalar characteristic yhi for the i-th unit within the h-th stratum was generated
from the mode1
for specified values of a h , Ph, yh, a and bh, where eh; are iid random variables for
each h , generated from either a chi-squared distribution with bh degrees of freedom. -
i.e.. y:,, or a standard normal distribution. N(0.1). Further. the scalar xhi are iid
variables for each h. generated from Table 2.1 gives the parameter settings
for generating the six populations. The strength of the relationship between y and
x is weak for population 1, and in populations 2-4. y and x are linearly related.
while in populations 5 and 6 a quadratic term is added to create departures from
linearity. The populations cover both symmetric and skewed errors eh;.
For each of the parameter settings in Table 3.1. a stratified finite population was
created and R = 1000 independent stratified simple random samples were drawn.
The random numbers needed to generate the finite population were obtained using
the NAG fortran library function. The empirical mean squared error of an estimator
CHAPTER 2. EL FOR FINITE POPu'LATlOM
8 of a parameter 0 was calculated as
where &') is the value of for the r-th simulated sample.
We now st.udy the performance of the MEL estirnator ? of the mean k- relative
to the stratified rnean, ijSt = Eh CVhijh, a generalized regression estirnator (GREG)
and the optimal regression estimator I,,. The GREG estimator is given by
w here
This estimator is motivated by a linear regression mode1 with common intercept and
dope coefficients and constant error variance ( Sarndal et al., 199 1. p.229). Table 2.2
reports the ratios of :USE(?) to M S E ( P ~ R ~ ~ ) , iLISE(yst) and MSE(P ,~ ) for al1
the six populations. It is clear from Table 3.2 that MELE outperforrns the GREG
estimator? especially for populations 1-4 with widely varying intercept terms across
strata. It is also considerably more efficient than the stratified mean except in
the case of population 1 with a weak relation between y and x. As erpected. MELE
and the optimal estimator are equally efficient for al1 the sis populations.
The second part of our simulation study deals with the estimation of popula-
tion distribution function Fy( y ). We selected the values of F y ( y ) corresponding
2 3 t o the quartiles y p ( p = f, 4, i-e., Fv(yp) = p. Table 2.3 reports the ratios
of M S E ( F ~ ( ~ ) ) to M S E ( F , ~ ( ~ ) ) and MSE(F , , , (~ ) ) . where is the MELE of
F Y ( y ) , pst(y) = Ch CVhFh,nh (y ) is the ernpirical distribution function and Fo, t (y ) is
CHAPTER 2. EL FOR FINITE POP lJLATIONS 36
the optimal regression estimator of Fy(y). It is clear from Table 2.3 that the MEL
estimator p&) outperforrns the ernpirical distribution function ) in al1 the six
populations. Alt hough F~ (IJ) and fi,,,(y) are asymptotically equivalent , our sirnu-
lation results indicate that FY(y) is often çignificantly more efficient than F,,t(Y) in
finite samples. For example, the ratio equals 0.62 for population 2 and 4 at p = $
and 0.7 for population 6 a t p = f . The main advantage of F ~ ( ~ ) . however. is tha t
it is monotone, unlike FOpt( y).
The final part of our simulation study investigates the performance of the em-
pirical likelihood ratio test and the asymptotic normality test obtained from the
result of theorem [(called ANT). Since we know the means of the six populations,
say, Foy we can test the hypothesis Ho : Y = vo from the simulated samples. We
tested Ho using the empirical likelihood ratio test statisiic ~ . b ( k b ) a t nominal levels
a = 0.05 and 0.1. Table 2.4 reports the empirical type 1 error rates for a = 0.10 and
0.05, obtained by computing the proportion of samples for which WE(F0) > >:(l)-
the upper a-point of a x2 variable with 1 degree of freedom. Table 2.5 reports the
empirical error rates for a = 0.10 and 0.05. obtained by computing the proportion of - -
samples for which IV - Y! > u,Cu :, where u, is the upper a-point of a :V(O. 1) dis-
tribution. Table 2.4 shows that the empirical likelihood ratio test performs well over
the six populations. although somewhat conservative. It appears that higher-order
corrections to the empirical likelihood ratio test are needed to get more accurate re-
sults, as noted by Qin and LawIess(1995) in the context of simple random sampling.
Table 2.5 shows that the standard normal approximation performs well over al1 six
populations.
CHAPTER 2. EL FOR FINITE POPULATIONS
Table 2.1: Parameter settings for generated finite populations
1
II
III
IV
V
rv
.xi Y:
2 Ys
'CE
Y;
2 Y4
Y:
Y;
Y:
15
2 'Cs
N(0.1)
:\-(o. 1)
X(0. 1)
Y(O.1)
X(0. 1)
N(0. 1)
iV(0,l)
X(0. 1)
X(0.1)
iV(0,I)
X(0.1)
N(0,l)
CHAPTER 2. EL FOR FINITE POPULATIONS
Table 2.2: Ratios of MSE's of the to PGREC. iOPT
Population
1
3 - 3
4
5
6
to y.t to FGREG
0.0247
0.471
0.345
O. 434
O. 629
OS18
Y to vopT 0.938
0.786
0.SSS
0.191
0.604
O Xi6
1.001
O. 999
O. 994
0.996
1 .O09
1 .O06
CHAPTER 2. EL FOR FINITE POPULATIONS
Table 2.3: Ratios of MSE's of F ( y P ) to k"t(9,) and FaPt(gp)
I Population 1 Ratio 1 p = L/4 1 p - - 214 1 p = 314
CHAPTER 2. EL FOR FINITE POPCJL.4TIONS
Table 2.4: Empirical Error Rate(% ) for MELRT
Population
Table 2.5: Empirical Error Rate(% ) for ANT
Population L
CHAPTER 2. EL FOR FINITE POPLfLATIONS
2.6 Proofs
We first summarize the existence cooclusion of maxirnizing problem at the beginning
of section 2 into a lemma and then prove it.
Lemma 1 Under conditions of Theorem 1, there exists wi th probability tending
to one, as v goes to infinity, a solution to the problem of maximizing Ch xi log Phi
subject to
In order to simplify our discussion. we suppose that x is a scalar here. The
discussion is simitar wher: x is multi-dimensional.
and
then obviously Dl is a convex subset of D. As shown by Owen (1990). a unique
solution exists for rnaxirnizing Eh xi logpi; within D y provided that X is within the
convex hull of xlll . ., x H n H . Since Dl c D, and if we can show that Dl is not
empty, we can get a maximum for Eh Ci log& within Dl: which is equivalent to
maxirnizing problem in the lemma since phi = CVhphz.
CHAPTER 2. EL FOR FINITE POPULATIONS
Under the conditions of Theorem 1, it is obvious that
where xcl) = minhminizhi, s(,) = maxhmax;xhi. Since H is finite, and samples are
independent across strata. we can also conclude that
h = 1? H, then we have solution (PZ,. . . ? pK,, )(xi P& = 1 ) ' such that 1, phi+,i =
-yh. Since Ch = -X, it follows that
The conditions .ih E ( x ~ ( ~ ) , ( h = 1, . . H ) hold with probability tending to
1, hence Dl is not empty with probability tending to 1. This cornpletes the proof.
We need the following Lemma to prove Theorem 1.
Lemma 2 linder conditions of Theorem 1. ive have
where
CHAPTER 2. EL FOR FINITE POPULATIONS
Proof of Lemma 2
By using the decomposition
and noting (2.13) we have
From (6.2) and (6.4) we get
P where 1 1 Zst - ,r II2+ O follows from standard results for stratified random samplinp
theory( e.g.. Bickel and Freeman, 1984 ). Hence
CK4PTER 2. EL FOR FINITE POPULATIONS
T herefore,
From (6.4) we have
where xZvRh = maxi II xh, II . Similar to the proof of Owen's (1988) Lemma. we can 1
show that xh,,, = o(nf ) almost surely(a.s.). Hence
where n('l = mirr{nl,. . , n H } By (6.8) we get
where A = Eh kh W:o(1)/2 > 0, is t he sniallest eigenvalue of Q. Hence.
CHAPTER 2. EL FOR F M T E POPULATIONS
we know II z,, - X II= O,@) and II tSt - -r II op(&) = o,(l). Therefore. (6.10)
estabiishes the last part of (6.1). From (6.4) we have
and
Therefore,
CHAPTER 2. EL FOR F I N T E POPULATIONS
Proof of theorem 1
We have
where the last term. noting (6.9), is
Hence, from (6.1) and (6.1) we get
CHAPTER 2. EL FOR FINITE POPULATIONS 47
Since E[y,, - B(I., - X)] = Y and
by appealing to Bickel & Freedman (1984): we get
! - - - L O" '(FWn - F) 4 N p ( O , I p ) .
Now ive turn to the second result of theorem 1. Denoting
we have
CHAPTER 2. EL FOR FINITE POPULATIONS
where by (6.4) the third term is
T herefore
where
CHAPTER 2. EL FOR FINITE POPULATIONS
Hence, --
OF! ( ~ y ( y ) - FY(Y)) i V p ( 0 ~ Ip).'
Proof of theorem 2
for convenience. where c i>-r j is the Lagrange multiplier $J when j-th observation of
1-th stratum is deleted. In the following op(
From Lemrna 2.1 it can be shown that the
1 ) means it holds uniformly over h. i. l. j.
follorving hold uniformly over h. i. 1. j :
CHAPTER 2. EL FOR FINITE POPULATIONS 50
Though MELE for plj is O when we delete xij from the data. we redefine f i l j , - l j as
above for convenience. We have
CK4PTER 2. EL FOR FINITE POPULATIONS *5 1
where the t hird term in the sixt h equality sign has the order
The fourth term of the same equation has also the same order.
Using the same method we can prove for h # 2 that
Si nce
by (6.11
O =
- -
) and (6.12) we have
Therefore
CHAPTER 2. EL FOR FINITE POPULATIONS 5 2
Following the steps of (6.11) we can obtain similar results on vh - W e have
Wl .- vt; (xij - i l ) + - 1
ni - 1 n1 - 1 (91j - Y i ) + o p ( ; )
Proof of theorem 3
- T h e result in Lemma 2 holds if we change -v to Bo, ZSt to z ,~ . m..
CH.4PTER 2. EL FOR FINITE POPULATIONS
where gr, = Eh bi/hkhah,zr. Therefore.
Since
CHAPTER 2. EL FOR FINITE POPULATIONS 54
Now
T herefore,
noting that
Chapter 3
Empirical Likelihood and General
Est imat ing Equat ions under
S t rat ified Sampling
3.1 Introduction
Likelihood and estimat ing equat ions provide the most common approaches to para-
metric inference. Recently. it has also been shown to be useful in non-parametric
contexts (Qin and Lawless(l994,5) and the references cited therein ). Our purpose
in this chapter is to combine empirical likelihood, estimating equations and stratified
sampling technology together. In order to sirnplify the discussion. ive suppose i.i.d.
sampie in each stratum, but the discussion can be carried over to simple random
sampling wit hout replacement.
Suppose that a target population with d-dimensional characteristic r has the
CHAPTER 3. EL AND GEE 56
unknown distribution function F and a pdimensional parameter 0 associated with
F. We are interested in making inference on 8. The sampling scheme is as lollows.
Suppose that the target population is divided into H strata with known weight
Wh for al1 strata. We have an i.i.d. sample t h t , - - , Xhnh on the characteristic xh
from stratum h with distribution function Fh' where xhi7s are d-dimensionaI(h =
1, - - - , H , i = 1, - , n h ) . The Fhls are unknown, and the sarnples across strata are
independent. From the context we know that
We also assume that information about 0 and F is available in the form of r 2 p func-
tionally independent unbiased estimation functions. t hat is. functions g j ( x . O), j =
1, . , r such that EF{gj(x, O ) } = O. In vector form.
where g ( x . O ) satisfies
In the following, we will show how to use such information to estirnate 0 and F.
in conjunction with empirical likelihood. But first for illustration we give some
examples that ive will return to later.
Example 1. Sometimes we have information relating to the first and second
moment of a variable. For example, let z be univariate with mean 6' and E ( x 2 ) =
m(0) , where m(.) is a known function. The information about F can be expressed
CHAPTER 3. EL AND GEE 5 7
in the form (1.2) by taking g(x. O ) = (x - 6 , x2 - m(O))' and the restriction becornes
Example 2. Let x = ( y , 2)' be bivariate with E(y j = E(r) = O. In this case we
can take g ( r , 6) = ( y - 6 , z - O)' and the restriction becomes
A somewhat similar problem is when E ( y ) = I is known and E ( z ) = 6 is to be
estimated, in which case we would have g(x) = ( y - c. 2 - O)' . This problem h a
been discussed in the survey context in chapter 2: where (Y and F played the rules
of and 9 respectively.
We wili show later how to use the empirical likelihood method to solve the above
problems. The basic idea is to maxirnize an empirical likeiihood subject to con-
straints provided by (1.2) to get the maximum empirical likelihood ratio estimators
(MELR) for O and F. In section 3.2? we give MELR estimators for û and F, and es-
tablish asymptotic normality. Section 3.3 shows how the MELR method can be used
in models where parameters are subject to constraints. Hypothesis testing problems
will also be considered. Finaily, section 3.4 presents examples to demonstrate how
these met hods can b e used in practice. Proofs are delegated to section 3.5. .
CHAPTER 3. EL AND GEE
3.2 MELR Estimators and Their Properties
3.2.1 MELR estimators
The empirical likeli hood funct ion given by
where Phi = P f ( x h = x h i ) Only those F distributions wit h Fhqs which have an atom
of probability on each xhi have nonzero likelihood. We kknow (2.1) is maximized by
the empirical distribution function F J x ) = Eh Wh Fnh(x) by noting (1.1). where
I n = x h n h , F,,(+) = < C i l ( z h i < x). l ( x h i < X ) = ( l ( x h i V l < x ~ ) . . . . . 1(zhiVd <
td))l. X h i j and + j are j-th component of vector z h , and x respectively. l ( x h i , , < xj)
is t he indicator of set (xh iV j < xj ) The empirical likelihood ratio is then defined as
R ( F ) = L ( F ) / L ( F , ) . which reduces to
R ( F ) = II~==,I 'I :~lnhph, . (2.2)
We remark that formulas here and elsewhere in this chapter do not require that
the xhi's be distinct. Since we are interested in estimating the parameter O . and we
know the estimating equation (1 .2) ' we define the empirical likelihood ratio function
where ghi (B) = g ( x h i ? O ) for al1 h and i. As discussed in chapter 2, for any given 0
in a neighbourhood of true 0, a unique value for the right side of ( 2 . 3 ) exists with
probability tending to 1 as al1 nh -t CO. The maximum can be found via Lagrange
multiplier met hod. Let
CHAPTER 3. EL AND GEE
where t is a r-dimension vector, Then
with the restriction from = O? i.e.,
WhGh (0) = 0. (2.6) h
from which (see section :3.2.2) t can be determined in terms of 8. Note that t = t ( d )
is actually a continuous differentiable vector valued function of 8. Therefore. the
empirical log-likeli hood ratio statistic is given by
h,
with t being defined by (2.5). We minimize fE(B) to obtain an estimate 8 of the
paraneter B (called MELR ). In addition. this yields estirnates P h i ' s frorn (2.4). and
an estimate for distribution F as
h 1
Sornetime in order to indicate lE(6) and ~ ~ ( 8 ) also depend on t ( O ) , 1, lE(O. t ( O ) ) and
CHAPTER 3. EL A N D GEE
3.2.2 Numerical method for given 9
We first show how to solve (2.4) (2.5) and (2.6) to get phi's for given 6' where
O 5 phi 5 1. Here irnplicitly phi's are functions of 0. Suppose al1 Ehg(xhi 8) = ~vIh(8)
are also given, where Eh indicates the expectation is taken with respect to Fh( we
note here M h ( 0 ) may include other characteristics of xh which we are not interested
) Here, of course, ~ b . ( ~ ( e ) ' s must satisfy Ch W , f b f h ( 8 ) = O , then using Lagrange
multiplier met hod we find t hat Eh xi log phi attains the maximum
with $h satisfying
where implicit
Hence. we can Say that t,bh is a function of M h ( 0 ) and 8 and we want to maximize
subject to C h iwhklh(h(8) = 0. Using Lagrange method again. k t
CHAPTER 3. EL AND GEE
letting nh?,bh - nbvht = O, we get $h = mht, which implies that
1 Phi = nh[ l + mhtT(ghi(d) - lMh(e))]' 0 <phi < 1 1
or we can say that iWh(0) is a function of t , Say !Gh(t). and t should be chosen such
t hat
In conclusion. for any given 6. we can select a value of t , solving (2.9) for each h =
1,. . . , H under the conditions of phi 2 O to get Lfh(t). then check whet her iUh(t)'s
satisfy equation (2.11). Another way is combining (2.9) and (2.11) as a system of
equations, usiog Newton's method tc solve these equations for appropriatel- selected
initial values. Usually (2.10) will be satisfied.
Using the same method as in Chapter 2. we can prove the existence of solutions
to ('2.9), (2.10) and (2.1 1). We also know that iClh(9)'s are continuous differentiable
functions of O.
3.2.3 Asymptotic Results CY
The following lemmas tvill be used to prove that at some point B. l E ( B ) attains .y
its minimum and to establish asymptotic properties of 0. We use 11 II to denote
Euclidean norm.
Lemrna 1 Let k; be i.i.d. random d-dimension variables and define 2, =
CHAPTER 3. EL .4ND GEE
Lemma 2 Suppose a s n + CU, n / n h -, kh > O for al1 h. And suppose
that in a neighborhood o f the true value 00, & [ ( g ( q l 00) - Ehg(xh? &)) (g(xh? 00) -
Ehg(zh, = oh(Oo) > 01 > O for ail h. where al is positive definite. I I g ( x , 6 ) I l 3 is bounded by some integrable function G(x) in this neighbourhood. Tben for any
given 6 in this neighbourhood.
Lemma 3 (Jennrich. 1969) Let g be a function on X x 0, rvhere .Y is a
Euclidean space and O is a compact subset of a Euclidean space. Let g ( x . 6 ) be
a continuous function o f 8 for each r2: and a measurable function o f x for each 6 .
Assume also that Ig(x,6)1 4 h ( x ) for al1 x and B. where h is integrable with respect
to a probability distribution function F on x. I f s l . x?. is a randorn sample
from F then for almost every sequence (x,)
uniformfy for al1 0 in O.
Lemma 4 In addition to the conditions of Lemma 2, rr7e further assume that
d g ( x , B)/d0 is continuous in a neighborhood of the true value Bol II a g ( x . 0)laQ I I is
bounded by some integrable function G(x) in this neighbourhood. and the rank o f
Ch k f i E h [ a g ( x h ? eo)/ae] is p. Then, a s n -, CO. tvith probability 1. l E ( B ) at tains its
CHAPTER 3. EL ,4ND GEE 63
Y Y
minimum value at some point 8 in the interior of the b d II 6 - O0 115 n- i . and 0 A I - -
and t = t ( O ) satisfy h, h, Ly h,
&ln(& t ) = 0, & d e . 1 ) = 0, (2.L3)
wbere
wit 6
and
Theorem 1 In addition to the conditions o f Lemma 4 above, we further assume
325(r,4 that is continuous in 8 in a oeighbourhood of the true value $0 and I I 11 is bounded by some integrable function G ( x ) in that neighbourhood. Then
Y n, Y
1 where F, ( x ) = Ch kvh x i P h i l ( x h i < s) . P h i = - 1 , , and V, CÏ. I/tr nh lf m h ; r ( 9 h ~ ( ~ ) - ~ h ( ~ 1)
are defined in the proof Section 3.5. Further. 8 and ? are asymptotically indepen-
dent .
Theorem 2 In the semi-parametric mode1 ( 1 4 , for testing Ho : 8 = Bo the
ernpirical Iikelihood ratio test statistic
CHAPTER 3. EL AND GEE
is asymptotically under Ho, assuming the conditions of theorem 1.
Corollary 1 Assume r > p. In order to test the mode1 (1.2). Le..
we c m use the empirical likefihood ratio statistic
Under the conditions of theorem 2, Ri is asymptoticaffy y?-, i f (1.2) is correct.
3.3 MELR Estimators and Testing with Con-
straints
In this section we extend the empirical likelihood met hods to deal wit h case in which
there are constraints on parameters. Suppose there is a q-dimensional constraint on
9, i.e.,
r ( 6 ) = O. (3.1)
where r ( 6 ) is a q x 1 ( q 5 p) and the q x p rnatrix R(0) = 6 is of full rank q. To
minirnize lE(0) defined by (2.7) subject to r(0) = O, we consider
where v is a q x L vector of Lagrange multipliers. Differentiating G2 with respect to
9 and v, we have
CHAPTER 3. EL AND GEE 6 5
Consequently, to rninimize l E ( B ) subject to r ( 0 ) = O, we need the solution of
w here
Q3n(d, t , V ) = r(6')? - ' - Y
wherejh(B) is defined by (2.13). We denote the solution of (3 .3 ) as (O,. t , , ~ , ) . To
discuss the asymptotic behavior of these estirnators and the test statistics based on
them, we suppose that r (&) = O, where Bo is the true value of 8. First we give two
lemmas which show that the solution to (3.3) does exist with probability tending 1
a s n + c u .
We rieed the following lemma from Aitchison and Silvey (19%) in the proof of
Lemma 6:
Lemma 5 If w ( X ) is a continuous function rnapping RS inio itself with the
property that, for every X such that II h II= 1 . Xr+(h) < O . then lhere exists a point
,\ such that II II< 1 and zb(i) = 0.
Lemma 6 Suppose that g ( x , 0 ) is continuous and differentiable in the neigh-
bourhood o f Bo and E II g(x, 8 ) Il3< m. Furthermore. assume that oh(ûO) > O for all
h and that Eh W h E h { w } i sof fu l l r a d . Then in thesphere {O :II O - & 115 d,).
the equation Q l n ( B , t ) = O almost surely has roots t = t ( 8 ) = O(d,). and t ( 6 ) is
' 5 continuous and differentiable when 0 belongs to this sphere, where d, = n-7- . + & > O . 6
CHAPTER 3. EL AND GEE 66
Lemma 7 Assume that the conditions of Lemma 3.1 hold; that in a neighbour-
hood of Bo, r (6 ) is a continuous differentiable function; that the q x p rnatrix R(0)
is of rank q; and that
d2r(0) aZg(r, 0) dOdBT' dB60T
exist and are bounded by some constant and some in tegrable function respectively
Then the equations (3.3) almost surely have solutions in ifd, = { ( O . t' V ) :II 9 - 00 II
+ II t II + II v Il 5 d, } as n goes t O infini ty, and any sol u t ion of (3.3) in Lid, minimizes
lE(0) subject to the condition r(0) = 0. ry
Theorem 3 below establishes the asymptotic normality of 8,.
Theorem 3 Under conditions of Lemma 3.2. we have
where P, H are defined in Section 3.5.
Now we turn to the problem about how to test Ho : r ( 0 ) = O. There are three
popular tests based on the likelihood. They are: likelihood-ratio test. Lagrange-
multiplier test and Wald test. Of course here they should be based on the empirical
likelihood. These stat istics are defined respect ively as
rr
ELR = 21E(&), L A = n G: HZ' G r , W A = n r ' ( ê ) ~ ~ r ( e ) . 6 r
where 6 is a solution of the estirnating equation Ch CVhnhL Ci ghi(B) = 0, and He
defined in (5.%7), is t.he asymptotic covariance rnatrix of fi y,.
The following t heorem gives the asymptotic behavior and relat ionship between
the test statistics defined in (3.5).
CHAPTER 3. EL AND GEE 67
Theorem 4 Under the assumptions of Theorem 3.3, the three test statistics in
(3.5) are asymp to t ically eq uivalen t , and each of t hem is asymp tot icaily dis t ri b u t ed
as uoder Ho.
3.4 Examples
We consider several illustrations of the estimation procedures. We primarily con-
sider large-sample aspects, but computational issues are also discussed by giving the
equations and the method to solve them.
Example 1 (continued) We apply the approach in Section 2 to solve the problem
displayed in Section 1. Equations (2.9), (2.1 1) and (2.16) lead to
The third equation implies t l = -mf(B)t2 and by ssubstituting this into t he
first and the second we get 2 H + 2 equations in t î 7 0, u , , . uw. cl. . . . U H to solve. We use Fortran NAG library function to solve them, for initial value
t l = O. 0 = Eh W h ~ h , uh = O, vh = O, h = 1, .. , H . Usually the solutions should
satisfy (2.10).
CHAPTER 3. EL A N D GEE 68
The results of Theorem 1 show that fi(; -O0 ) -+ N(0, i f ) , where V is given in ry
Theorem 1, or 8 -do = N ( 0 , +V). Since
Also. since
Wh A = a22 - 2rn1(0)a12 + a l l (m1(8) )2 = Var mf(0)zst - &) > o.
Y
Thus 4 Var&) , which is the variance of ISt - do. Therefore. 0 is asyrnptotically
a t least as efficient as Z s t .
Example 2 (continued) [Two-sample problem with common mean 1. In this case
observations (xhi7 IJh i ) i = 1, . nh O C C U ~ in independent pairs and Ch Wh E x h =
CHAPTER 3. EL AND GEE 69
Ch WhEyh = 8. TO estimate 0. we consider the estimating equations based on
gl = x - O, g2 = y - t9 and we associate the empiricai likelihood probability phi with
(zhit y h i ) After some simplification, from equations (2.9), (2.11) and (2.16): we have
The initial values may be set to
Here
T herefore
CHAPTER 3. EL AND GEE
In particular, note that in the case where
which is the same as the variance of the optimal linear combination estimator
3.5 Proofs
Proof of Lemma 1
Following Owen(19SS). we get M û ~ ~ < ~ < ~ l & ~ ~ ~ l - - = o ( n i ) (a-S.). hence rve have z , =
Proof of Lemma 2
By using the decomposition
and noting ( 2 . 5 ) . we have
CHAPTER 3. EL AND GEE 71
where gh ( O ) = Ci ghi(0) /nh Since Whjh ( O ) = O. we have
0 = W h g h ( 6 ) - n bv:nh2 x (ghi(e) - gh(e))(~hi(e) - Gh(e))I h h ; 1 + mhtT(e)(ghi(0) - Gh(6)) W )
From (5.1) and (5.3) we get
by the strong Iaw of large number. Hence
CHAPTER 3. EL AND GEE
Therefore, we have
Since ah(e) > 01, hence S (0) > Ch W;khol > 0, therefore
From (5.3) we have
- where = maXi II ghi (B) 11. By the result of Lemma 1. we have uhSnh - 1
o ( n i ) ( a . ~ . ) . Hence
Therefore, froni (5.7) and (5-8) y we get
CHAPTER 3. EL AiVD GEE 73
where X = Ch kh Wto( )/2 > O and q1 is the smallest eigenvalue of CI. Hence
From (-5.3) we have
and
T herefore,
Proof of lemma 4
We follow Qin and Lawless( 1994) in proving this Lernrna. Denote 0 = Bo + un-: for 6 E {BI 11 8 - Bo I I = n- i } , where II u I I = 1. First we give a lower bound
for l E ( 0 ) on the surface of the ball. From Lemmas 1, 2 and 3 we know that when
II 0 - 6 , 115 n - f .
By this and Taylor's expansion, we have (uniformly in u )
CHAPTER 3. EL AND GEE CI / a
w here
by the law of i terat ive logari t hm. r > O, c - E > O and c is the srnallest eigenvalue of
S imilarly,
Since lE(e) is a continuous function of B belonging to the ball II O - O0 11s n-!. h
l E ( B ) has minimum value in t he interior of this ball. and 8 satisfies (noting ( 2 . 5 ) and
(2.6))
Proof of Theorem 1
Taking derivatives with respect to B r 7 t r , we have:
CHAPTER 3. EL AND GEE 76
N Y
where 6, =Il8 -00 II + ( 1 t 11. Noting (2.14) (%.15), (5.10), (5.11). we have
The system of equations (5.14) and (51.5) rnay be written in matrix forrn as
CHAPTER 3. EL A N D GEE
it follows that
where hl2* = !MT2 and Mzz = O. Thereiore,
From this result and
ive get 6, = O&-f ).
Since
where iÇ122.1 = - &&Jdfi1 iÇflZ, we have
CHAPTER 3. EL ,4ND GEE
w here
and
where
CHAPTER 3. EL AND GEE
it follows that 8 and 'i are asyrnptotically independent.
Now we turn to the third result. Since
noting (5.20) we have
where
and B, 3 B. Hence
CHAPTER 3. EL AND GEE 80
where
Proof of Theorern 2 ry hi
Noting (5.20) and t hat t ,O are asymptotically independent, we have
Similarly. we get
Hence
CHAPTER 3. EL AND GEE
1 1 1
Now noting that J n I C ~ ~ ' i j S t ( B o ) -+ N ( 0 , I ) and -iV11~'~~112iiM;2f112.f21~M1~T is sym-
metric and idempotent with trace equal to p, it follows that R converges to y;.
Proof of Corollary 1
From the proof of theorem 2.2, we have
For fixed 6 such that II 8 - do II < d,, consider t h e function
almost surely continuous function for II X II < 1. When II A II = 1. we have
CHA4PTER 3. EL AND GEE
where c is the smallest eigenvalue of Ch khCVi~h(BO). By Lemma .5 there exists a
point A such that II II < 1 and $(i) = O. C'sing the irnplicit-function theorern. we
can easily get ot her theorem's results. O
When 1) 0 - 00 115 d,, from
we have
t hen
CHAPTER 3. EL AND GEE 83
= ~ ( n - f log log n ) + ( D ' ( O ~ ) S - ' ( O ~ ) ~ ( 0 ~ ) + ~ ( n - i log log n)) ( O - B o )
+w 0 - 00 ID2 = D ' ( ~ ~ ) S - ' ( O ~ ) D ( B o ) ( O - B o ) + o ( I I 0 - Bo 1 1 ) . U . S .
That is, CC',(O) can be represented as
where V ( x , O ) is almost surely continuous function of 0 and V ( x , 8) = o(ll O - O0 11
) as.. Similar to the arguments in Aitchison and Silvey (1958 ). we can show that - ry
the system of equations Q2,(0, t ( O ) , Y ) = O and Q3n(B) = O have a solution 8. v such - N
that O , vE { ( 9 . 4 : I l 6 - Bo I I + II III d,}. 'Y
Norv we prove the last part of Lemma 7. Suppose that e is a solution in Fd,. B
is a point in Ud, and satisfies r ( 0 ) = O. Then we have
Note t hat
- LII n.
and Q ~ , ( o : t (B) , G) = O , so that t r ( ; )2 , ' (6 ) = - Cr R ( B ) frorn (3.2). Since the q x p
matrix R(Bo) is of rank q? ive have
CHAPTER .3. EL AND GEE
where c is the smallest eigenvalue of D(Bo)S; ( B o ) DT (Bo). Hence
Proof of theorem 3
Taking derivatives about B r , t r . uT' we have
CHAPTER 3. EL AND GEE 85
where 6, =11zr -0, II + 11Yr11 + 116r11, Dn(Bo) = Ch 2 xi -. The systern of
equations (5.21) (5.22) and (5.23) written in rnatrix form is
where
we know that 6, = 0, (n- i ) . Since
CHAPTER 3. EL AND CEE
where R = R(Bo). D = D(B0) , V = D':CI,'D. we have
where
H = ( R V - ~ R ~ ) - ~ , Q = HRV-I , P = V - I ( I - WQ). (5.271
Hence using the central limit theorem for ( O o ) and Slutsky's t heorem. we can
CHAPTER 3. EL AND CEE
We have
Therefore. noting (5.30),
Since
we know, noting (5 .28) ,
Hence
Chapter 4
Pseudo Empirical Likelihood and
General Estimat ing Equat ions
Using Stratified Sampling
4.1 Introduction
kVe applied empirical likelihood(EL) to stratified simple random sarnpling and estab-
lished asymptotic properties of EL estimators and likelihood ratio tests in chapters
2 and 3. The purpose of this chapter is to use empirical likelihood and estimating
equations in more complex sampling situations. Sample surveys often use a com-
binat ion of several of the following sampling met hods: s trat ified sampling, cluster
sampling, unequal probability sampling and multi-stage sampling. In this chapter.
ive will consider stratified sampling again, but samples taken from each straturn can
be complex.
CHAPTER 4. PSE UDO EL AND GEE 91
Section 4.2 contains the defini tion of pseudo empirical likelihood, ratio stat istic.
estimation equations and the maximum pseudo empirical li keli hood ratio ( M P ELR)
estimator. The existence of these estirnators and the asymptotic properties of these
estimators will be established in Section 6.3. An illustration example will be given
in Section 6.4. Proofs are delegated to Section 6.5.
4.2 MPELR Estimators
Let {P,, v = 1,2, . - } be a sequence of finite populations. Throughout the paper v is
used as the index of the finite population, but it may be omitted frequently for sim-
plicity. Each Pu contains H strata and the hth stratum contains Nh units. Associated
with the ith unit of hth straturn is the characteristic x hi. h = 1. , H. i = 1. . . . . ivh. where xhi is d-dimensional. Note that H, Nh, xhi etc. depend on v also but the
subscript u is omitted. Suppose P, with d-dimension characteristic x, has the un-
known distribution Fu and a pdimension parameter O associated with it. and the
hth stratum of P, has d-dimension characteristic r,h which has the unknown distri-
bution Fuh. We will omit v also from Fu. F u h . x v h in the following discussion when
appropriate. We assume that the weight of the hth stratum of P,. Wh' is known for
every h.
A sarnpie, sh. with values z h l , . - a . sh,, irom stratum h, is taken according to
some sarnpling scheme and samples are independent across strata. From the context
we know that
Suppose x h l , . x ~ N , , , units ia hth stratum of P,, is a random sample from a super-
CHAPTER 4. PSEUDO EL AND GEE 92
population, say Fjh. If the entire population P, were available, the corresponding
likelihood function would be
where phi = F:h(+hi) - F$( th i ) . The log-likelihood function then is
1 ( F ) = log phi-
The oonparametric maximum likelihood estimator of F: is
nent of vector 2hi and z respectively. 1 ( x h i j < X j ) is the indicator of set ( x h i V j < Xj). H N = lvh. But we o d y have a sample, Say. s, from the finite population and
if we view (2.3) as a finite population total. then we may have available a design
unbiased estimator of l ( F), namely,
al1 h , and E refers to expectation with respect to the sampling design. Obviously
we have E ( ~ ( F ) ) = l ( F ) ; We will cal1 (2.4) the "pseudo empirical likelihoody . Chen
and Sitter(1996) defined pseudo empirical likelihood and we use a similar definition
here. Some examples about how to construct weight d h i ( s ) are given in Section 4.3.
Since Ciphi = 1, we know (2.4) is maximized by F,(x) = Eh W h F n h ( x ) where
n = nh, F,,(X) = C i p S h ~ h i ( ~ ) l ( x h i < x), a h i ( ~ ) = d h i ( s ) / x i E S h d h i ( ~ ) - The
CHAPTER 4. PSEUDO EL AND GEE
log-empirical likelihood ratio is t hen defined as
We also assume that information about O and F is available in the forrn of
r 2 p functionally independent unbiased estimation functions. that is functions
g j ( t , O), j = 1,. . . , r such that E F { g j ( ~ , O ) } = O . In vector form. that is
where g ( x , 6) satisfies
Hence, we define the (negative) log-pseudo ernpirical likeli hood ratio funct ion for 0
where ghi(B) = g ( x h i , 19) for al1 h and i. The maximum may be found via Lagrange
multiplier method for any given B. Let
where t is a r-dimension vector. Then
CHAPTER 4. PSEUDO EL AND GEE
with the restriction from = O, Le.,
Therefore,
with t being the solution to (2.10). We may minimize I E ( @ ) to obtain an estimator Y
0 of the parameter O (called MPELR estimator). In addition. this yields estimators
PhiYs from (2.4). and therefore an estirnator for distribution function F as
Note that if there is no auxiliary information, i.e., g(x ,B) = O, this approach
yields fihi = d h i ( s ) / xi,=,, d h i ( s ) Then the estimator of .IN = C h W;-Fivh will be
Ch xi fihixhi = Eh W h ( x i d h i ( s ) l h i ) / x i dh i (û ) . a separate ratio estimator.
The existence of solutions to (2.8)-(2.10) and ni11 be discussed in the following
section.
CHAPTER 4. PSEUDO EL
4.3 Asymptotic
AND GEE
Result s
Some conditions are needed to study the existence of solutions and asyrnptotic prop-
erties of MPELR estimators. In the following discussion, we always assume that
n + m7 n/nh -t kh > O and
as v + W. The assumption (3.1) means that no survey weight is disproportionally
large. We will limit ourself to stratified SRS or unequal probabiiity sampling with
replacement. It is of interest to see what (3.1) reduces to in some simple special cases.
if the sampling design is stratified SRS sampling, then d h i ( s ) = bvh/h/nh = iVh/(fVnh ).
and ( 3 . 1 ) is the same as
lVh -- I - O( - ) for each h. Nntl n
which will be satisfied since n l n h + 4 > O. If the sampling design is a stratified
unequal probability sampling design. we suppose that the sample is taken with
replacement with selection probability ah;: where a h i are known measure of the i-th
unit in straturn h, and the characteristic of interest is positive correiated with ah;.
In this case. dhi(s) = wh!(!L;inhahi), and ( 3 . 1 ) reduces to
W h 1 1 Maxidhi ( s ) = -Maxi - = O ( - ) for each h
nh N h a h i n
CHAPTER 4. PSEUDO EL AND GEE
Hence in this case (3.1) can be reduced to
We will also suppose some other conditions about survey weights. In section 4.2
we required
for al1 h and phi , we will have
Under assurnption (3.1), we have
hence, noting t hat E ( z i E s h dhi(5)) = Wh.
Lemma 1 shows that the solutions to (2.7)-(2.9) exist for given 0.
Lernma 1 Suppose t6at g ( x , O ) is continuous and differentiable in the neigh-
bourhood o f ûo and II g ( x , B ) I l 3 is bounded by some integrable function G(x) in
this neighbourhood. Furthermore, assume that V a r ( r h ) = uh(0) > QO > O for al1 h
and Y in this neighbourhood. Then in the sphere { O :)1 8 - Bo 11 5 d,} . the equation
CHAPTER 4. PSEUDO EL AND GEE 97
QI& t ) = O almost surelv has roots t = t ( 6 ) = O(d,), and t ( 9 ) is continuous and
differentiable wheo 0 belongs to this sphere, where d, = n- > 6 > 0.
From Lernma 1 we know under sorne conditions for any O in the neighbourhood
of Bo, we will have t = t ( 0 ) which is the solution to (2.8)-(2.10), and this solution
also guarantees phi > O. We also know t ( 6 ) = o ( d ) a.s.
We need some results from Shao (1994) for the Eollowing discussion. We sum-
marize them as a Lemrna.
Lemma 2 Under conditions o f Lemma 3.1. for any &en O satisfying II B - Bo 11 5
d,, we have
where &.w(8) = Ch Lh I idhi(s)ghi(0) . bv(0) = C'a+(g,t,,(B))-
Some special examples of Lemma 2 were discussed by a few authors. P.K. Sen
discussed multi-stage probability proportional to size(PPS) sampling and its asymp-
totic norrnality(Chapter 12, Handbook of statistics 6, edited by Krishnaiah and
Rao). The asymptotic results for two-stage and three-stage cluster sampling were
given by Rao and Scott(1979, 1981).
Lemma 3 Linder conditions of Lemma 3.1, for any given 0 satisfying 1) 0 - Bo II 5
d,, we have
CHAPTER 4. PSEUDO EL AND GEE
rvh ere
I
t ( 0 ) is solution to (2.8)-(2.10).
Lemma 4 In addition to the conditions of Lemma 1. rve further assume that
dg(x, 6 ) / 8 0 is continuous in a neigbborhood of the true value Bo; that 1 1 ag(r. O ) / % II
is bounded by some integrable function G(x) in this neighbourhood: and that the
rank o f x h w ~ E ~ [ ~ ~ ( . z ~ , B ~ ) / ~ ~ ] is p. ~e dso assume S,(B) +, ~(6). Then. as h
Y -+ 00: with probability 1 , i E ( 6 ) attains its minimum value at some point 0 in the h c y h
interior of the bal1 11 0 - Bo 115 n - i . and e and t=t ( 0 ) satisf)
where
Theorem 1 Zn addition to the conditions of Lemma 4 ahove, we further assume
a2g(r O that is continuous in O in a neighbourhood o f the true value 90, II - II is
bounded by some integrable function G ( x ) in the neighbourhood. Then
CHAPTER 4. PSEUDO EL AND GEE
Theorem 2 Suppose nu,(Oo) = nVar(g,,,,(B0)) -+ G' as u 4 iu, and the
conditions of Theorem 2.1, then in the serni-parametric model (2.6). for testing
& : 9 = O0 the empirical likelihood ratio test statistic
is asymptoticalb distributed as xf=, AiZ; under Ho, where ZI.. - , 2, are indepen-
dent N(0, l) ran dom variables, and A, ?s are eigen values of
CHAPTER 4. PSEUDO EL AND GEE 100
Since the asymptotic distribution of R involves unknown parameters Xi'so we
must have their estirnators before any practical testing. Suppose Y is a consistent es- - 1
timator of V as discussed previously, then frorn -Y i i \ f ~ $ ~ M ~ , ~ ~ ! P I ~ ~ ~ J ~ I ~ , ~ ~ ~ ~ I ~ ~ ~ V T
we can have consistent estirnators for Ai 's , say A i , then ive can use either exact
methods or simulation to calculate the required percentile points and hence do the
testing of &. Another way is to use a modified R, denoted by R,:
w here
FVe treat Rc approximately distributed as under Ho. Rao and Scott(l994) con-
sidered such modifications in the content of analysis of categorical survey data.
Corollary 1 Assume r > p. In order to test the mode1 (2.6). ive can use the
empirical likelihood ratio st at istic
under the conditions of the Theorem 2, Ri is asyrnptoticaily distribu ted a s
i f (2.6) is correct, where Z1, . . , Zr-, are in dependen t iV(O.1) random variables. and
Ai '.s are eigenvalues of ~f c'vS.
We can use the sarne method to rnodify R1 as for R in the above theorem.
CHAPTER 4. PSEUDO EL AND GEE
We consider an illustration of the estimation procedures. We primarily consider large
sample aspects, but computational issues are discussed by giving the equations and
method to solve them.
Suppose that our sampling scheme is stratified sampling, sample in stratum
h is taken according to some method. such as unequal probability sampling with
replacement. Samples are independent across strata. We denote xhl: - - . xhnh as
the sample frorn stratum h, h = 1, - - . H. kVe aiso suppose that xhi = (--hi, y h i ) .
where z is auxiliary variabIe and y is the characteristic variable of interest. that the
population mean of zhi1s1 2, is known. we want to est imate 0 = I. the population
mean of ZJhi7s. If we denote
then t h e estimating equation ( 2 . 6 ) can be written as
agha'(e) = ( O , -1)" t = ( t , , t 2 y . 88
from (3 .8 ) we have t 2 = O . Then (3.6) becomes -
CHAPTER 4. PSEUDO EL AND GEE
where
We have 4 = Ch Wh xi Pli iyhi- AS discusçed in chapter 2? solving (4.3) is equi valent
to solving the following equations:
For appropriately selected initial values, e-g.?
we can solve (4.5) and (4.6) by using NAG Fortran library function to get a solution
Hence we can have and F ~ ( ~ J ) .
Since ~ 1 . 1 ~ 2 = (0. l)', = (0,1)? M2'(22.1 = -(O. l)iCI~l(O,l)'. and replacing :bhi
by its estirnator
CHAPTER 4. PSEUDO EL AND GEE
For stratified simple random sampling, this approximate estimator will be
which is identical to the estirnator obtained by Chen and Sitter(1996). Note that.
4' is not equal to the approximate EL estimator in Chapter 2, The latter estimator
is asymptoticallÿ equal to the optimal regression estimator.
We will use the above example to show that the asymptotic distribution of pseudo
empirical likelihood ratio test statistic is not generally a y2-distribution here. Sup-
pose the sampling scheme is stratified SRS, and w h / n h = c/n for al1 h, that is.
proport ional allocation, t hen
Denote
t hen
hence
4.5 Proofs
Proof of Lemma 1.
CHAPTER 4. PSEUDO EL AND GEE
For fixed 6 such that II 0 - Bo II< d,, consider the function
From E 1 1 g ( x , 6 ) Il3< m we can get mazhmüxi II g(xhi ,O) II= o(n5) a.s. for both
stratified SRS or PPS with replacement case, and $ ( A ) is an almost surely continuous
function for II X I j 5 1. When ( 1 ,\ I I= 1, we have
where c is t he smallest eigenvalue of 00. By the Lemma above there exists a point
X such that II ,i I I< 1 and $ ( A ) = O. Using the implicit-function theorem. we can
easily get other Lemma's results.
Proof of Lemma 3
By using the decomposition
CHAPTER 4. PSEUDO EL A N D GEE
and noting (2.9) ive have
From (5.1) and (5.3) we get
Hence
Therefore, we know
From QI,(& t ( 9 ) ) = 0: we have
Proof of Lemma 4
Denote 9 = Bo + un-f for 8 E {e l 1 1 O - ûo II= n- f} , where II u II= 1. First we
give a lower bound for l E ( B ) on the surface of the bail. From Lemma 1. Lemma 3
and Lemma 3 of chapter 3 we know when II 0 - Bo 115 n-f. we have
uniforrnly about û E { O II 0 - 00 il 5 d}.
By this and Taylor's expansion, we have (uniformly for u )
CHAPTER 4. PSEUDO EL AND GEE
where c - E > O and c is the smallest eigenvalue of
Since f E ( B ) is a continuous function about 0 as 8 belongs to the bal1 11 8 - Bo /I 5 n-i. h
f E ( 8 ) has minimum value in the interior of this bâll. and e satisfies (noting (2.8) and
(2.10))
CHAPTER 4. PSEUDO EL AND GEE
Proof of Theorern 1
Taking derivatives about 0'; t T , we have
hr hi
where 6, =Il0 -Bo II + l i t 11. Hence we have
The system equations of (5.10) (5.11 ) written in matrix form is
CHAPTER 4. PSEUDO EL AND GEE 109
where iCI, is the matrix of (5.13), hlnvl2 = - ai'~$eo), KV2, = M;,,,. Since
hence
M,z 3 lkl =
where i1hl = M;,, Mz2 = O. Therefore
From this and g,t,w(&) = Ch b f i Ci "thi(~)ghi(S0) = O&-+), we know that 6, =
we have
CHAPTER 4. PSEUDO EL AND GEE
Hence
where
Now we turn to the second conclusion. Since
not ing (5.17) we get
CHAPTER 4. PSEUDO EL AND GEE
w here
Hence,
We note here since ibfn,112 1b1,,21 are consistent estimators of Mil! :G1120
iCf21 respectively. if we have a consistent estimator for op(&) = Var(g,, , ,(O0)), Say.
û(Bo), t hen
N
is a consistent estirnator of I/,(B), where ~$122.1 = -~ l .1 , ,2~ Ml: , iWn,12. There are some
methods in literature on how to estimate o,(Oo), such as the jackknife estimator.
Since B can be estimated by B,, li by Un, hence from an estimator of ~ ~ ( 8 ~ ) . ive ry
can also get an estimator of Vu(F) .
Proof of Theorem 2
Noting (5.17) we have
CHAPTER 4. PSEUDO EL AND GEE
Similarly Ive can get
Hence
Noting that
and
CHAPTER 4. PSEliDO EL AND GEE 11:3
is symmetric and nonnegative with rank p, hence we have the theorem's results.
Proof of Corollary
From the process of proof of theorern 2.2. we have
and the results follows.
Chapter 5
M-estimation in the Presence of
Auxiliary Informat ion Using
Strat ified Random Sampling
5.1 Introduction
Huber ( 1964) introduced a flexible class of estimators? called 'M-estimators'. ivhich
are generalizations of the usual maximum likelihood estimators and have become
a very useful tool in statistical inference. M-estimation plays an important role
in robust parameter inference and in non-parametric inference. In this paper. we
consider a new class of M-estimators under the following model. Let the population
distribution be F ( - ) , and the population be stratified into H strata with known
weight Wh of each straturn, t h i , - - , xh,, be iid sample from stratum h, h = 1: . H.
IVe assume t hat ive have some auxiliary information about the distribution function
C'HA PTER 5. M-ESTIMATION AND EL
F in the sense that there exist r ( 2 1) functioos g l ( x ) , - . - .g,(x) such that
where g ( x ) = (gi(r), - - - ,g,(s))l, Eh denotes the expectation with respect to the
distribution function of stratum h. Here g(x) does not contain any unknown pa-
rameters, hence is different from the questions discussed in chapter 3.
Our approach is first to apply empirical likelihood to provide an aiternative
estimator for the distribution function F ( . ) by ut ilizing auxiliary information ( 1.1 ).
Based on the estimated distribution function which puts unequal weight at each
Xhi , we propose a new chss of M-estimators, and establish some asymptotic results.
One advantage of the new M-estimators over those based on the usual empirical
distribution function is that they have smaller asyrnptotic variances. which enables
us to improve our stat istical inference t hrough West imation.
In section 5.2. we describe the profile empirical likelihood under ( 1.1) and some
asymptotic results. In section 5.3, ive propose a new class of M-estimators. It is
shown t hat the proposed M-estimators are consistent and asymptot ically normally
distributed. The proofs are delegated to section 5.4.
5.2 Profile empirical likelihood
Let xhl, - t h n h be iid sample from stratum h, and samples be independent across
strata. The empirical likelihood, as before, given by
CHAPTER 5. Ad-ESTIMATION . W D EL 116
where Phi = P r ( x h = z h i ) Under the condition that EFg(x) = 0. the profile
empirical likelihood is defined to be the maximum of L with respect to phi subject
to
L can be maximized. The maximum is reached when
where
and t is the solution to
The numerical met hod of calculating ihi is given in chapter 3 .
Now let
nent of vector Zhi and x respectively, l ( x h i j < X j ) is the indicator of set (rhiqj < Xj).
then F ~ ( + ) can be regarded as an alternative estirnator of the distribution function
of F ( x ) . Note t hat instead of the usual empirical distribution function giving weight
bVh/nh ât each observed point 3 h i bn(x) gives the weight CVhFhi at each x h i . On the
CHAPTER 5. M-ESTIMATION AND EL L 17
other hand, if there is no auxiliary information given in (1.1), L attains its maximum
at Phi = l/nh, and hence
the usual weighted ernpirical distribution function.
From chapter 3, we have the following results.
Lemma 1 Suppose that as n + oo, n/nh - kh > O for all h and Eh[g(xh) -
Ehg(xh)][g(~h) - Ehg(xh)lr = c h > 00 for a A h , ivhere 00 is positive definite. Eh
denotes the expectation with respect to the distribution of stratum h. Denote o =
Ch Ch Wtoh7 then
t = o-'gSt + o p ( n - f 1.
Furttiermore, if E ( 1 g(t) (1" m7 then
where gst = Eh Wh & xi g ( x h i ) Hence in both cases.
The variance of Fn(3) shows clearly that it is generally more efficient than the
empirical c.d.f. Fn(x).
CHAPTER 5. M-ESTIMATION AND EL
An M-functional 8 ( F ) associated with distribution function F is defined as a root
Bo of the equation
where x - F and x, Bo, $(x, Bo) E R. For a stratified sample X I I : . , X I , , ; : r H i 7
- ., z~~~ from F ( - ) , the usual estimator of $ is given by 0, = O(F,), where F,(.) is
the natural stratified ernpirical distribution function. i.e.? F&) = Eh 2 Cl 1(xhi <
2). O(Fn) is called the M-estimator corresponding to $. We propose a new class of
M-est imators by taking the auxiliary information (1.1) into account. Specificall-
we propose t o estimate Bo by O n , which is defined to be a root. uo, of the equation
We will also cal1 4, the M-estirnator corresponding to d?. Of course different choices
of ~ lead to different estimators. Xote that if t here is no auxiliary information ( 1.1).
then k(. j = F,(-), and hence 4 = B(F,), the usual M-estirnator. Though ive could
include d ~ ( x , O ) = O as an estirnating equation in the general estimating estimating
equations setup of Chapter 3. it may add some dificulty in getting an appropriate
solution in practice since ii>(x,B) includes the unknown parameter O. Instead. we
first use the empirical likelihood method to get the estimators fihi and then solve
(3.2) to get an M-estimator for O . We cal1 it ernpirical likelihood moment (ELM)
est imator.
Next we establish the asyrnptotic properties of the ELM-estimator 8,. Denote
CHAPTER 5. M-ESTIMATION AND EL 119
with t being the solution to (2.3) and
We first establish the consistency of the ELM-estirnator 8,. Given t hat the equation
X F ( 0 ) = O has a root Bo and the equation Q,(B) = O has a root On, we wish to show
that ê, + Bo in probability under appropriate conditions. In this paper. the norrn t
of a p x q rnatrix A = (ai j ) , , , is defined by II A II= (& & a:,) ' . p. q 2 1.
Theorern 1 establishes the asyrnptotic consistency of the ELM-estirnator 9,.
Theorem 1 Let Bo be an isolated root of XF(B) = O , i.e.. Bo is the on- root
in a neighbourhood of 9,. Suppose that E g ( x ) g T ( x ) is positive definite. and that
in a neighbourhood of 00, E [ l S ( x , O ) 11 is finite. Furthermore, assume that IL>(r, 8 ) is
continuous in 0 and monotone in 19, or $(z?O) is continuous in O and bounded. and
X F ( 0 ) changes sign uniquefy in a neighbourhood of Bo. Then the equation Q,(O) = O
bas a solution {d,) which converges to Bo in probability.
Next we establish the asyrnptotic norrnality of the ELM-estirnator 8,. We give
the following two theorerns.
Theorem 2 Let Bo be an isolated root of AF(B) = O , r /~(x,B) be monotone in
O. Suppose that AF(0 ) is differentiable at 0 = Bo, with X ; ( O o ) + O. Furthermore.
suppose that E&*(x, 0 ) and E[II g ( x ) 11 1+(x,O)1] are finite for 8 in a neighborhood
of go, and that E?,b2(r, O ) and E [ $ ( r , B ) g ( t ) ] are continuous at O = Bo. Then i f 8, is
any solution sequence o f the equation Qn(B) = 0 , Ive have
wit h
The next theorem does not require rnonotonicity of d ( x , O ) in B. However. we
need to have some srnoothness conditions on $(x, 0) and higher moment require-
rnents on g(x) .
Theorem 3 Let O0 be an isolated of X F ( B ) = O. Let d.l(xl 8)ldO be continuous
at 8 = Bo uniformly in r. Suppose that E [ g ( x ) g ( x ) '1 is positive definite.
is finite and nonzero, and E[II g ( x ) I l2 I+(x. 00)12] < co. Furthermore. suppose that in
a neighborhood o f do, [$'(x, 8) 1 < G ( x ) , where G(-) satisfies E [II g(x) 1 ) G(x)] < ,m.
Let {0,) be a consistent solution sequence of the equation Q,(B) = O . then
tvh ere
CHAPTER 5. LM-ESTIMATION AND EL
5.4 Proofs
Proof of Theorem 1
Let c > O be given. From Lemmas 1 and 2 of Chapter 3, we have
Using Lemma i . (4.1) and the strong law of large numbers. the last term in (-1.2)
has absolute value bounded by
Therefore. applying the weak !aw of large numbers gives
Without loss of generality, assume that d ( x , O ) is non-increasing in B. t hen E ( ~ ( x . 8 )
is a non-increasing function of O. Since 6 is an isolated root of XF(0) = 0. we have
CHAPTER 5. LW-ESTIMATION AND EL 123
E$(x, $ + E ) < E$(x, 80) < E+(x, 00 - c) for sufficiently small e. Thus (4.3) implies
t hat
Since Q n ( B ) is a continuous function of O for each n, it follows from (4.4) that there
^ P exists a sequence (On} such that Qn(Bn) = O and 0, + Bo.
For the case that $ ( x , 6 ) is continuous in 0 and is bounded, it follows by the
dominated convergence theorem that X F ( B ) is continuous in O, we can complete the
prool as with part one of the theorem.
Proof of Theorem 2
The consistency of b, to Bo is guaranteed by Theorern 1. Now we assume that
t,b(z,B) is non-increasing in 8, so that Q.(B) is non-increasing. Thus for every y.
where y, = O. + y o r l / f i . Thus, to prove the theorem, it suffices to show that for
every y.
where 9(-) is the standard normal distribution function. Now using Lemma 1. (4.1 )
and the assumptions given in the theorem, we can show that
w here
or equivalently,
where V, is t h e variance of rh Whnhl - Ezhi,*) . NOW
Hence
In order to prove (4.6), it therefore suffices to show that
or equi valent ly
where
Yhi,n = d'(xhi7 Y , ) - o L g ( x h i ) .
and V , is the variance of Ch CVhnhl Ci(yhi,n - Eyhi , , ) .
Since YhiVn - Eyhi tR7 1 5 i 5 n, are independent and identically distributed wit h
mean O . by appealing to the centrai lirnit theorem, it suffices to verify the condition
for every c > 0. or equivalently, for c
Hence it suffices to prove for any c > 0 ,
For every a > O, we have for each x and for n sufficiently large that
CHAPTER 5. M-ESTILWATION AND EL
where
The proof is complete.
Proof of Theorem 3
Since &,(O,) = O and Q,(B) is differentiable in 8. we have
CHAPTER 5. Ad-ESTIMATION AND EL
It can be shown that
hence,
Therefore, i t foI!ows from the central limit theorem and Slutsky's theorern that
where
Chapter 6
Empirical Likelihood Inference in
the Presence of Measurement
Error
6.1 Introduction
Suppose that we want to measure a characteristic of interest of a target population.
We have several different "instruments". one of them is "accurate". or it has smaHer
measurement error compared to the ot her ones. In pract ice we t reat t his instrument
as "perfect", or no rneasurement error. -411 other instruments will have larger mea-
surement error and we treat them as "irnperfect*'. Here we use "instrument" to refer
the method used to measure the characteristic. I t may include factors such as the
real instrument used, the personnel involved. working environment. etc. It may be
due to the cost. or lack of highly trained personnel for using the perfect instrument.
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR
We use imperfect instruments to measure some samples drawn from the population,
and use perfect instrument to measure other samples from the population. We com-
bine à11 the data, both imperfect and perfect, to make inference on the parameters
of the populat,ion. Of course, the inference should be more accurate compared to
using only the perfect measurement data.
Survey researchers have long been cognizant of the il1 effects of measurement
error on estimators of means and totais of finite populations. In general. compared
with perfect measurements. imperfect but unbiased measurernents increase the vari-
ance, but do not affect the bias of estimators of means and totals. However. the
sample cumulative distribution function(CDF) is no longer an unbiased or consis-
tent estimator of the population CDF even if the measurement error has mean zero.
Several authors discussed these problems. Luo, Stokes and Sager gave a good sum-
mary of history and current development in this area, though alrnost ali the papers
listed there only deal with imperfect sample without calibration sample. i-e.. per-
fect sample. The pa.per by Luo. etc.. may be the first to handle the data collected
with measurement error with a calibration sample. In their paper. they used the
simulation to compare the performance of several estimates of CDF. The estima-
tors t hey considered are: the difference, ratio, regression. aeighted average of the
empirical CDF's of the imperfect and perfect measurements, etc. Their proposed
estimators are linear combinations of the empirical CDF's from the imperfect and
perfect siimples. The weights are chosen so that the resulting estimator h a the
smallest possible variance subject to an unbiasedness constraint. These weights are
also estimated from the data. They did not give any theory about the proposed
estimators such as asymptotic variances. They also pointed out that the proposed
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 139
weighted linear estimators of CDF may not monotone.
In this chapter we propose to use the empirical likelihood method to malie in-
ference on parameters of interest by taking al1 the data into account, treating the
imperfect measurements as auxiliary information. The ernpirical likelihood estima-
tor of CDF is monotone, asy mptotically more efficient than the empirical CDF from
perfect sarnple only, and asyrnptotically normd distributed. Section 6.2 shows how
to use the empirical likelihood to solve the problems above. Section 6.3 discusses the
associated asymptotic results., while Section 6.4 discusses likelihood ratio testing.
Several examples will be given in section 6.5. Proofs are delegated to Section 6.6.
6.2 Empirical likelihood method in the presence
of measurement error
Suppose that there are H different instruments used in rneasuring a characteristic of
interest. The distribution function associated with instrument h ( h = 1. . . H) is
Fh(x) = P ( x h < a ) with unknown parameter B. where 8 is a pvector. and the H-th
measuring instrument is taken as perfect. Note that we have H different distribu-
tions due to possibly different measurement errors in the instruments. alt hough the
same characteristics, XH, are measured by al1 the instruments. These H different
distributions are assumed to be related by the common parameter O. We also as-
sume that information about B and Fh is available in the form of r h > p hnctionally
independent unbiased estimation functions, that is functions gh (xh, O ) such t bat
CHAPTER 6. EL IN THE PRESElVCE OF R/IEASClREhfENT ERROR 130
where gh (x, 0) is a rh-dimension vector function. We furt her suppose t hat measure-
rnents xhl, ..-, Xhnh are i.i.d. measurements obtained by instrument h: different
measuring by different instruments are independent. The sampling scheme consid-
ered here is different from the method of Luo et al (1996, unpublished) and others.
They used two-phase sampling with the imperfect sample as the first phase sample
and the second phase sample taken from the first phase sample. This means that the
second phase sample is dependent on the first phase sample, and usually the second
phase sample is a very small portion of the first phase sample. We consider inde-
pendent samples here for two main reasons. First, in the case different inst rurnents
are involved in measuring, the makers may have done a lot of testing and will give
the accuracies of different instruments. Or a great number of testing can be made to
know the difference between different instruments before practical measuring. The
sarne thing can be said about different degrees of trained personnel. Our method
can take this kind of information into account. and make more accurate inference.
Second, as pointed by Luo et al (1996, unpublished) and others. when the second
phase sample is quite small compared to the first phase sample. the two samples
can be treated as independent samples. Actually their recommended estimator for
the CDF is based on independent samples.
The empirical likelihood based on the independent samples
is given by
L(Fl, ' . . ? FH) = n ~ i ~ : ~ I P h i ~ (2.2)
where phi = Pr(xh = x h i ) and l:zl phi = 1 for each h. Only those Fh distributions
CHAPTER 6. EL IN THE PRESENCE OF II/IE.~SC'REMENT ERROR 131
which have an atom of probability on each xhi have nonzero likelihood. (2.2) is
rnaximized by the empirical distribution function FhTn,, (z) = n i 1 x i I (xh i < x).
The empirical likelihood ratio is then defined as
which reduces to
We remark that formulas here and elsewhere in this chapter also do not require that
the xhi7s be distinct. Since we are interested in estimating the parameter 0, and we
know the estimating equation (2.1), we define the empirical likelihood ratio function
and i. Here function ghi(B) depends on h aiso.
which is different irom the one discussed in chapters 2 to 4.
As noted before, for any given 0 and h , n:,hlphi can be maximized, provided O
is inside the convex hull of the point ghl ( O ) , . . gh,, ( O ) . The maximum of II:zlphi
subject to the constraints phi 2 O, xi Phi = 1. and C; phighi(0) = 0 is attained when
where t h = t h ( 0 ) is a rh x 1 vector given as the solution to
Hence the left side of (2.4) is
CIHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 132
and the empirical (negative) log-likelihood ratio of 8 is
hi
with th being t h e solution to (2 .6 ) . We can minimizelE(B) to obtain an estimator 0 of
the parameter 8 , called the maximum empirical likelihood ratio estimator (MELRE). h
In addition, this yields estimators P h i t from (2.5): and an estimator for the true
distribution function FH as
CY
0 is obtained by solving
and (2.6) toget her. Ly
In the following section we will prove the existence of 9 and study the asymptotic ry Y
properties of e and F H , ~ ~ (5)-
6.3 Asymptotic properties of MELRE hi
First, we give the conditions for the existence of 8 which will minirnize l E ( 0 ) defined
by (1.8).
In the lollowing. we will use I I I I to denote Euclidean norm and n = Eh nh.
Lemma 1. Suppose as R + oc, n/nh -t /ri, > O for al1 h . And suppose
that in a neighborhood o f the true value Bo, V ~ P [ ~ ~ ( X ~ . Bo) ] = ah(&) > O for al1 h .
I I agh(xh,O)/aO II and I I gn(xh, 8 ) I l 4 are bounded by some integrable function G(x)
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 133
in this neighbourhood, and the rank of E[agh(xh, $ ) / a d ] is p. Then. as n + ai' with CY
probability 1 l E ( Q at tains its minimum value at some point ,g in t6e interior o f the ry & h i -
bail II B - Bo 115 n-+, and 0 and t h = t h ($) satis-
w-h ere
Theorem 1. ln addition to the conditions o f Lemma 1. we further assume
that is continuous in d in a neighbourhood of the brue value Bo, then if
II II is bounded by some integrable function G(z) in the neigbbourhood for
ail h , then
fi(; - B a ) + $(O, V), J n ( Y h -0) + !\*(O. Q).
h r rr
where F H . ~ ~ (x) = Ci p ~ i 1 ( x ~ i < x), p ~ i = ' 1 - : and V, Q , kt' are defined nH l+C&,(e 3
in the proof.
Since usually n h >> n H for h = 1, - - , H - 1. we can see the main part of the h)
variance of F H Y n H (2) will come from the first three term of (6.4), ahich is smaller
tiian the variance of t he estimator FHVnH (x), t he cumulative distribution function
from the precise data x ~ l , . - . , XH,, only. Some further discussion will be given in
Section 6.+5.
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 134
h, H
The estimaton for variance of 9 and F H , ~ ~ (x) can be obtained by replacing
each component of V and W by their obvious estimators. These variance estimators
are asymptot ically correct.
6.4 Likelihood Ratio Tests
Several testing issues arise naturally from the appiication point of view. We first
consider how to test hypothesis Ho : 19 = Bo.
Theorem 2. In the serni-parametric mode1 (2.1)' under the conditions o f
theorem 1, for testing Ha : 19 = Bo, the empirical likelihood ratio statistic
is asymptotically distï-ibuted as xg i f n l = = n ~ .
Now we turn to test the mode1 ( 2 4 , i-e..
Corollary 2.1 In order to test the mode1 (4.1). ive can use the empirical like-
lihood ratio statistic
Under the conditions o f theorem 2 above. Ri is asymptotically distributed as y?-,
i f (4.1) is correct, and nl = . . . H = n~ = n, where r = Eh=, rh .
CH.4PTER 6. EL IN THE PRESENCE OF MEMUREMENT ERROR 135
6.5 Examples
We present several illustrations of the estimation procedures. Procedures about how
to solve equations (3.1) through (3.3) will be given. Large-sample aspects of these
estimators wiIl be discussed.
Example 1. Common Mean Mode1
Suppose we have two different instruments, i.e., H = 2. and we only know that
they are unbiased, i.e.,
E x l = O , Ex2 = O?
and x2 refers to the perfect measurement. Suppose that 111 ... -. II,, and xzl . ..
xz,, are independent sarnples from xi and 22 respectively. Let
t hen
and equat ion (3.3) becornes
Substituting (5.1) into (3.2): we get
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMEIL'T ERROR 136
Solving (5.2) and (5.3) by Newton method or using NAG Fortran library function
with initial value (8, t ) = (2, O ) often works, where 3 = 2 Z 1 + nf 2 .
Let cri = Var(xl). 0 2 = V a r ( x 2 ) . Since
CY
replacing kh by n / n h we pet the asymptotic variance of 6 as
which is the same as the variance of the optimal linear combinat ion *estimator3
Note è is actually not an estimator since we do not know t h e variances ol and 0,.
but we use it for cornparison here and in the following. rV
A s for the asymptotic variance of FH,,, (x) , Le.. M.' given by (6.4). alter a little
calculation we get
+ L ~ ~ ~ ; ' C ~ O ; ~ - 2o;l
(a;' + 0;' + OF' 1
From (-5.4) we see t hat if a2 < < ai, then combining the two samples will not improve
the efficiency of CDF estimator significantly compared to using only the pcrfect data.
On the other hand, if a2/al is reasonable large, such as greater than 0.5. then for
example, n z / n l = k1 /k2 < 518, we will have a significantly more efficient estimator
for CDF.
CHAPTER 6. EL IN THE PRESENCE OF MEASIREMENT ERROR 137
Example 2 Additive Mode1
We consider the following mode1
i.e., the variance of imperfect measurement is larger than the variance of the ~ e r f e c t
measurement by 00, where ao(> O ) is known. We note here that if E x l = dl + UO. where vo is known, we can change the data x i to X I - vol and al1 the following
discussion still applies.
Suppose two independent samples frorn X I and x2 are slil - . .. XI,, and xzl? - - -.
x2,, respectively. Denote zl = (xi, x:)'. z2 = ( x 2 , x;)', 0 = (61782)17 gi(zl.O) =
(xi-$1, ~ : - 8 : - 6 ~ - 0 , 2 ) ' , g 2 ( i 2 , O ) = ( x 2 - d l , x Z - B : - ~ ~ ) ~ : then Egl = O. Eg2 = 0.
and equation (3.3) becomes
n2 n2 tZl = -- h i , t22 = -- tl2.
nt nl
Hence, (3.2) becomes
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 138
CVe can solve the above four equations by Ietting the initial values
- h
Using Newton's method will usually give us a solution to B I , 02 . and hence Phi, F 2 n 2
(4. In order to get an idea of the asymptotic properties of these estimators. we
suppose xl and xz have normal distribution. Since
CHAPTER 6. EL IN THE PRESENCE OF MEA4SUREMENT ERROR 139
hence. by (6.4) we have V = (kl& + k2B,)62Q./(B2 + O.)? If ive replace kh by n/nht ly
then Var(@) = (n~'O2 + n;19.)626i/(82 + which is the sarne as t h e variance of
the optimal linear combination estirnator
Example 3. Product Mode1
Suppose
i.e., the ratio of variances of imperfect measuremeot to perfect measurement is a
known constant(c > O). The two independent samples from xl and x 2 are xil. -.
2 ln, and 121. . . S . x~~~ respectively. Denote
2 gl ( z ~ . O ) = (z1 - Bi, r; - 8: - ce2)'. g2(z2. O ) = (x2 - 4. tz - 8: - 02 )'.
t hen
Egl = 0, Eg2 = 0,
Hence equat ions (3.1) t hrough (3.3) become
CHAPTER 6. EL IN THE PRESENCE OF MEMUREMENT ERROR 140
Solving the above six equations by letting the initial values
tlJJ = Oi t2 ,0 = 0
n, N
and using Newton's method. give us a solution to 61, 02 and hence P h i , F2n2 ( X 1. h,
Usually the solution Phi is non-negative.
In the following discussion, we suppose xi and x2 are normally distributed. Then
Therefore ' 2 ~ 0 ~ 0 ~
Var(gl(rl, 8)) = 01 = %&O2 2 ~ ~ 8 ; + .~c0&
CHAPTER 6. EL II THE PRESENCE OF MEASUREkiENT ERROR 141
hi
Hence, by (6.4) we have Var (0 ) z: (&)2 (-& + $) Q2. which is the same as the
variance of the optimal linear combination estimator
6.6 Proofs
Proof of Lemrna 1
Denote 0 = Bo + un-f for O E {el II 9 - Bo I I = n- ; } : where I I u II= 1. First
rve give a Iower bound for l E ( 0 ) on the surface of the bail. From Qin and Lawless
(1994). when II 0 - O o I I $ n-;, ive have
uniforrnly about O E {O 11 O - Bo 115 n-f}, where
By this and Taylor's expansion, we have (uniformly for u )
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 142
mhere c - E > O and c is t h e smallest eigenvalue of
Similarly
Since l E ( B ) is a continuous function about 0 when 0 belongs t o the ball II 6 - Bo II< Ly
n - i l f E ( e ) has minimum value in t h e interior of this ball. and e satisfies ( noting
C H A P T E R 6. EL IN THE PRESENCE OF MEASII'REMENT ERROR 143
Proof of Theorem 1
Taking derivatives about B r , th, we have
CHAPTER 6. EL IN THE PRESENCE OF MEASLrREhfEiVT ERROR 144
equations of (6.1) (6.2) can be written in nlatrix form as
where iC122.1 = -&lM{11Vf12, we have
fi(; - 9 0 ) + !V(O, V ) ,
where
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 145
Fur t, her
where
Now we turn to the third conclusion. Since
we know
w here
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 146
By the definition of U , we have
= M;'
and
cy
t =
Hence
where
CHAPTER 6. EL I N THE PRESENCE OF MEASUREMENT ERROR 147
Noting that CJiMIIU = CJo we have
Proof of Theorem 2
'loting (6.:3), we have
CHAPTER 6. EL IN THE PRESENCE OF MEASUREMENT ERROR 148
where Li, Mil are defined in section 4: N = diag{nl ï,, , . - . n H l , , } . Similady. we
F t
we can see only when nl = - . - - n~ = n. then LW = .VL:. and P is symmetric:
otherwise it is asymmetric, and the distribution of R is of an unknown form. Since
we suppose that nt = - = n H . t herefore
By the definition of lJ? we have
CHAPTER 6- EL IN THE PRESENCE OF MEASbïtEhIENT ERROR
Hence P is idempotent, with trace equal to p, the theorem follows.
Proof of Corollary 2
Frorn the proof of theorem 2, we have
Chapter 7
Conclusions and Further Research
7.1 Conclusions
An empirical likeli hood approach to the use of auxiliary informât ion in st rat ified
survey is introduced. CVe first considered how to make inference on the popula-
tion mean I and the distribution function F v ( y ) of characteristic y of a population
when the population mean of an auxiliary variable x is known. Then we extended t,he
form of auxiliary information to more general format, namely. the general estimat-
ing equations. In the above two cases, only stratified simple randorn sampling(with
or without replacement) is considered. \.Ve proceeded to consider more complex
sampling designs, i-e.. in each stratum, other sampling methods such as probability
proportional to size sampling are used. Chapter 5 is trying to solve the calcula-
tion problem which rnay rise from Chapter 3 , that is? when we solve the estimating
equations involving unknown parameters, sometime the solution f ih i may be nega-
tive. Chapter 6 is an immediate application of empirical likelihood, namely, using
CHAPTER 7. CONCLUSIONS AND FURTHER RESEARCH
empirical likelihood method in the presence of measurement error.
The empirical likelihood estimators of parameters such as k', the mean of the
interested characteristic y of population, and the distri but ion funct ion of y. are
shown in C hapter 2 to be asymptotically equivalent to optimal estimators given
by Rao and Liu(1992). However, the EL estimator of F v ( y ) . is guaranteed to be
monotone, non-negative, unlike the optimal est imator. Empirical likelihood ratio
test is also considered, though our simulations show that its finite sarnple properties
are not as good as expected.
The above results are generalized to deal wit h the case of au'tiliary information
summarised in the form of general estimating equat ions, hence enlarging the appli-
cation areas of the empirical likelihood rnethod. Methods to deal witli cases where
parameters are subject to constraints are also included. Empirical likelihood ratio
test is also discussed. The results show that empirical likelihood ratio test, Lagrange
multiplier test. and Wald test lead to the same asymptotic distribution.
In practice. a cornplex survey design is often employed. How to incorporate
such designs into empirical likelihood inference is presented in Chapter 4. Here
the empirical likelihood had been changed to pseudo-likelihood. Ail the inferences
are based on the pseudo-likelihood. The EL ratio test is no longer asymptotically
,y2-distributed. It is a weighted sum of independent y: variable.
The method of combining EL. general estimating equations and M-estimation is
presented in Chapter 5. The key point here is that the general estimation equations
and EL are used to estimate distribution îunction first, where the auxiliary infor-
mat ion summarised t hrough estimating equat ions does not include any unknown
parameters, then use this est ina ted distribution function in $1-estimation. In some
CHAPTER 7. CONCL USIONS AND FURTHER RESEARCH
complex case, this will reduce the difficulty in calculation. but the auxiliary infor-
mation should be summarised without unknown parameters.
In the presence of mrasurement error, EL can be used to combine information
from different instruments to rnake more accurate inference. In other words. we
can borrow strengt h from other populations to make inference on the population of
interest. Some form of relation among these populations must be known before we
can torrow st rength from ot her populations.
7.2 Further Research
While it is hoped that the results of this research will be of use in making more accu-
rate inference rvhen auxiliary information is available. many areas remain in which
further research might be profitably undertaken. Three areas are now discussed.
The sirnulat ions presented in Chapter 2 show t hat the empirical likelihood (ratio)
estimator for parameters of interest are very good. However. the empirical likelihood
ratio tests are not as good as expected. Higher order correction to the tests may
improve the performance of EL ratio testing.
The EL estimators for the distribution function from Chapters 2 to 6 are nonneg-
ative, monotone, hence can be used to estimate quantiles. Large sample properties
of these estimators need to be studied.
Throughout the paper, the weights of al1 strata are assumed known. This will
limit the application of EL method to more complex sampling such as multi-stage
sampling. Typically, in multiple-stage sampling, the weights of second stage sam-
pling are not known. How to combine the auxiliary information in estimating weights
CHAPTER 7. CONCL USIONS AND FURTHER RESEARCH
and the parameter of interest will be more challenging.
References
Aitchison, J.. & Silvey, S. D. (1955). Maximum-likelihood estimation of parameters
subject to restraints. Ann. Math. Statist., 29, 513-828.
Bickel. P.J. & Freedman, D.A. (1984). Asymptotic normality and the bootstrap in
stratified sampling. Ann. Statist. 12, 470-82.
Chaudhary M. A. & Sen. P.K. (1995). .Asymptotic Distribut ion of Estimators lrom
Unequal Probabili ty Sampling, Proceedings of the American Statistica! Associ-
ation, Section on Survey Research illethods, 34.5-349.
Chen, J. k Qin, J. (1993). Empirical likelihood estimation for finite population and
the effective usage of auxiliary information. Biometrika 80 107- 1 16.
Chen. J. Sr Sitter. R.R. A Pseudo Empirical Likelihood Approach to the Effective
Use of Auxiliary Information in Complex Surveys. Research Report. Xo. 96-01.
Cochran, W.G. ( 1977). Sampling Techniques. 3rd. New York: Wily
DiCiccio, T.J., P. Hall Sr J.P. Romano (1989). Cornparison of para,metric and em-
pirical likelihood funct ions. Biometrika, 76, 465-476.
DiCiccio, T. J. & Romano, J. (1990). Nonparametric confidence limits by resampiing
methods and least favorable families. Internat. Statist. Rev. . 58. Fj9-76
Fuller. W.A. (1995). Estimation in the presence of measurement error. Int. Statist.
Rev., 63, 121-141.
Ghosh M. 5i Rao, J.N.K. (1994). Small -4rea Estimation: An Appraisal. Statistical
Science, 9, 55-93.
Hall, P. (1990). Pseudo-likelihood theory for empirical likelihood. Ann. Statist..
18, 1'21-140
Hartley. H.O. 8 Rao, J.N.K. (1968). A new estimation theory for sample surveys'
Biometrika, 55, 547-1557.
Hochberg, Y . (1977), On the Use of Double Sampling Schemes in Analyzing Cate-
gorical Data wit h Misclassification Errors, JASA, 72, 914-921.
Holl, P. (1990 j. Pseudo-likelihood theory for empirical likelihood. Ann. Statist..
18, 121-140.
Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist.
, 35, 73-101
Jennrich, R.I. (1969) Asymptotic properties of non-linear least squares estimates.
Ann. Math. Statisist., 40, 633-643.
Luo, M. k Stokes, L. & Sager, T. Estimation of the CDF of a Finite Population in
the Presence of a Calibration SampIe.
Ohlsson, E. (1986). Asymptotic Normality of the Rao-Hartley-Cochran Estimator:
.4n Application of the Martingale CLT, Scand J. Statist.: 13. 17-23.
Overton. W. S. (1989). Effects of measurement and other extraneous errors on
estimated distri but ion functions in the national surface water surveys. Technical
report 12.9, Dept . of Statistics, Oregon State LTniversity
Owen, A. B. (1988). Empirical likelihood ratio confidence intervals for a single
functional, Biometrika, 75 337-249.
Owen, .4. B. (1990). Empirical likelihood confidence regions, Ann. Statist .. 18,
90-120.
Owen, A. B. (1991). Empirical likelihood for linear rnodels, Ann. Statist.. 19,
1725-1747.
Qin, J. k Lawless, J.F. (1994). Empirical likelihood and general estimating equa-
tions. rlnn. Statist. 22 300-325
Qin, J. Sr Lawless, J.F. (1995). Estimating equations, empirical likelihood and
coristraints on parameters, The Cimadian Journal of Statistics, '23. 14.5- 159.
Rao, J.N.K. (1994). Estimating Totals and Distribution Functions Using Auxiliary
Information at the Estimation Stage, Journal of Official Statistics, 10. 153- 165
Rao, J.N.K. Sr Kovar, J.G. k Mantel, H.J. (1990). On estimating Distribution Func-
t ions and Quant iles from Survey Data using Aiixiliary Informat ion. Biometrika.
'77, 365-3'75
Rao, J.N.K. k Liu, J. (1992). On estimating distribution function from sample
survey data using supplementary information a t the estimation stage. 'ionpara-
metricstatistics and Related Topics (A.K.lLId.E. Saleh. Ed.). ~Veeiv York: Elswior.
399-40 7
Rao. J .N.K. & Scott, A.J. (1979) Chi-Squared Tests for Analysis of Categorical Data
From Complex Surveys, Proceedings of the American Statistical Association.
Section on Survey Research Methods, 58-66.
Rao, J.N.K. Sr Scott, A.J. (1981). The analysis of Categorical Data From Complex
Sample Surveys: Chi-Squared Tests for Goodness of Fit and Independence in
Two-Way Tables, JASA, 76, 221-230.
Rosen. B. (1972); Asymptotic Thcory for Successive Sarnpling wit h Varying Proba-
biiities, 1 and II, AnnaIs of Mathematical Statistics, 43, 373-397; 748-776.
Sarndal. C.E., Swensson. B. & Wretman, J. (1991). Mode1 assisted survey sampling.
Springer.
Sen, P.K. (198S), Asymptotics in Finite Population Sampling, Hand Book of Statis-
tics 6, P.R. Krishnaiah and C.R. Rao Eds., Elsevier Science Publishers B.V.
391-331.
Shao, J. (1994). L-statistics in complex survey problems, -4nn. Statist ., 22. 946-967.
Tenenbein, A. (1970), A Double sampling Scheme for Estimating from Binomial
Data with Misclassifications, JASA, 65, 1330-1361.
Tenenbein, A. (1971), A Double sampling Scheme for Estimating from Binomial
Data with b1isclassiFications: Sample Size Determination, Biometrics. 27, 935-
944.
Tenenbein, A. ( lW2), A Double sampling Scheme for Estimating from Misclassified
Multinomial Data with Applications to Sampling Inspection. Technometries. 14.
187-202.
Zhang, B. (1995). -M-estirnat,ion and Quantile Estimation in the Presence of Auxil-
iary Information, Journal of Statistical Planning and Inference, 44. 77-94.
Zhang, B. ( l996), Estimating a Population Variance wi th Known Mean. Interna-
tional Statistical Review, 64, 21 5-229.
Zhong, C. & Rao, J N K (1996). Empirical likelihood inference using stratified Sam-
pling using auxiliary information, The proceeding of JSM 96.
TEST TARGET (QA-3)
APPLIED IMAGE. lnc 1653 East Main Street - -. - - Rochester. NY 14609 USA -- -- - - Phone: 716/482-0300 --
I- - - Fa: 716i288-5989