GhoshAndRao_1994 SAE-An Appraisal

Embed Size (px)

Citation preview

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    1/29

    Small Area Estimation: An Appraisal

    M. Ghosh; J. N. K. Rao

    Statistical Science, Vol. 9, No. 1. (Feb., 1994), pp. 55-76.

    Stable URL:

    http://links.jstor.org/sici?sici=0883-4237%28199402%299%3A1%3C55%3ASAEAA%3E2.0.CO%3B2-5

    Statistical Scienceis currently published by Institute of Mathematical Statistics.

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available athttp://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtainedprior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content inthe JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/journals/ims.html.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printedpage of such transmission.

    The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academicjournals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers,and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community takeadvantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

    http://www.jstor.orgSat Feb 2 11:43:25 2008

    http://links.jstor.org/sici?sici=0883-4237%28199402%299%3A1%3C55%3ASAEAA%3E2.0.CO%3B2-5http://www.jstor.org/about/terms.htmlhttp://www.jstor.org/journals/ims.htmlhttp://www.jstor.org/journals/ims.htmlhttp://www.jstor.org/about/terms.htmlhttp://links.jstor.org/sici?sici=0883-4237%28199402%299%3A1%3C55%3ASAEAA%3E2.0.CO%3B2-5
  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    2/29

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    3/29

    6 M GHOSH AND J N K RAO

    in the apportionment of government funds, and inregional and city planning. In addition, there aredemands from the private sector since the policy-making of many businesses and industries relies onlocal socio-economic conditions. Thus, the need forsmall area statistics can arise from diverse sources.

    Demands of the type described above could nothave been met without significant advances in sta -tistical data processing. Fortunately, with the ad-vent of high-speed computers, fast processing oflarge data sets made feasible the provision of timelydata for small areas. In addition, several power-ful statistical methods with sound theoretical foun-dation have emerged for the analysis of local areadata. Such methods borrow strength from relatedor similar small areas through explicit or implicitmodels that connect the small areas via supplemen-tary data (e.g., census and administrative records).However, these methods are not readily availablein a package to the user, and a unified presentationwhich compares and contrasts the competing meth-ods has not been attempted before.

    Earlier reviews on the topic of small area esti-mation focussed on demographic methods for pop-ulation estimation in post-censual years. Morri-son (1971) covers the pre-1970 period very well, in-cluding a bibliography. National Research Coun-cil (1980) provides detailed information as well asa critical evaluation of the Census Bureau's proce-dures for making post-censual estimates of the pop-ulation and per capita income for local areas. Theirdocument was the report of a panel on small-area es-timates of population and income set up by the Com-mittee on National Statistics a t the request of theCensus Bureau and the Office of Revenue Sharingof the U.S. Department of Treasury. This documentalso assessed the levels of accuracy of current esti-mates in light of the uses made of them and of theeffect of potential errors on these uses. Purcell andKish (1979) review demographic methods as well asstatistical methods of estimation for small domains.An excellent review provided by Zidek (1982) in-troduces a criterion that can be used to evaluatethe relative performance of different methods forestimating the populations of local areas. McCul-lagh and Zidek (1987) elaborate this criterion morefully. Statistics Canada (1987) provides an overviewand evaluation of the population estimation meth-ods used in Canada.

    Prompted by the growing demand for reliablesmall area statistics, several symposia and work-shops were also organized in recent years, and someof the proceedings have also been published: Na-tional Institute on Drug Abuse, Princeton Confer-ence (see National Institute on Drug Abuse, 1979),International Symposium on Small Area Statistics,

    Ottawa [see Platek et al. (1987) for the invitedpapers and Platek and Singh (1986) for the con-tributed papers presented at th e symposium]; Inter-national Symposium on Small Area Statistics, NewOrleans, 1988, organized by the National Centerfor Health Statistics; Workshop on Small Area Es-timates for Military Personnel Planning, Washing-ton, D.C., 1989, organized by the Committee on Na-tional Statistics; International Scientific Conferenceon Small Area Statistics and Survey Designs, War-saw, Poland, 1992, (see Kalton, Kordos and Platek,1993). The published proceedings listed above pro-vide an excellent collection of both theoretical andapplication papers.

    Reviews by Rao (1986) and Chaudhuri (1992)cover more recent techniques as well a s traditionalmethods of small area estimation. Schaible (1992)provides an excellent account of small area estima-tors used in U.S. Federal programs (see NTIS, 1993,for a full report prepared by the Subcommittee onSmall Area Estimation of the Federal Committee onStatistical Methodology, Office of Management andBudget).

    The present article considerably updates earlierreviews by introducing several recent techniquesand evaluating them in th e light of practical consid-erations. Particularly noteworthy among the newermethods are the empirical Bayes (EB), hierarchicalBayes (HB) and empirical best linear unbiased pre-diction (EBLUP) procedures which have made sig-nificant impact on small area estimation during thepast decade. Before discussing these methods in thesequel, it might be useful to mention a few impor-tant applications of small area estimation methodsas motivating examples.

    As our first example, we cite the Federal-StateCooperative Program (FSCP) initiated by the U.S.Bureau of the Census in 1967 (see National Re-search Council, 1980). A basic goal of this pro-gram was to provide high-quality, consistent seriesof county population estimates with comparabilityfrom area to area. Forty-nine states (with the ex-ception of Massachusetts) currently participate inthis program, and their designated agencies worktogether with the Census Bureau under this pro-gram. In addition to county estimates, several mem-bers of the FSCP now produce subcounty estimatesas well. The FSCP plays a key role in the Cen-sus Bureau's post censual estimation program as theFSCP contacts provide the bureau a variety of datathat can be used in making post censual populationestimates. Considerable methodological research onsmall area population estimation is being conductedin the Census Bureau.

    Our second example is taken from Fay and Her-riot (1979) whose objective was to estimate the per

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    4/29

    SMALL AREA ESTIMATION: N APPRAISAL

    capita income (PCI) for several small places. TheU.S. Census Bureau was required to provide theTreasury Department with the PC1 estimates andother statistics for state and local governments re-ceiving funds under the General Revenue SharingProgram. These statis tics were then used by theTreasury Department to determine allocations tothe local governments within the different statesby dividing the corresponding state allocations. Ini-tially, the Census Bureau determined the currentestimates of PC1 by multiplying the 1970 censusestimates of PC1 in 1969 (based on a 20 percentsample) by ratios of an administrative estimate ofPC1 in the current year and a similarly derived es-timate for 1969. The bureau then confronted theproblem that among the approximately 39,000 lo-cal government units about 15,000 were for placeshaving fewer than 500 persons in 1970. The Sam-pling errors in the PC1 estimates for such smallplaces were large: for a place of 500 persons thecoefficient of variation was about 13 percent whileit increased to about 30 percent for a place of 100persons. Consequently, the Bureau initially decidedto set aside the census estimates for these smallareas and use the corresponding county PC1 esti-mates in their place. This solution proved unsat-isfactory, however, in that the census estimates ofPC1 for a large number of small places differed sig-nificantly from the corresponding county estimates,after taking account of the sampling errors. Fayand Herriot (1979) suggest better estimates basedon the EB method and present empirical evidencethat these have average error smaller than eitherthe census sample estimates or the county aver-ages. The proposed estimate for a small place isa weighted average of the census sample estimateand a synthetic estimate obtained by fitting a lin-ear regression equation to the sample estimates ofPC1 using as independent variables the correspond-ing county averages, tax-return data for 1969 anddata on housing from the 1970 census. The Fay-Herriot method was adopted by the Census Bureauin 1974 to form updated estimates of PC1 for smallplaces. Section 4 discusses the Fay-Herriot modeland similar models for other purposes, all involvinglinear regression models with random small area ef-fects.

    Our third example refers to the highly debatedand controversial issue of adjusting for populationundercount in the 1980 U.S. Census. Every tenthyear since 1790 a census has been taken to countthe U.S. population. The census provides the pop-ulation count for the whole country as well as foreach of the 50 states, 3000 counties and 39,000 civildivisions. These counts are used by the Congressfor apportioning funds, amounting to about 100 bil-

    lion dollars a year during the early 1980s, to thedifferent state and local governments.It is now widely recognized that complete cover-age is impossible. In 1980, vast sums of money andintellectual resources were expended by the U.S.Census Bureau on the reduction of non-coverage.Despite this, there were complaints of undercountsby several major cities and states for their respec-tive areas, and indeed New York State filed a law-sui t against the Census Bureau in 1980 demandingthe Bureau to revise its count for that state.

    n undercount is the difference between omis-sions and erroneous inclusions in the census, andit is typically positive. In New York State's lawsuit against the Census Bureau, E.P. Ericksen andJ.B. Kadane, among other statisticians, appeared a sthe plaintiff's expert witnesses. They proposed us-ing weighted averages of sample estimates and syn-thetic regression estimates of the 1980 Census un-dercount, similar to those of Fay and Herriot (1979)for PCI, to arrive at the adjusted population countsof the 50 states and the 16 large cities, including theState of New York and New York City. The sam-ple estimates are obtained from a Post Enumera-tion Survey. Their general philosophy on the role ofadjustment as well as the explicit regression mod-els used for obtaining the regression estimates aredocumented in Ericksen and Kadane (1985) and Er-icksen, Kadane and Tukey (1989). These authorsalso suggest using the regression equation for areaswhere no sample data are available. As a histori-cal aside, we may point out here that the regressionmethod for improving local area estimates was firstused by Hansen, Hurwitz and Madow (1953, pages483-486), but its recent popularity owes much toEricksen (1974).

    While the Ericksen-Kadane proposal was ap-plauded by many as the first serious attempt to-wards adjustment of Census undercount, it has alsobeen vigorously criticized by others (see, e.g., thediscussion of Ericksen and Kadane, 1985). In par-ticular, Freedman and Navidi (1986,1992) criticizedthem for not validating their model and for not mak-ing their assumptions explicit. They also raise sev-eral other technical issues, including the effect oflarge biases and large sampling errors in the sam-ple estimates. Ericksen and Kadane (1987, 1992),Cressie (1989, 1992), Isaki et al. (1987) and oth-ers address these difficulties, but clearly further re-search is needed. Researchers within and outsidethe U.S. Census Bureau are currently studying var-ious models for census undercount and the proper-ties of the resulting estimators and associated mea-sures of uncertainty using the EBLUP, EB, HB andrelated approaches.

    Our fourth example, taken from Battese, Harter

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    5/29

    8 M GHOSH AND J N K. RAOand Fuller (1988), concerns the estimation of areasunder corn and soybeans for each of 12 countiesin North-Central Iowa using farm-interview datain conjunction with LANDSAT satellite data. Eachcounty was divided into area segments, and the ar-eas under corn and soybeans were ascertained for asample of segments by interviewing farm operators;the number of sample segments in a county rangedfrom 1 to 6. Auxiliary data in the form of num-bers of pixels (a term used for picture elementsof about 0.45 hectares) classified as corn and soy-beans were also obtained for all the area segments,including the sample segments, in each county us-ing the LANDSAT satellite readings. Battese, Har-ter and Fuller (1988) employ a nested error regres-sion model involving random small area effects andthe segment-level data and then obtain the EBLUPestimates of county areas under corn and soybeansusing the classical components of variance approach(see Section 5). They also obtain estimates of meansquared error (MSE) of their estimates by takinginto account the uncertainty involved in estimatingthe variance components. Datta and Ghosh (1991)apply the HB approach to these data and show thatthe two approaches give similar results.

    Our final example concerns the estimation ofmean wages and salaries of units in a given in-dustry for each census division in a province usinggross business income as the only auxiliary vari-able with known population means (see Sarndal andHidiroglou, 1989). This example will be used in Sec-tion 6 to compare and evaluate, under simple ran-dom sampling, several competing small area esti-mators discussed in this paper, treating the censusdivisions as small areas. We were able to comparethe actual errors of the different small area estima-tors since the t rue mean wages and salaries for eachsmall area are known.

    The outline of the paper is as follows. Section 2gives a brief account of classical demographic meth-ods for local estimation of population and other char-acteristics of interest in post-censual years. Thesemethods use current data from administrative reg-isters in conjunction with related data from the lat-est census. Section 3 provides a discussion of tra-ditional synthetic estimation and related methodsunder the design-based framework. Two types ofsmall area models th at include random area-specificeffects are introduced in Section 4. In the firsttype, only area specific auxiliary data, related toparameters of interest, are available. In the sec-ond type of models, element-specific auxiliary dataare available for the population elements; and thevariable of interest is assumed to be related to thesevariables through a nested error regression model.We present the EBLUP, EB and HB approaches to

    small area estimation in Section 5 in the context ofbasic models given in Section 4. Both point esti-mation and measurement of uncertainty associatedwith the estimators a re studied. Section 6 comparesthe performances of several competing small areaestimators using sample data drawn from a syn-thetic population resembling the business popula-tion studied by Sarndal and Hidiroglou (1989). InSection 7, we focus on special problems that may beencountered in implementing model-based methodsfor small area estimation. In particular, we givea brief account of model diagnostics for the basicmodels of Section 4 and of constrained estimation.Various extensions of the basic models are also men-tioned in this section. Finally, some concluding re-marks are made in Section 8.

    The scope of our paper is limited to methods ofestimation for small areas; but the developmentand provision of small area statistics involves manyother issues, including those related to sample de-sign and data development, organization and dis-semination. Brackstone (1987) gives an excellentaccount of these issues in the context of StatisticsCanada's Small Area Data Program. Singh, Gam-bin0 and Mantel (1992) highlight the need for de-veloping an overall strategy tha t includes planning,designing and estimation stages in the survey pro-cess.

    2 DEMOGR PHIC METHODSAs pointed out earlier, demographers have longbeen using a variety of methods for local estimation

    of population and other characteristics of interestin post-censual years. Purcell and Kish (1980) cat-egorize these methods under the general headingof Symptomatic Accounting Techniques (SAT). Suchtechniques utilize current data from administrativeregisters in conjunction with related data from thelatest census. The diverse registration data used inthe U.S. include symptomatic variables, such asthe numbers of births and deaths, of existing andnew housing units and of school enrollments whosevariations are strongly related to changes in popu-lation totals or in i ts components. The SAT methodsstudied in the literature include the Vital Rates (VR)method (Bogue, 1950), he composite method (Bogueand Duncan, 1959), the Census Component MethodI1 (CM-11) (U.S. Bureau of the Census, 1966), andthe Administrative Records AR) method (Starsinic,1974), and the Housing Unit (HU) method (Smithand Lewis, 1980).

    The VR method uses only birth and death data ,and these are used a s symptomatic variables ra therthan as components of population change. First, ina given year, say t the annual number of births,

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    6/29

    9MALL AREA ESTIMATION AN APPRAISALbt, and deaths, d t, are determined for a local area.Next the crude birth and death rates, r lt and rat, forthat local area are estimated by

    where rlo and 7-20 respectively denote the crude birthand death rate s for the local area in the lates t cen-sus year (t = 0) while Rlt(RPt)and R10(R20) espec-tively denote the crude birth (death) rates in thecurrent and census years for a larger area contain-ing the local area. The population Pt for the localarea a t year t is then estimated by

    As pointed out by Marker (1983), the success of theV method depends heavily on the validity of the as-sumption that the ratios rlt/rlo and r zt /r 2~or the lo-cal area a re approximately equal to the correspond-ing ratios, Rlt/Rlo and Rzt/RzO, or the larger area.Such a n assumption is often questionable, however.

    The composite method is an extension of the Vmethod that sums independently computed age-sex-race specific estimates based on births, deaths andschool enrollments (see Zidek, 1982, for details).

    The CM-I1 method takes account of net migrationunlike the previous methods. Denoting the ne t mi-gration in the local area during the period since thelast census as mt , an estimate of Pt is given by

    where Po is the population of the local area in thecensus year t = 0. In the U.S., the net migration isfurther subdivided into military and civilian migra-tion. The former is readily obtainable from admin-istrative records while the CM-I1 estimates civilianmigration from school enrollments. The AR method,on the other hand, estimates the net migration fromrecords for individuals as opposed to collect unitslike schools (see Zidek, 1982, for details).

    The HU method expresses Pt as

    where Ht is the number of occupied housing units a ttime t, PPHt is the average number of persons perhousing uni t a t time t and GQt is the number of per-sons in group quarters a t time t. The quantities Ht,PPHt and GQt all need to be estimated. Smith andLewis (1980) report different methods of estimatingthese quantities.

    s pointed out by Marker (1983), most of the es-timation methods mentioned above can be identi-fied as special cases of multiple linear regression.

    Regression-symptomatic procedures also use multi-ple linear regression for estimating local area popu-lations utilizing symptomatic variables as indepen-dent variables in the regression equation. Two suchprocedures are the ratio-correlation method and thedifference-correlation method. Briefly, the formermethod is as follows: Let O 1 and t(> 1 denote twoconsecutive census years and the current year, re-spectively. Also, let Pi, and Sy, be the populationand the value of the th symptomatic variable for theith local area (i = 1 . . . ,m) in the year a O 1 t).Further, let pi, = Pi,/CiPi, and s ~ , SU,/CiSg,be the corresponding proportions, and write Ri =pil/piO, Ri = pit/pil, r . = sijl/sW and r~ = sut/sgl.Using the data (Rf, f,, . . . rZ i = 1, . ,m) and mul-tiple regression, we first fit

    where Ps are the estimated regression coefficientsthat link the change, Rf , in the population pro-portions between the two census years to the cor-responding changes, r;, in the proportions for thesymptotmatic variables. Next the changes, Ri, inthe post censual period are predicted as

    using the known changes, rg, in the symptomaticproportions in the post censual period and the es-timated regression coefficients. Finally, the currentpopulation counts, Pit , are estimated as

    where the total current count, Zipit, is ascertainedfrom other sources. In the difference-correlationmethod, differences between the proportions at thetwo pairs of time points, ( 0 , l) and (l ,t ), are usedrather than their ratios.

    The regression-symptomatic procedures describedabove use the regression coefficients, , ,in the lastintercensual period, but significant chhnges in thestatistical relationship can lead to errors in the cur-rent postcensal estimates. The sample-regressionmethod (Ericksen, 1974) avoids this problem by us-ing sample estimates of Ri to establish the currentregression equation. Suppose sample estimates ofRi are available for k out of m local areas, sayRl, . , . Then one fits the regression equation

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    7/29

    60 GHOSH AND J. N. K. RAO.to the data (Ri, il, . . . rip) from the k sampled ar-eas, instead of (2.1); and then obtains the sample-

    regression estimators, Ricreg,,or all the areas usingthe known symptomatic ratios r~ (i 1 . m :

    Using 1970 census data and sample data from theCurrent Population Survey (CPS), Ericksen (1974)has shown that the reduction of mean error is slightcompared to the ratio-correlation method but thatof large errors (10% or greater) is more substantial.The success of Ericksen's method depends largely onthe size and quality of th e samples, the dynamics ofthe regression relationships and the nature of thevariables.

    3 SYNTHETIC ND REL TED ESTIM TORSGonzalez (1973) describes synthetic estimates as

    follows: An unbiased estimate is obtained froma sample survey for a large area; when this esti-mate is used to derive estimates for subareas un-der the assumption that the small areas have thesame characteristics as the large area, we iden-tify these estimates as synthetic estimates. TheNational Center for Health Statistics (1968) firstused synthetic estimation to calculate state esti-mates of long and short term physical disabilitiesfrom the National Health Interview Survey data.This method is traditionally used for small area es-timation, mainly because of its simplicity, applica-bility to general sampling designs and potential ofincreased accuracy in estimation by borrowing in-formation from similar small areas. We now givea brief account of synthetic estimation and relatedmethods, under the design-based framework.3 1 Synthetic Estimation

    Suppose the population is partitioned into largedomains for which reliable direct estimators, p;,of the totals, Y.,, can be calculated from the surveydata; the small areas, i, may cut across so thatY., CiYig,where Yig is the total for cell (i,g). Weassume that auxiliary information in the form oftotals, Xi,, is also available. A synthetic estimatorof small area total Yi CgYig is then given by

    where X., CiXig (Purcell and Linacre, 1976;Ghangurde and Singh, 1977). The estimator (3.1)has the desirable consistency property that zip:equals the reliable direct estimator p Cg pg of

    the population total Y, unlike the original estimatorproposed by the National Center for Health Statis-tics (1968) which uses the ratio Xig/CgXig nstead ofXig 1X.g.

    The direct estimator pi used in (3.1) is typicallya ratio estimator of the form

    where s. denotes the sample in the large domain gand w is the sampling weight attached to the Ithelement. For this choice, the synthetic estimator...(3.1) reduces to Y Cixig(p.,/~.,).

    If pg is approximately design-unbiased, thedesign-bias of pf is given by

    which is not zero unless Yig/Xi, Y.,/X., for al lg . Inthe special case where the auxiliary information Xi,equals the population count Nig, the la tte r condition-is equivalent to assuming tha t the small area meansYig in each group g equal the overall group mean,-Y.,. Such an assumption is quite strong, and in factsynthetic estimators for some of the areas can beheavily biased in the design-based framework.

    It follows from (3.1) th at the design-variance of pwill be small since it depends only on the variancesand covariances of the reliable estimators p i . Thevariance of pf is readily estimated, but it is moredifficult to estimate the MSE of ?:. Under the as-... .sumption cov(Yi,Yf) 0, where pi is a direct, un-biased estimator of Yi, an approximately unbiasedestimator of MSE is given by

    Here v pi)is a design-unbiased estimator of vari-ance of pi. The estimators (3.2), however, a re veryunstable. Consequently, it is customary to averagethese estimators over i to get a stable estimator ofMSE (Gonzalez, 1973), but such a global measureof uncertainty can be misleading. Note that the as-.sumption cov(Yi,Yf) 0 may be realistic in practicesince pf is much less variable than pi.

    Nichol (1977) proposes to add the synthetic esti-mate, pf, as an additional independent variable inthe sample-regression method. This method, calledthe combined synthetic-regression method, showedimprovement, in empirical studies, over both thesynthetic and sample-regression estimates.

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    8/29

    6MALL AREA ESTIMA TION: N APPRAISALChambers and Feeney (1977) and Purcell and

    Kish (1980) propose structure preserving estimation(SPREE) as a generalization of synthetic estimationin the sense it makes a fuller use of reliable directestimates. SPREE uses the well-known method ofiterative proportional fitting of margins in a multi-way table, where the margins are direct estimates.3 2 omposite Estimation

    natural way to balance the potential bias of asynthetic estimator against the instability of a di-rect estimator is to take a weighted average of thetwo estimators. Such composite estimators may bewritten as

    A A

    where Yli is a direct estimator, Yzi is an indirect es-timator and wi is a suitably chosen weight (0 wi1 .For example, the unbiased estimator pi may bechosen as pli, and the synthetic estimator ?; as pzi.Many of the estimators proposed in the literature,both design-based and model-based, have the form(3.3). Section 5 gives such estimators under realis-tic small area models that account for area-specificeffects. In this subsection, we mainly focus on thedetermination of weights, wi, in the design-based

    A Aframework using Yli = Yi and Ci=?Optimal weights, wi(opt), may be obtained by

    minimising the MSE of p? with respect to wi as-A Asuming covtYi,Y;) = 0:

    The optimal weight (3.4) may be estimated by sub-stituting the estimator mse ?;) given in (3.2) forthe numerator and ?; pi)2for the denomina-tor, but the resulting weights can be very unsta-ble. Schaible (1978) proposes an "averagen weight-ing scheme based on several variables to overcomethis difficulty, noting that the composite estimatoris quite robust to deviations from wi(opt). Anotherapproach (Purcell and Kish, 1979) uses a commonweight, w, and then minimizes the average MSE,i.e., rn-lCi MSE (p?), with respect to w. This leadsto estimated weight of the form

    If the variances of pi's are approximately equal,A

    then we can replace v(Yi) by the average =

    Civ(pi)/rn in which case (3.5) reduces to James-Stein type weight:

    The choice of a common weight, however, is not rea-sonable if the individual variances, v(pi),vary con-siderably. Also, the James-Stein estimator can beless efficient than the direct estimator, pi, for someindividual areas if the small areas that are pooledare not "similarn (C.R. Rao and Shinozaki, 1978).

    Simple weights, wi, that depend only on the do-main counts or the domain totals of a covariate xhave also been proposed in the literature. For ex-ample, Drew, Singh and Choudhry (1982) proposethe sample size dependent estimator which uses theweight

    if Ni 6Ni,(3.6) wi(D)=N~/(sN~),otherwise,

    where Ni is the direct, unbiased estimator of theknown domain population size Ni and S is subjec-tively chosen to control the contribution of the syn-thetic estimator. This estimator with S = 213 anda generalized regression synthetic estimator replac-ing the ratio synthetic estimator is currently be-ing used in the Canadian Labour Force Survey toproduce domain estimates. Sarndal and Hidiroglou(1989) propose an alternative estimator which usesthe weight

    il if Ni Ni(3.7) wi(S)= ( N ~ / N ~ ) ~ - ~ ,therwise,where h is subjectively chosen. They, however, sug-gest h = 2 as a general-purpose value. Note that theweights (3.6) and (3.7) are identical if one choosesS = 1 and h = 2.

    To study the nature of the weights wi(D) or wi(S),let us consider the special case of simple randomsampling of n elements from a population of N ele-ments. In this case, Ni =N(ni /n), where the randomvariable ni is the sample size in ith domain. Taking6 = 1 in (3.6), it now follows that wi(D) = wi(S)= 1if ni is at least as large as the expected sample sizeE(ni)= n(Ni/N), that is, the sample size dependentestimators can fail to borrow strength from relateddomains even when E(ni) is not large enough tomake the direct estimator Pi reliable. On the otherhand, when Ni

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    9/29

    6 M. GHOSH AND J N. K. RAOresult, more weight is given to the synthetic compo-nent as ni decreases. Thus, the weights behave wellunlike in the case N~ Ni. Another disadvantage isthat the weights do not take account of the size ofbetween area variation relative to within area vari-ation for the characteristic of interest, that is, allcharacteristics get the same weight irrespective oftheir differences with respect to between area ho-mogeneity.

    Holt, Smith and Tomberlin (1979) obtain a bestlinear unbiased prediction (BLUP) estimator of Yiunder the following model for the finite population:

    where Yige is the y-value of the Cth unit in the cell(i,g),pg s a re fixed effects and the errors eige are un-correlated with zero means and variances a: Fur-ther, Ni, denotes the number of population elementsin the large domain g t ha t belong to the small areai. Suppose nig elements in a sample of size n fallin cell (i,g), and let 5e and g., denote the samplemeans for ( i,g) and g, respectively.

    The best linear unbiased estimator of pg under(3.8) is fig = jj g which in turn leads to the BLUPestimator of Yi given by

    where PZ is a composite estimator of the total Ygiving the weight Gig = nig/Ni, to the direct esti-mator Yig = Nigyig, a_nd the weight 1 wig to thesynthetic estimator Y = Nigyig. It therefore fol-lows that the BLUP estimator of Yi tends to thesynthetic estimator Y = CgNig5., if the samplingfraction nig/Nt, i s negligible for all g, irrespective ofthe size of between area variation relative to withinarea variation. This limitation of model (3.8) can beavoided by using more realistic inodels th at includerandom area-specific effects. We consider such mod-els in Section 4, and we obtain small area estimatorsunder these models in Section 5 using a general EBor a variance components approach as we11 as a HBprocedure.

    4 SMALL AREA MODELSWe now consider small area models that include

    random area-specific effects. Two types of mod-els have been proposed in the literature. In thefirst type, only area-specific auxiliary data xi =

    ( ~ i l ,. . x ~ ~ ) ~re available and the parameters of in-terest , Oi, are assumed to be related to xi. In partic-ular, we assume that

    where the zi s are known positive constants, is thevector of regression parameters and the vi s are in-dependent and identically distributed (iid) randomvariables with

    In addition, normality of the random effects vi is of-ten assumed. In the second type of models, element-specific auxiliary data XU = (xU1,. ,xiiplTare avail-able for the population elements, and the variable ofinterest, yU, s assumed to be related to xu througha nested error regression model:

    Here eg =BUkliand the BG9s re iid random variables,independent of the vi s, with

    the ko s being known constants and Ni the numberof elements in the ith area. In addition, normalityof the vi s and Bu s is often assumed. The parametersof inferential interest here are the small area totalsor the means = YYi/Ni.

    For making inferences about the Oi s under model(4.1), we assume that direct estimators, re avail-able and that

    where the ei s are sampling errors, E(eilBi)= 0 andV(eilOi)= Gi that is, the estimators i are design-unbiased. It is also customary to assume that thesampling variances, Gi, are known. These assump-tions may be quite restrictive in some applications.For example, in the case of adjustment for censusunderenumeration, the estimates i obtained froma post-enumeration survey (PES) could be seriouslybiased, as noted by Freedman and Navidi (1986).Simlasly, if Bi is a nonlinear function of the smallarea total Yi and the sample size, ni is small, theni may be seriously biased even if the direct estima-tor of Yi is unbiased. We also assume normality ofthe 4 8, but this may not be as restrictive as the nor-mality of the random effects vi, due to the centrallimit theorem s effect on the Bi s.

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    10/29

    6MALL AREA ESTIMATION: AN APPRAISALCombining (4.3) and (4.1), we obtain the model

    which is a special case of the general mixed linearmodel. Note that (4.4) involves design-induced ran-dom variables, ei, as well as model-based randomvariables vi.

    Turning to the nested error regression model (4.2),we assume that a sample of size ni is taken from theith area and that selection bias is absent; that is, thesample values also obey the assumed model. Thelat ter is satisfied under simple random sampling. I tmay also be noted tha t model (4.2) may not be appro-priate under more complex sampling designs, suchas stratified multistage sampling, since the designfeatures are not incorporated. However, it is possi-ble to extend this model to account for such features(see Section 7).

    Writing model (4.2) in matrix form as

    where is Ni x p, ,e; and l p are Ni x 1 andp 1,. I ) ~ ,we can partition (4.5) as

    where the superscript denotes the nonsampled el-ements. Now, writing the mean Ti as

    with i ni/Ni and Yi, gf denoting the means forsampled and nonsampled elements respectively, wemay view estimation of Ti as equivalent to predic-tion of gf given the data {yi) and {Xi).

    Various extensions of models (4.4) and (4.6), aswell as models for binary and Poisson data, havebeen proposed in the literature. Some of these ex-tensions will be briefly discussed in Section 7.

    In the examples given in the Introduction, theinodels considered are special cases of (4.4) or (4.6).In Example 3 Ericksen and Kadane (1985, 1987)use model (4.4) with zi 1 and assume to beknown. Here gi is a PES estimate of census under-count i {(Ti Ci)/Ti)lOO, where Ti is the true(unknown) count and Ci is the census count in theith area. Cressie (1992) uses (4.4) with zi c? ~,where gi is a PES estimate of the adjustment factori Ti/Ci. In Example 2, Fay and Herriot (1979)use (4.4) with zi 1, where i is a direct estimator

    of i logPi and Pi is the average percapita income(PCI) in the ith area. Further, xTp o + plxi withxi denoting the associated county value of log (PCI)

    from the 1970 census. In Example 4, Battese, Har-ter and Fuller (1988) use model (4.6) with k~ 1and x ~ o + Plxliii+ P2xzy,where YQ x l ~nd x z ~respectively denote the number of hectares of corn(or soybeans), the number of pixels classified as cornand the number of pixels classified as soybeans inthe jt h area segment of the i th county. A suitablemodel for our final example is also a special caseof (4.6) with xTp o + P1xG and k c x;I2, whereyQ and XU respectively denote the total wages andsalaries and gross business income for the jth firmin the i th area (census division).

    5 EBLUP, EB AND HB APPROACHESWe now present the EBLUP, EB and HB ap-proaches to small area estimation in the context of

    models (4.4) and (4.6). Both point estimation andmeasurement of uncertainty associated with the es-timators will be studied.5.1 EBLU P Variance Componen ts) Approach

    As noted in Section 4, most small area modelsare special cases of a general mixed linear modelinvolving fixed and random effects, and small areaparameters can be expressed as linear combinationsof these effects. Henderson (1950) derives BLUPestimators of such parameters in the classical fre-quentist framework. These estimators minimize themean squared error among the class of linear un-biased estimators and do not depend on normal-ity, similar to the best linear unbiased estimators(BLUES)of fixed parameters. Robinson (1991) givesan excellent account of BLUP theory and examplesof its application.

    Under model (4.4), the BLUP estimator of iX T + UiZi simplifies to a weighted average of thedirect estimator i and the regression-synthetic es-timator xTP

    where the superscript H stands for Henderson,

    is the BLUE estimator of and

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    11/29

    6 M. GHOSH AND J N. K. RAOThe weight yi measures the uncertainty in mod-elling the Ois, namely, u:zP relative to the total vari-ance o:zP + gi. Thus, the BLUP estimator takesproper account of between area variation relativeto the precision of the direct estimator. It is validfor general sampling designs since we are modellingonly the Ois and not the individual elements in thepopulation. It is also design consistent since yi 1as the sampling variance Gi - 0.

    The mean squared error (MSE) of under model(4.4)may be written as

    where

    and

    The first term gli(o:) is of order O(1) while the sec-ond term gzi(a:), due to estimating p, is of orderO(m-l) for large m.

    The BLUP estimator (5.1) depends on the vari-ance component u which is unknown in practicalapplications. However, various methods of estimat-ing variance components in a general mixed linearmodel are available, including the method of fittingconstants or moments, maximum likelihood (ML)and restricted maxiinum likelihood (REML). Cressie(1992) gives a succinct account of these methods inthe context of model (4.4). All these methods yieldasymptotically consistent estimators under realisticregularity conditions.

    Replacing u with an asymptotically consistentestimator 3:, we obtain a two-stage estimator, p ,which is referred to as the empirical BLUP orEBLUP estimator (Harville, 1991), in analogy withthe EB estimator. It remains unbiased provided (i)the distributions of vi and ei are both symmetric(not necessarily normal); (ii) 3: is an even functionof Bi s and remains invariant when i is changed toi - x r a for all a (Kackar and Harville, 1984). Stan-dard methods of estimating variance components allsatisfy (ii). We may also point out that the MSE ofthe EBLUP estimator appears to be insensitive tothe choice of the estimator 8:

    If normality of the errors vi also holds, then wecan write the MSE of ? as

    see Kackar and Harville (1984). It follows from (5.3)that the MSE of Y is always larger than that of theBLUP estimator The second term of (5.3) is nottractable, unlike the first term Mli(u:); but it canbe approximated for large m (Kackar and Harville,1984; Prasad and Rao, 1990; Cressie, 1992). Wehave, for large m,

    where

    and the neglected terms in the approximation (5.4)are of lower order than O(m-l). Here V 3:) de-notes the asymptotic variance of 3:; Cressie (1992)gives the asymptotic variance formulae for ML andREML estimators. It i s customary to ignore the un-certainty in 3: and use Mli(3:) gli(3:) +g2i(3:) asan estimator of MSE of p , ut this procedure couldlead to severe underestimation of the true MSE.

    correct, approximately unbiased estimator ofMSE 8?) is given by

    (see Prasad and Rao, 1990). The bias of (5.5) is oflower order than m-l.

    Noting that E[Ci(yi ~:,B)~/(a :z~ Gi)] m pa method of moments estimator 8: can be obtainedby solving iteratively

    in conjunction with (5.2) and letting 3: 0 whenno positive solution exists (Fay and Herriot, 1979).This method does not require normality, unlike theML and REML. Alternatively, a simple moment es-timator is given by 6: rnax(3:, O , where

    and p (CixixT)-l (cixiei) s the ordinary leastsquares estimator of p. The estimator 3: is unbi-ased for a: and under normality,

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    12/29

    SMALL AREA ESTIMATION: N APPRAISAL 6(see Prasad and Rao, 1990 for the case zi = 1).

    Lahiri and Rao (1992) show that the estimator ofMSE, (5.5), using the moment estimator (5.6), s alsovalid under moderate nonnormality of the randomeffects, vi. Thus, inference based on 8F and mse(8F)is robust to nonnormality of the random effects.

    We next tu rn to the nested error regression model(4.6). The BLUP estimator of in this case is ob-tained as follows: (i) using the model yi = Xi vi ln i

    ei for the sampled elements, obtain the BLUP es-timator of zz:Tp vi, where t is the mean for non-sampled elements; (ii) substitute this estimator for; in (4.7). Thus the BLUP estimator of is givenby

    where 6 is the BLUE of P,

    with wi. = .EJni1wgand wg = ka2, and iw and iiwrethe weighted means with weights wg (see Prasadand Rao, 1990, and Stukel, 1991). The BLUE

    is readily obtained by applying ordinary leastsquares to the transformed data {by ysiw)/kg,(xu %%iw)/kg)see Stukel, 1991, and Fuller andBattese, 1973). If the sample fraction i is negligi--ble, we can write YH as a composite estimator ofthe form

    where Xi is the ith aiea population mean of xg's.It follows from (5.8) that the BLUP estimator is aweighted average of the survey regression estima-tor Ti, (xi and the regression syntheticiw)T6

    T -estimator Xi p If kg = 1 for all zj ,hen the sur-vey regression estimator is approximately design-unbiased for Yi under simple random sampling evenif ni is small. In the case of general kg's, i t is model-unbiased conditional on the realized local effect vi,unlike the BLUP estimator which is conditionallybiased.

    An empirical BLUP estimator, Y?, is obtainedfrom (5.7) by replacing (a:, a2) with asymptoticallyconsistent estimators 8: d2). Further, assumingnormality of the errors an approximately unbiasedestimator of MSE (YH), similar to (5.5) under model(4.4), is given by

    Heregli(a,2 2 - yi(a2/wi.) 1 fi)2~L-2ktTkt

    with kt denoting the vector of kg's for nonsampledunits in ith area, and

    2 2g2i(gu, = (x? Tii iw )T~-l igiw)u2with

    Further,

    where C V denotes the asymptotic covariance (seeStukel, 1991 and Prasad a nd ka o, 1990).For the ML and REML methods, the asymptoticcovariance matrix of 3;,32)can be obtained fromgeneral theory (see, e.g., Cressie, 1992). Stukcl(1991) and Fuller and Battese (1973) use the methodof fitting constants which involves two ordinaryleast square fittings: first, we calculate the residualsum of squares, SSE(l) , with z degrees of freedomby regressing through the origin the y-deviationsk ~ l ( ~ g - J ~ ~ , )n the nonzero x-deviations k;'(xij-&,)41for these areas with ni > 1. Second, we calculate theresidual sum of squares SSE(2) by regressing yglkgon xglkg. Then 82= v l l SSE(1) and d: = max(c7:, 0)with

    where

    with

    The Appendix gives the variances and covariance ofd2 and 3;.Again, ignoring the uncertainty in 3: and d2 and

    using Mli(3:, d2) = gli(8:, e2)as an estimator ofMSE By) could lead to severe underestimation ofthe true MSE.Limited simulation results (Prasad and Rao,1990; Datta and Ghosh, 1991 and Hulting and

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    13/29

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    14/29

    7MALL AREA ESTIMATION: N APPRAISALis a severe underestimate of the t rue posterior vari-ance V(Fi y). Again, the bootstrap and Kass-Steffeymethods can be applied to account for the underes-timation.

    If one wishes to view the EB approach in the fre-quentist framework, a prior distribution on P anda: cannot be entertained. In this case, MSE is anatura l measure of uncertainty and any differencesbetween the EB and EBLUP approaches disappearunder the normality assumption. It may also benoted that the EB estimator can be justified withoutthe normality assumption, similar to the EBLUP,using the "posterior linearity" property (Ghosh andLahiri, 1987; Ericson, 1969).5 3 HB Approach

    In the HB approach, a prior distribution on themodel parameters is specified and the posterior dis-tribution of the parameters of interest is then ob-tained. Inferences are based on the posterior distri-bution; in particular, a parameter of interest is esti-mated by it s posterior mean and i ts precision is mea-sured by its posterior variance. The HB approachis straightforward and clear-cut but computation-ally intensive, often involving high dimensional in-tegration. Recent advances in computational as-pects of the HB approach, such as Gibbs sampling(cf. Gelfand and Smith, 1990) and importance sam-pling, however, seem to overcome the computationaldifficulties to a large extent. "If he solution involvesonly one or two dimensional integration, it is ofteneasier to perform direct numerical integration thanto use Gibbs sampling or any other Monte Carlonumerical integration method. Datta and Ghosh(1991) apply the HB approach to estimation of smallarea means, Ti, under general mixed linear models,and also discuss the computational aspects.

    We now illustrate the HB approach under ourmodels (4.4) and (4.6), assuming noninformative pri-ors on ,fJ and the variance components a: and a2.The HB approach, however, can incorporate priorinformation on these parameters through informa-tive priors.Under model (4.4), we first obtain the posteriordistribution of Oi given 6 and a:, by assuming thatp has a uniform distribution over RP to reflect ab-sence of prior information on p. Straightforwardcalculations show tha t i t is normal with mean equalto the BLUP estimator F and variance equal toMli(a:), the MSE of F , ha t is, E(Oi16,a:) =F and~ 0 ~ 1 6 ,= MSE (F). is assumed:) Hence, when a:to be known, the HB and BLUP approaches lead toidentical inferences.

    To take account of th e uncertainty about a:, weneed to calculate the posterior distribution of a

    given 6 under a suitable prior on a:. The posteriormean and variance of Oi are then given by

    and

    where E,; and V ; respectively denote the expec-tation and variance with respect to the posteriordistribution of a: given 6. Numerical evaluation of(5.13) and (5.14) involves one dimensional integra-tion. Ghosh (1992) obtains the posterior distribu-tion, f a:,216), assuming th at a: has a uniform dis-tribution over (0,w) to reflect the absence of priorinformation about a:, and that a: and are inde-pendently distributed. It i s given by

    where

    We next turn to the nested error regression model4.6). We first obtain the posterior distribution of

    Yi given y, a: and a 2, by assuming that p hasuniform distribution over RP Straightforward cal-culations show that it is normal with mean equalto the BLUP estimator Ti and variance equal toMSE YH = Mli(a:, a2), h at is, E(Y,.ly, a:, a2 )= Ffand V(Fily,a,2,a2)= MSE(Yf). Hence, when botha: and a2 are assumed to be known, the HB andBLUP approaches lead to identical inferences.

    To take account of the uncertainty about a: anda2,Datta and Ghosh (1991) further assume P,(a2)-and (a:)- = (a2)-'X to be independently dis-tributed with (a2)- gamma ((1/2)ao, 1/2)go)and(a2)- A gamma ((1/2)a1,(1/2)gl), where a0 0,go 0, a1 > 0, gl 0 and X = a2/a:. Here gamma(a ,p) denotes the gamma random variable with pdff(z) = z > 0. Datta and Ghosh~p(-az>aPzP-~/r(p),(1991) obtain closed form expressions for E Y~~Y,)

    http:///reader/full/E(Y,.lyhttp:///reader/full/E(Y,.ly
  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    15/29

    68 M. GHOSH ND J. K. R Oand V(Yily,X) by showing that f(y*ly,A) is a mul-tivariate t-distribution. They also derive the pos-terior distribution of X given y, but it has a com-plex structure making it necessary to perform one-dimensional numerical integration to get ~ ( y ~ l y )and V(yiIy) using the following relationships:

    and

    where Ex and Vx respectively denote the expecta-tion and variance under the posterior distributionof X given the data y.

    Datta and Ghosh (1991) compare the HB, EB andEBLUP approaches using the data for our exam-ple 4 and letting a0 a1 0.005 and go gl 0to reflect the absence of prior information on a:and cr2 s one might expect, the three estimateswere close to each other as point predictors of smallarea (county) means; the EB estimate was obtainedby replacing X with the method-of-fitting constantsestimate in ~ ( Y ~ l y , The naive variance esti-).mate, v(Yily, sfB)2associated with the EB es-timate E(Y~IY,), was always found to be smallerthan the true posterior variance, V(Tily) (eB)2 ,associated with the HB estimate FrBE ( F ~ ( ~ ) ;orone county, sfBwas about 10% smaller than tfB.Note that the customary naive EB variance esti-mate, v(Yily, 3:, c?~),will lead to much more se-vere underestimation than V(Yi ly,1 since the lat-ter takes account of the uncertainty about anda2. The estimated MSE, mse (T r ) (tf)2, associ-ated with the EBLUP estimate, ??, was found tobe similar to the HB variance estimate. Our exam-ple in Section 6 also gives similar results. Dattaand Ghosh (1991) have also conducted a simula-tion study on the frequentist properties of the HBand EBLUP methods using the Battese, Harter and,Fuller (1988) model. Their findings indicate that thesimulated MSEs for the HB estimator are very closeto those for the EBLUP estimator while the coverageprobabilities based on yFBf(l .96 )eB urn out to beslightly bigger than those based on yf (1.96)sH,both being close to nominal confidence level of 95%.Hulting and Harville (1991) obtain similar results inanother simulation study using the Battese, Harterand Fuller (1988) model and varying the varianceratio a,2/a2.However, they find the HB method pro-duces different and more sensible answers than theEBLUP procedure if the estimate for a:/a2 is zeroor close to zero.

    The HB approach looks promising, but we need tostudy its robustness to choice of prior distributionson the model parameters.

    6 EX MPLESeveral of the proposed small area estimators arenow compared on the basis of their squared er-

    rors and relative errors from the true small areameans yi. For this purpose, we first constructeda synthetic population of pairs (YU,xli) esemblingthe business population studied by Sarndal andHidiroglou (1989) where the census divisions aresmall areas, y~ denotes wages and salaries of jthfirm in the ith census division and XU the corre-sponding gross business income. To generate thesynthetic population, we fitted the nested error re-gression model (4.6) with xgp Po P1xu and kiix~ to a real population to estimate Po and dl andthe variance components a: and a2 . The resultingsynthetic model is given by

    We then used model (6.1) in conjunction with thepopulation xy-values to generate a synthetic popula-tion of pairs (yo,xU)with m 16 small areas. Table1 reports the small area population sizes, Ni, andthe small area means (Yi,Xi) or this synthetic pop-ulation of size N 114. simple random sample ofsize n 38 was drawn from the synthetic popula-tion. The resulting small area sample sizes, ni, andsample data (yij,xij)are reported in Table 2 Notethat direct estimators cannot be implemented forareas 1, 4 and 13 since ni 0 for these areas. Wehave, therefore, confined ourselves to the followingindirect estimators valid for all ni 0:

    (i) Ratio-synthetic estimator: Y s @/zEi ,where @,R) are the overall sample means.(ii) Sample-size dependent estimator:

    where YyEGis a "survey regression" estima-tor, G i ,R i ) are the sample means, Wi ni/nand W Ni/N. This estimator correspondsto the weight (3.6) with 1 or the weight

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    16/29

    9MALL AREA ESTIMATION: N APPRAISALTABLE1Small area sizes, Ni, and means Yi,Xi)for a synthetic population N 114)

    rea reaO . Ni Yi N O . Ni Yi

    TABLE2Data from a simple random sample dra wn from a synthetic population n 38,N 114)reaNO. ni z j Y i j

    1 02 3 33,70 5.90

    47.19 13.2275.21 17.44

    3 1 36.43 2.544 05 1 28.82 3.616 2 30.60 11.48

    129.69 21.457 4 200.60 46.96113.92 15.57

    74.33 8.6653.00 11.90

    8 3 95.43 11.7635.75 -0.6939.08 21.46

    (3.7)with h 2 We have not included the op-timal composite estimator due to difficultiesin estimating the optimal weight (3.4).

    (iii) EBLUP (or EB) estimator Ff under model(4.6)with x p Po+,Dlxuand kg x ~ whereu and u2 are estimated by the method offitting constants.

    (iv) HB estimator FfBunder model (4.6) as in(iii), using Datta-Ghosh's diffuse priors witha. 0,go 0, a1 0.05 and g1 0.

    Using the sample data bij,xij) and the Imown smallarea population means xi we computed the above

    reaNO. ni Xij9 10 333.24 47.6280.91 5.2743.65 6.97

    29.29 -0.19102.66 15.94109.34 19.8430.56 2.57127.96 24.61190.34 35.4152.16 2.541 45.91 -6.34

    2 43.03 8.83190.12 27.31

    1 47.39 1.7003 35.66 -0.8040.23 2.75

    111.23 10.876 51.61 -3.20

    67.46 12.47190.97 21.7735.11 2.9225.09 -5.4673.51 7.35

    16 1 229.32 53.83

    four estimates along with their average relative er-rors

    ARE xn lest. Fi//Fi i = l

    and average squared errorsrnASE x ( e s t . Fi12.

    i=1These values are reported in Table 3. We also cal-culated the standard error, sf, of EBLUP estima-tor using (5.9) and the posterior standard deviation

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    17/29

    GHOSH AND N R OT LESmall area estimates and their ( ) average relative errors and average squared roots; stand ard error (S. E. )of EBLUP a nd HBestimators

    S.E.AreaNO. i i Rs SD EBLUP HB EBLUP HE

    Av Rel Error%:Av Sq. Error:

    ==ratio synthetic estim ator; SD=sam ple-size dependent estimator ; EBLUP=EBLl.JP or EB es timator ; HB=HB estimator.

    standard error), s f B , of HB estimator using one-dimensional numerical integration. These valuesare also reported in Table 3.

    The following observations on the relative perfor-mances of small area estimates may be drawn fromTable 3: 1) EBLUP and HB estimators give simi-lar values over small areas, and their average rel-ative errors ( ) are 11.74 and 11.23 and squarederrors are 2.84 and 2.69 respectively. Asymptoti-cally as m co the two estimators are identical,and the observed differences are due to moderatem = 16) and the method of estimating a and aREML or ML would give slightly different EBLUPvalues). 2) Standard error values for EBLUP andHB estimators are also similar. This is in agree-ment with the empirical results of Datta and Ghosh1991) and Hulting and Harville 1991). 3) Un-

    der the criterion of average squared error, EBLUPb d HB estimators perform much better than theratio-synthetic and sample-size dependent estima-tors: 2.84 for EBLUP vi. 12.38 for sample-size de-pendent SD) and 22.10 for ratio-synthetic RS). 4)Under the criterion of average relative error ( ),however, EBLUp and HB estimates are not muchbetter than the sample-size dependent estimator:11.74for EBLUP versus 12.40 for SD. However, bothperform much better than the ratio-synthetic esti-mator with ARE 17.85.

    It may be noted t ha t EBLUP, EB and HB estima-tors a re optimal under squared error loss and cease

    to be so under relative error loss. This is due to thefact that the Bayes estimators under relative errorloss can often differ quite significantly from thoseunder squared error loss. This nonoptimality car-ries over to EBLUP estimator which usually mim-ics closely the Bayes estimators. The above obser-vations could perhaps explain why in our examplethe Bayes and EBLUP estimator did not improvesignificantly over the SD estimator under relativeerror.All in all, our results in Table 3 clearly demon-strate the advantages of using the EBLUP or HBestimator and associated standard error when theassumed random effects model fits the data well.Note that we simulated t he data from an assumed

    model.) It is important, therefore, to examine theaptness of the assumed model using suitable diag-nostic tools; Section 7.1 gives a brief account of di-agnostics for models 4.4) and 4.6).

    7 SPECI L PROBLEMSIn this section we focus on special problems thatmay be encountered in implementing model-based

    methods for small area estimation. We also discusssome extensions of our basic models 4.4) and 4.6).7 1 Model Diagnostics

    Model-based methods rely on careful checking ofthe assumed models in order to find suitable models

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    18/29

    7MALL AREA ESTIMATION N APPRAISALthat fit the data well. Model diagnostics, therefore,play an important role. However, the literature ondiagnostics for mixed linear models involving ran-dom effects is not extensive, unlike standard regres-sion diagnostics. Only recently have some usefuldiagnostic tools been proposed. See, for example,Battese, Harter and Fuller (1988); Beckrnan, Nacht-sheim and Cook (1987); Calvin and Sedransk (1991);Christensen, Pearson and Johnson (1992); Cressie(1992); Dempster and Ryan (1985) and Lange andRyan (1989).

    We first consider the Fay-Herriot type mode1 (4.4),where only area-specific covariates are used. Whenthe model is correct, the standardized residualsri = 8:~: i)-('l2) (Bi xTb), i = 1,..m areapproximately iid N( 0 , l ) for large m where fl is

    7 2 Constrained EstimationDirect survey estimates are often adequate at anaggregate (or large area) level in terms of precision.For example, Battese, Hal-ter and Fuller (1988), intheir application, find tha t the direct regression es-timator of the mean crop area for the 12 counties

    together has adequate precision. It is, therefore,sometimes desirable to modify the individual smallarea estimators so that a properly weighted sum ofthese estimators equals the model-free, direct esti-mator at the aggregate level. The modified estima-tors will be somewhat less efficient than the origi-nal, optimal estimators, but they avoid possible ag-gregation bias by ensuring consistency with the di-rect estimator. One simple way to achieve consis-tency is to make a ratio adjustment, for example,the BLUE estimator (5.2) with a: replaced by 8:. ?he EBLUP estimator of a total i is modified toWe can, therefore, use a q q plot of ri against@-'[Fm(ri)l, where @(r)nd Fm(r)re the standardnormal and empirical cdfs, respectively. A primarygoal of this plot is to check the normality of therandom effects vi since the sampling errors ei areapproximately normal due to the central limit the-orem effect. Dempster and Ryan (1985) note thatthe above q q plot may be inefficient for this pur-pose since it gives equal weight to each observa-tion, even though the Bis differ in the amount ofinformation contained about the vis. They proposea weighted q q plot which uses a weighted em-pirical cdf FL(r) = CiI( r ri)Wi/CiWi in place ofF,(r), where I( t) = 1 for t 2 0 and 0 otherwise, andWi = 8: z;~ ~)-' in our case. This plot is moresensitive to departures from normality than the un-weighed plot since it assigns greater weight to thoseobservations for which 8: account for a larger partof the total variance 8: ~ i ~ ~

    We next turn to the nested errors regressionmodel (4.6), where the yg's are correlated for eachi. In this case, the transformed residuals rg =47' (Yu-q iw)- l are approximately( x ~ - + ~ x ~ , ) ~uncorrelated with equal variances a2. Therefore,traditional regression diagnostics may be applied tothe rgs, but the transformation can mask the effectof individual errors eg. On the other hand, stan-dardized BLUP residuals k,;& x b Bi)/b maybe used to study the effect of individual units i j )on the model, provided they are not strongly corre-lated. Lange and Ryan (1989) propose methods forchecking the normality assumption on the randomeffects vi using the BLUP estimates D i .

    Christensen, Pearson and Johnson (1992) developcase-deletion diagnostics for detecting influentialobservations in mixed linear models. Their meth-ods can be applied to model (4.6) as well as to morecomplex small area models.

    where ? is a direct estimator of the aggregatepopulation total Y = CiYi. Battese, Har ter andFuller (1988) and Pfeffermann and Barnard (1991)propose alternative estimators involving estimatedvariances and covariances of the optimal estimators?yThe previous sections focused on simultaneous es-timation of small area means or totals, but in some

    applications the main objective is to produce an en-semble of parameter estimates whose histogram isin some sense close to the histogram of small areaparameters. Spjatvoll and Thomsen (1987), for ex-ample, were interested in finding how 100 munic-ipalities in Norway were distributed according toproportion of persons not in the labor force. Theypropose constrained E estimators whose variationmatched the variation of the small area populationmeans. By comparing with the actual distributionin their example, they show that the E estimatorsare biased toward the prior mean compared to theconstrained E estimators. Constrained estimatorsreduce shrinking towards the synthetic component;for example, in (5.1) the weight 1 yi, attached tothe synthetic component, is reduced to 1 y,f/ . Fol-lowing Louis (1984), Ghosh (1992) develops a gen-eral theory of constrained HB estimation. Ghoshobtains constrained HB estimates by matching thefirst two moments of the histogram of the estimates,and the posterior expectations of the first two mo-ments of the histogram of the parameters and mini-mizing, subject to these conditions, the posterior ex-pectation of the Euclidean distance between the es-timates and the parameters. Lahiri (1990) obtains

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    19/29

    7 M. GHOSH AND J N. K. R Osimilar results in the context of small area estima-tion, assuming posterior linearity, thus avoidingdistributional assumptions. Constrained Bayes es-timates are suitable for subgroup analysis where theproblem is not only to estimate the different compo-nents of a parameter vector but also to identify theparameters that are above or below a specified cut-off point. It should be noted that synthetic estimatesare inappropriate for this purpose.The optimal estimators (i.e., EBLUP, EB and HBestimators) may perform well overall but poorlyfor particular small areas that are not consistentwith the assumed model on small area effects. Toavoid this problem, Efron and Morris (1972) and Fayand Harriot (1979) suggest a straightforward com-promise that consists of restricting the amount bywhich the optimal estimator differs from the directestimator by some multiple of the standard errorof the direct estimator. For example, a compromiseestimator corresponding to the HB estimator p ,under a normal prior on the OiYs,s given by

    where c > 0 is a suitable chosen constant, say = 1A limitation of the compromise estimators is that noreliable measures of their precision are available.7 3 xtensions

    Various extensions of the basic models (4.4) and(4.6) have been studied in the literature. Due tospace limitation, we can only mention some of theseextensions.Datta et al. (1992) extend the aggregate-levelmodel (4.4) to the case of correlated sampling errorswith a known covariance matrix and develop HBand EB estimators and associated measures of pre-cision. In their application to adjustment of censusundercount, the sampling covariance matrix is block

    diagonal. Cressie (1990a) introduces spatial depen-dence among the random effects vi, in the contextof adjustment for census undercount. Fay (1987)and Ghosh, Datta and Fay (1991) extend model(4.4) to multiple characteristics and perform hier-archical and empirical multivariate Bayes analysis,assuming that the sampling covariance matrix ofai, the vector of direct estimators for ith area, isknown for each i. In their application to estimationof median income for four-person families by state,Oi = with Oil population median incomeOil, 0 ~ 2 ~ =of four-person families in state i and Oi2 = (pop-ulation median income of three-person families in

    state i) : (median income of four-person familiesin state i). By taking advantage of the strong corre-lation between the direct estimators il and Biz, theywere able to obtain improved estimators of Oil.Many surveys are repeated in time with partialreplacement of the sample elements, for example,

    the monthly U.S. Current Population Survey andthe Canadian Labor Force Survey. For such re-peated surveys considerable gain in efficiency canbe achieved by borrowing strength across both smallareas and time. Cronkite (1987) developed re-gression synthetic estimators using pooled cross-sectional time series data and applied them to es-timate substate area employment and unemploy-ment using the Current Population Survey monthlysurvey estimates as dependent variable and countsfrom the Unemployment Insurance System andCensus variables as independent variables. Rao andYu (1992) propose an extension of model (4.4) o timeseries and cross-sectional data. Their model is of theform

    where it is the direct estimator for small area i attime t, the eit2s re sampling errors with a knownblock diagonal covariance matrix \k = block diagi(\kl,.. k xit is a vector of covariates and viN(0,0: . Further, the uit2sare assumed to followa first order autoregressive process for each i, i.e.,uit = p~ i ,~ - l + e i ~ , z N(0, u2). They ob-pI < 1with fittain the EBLUP and HB estimators and their stan-dard errors under (7.2) and (7.3).

    Models of the form (7.3) have been extensivelyused in the econometric literature, ignoring Sam-pling errors (see, e.g., Anderson and Hsiao, 1981;Judge, 1985, Chapter 13). Choudhry and Rao (1989)treat the composite error wit = eit uit as a first or-der autoregressive process and obtain the EBLUPestimator of x ip vi. A drawback of their methodis that the area by time specific effect uit is ignoredin modelling the OitYs.

    Pfeffermann and Burck (1990) investigate moregeneral models on the Oit's, but they assume mod-eling of sampling errors across time. They obtainEBLUP estimators of small area means using theKalman filter. Singh and Mantel (1991) considerarbitrary covariance structures on sampling errorsand propose recursive composite estimators usingthe Kalman filter. These estimators are not opti-mal but appear to be quite efficient relative to thecorresponding EBLUP estimators.

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    20/29

    7MALL AREA ESTIMATION AN APPRAISALTurning to extension of the nested error regres-sion model (4.6), Fuller and Harter (1987) propose

    a nested regression andobtain EBLUP estimators and associated standarderrors. Stukel (1991) studies two-fold nested errorregression and obtains E LUP estimatorsand associated standard errors' Such are

    for two-stage withinIKleffeand lgg2xtend mode' (4'6)the case of random error variances, 9 and obtainEBLUP estimator and associated standard errors inthe special case of x~ 1

    MacGibbon and Tomberlin (1989) and Malec, Se-dransk and Tompkins (1991) study logistic regres-sion models with random area-specific effects. Suchmodels are appropriate for binary response vari-ables when element-specific covariates are avail-able. MacGibbon and Tomberlin (1989) obtain EBestimators of small area proportions and associatedstandard errors, but they ignore the uncertaintyabout the prior parameters. Farrell, MacGibbonand Tomberlin (1992) apply the bootstrap method ofLaird and Louis (1987) to account for the underesti-mation of true posterior variance. Malec, Sedranskand Tompkins (1991) obtain HB estimators and as-sociated standard errors using Gibbs sampling andapply their method to data from the U.S. NationalHealth Interview Survey to produce estimates ofproportions for individual states.

    EB and HB methods have also been used for es-timating regional mortality and disease rates (see,e.g., Marshall, 1991). In these applications, the ob-served small area counts, yi, are assumed to be in-dependent Poisson with conditional mean E(yi einiOi, where Oi and ni respectively denote the truerate and number exposed in the ith area. Further,the Ois are assumed to be random with a specifieddistribution (e.g., a gamma distribution with un-known scale and shape parameters). The EB or HBestimators are shrinkage estimators in the sensetha t the crude rate yi/ni is shrunk towards an over-all regional rate, ignoring the spatial aspect of theproblem. Marshall (1991) proposes local shrink-age estimators obtained by shrinking the crude ratetowards a local neighbourhood rate. Such estima-tors are practically appealing and further work ontheir statistical properties is desirable.

    De Souza (1992) studies joint mortality rates oftwo cancer sites over several geographical areasand obtains asymptotic approximations to posteriormeans and variances using the general first orderapproximations given by Kass and Steffey (1989).The bivariate model leads to improved estimatorsfor each site compared to the estimators based onunivariate models.

    8 ON LUSIONIn this article, we have used the term small areato denote any local geographical area tha t is small or

    to describe any small o a population suchas a specific age-sex-race group o people within alarge geographical area. Sample sizes or small ar-eas are typically small because the overall samplesize in a survey is usually determined to provide de-sired accuracy a t a much higher level of aggregation.As a result, the usual direct estimators o a smallarea mean are unlikely to give acceptable reliabil-ity; and it becomes necessary to borrow strength*from related areas to find mire accurate estimatorsfor a given area or, simultaneously, for several ar-eas. Considerable attention has been given to suchindirect estimators in recent years.

    We have attempted to provide an appraisal of in-direct estimation covering both traditional design-based methods and newer model-based approachesto small area estimation. Traditional methods cov-ered here include demographic techniques for lo-cal estimation of population and other characteris-tics of interest in post-censal years, and syntheticand sample size dependent estimation. Model-basedmethods studied here include EBLUP, EB and HBestimation. Two types of basic small area modelsthat include random area-specific effects are usedto describe these methods. In the first type of mod-els, only area-specific auxiliary data are availablefor the population elements while in the second typeelement-specific auxiliary data are available for thepopulation elements.

    We have emphasized the importance of obtainingaccurate measures of uncertainty associated wi ththe model-based estimators. To this end, an approx-imately unbiased estimator of MSE of the EBLUPestimator is given as well as two methods of approx-imating the true posterior variance, irrespective ofthe form of the prior distribution on the model pa-rameters. The lat ter approximations may be usedas measures of uncertainty associated with the EBestimator. In the HB approach, a prior distributionon the model parameters is specified and the result-ing posterior variance is used as a measure of uncer-tainty associated with the HB estimator (posteriormean). We have also mentioned several applicationsof the model-based methods.

    We have also considered special problems th atmay be encountered in implementing model-basedmethods for small area estimation; in particu-lar, model diagnostics for small area models, con-strained estimation, 10cal'~ hrinkage, spatial mod-elling and borrowing strength across both small ar-eas and time. We anticipate quite a bit of futureresearch on these topics.

    Caution should be exercised in using or recom-

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    21/29

    7 M. GHOSH AND J N. K. RAOmending indirect estimators since they are basedon implicit or explicit models that connect the smallareas, unlike the direct estimators. As noted bySchaible 1 9 9 2 ) : Indirect estimators should be con-sidered when better alternatives are not available,but only with appropriate caution and in conjunc-tion with substantial research and evaluation ef-forts. Both producers and users must not forgetthat, even after such efforts, indirect estimates maynot be adequate for the intended purpose. (Also seeKalton, 1987. )Finally, we should emphasize the need for devel-oping an overall program that covers issues relatingto sample design and data evolvement, organizationand dissemination, in addition to those pertainingto methods of estimation for small areas.

    APPENDIXVariances and Covariance of 3: and d 2

    Let 8: and d 2 be the estimators of a: and a 2 ob-tained from the method of fitting constants. Thenv d 2 ) 2 ~ ~ ~ ~ 4v ~ , Z ) = 2 ~ 7 ; ~v ; l n p y X n p ) a 4

    + o ff + 29)*a2a:]with

    q.. W 1 w ~ ~ L A ; ~ ~ ~and

    2 1 - 1cov d , , 3: ) -2~7; vl n p * a4(See Stukel, 1991. )

    ACKNOWLEDGMENTSThe authors would like to thank the former edi-tor, J. V. Zidek, and a referee for several construc-tive suggestions. M. Ghosh was supported by NSFGrants DMS 89 01334 and SES-92-01210. N. K.Rao's research was supported by a grant from theNatural Sciences and Engineering Research Coun-cil of Canada.

    REFERENCESANDERSON,. W. and HSIA O, . 1981). Formu lation and estima-tion of dynamic models using panel dat a. J Econometrics 1867-82.

    BATTESE,G. E., HA RTER , . M. and FU LLE R, . A. 1988). Anerror-component model for prediction of county crop areasusing survey and satell i te data. J Amer. Statist. Assoc. 8328-36.BECKMAN, C. J. an d COO K, . D. 1987). Diag-. J. , NACHTS HEIM,nostics for mixed-model analysis of variance. Technometrics29 413-426.BOGUE,D. J. 1950). A techniq ue for mak ing extensive postcensalestimates. J Amer. S tatis t. Assoc. 45 149-163.BOGUE,D. J an d DU NC AN , . D. 1959). A composite methodof estimating p ost censal population of small area s by age,sex and colour. Vital Statistics -Spe cial Repo rt 47, No. 6, Na-tional Office of Vital Statistics, Washington, DC.BRACKSTONE,. J. 1987). Small area data: policy issues andtechnical challenges. In Sm all Area S tatist ics R. Platek, J.N. K Rao, C. E. Stirndal and M. P. Si ng h eds.) 3-20. Wiley,New York.CALVIN,. A. and SEDRANSK,. 1991). Bayesian and frequentistpredictive inference for th e patte rns of care studies . J Amer.Statist. Assoc. 86 36-48.CHAMBERS, G. A. 1977). Log line ar models for. L. and FEENE Y,small area estimation. Unpublished paper, Australian Bu-reau of Statistics.

    C H A U D H U R I ,. 1992). Sm all domain statistic s: a review. Tech-nical report ASCl9212, Ind ian S tatistical In stitu te, Calcutta.C H O U D H R Y ,. H. and RAO, J. N. K. 1989). Sm all ar ea esti-mation using models that combine t ime series and cross-sectional dat a. In Symp osium 89-Analysis of Da ta inrime-Proceedings A. C. Singh and P. Whitridge, eds.) 67-74. Statist ics Canada , Ottawa.CHRISTENSEN, L. M.a n d J O H N S O N ,., PEA RSO N, W. 1992). Ca sedeletion diagnostics for m ixed models. Technometrics34 38-45.CR ESS IE,. 1989). Empirical Bayes estim ation of undercount inthe decennial census. J Amer. Sta tist. Assoc. 84 1033-1044.CRESSIE,N. 1990a). Sm all ar ea prediction of undercount usingthe general linear model. In Symposium 90-Measurementan d Improvem ent of D ata Quality-Proceedings 93-105.Statist ics Canada, Ottawa.CRES SIE,. 1992). REML estimation in empirical Bayes smooth-ing of census und ercount. Survey Methodology 18 75-94.C R O N K I T E ,. R. 1987). U se of regres sion techniq ues for devel-oping state and area employment and unemployment esti-mates. In Sm all Area Statistics R. Platek, J. N. K. Rao, C.E. S arn dal a nd M. P. S ingh, eds.) 160-174. Wiley, New York.DATTA,G. S. and GHO SH, . 1991). Bayesian prediction in linearmodels: Applications to small area estimation. Ann. Statist.19 1748-1770.DATTA,G. S., GHO SH, ., HU AN G, C. T., SCHULTZ,. T., ISAKI, L.K. and TSAY, . H. 1992). Hierarchical an d empirical Bayesmethods for adjus tme nt of census undercount: th e 1980 hqis-souri dress re hearsa l data. Survey Methodology 18 95-108.DEMPSTER,. P. an d RYAN , . M. 1985). Weighted normal plots.J Amer. Sta tist. Assoc. 80 845-850.DESOUZA,C. M. 1992). An approximate bivariate Bayesianmethod for analysin g small frequencies. Biometries 48 1113-1130.DREW,D., SINGH , . P. and CHOUDHRY,. H. 1982). Evaluationof small area estimation techniques for the Can adian LabourForce Survey. Survey Methodology 17-4 7.EFRON,B. and M ORR IS, . 1972). Limiting the risk of Bayesan d empirical Bayes estimates-Part 11: the empirical Bayescase. J Amer. S tatis t. Assoc. 67 130-139.E R I C K S E N , P. 1974). A regression m ethod for estim ating pop-ulations of local areas. J Amer. Sta tist. Assoc. 69 867-875.ERICKSEN,. l? and KADANE, B. 1985). Estim ating the pop-ulation in a census year with discussion). J Amer. Statist.Assoc. 80 98-131.

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    22/29

    7M AREA ESTIMATION: AN APPRAISALERICKSEN, J. B. (1987). Sensitivity analysis. P. and KADANE,of local estimates of undercount in the 1980 U.S. Census. InSmall Area Statistics (R. Platek, J. N. K. Rao, C .E. Sarnd aland M. P. Singh, eds.) 23-45. Wiley, New York.ERICKSEN, J. B. (1992). Com ment on Should. P. and KADANE,we have adjuste d the U.S. Cen sus of 1980?, by D. A. Freed -m an an d W. C. Navidi. Survey Methodology 8 52-58.ERICKSEN,. P., KADANE,. B, and TUKEY, . W. (1 989 ). Ad-

    justing th e 1981 census of population an d housing. J Amer.Statist. A ssoc. 84 927-944.ERICSON,W. A. (1969). A note on t he posterior mean. J Roy.Statist. Soc. Ser. B 31 332-334.FARRELL, J., MA CGIBB ON, T.J. (1992).. and TOMBERLIN,evaluation of bootstrap techniques for correcting empiricalBayes interval estimates. Unpublished manuscript, Dept.Statistics a nd Ac tuarial Science, Univ. Waterloo.FAY,R. E. (1987). Application of multiv ariate regression to smalldomain estimation. In Small Area Statistics (R. Platek, J. N.K. Rao, C. E. Sa rn da l and M. P. Singh , eds.) 91-102. Wiley,New York.FAY,R. E. an d HE RRIO T, . A. (1979). Est ima tes of income forsmall places: an application of Jame s-Stein procedures tocensus data. J Amer. S tatist. Assoc. 74269-277.

    F R E E D M A N ,. A. and NAVIDI,W. C. (1986). Regressio n modelsfor adjusting the 1980 Census (with discussion).Statist. Sci.1 1-39.FRE EDM AN , W. C. (1992). Sho uld we have ad-. A. and NAVIDI,justed the U.S. Census of 1980? (with discussion). SurveyMethodology 18 3-74.F U L L E R ,W. A. an d BATTE SE, . E. (1973). Transformations forestimation of linear models with nested error structure. JAmer. S tatist. Assoc. 68626-632.F U L L E R ,W. A. an d HA RTE R,R. M. (1987). The multivariatecomponents of variance model for small area estimation. InSmall Area Statistics (R. Platek, J. N. K. Rao, C. E. S k n d a land M. P. Singh, eds.) 103-123. Wiley, New York.GELFAND,. and SMITH,A. F. M. (1990). Sam plin g based ap-proaches to calculating marginal densities. J Amer. Statist.Assoc. 85 398-409.GH AN GU RD E, M. P. (1977). Synthetic estimators. D. and SING H,in periodic households surveys. Survey Methodology 3 152-181.GHOSH,M. (1992a). Hierarchical and empirical Bayes multivari-ate es t imat ion. In Current Issues in Statistical Inference: Es-says in Honor o f D. Basu (M. Ghosh and P. K. Pathak, eds.)151-177. IMS, Hayw ard, CA.GHOSH,M. (1992b). Constrained Bayes estimation with applica-tions. J Amer. Sta tist. Assoc. 87533-540.GHOSH,M., DATTA, . S. and FA Y,R. E. (1991). Hiera rchica l an dempirical multivariate Bayes analysis in small are a estima-tion. In Proceeding of the Bureau of the Census Annual R e-search Conference63-79. Bure au of th e Census, Washington,DC.

    GHOSH,M. and LAHIRI,. (1987). Robust empirical Bayes esti-mation of means from stratified samples. J Amer. Statist.Assoc. 82 1153-1162.G O N Z A L E Z ,. E. (1973). Use a nd eva luation of synthetic estima-tors. In Proceedings of the Social Statistics Section 33-36.Arner. Sta tist. Assoc., Washington, DC.H A N S E N ,M., HURW ITZ, . N. an d MAD OW , . G. (1953). SampleSurvey Methods and Theory 1 Wiley, New York.HA RVI LLE ,. A. (1991). Com ment on, That BLU P is a goodthing: The estim ation of randb m effects, by G. K. Robin-son. Statist. Sci. 6 35-39.HENDERSON,. R. (1950). Estimation of genetic parameters (ab-stract). Ann. Math. Statist. 21 309-310.HOLT,D., SMITH, . M. F. and TOM BERL IN,. J. (1979). A model-based approach to estimation for small subgroups of a pop-ulation. J Amer. Sta tist. Assoc. 74 405-410.

    HU LTI NG , D. A. (1991). Some Bayesian and. L. and HAR VILLE ,non-Bayesian procedures for the analysis of comparative ex-periments an d for small are a estimation: computational as-pects, frequentist properties, and relationships. J Amer.Statist. A ssoc. 86 557-568.I SA K I, C . T., S C H U L T Z , S ~ ~ I T H ,. K. P. J. and DIFFENDAL,D. J. (1987). Small area estimation research for censusundercount-progress repo rt. In Small Area Statistics (R.Platek, J. N. K. Rao, C.E. Sarndal and M.P. Singh, eds.)219-238. Wiley, New York.J U D G E ,G. G. (1985). The Theory and Practice of Econometrics2nd ed. Wiley, New York.KACKAR,. N. and HARVILLE,. A. (1984). Approximations forstanda rd e rrors of estimators for fixed an d random effects inmixed m odels. J Amer. S tatist. Assoc. 79 853-862.KALTON,G. (1987). Discussion. In Small Area Statistics (R.Platek, J . N. K. Rao, C. E. Sarndal and M. P. Singh, eds.)264-266 . Wiley, New Y ork.KALTON,G., KORDOS, . and PLATEK, . (1993). Small AreaStatistics and Survey Designs Vol. I Invited Papers; Vol. 11:Contributed Papers and Panel Discussion. Central Statisti-cal Office, Warsaw.a s s , R. E. and ST EFF EY , . (1989). Approxim ate Bayesian infer-ence in conditionally independe nt hierarchical m odels (pa ra-metric empirical Bayes models). J Amer. Statist. Assoc. 84717-726.KLEFFE,J. and RA O, . N. K. (1992). Estimation of mean squ areerror of empirical best linear unbiased predictors under arandom error variance linear model. J Multivariate Anal.43 1-15.LAH IRI,. (1990). 'Adjusted Bayes and empirical Bayes estima-tion in finite population sam pling. Sankhy6 Ser. B 5250-66.LAHIRI,P. and RAO, J. N. K. (1992). Robust estimation ofmean square error of small area estimators. Unpublishedmanuscript.LAIRD,N. M. a nd L OU IS, . A. (1987). Empirical Bayes confidenceintervals based on b ootstrap samples. J Amer. Sta tist. Assoc.82 739-750.

    LANCE,N. an d RYAN , . (1989). Assessing normality i n randomeffects models. Ann. Statist. 17624-642.Lours T. (1984). Estimating a population of parameter valuesusing Bayes a nd em pirical Bayes methods. J Amer. Statist.Assoc. 79 393-398.MACGIBBO N, T.J. (1989). Sma ll are a estima-. and TOMBERLIN,tion of proportions via empirical Bayes techniques. SurveyMethodology 15 237-252.MALEC,D., SED RAN SK, L. (1991). Bayesian pre-. and TOMPKINS,dictive inference for small ar eas for binary variables in th eNational Hea lth Interview Survey. Unpublished manuscript.M A R K E R ,D. A. (1983). Organization of small area estimators.Proceedings of Survey Research Methods Section 409-414.Arner. St atist. Assoc., Washington , DC.MARSHALL,. J. (1991). Mapping disease and mortality ratesusing empirical Bayes estimators. J Roy. Statist. Soc. Ser.C 40 283-294.MCCULLAGH,. and ZIDEK, . (1987). Regression methods andperformance criteria for small area population estimation. InSmall Area Statistics (R. Platek, J. N. K Rao, C. E. Sarnda lan d M. P. Sin gh , eds .) 62-74. Wiley, New York.MORRIS,C. (1983). Par am etri c empirical Bayes inference: Theoryand applications (with discussions). J Amer. Sta tist. Assoc.7847-65.M O R R I S O N ,. (1971). Demo graphic information for cities: a m an-ual for estimating and projecting local population character-istics. RAND repo rt R-618-HUD.NATIONALENTERFOR HE AL THSTA TIST ICS1968). SyntheticState Estimates of Disability. P.H.S. Publication 1759. U.S.Government Printing Office, Washington, DC.

    NATIONAL O N DRUG (1979). Synthetic EstimatesNSTITUTE ABUSE

  • 8/13/2019 GhoshAndRao_1994 SAE-An Appraisal

    23/29

    7 M GHOSH AND J. N. R Ofor Sm all Areas (research monograph 24). U.S. GovernmentPrinting Office, Washington, DC.NATIONALESEARCHOUNCIL1980). Panel on Small-Area Esti-mates of Population a nd Income. Estim ating Population a ndIncome of S ma ll Areas. Natio nal Academy Press, W ashing-ton, DC.NICHO L,. (1977). A regression approach to small are a estima-tion. Unpublished manuscript, Australian Bureau of Statis-tics, Canberra, Australia.NTIS (1963). Indirect estimators in federal programs. Statisti-cal policy working paper 21, prepared by the Subcommitteeon Sm all Area Estimation, Federal Com mittee on StatisticalMethodology, Office of M anag emen t an d Bu dget, W ashing-ton, DC.PFEF FERM AN N, L. (1990). Robust small area esti-. and BURCK,mation combining time series and cross-sectional data. S ur -vey Metho dology 16 217-237.PFEFFER MAN N, C. (1991). Some new estimators. and BARNARD,for small area mean s with applications to t he assessment offarmland values. Jou rn al of Business and Economic Statis-tics 9 73-84.PLA TEK , . and SIN GH , . P. (1986). Sm all Area Statistics: Con-tributed Papers. Laboratory for Research in Statistics andProbability, Carleto n Univ.PLATEK, ., RAO,J. N. K. SARND AL,. E, an d SINGH,M. P.(1987). Sma ll Area Statistics. Wiley, New York.PRASAD , . G. N. and RA O, J N. K. (1990). The estimationof mean squared errors of small-area estimators. J Amer.Statist. Assoc. 85 163-171.PUR CEL L, S. (1976). Techniques for the estima-. J and LINACRE,tion of small area characteristics. Unpublished manuscript.PURCELL,. J. an d KIsH,L (1979). Estima tion for small domain.Biometries 35 365-384.PUR CEL L,. J an d KIsH,L (1980). Postcensal e stimates for localar eas (or domains). Intern at. Statist. Rev. 48 3-18.RAO,C. R. and SHINOZAKI,. (1978). Precision of individual esti-mates i n simultaneous estimation of parameters. Biometrika65 23-30.RAO,J N. K. (1986). Synthetic estimators, SPREE and bestmodel based predic tors. I n Proceedings of the Conference onSurvey Research Method s in Agriculture 1-16. U S . Dept.Agriculture, W ashington, DC.RAO,J. N. K. a nd Yu, M. (1992). Sma ll area estimation by com-bining tim e series an d cross-sectional data. I n Proceedingsof the Survey Research M ethods Section 1-19. Arner. Sta tis t.

    CommentNoel Cressie and Mark S Kaiser

    Malay Ghosh and Jon Rao have presented uswith a well written exposition of the topic of smallarea estimation. The past literature has been de-

    Noel Cressie is Professor of Statistics and Distin-guished Professor in Liberal Art s an d Sciences andMark S . Kaiser is Assistant Professor Departmentof Statistic s Iowa Sta te University Snedecor HallAmes Iowa 50011 1201.

    Assoc., Alexandria, VA.ROBINSON,. K. (1991). Th at BLUP is a good thing: th e estima-tion of random effects (with discussion). Stati st. Sci. 6 15-51.SA RN DA L, M. A. (1989). Sma ll domain. E. and HIDIROGLOU,estimation: a con ditional analysis. J. Amer. Statis t. Assoc.84266-275.SC HA IBL E,. L. (1978