Variance Estimation in Complex Surveys (1).ppt

Embed Size (px)

Citation preview

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    1/27

    Variance Estimation in

    Complex SurveysDrew Hardin

    Kinfemichael Gedif

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    2/27

    So far..

    Variance for estimated mean and totalunder

    SRS, Stratified, Cluster (single, multi-stage), etc.

    Variance for estimating a ratio of twomeans under

    SRS (we used linearization method)

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    3/27

    What about other cases?

    Variance for estimators that are not linearcombinations of means and totals

    Ratios

    Variance for estimating other statistic fromcomplex surveys

    Median, quantiles, functions of EMF, etc.

    Other approaches are necessary

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    4/27

    Outline

    Variance Estimation Methods Linearization

    Random Group Methods

    Balanced Repeated Replication (BRR) Resampling techniques

    Jackknife, Bootstrap

    Adapting to complex surveysHot research areas

    Reference

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    5/27

    Linearization (Taylor SeriesMethods)

    We have seen this before (ratio estimatorand other courses).

    Suppose our statistic is non-linear. It canoften be approximated using TaylorsTheorem.

    We know how to calculate variances oflinear functions of means and totals.

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    6/27

    Linearization (Taylor SeriesMethods)

    Linearize

    Calculate Variance

    ),(

    )(

    )(

    ),...,(

    2

    ),...(1

    2

    ),...(

    1

    1 11

    ji

    jji i

    ktt

    k

    ttk

    ttCovt

    h

    t

    h

    tV

    t

    htV

    t

    htthV

    kk

    k

    j

    jjttt

    j

    kk tt

    c

    cccchttthtttth k

    k

    1

    ,..,21321 )(),....,,,(

    ),...,,(),...,,,( 21321

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    7/27

    Linearization (Taylor Series)Methods

    Pro:

    Can be applied in general sampling designs

    Theory is well developed

    Software is available

    Con:

    Finding partial derivatives may be difficult

    Different method is needed for each statistic

    The function of interest may not be expressed asmooth function of population totals or means

    Accuracy of the linearization approximation

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    8/27

    Random Group Methods

    Based on the concept of replicating the surveydesign

    Not usually possible to merely go and replicatethe survey

    However, often the survey can be divided into Rgroups so that each group forms a miniatureversions of the survey

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    9/27

    Random Group Methods

    1 2 3 4 5 6 7 8Stratum 1

    1 2 3 4 5 6 7 8Stratum 2

    1 2 3 4 5 6 7 8Stratum 3

    1 2 3 4 5 6 7 8Stratum 4

    1 2 3 4 5 6 7 8Stratum 5

    Treat as miniature sample

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    10/27

    Unbiased Estimator (Average of Samples)

    Slightly Biased Estimator (All Data)1

    )~

    (1

    )~

    ( 1

    2

    1

    RR

    V

    R

    rr

    1

    )(1

    1

    2

    2

    RRV

    R

    r

    r

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    11/27

    Random Group Methods

    Pro: Easy to calculate General method (can also be used for non smooth

    functions)

    Con:Assumption of independent groups (problem when N

    is small) Small number of groups (particularly if one strata is

    sampled only a few times)

    Survey design must be replicated in each randomgroup (presence of strata and clusters remain thesame)

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    12/27

    Resampling and Replication Methods

    Balanced Repeated Replication (BRR)

    Special case when nh=2

    Jackknife (Quenouille (1949) Tukey (1958))

    Bootstrap (Efron (1979) Shao and Tu (1995))

    These methods Extend the idea of random group method

    Allows replicate groups to overlap

    Are all purpose methods

    Asymptotic properties ??

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    13/27

    Balanced Repeated Replication

    Suppose we had sampled 2 per stratum

    There are 2H ways to pick 1 from eachstratum.

    Each combination could treated as asample.

    Pick R samples.

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    14/27

    Balanced Repeated Replication

    Which samples should we include?Assign each value either 1 or1 within the stratum

    Select samples that are orthogonal to one another to

    create balanceYou can use the design matrix for a fraction factorial

    Specify a vector ar of 1,-1 values for each stratum

    Estimator

    2

    1

    )(1

    )(

    R

    r

    rBRRR

    V a

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    15/27

    Balanced Repeated Replication

    Pro Relatively few computations

    Asymptotically equivalent to linearization methods for

    smooth functions of population totals and quantiles Can be extended to use weights

    Con 2 psu per sample

    Can be extended with more complex schemes

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    16/27

    The JackknifeSRS-with replacement

    Quenoule (1949); Tukey (1958); Shao and Tu (1995)

    Let be the estimator of after omitting the ithobservation

    Jackknife estimate

    Jackknife estimator of the

    For Stratified SRS without replacement Jones (1974)

    l ii

    n

    i

    i

    J nnn )1(

    ~where/

    ~~

    1

    n

    i

    J

    i

    n

    i

    in

    i

    i

    J

    nn

    nn

    nV

    1

    2

    11

    2

    )~~

    ()1(

    1

    /where)(1

    )(

    i

    )(V

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    17/27

    The Jackknifestratified multistage design

    In stratum h, delete one PSU at a time

    Let be the estimator of the same form aswhen PSU iof stratum his omitted

    Jackknife estimate:

    Or using pseudovalues

    )()1/()(' ''

    hihi

    hihhhhh hh

    hiygwherenhyynWyWy

    )( hi

    L

    h

    n

    i

    L

    h

    n

    i

    hi

    h

    II

    J

    hiI

    J

    hi

    hh

    hi

    h h

    nL

    n

    nn

    1 1 1 1

    )()()()(

    )()(

    ~11~;/

    ~~

    )1(~

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    18/27

    The Jackknifestratified multistage design

    Different formulae for

    Where

    Using the pseudovalues

    )(V

    hn

    i

    methodhiL

    h h

    hL

    n

    nV

    1

    2)(

    1

    )()1

    )(

    LnL

    h

    hL

    h

    hihmethod/or,/,,becan

    1

    )(

    1

    )()(

    IIIjn

    nV

    hn

    i

    j

    J

    hiL

    h h

    hL ,)

    ~~(

    )1)(

    1

    2)()(

    1

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    19/27

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    20/27

    The BootstrapNave bootstrap

    Efron (1979); Rao and Wu (1988); Shao and Tu (1995)

    Resample with replacement in stratum h

    Estimate:

    Variance:

    Or approximate by

    The estimator is not a consistent estimator of thevariance of a general nonlinear statistics

    hnihi

    y1

    *

    Bb

    ygandyyynyb

    h

    b

    h

    b

    i

    b

    hih

    b

    h

    ,...,2,1

    )(,,*)*()*()*()*(1)*(

    2*

    *

    *

    *

    *))(()( EEVNBS

    B

    b

    b

    BVNB S 1

    .*)*(**

    )

    (1

    1)

    (

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    21/27

    The BootstrapNave bootstrap

    For

    Comparing with

    The ratio does not converge to 1for abounded nh

    *** yyW hh

    2

    2

    * 1)( h

    h

    h

    h

    sn

    n

    n

    WyVar h

    2

    2

    )( hh

    sn

    WyVar h

    )(

    )(*

    yVaryVar

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    22/27

    The BootstrapModified bootstrap

    Resample with replacement in stratum h

    Calculate:

    Variance:

    Can be approximated with Monte Carlo

    For the linear case, it reduces to the customaryunbiased variance estimator

    mh< nh

    1,1

    * h

    m

    ihimy

    h

    )~(~

    ,~~,/~~

    )()1(

    ~

    1

    *

    2/1

    2/1

    ygyWymyy

    yyn

    myy

    h

    m

    i

    L

    h

    hhhih

    hi

    h

    hhhi

    h

    2*

    *

    *

    *

    **))

    ~(

    ~()

    ~( EEV

    MBS

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    23/27

    More on bootstrap

    The method can be extended to stratified srswithout replacement by simply changing

    For mh=nh-1, this method reduces to the nave BS

    For nh=2, mh=1, the method reduces to the

    random half-sample replication method For nh>3, choice of mh see Rao and Wu (1988)

    ))(1()1(

    ~to~*

    2/1

    2/1

    hhihh

    h

    hhihi

    yyfn

    myyy

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    24/27

    SimulationRao and Wu (1988)

    Jackknife and Linearization intervals gavesubstantial bias for nonlinear statistics in one sidedintervals

    The bootstrap performs best for one-sided intervals(especially when mh=nh-1)

    For two-sided intervals, the three methods havesimilar performances in coverage probabilities

    The Jackknife and linearization methods are morestable than the bootstrap

    B=200 is sufficient

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    25/27

    Hot topics

    Jackknife with non-smooth functions (Raoand Sitter 1996)

    Two-phase variance estimation (Graubardand Korn 2002; Rubin-Bleuer and Schiopu-Kratina 2005)

    Estimating Function (EF) bootstrap method(Rao and Tausi 2004)

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    26/27

    Software

    OSIRIS BRR, Jackknife

    SAS Linearization

    Stata Linearization

    SUDAAN Linearization, Bootstrap, Jackknife

    WesVar BRR, JackKnife, Bootstrap

  • 7/30/2019 Variance Estimation in Complex Surveys (1).ppt

    27/27

    References: Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of

    statistics 7, 1-26. Graubard, B., J., Korn, E., L. (2002). Inference for supper population parametersusing sample surveys. Statistical Science, 17, 73-96.

    Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Propertiesof linearization, jackknife, and balanced replication methods. The annals of statistics.9, 1010-1019.

    Quenouille, M., H.(1949). Problems in plane sampling. Annals of MathematicalStatistics 20, 355-375.

    Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex surveydata. JASA, 83, 231-241.

    Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation understratified multistage sampling. Communications in statistics. 33:, 2087-2095.

    Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shaos paper.Statistics, 27, pp.246247.

    Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for

    joint model and design based framework. Annals of Statistics (to appear) Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag. Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of

    Mathematical Statistics. 29:614.Not referred in the presentation Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag. Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper, Statistics,

    27, pp. 203237, with discussion, 237254.