24
The Jackknife The Jackknife Patrick Breheny September 6 Patrick Breheny STA 621: Nonparametric Statistics 1/24

The Jackknife

Embed Size (px)

DESCRIPTION

The Jackknife

Citation preview

  • The Jackknife

    The Jackknife

    Patrick Breheny

    September 6

    Patrick Breheny STA 621: Nonparametric Statistics 1/24

  • The Jackknife

    Introduction

    The last few lectures, we described influence functions as atool for assessing the standard error of statistical functionalsand obtaining nonparametric confidence intervals

    Today we will discuss a tool called the jackknife

    Like influence functions, the jackknife can be used to estimatestandard errors in a nonparametric way

    The jackknife can also be used to obtain nonparametricestimates of bias

    Although superficially different, we will see that the jackknifeis built on essentially the same idea as the influence function,although the jackknife was proposed much earlier (1949)

    Patrick Breheny STA 621: Nonparametric Statistics 2/24

  • The Jackknife

    Jackknife setup and notation

    Suppose we have an estimator which can be computed froma sample {xi}Jackknife methods revolve around computing the estimates{(i)}, where (i) denotes the estimate calculated from thedata with the ith observation removed (these are sometimescalled the leave one out estimates)

    Finally, let

    =1

    n

    ni=1

    (i)

    Patrick Breheny STA 621: Nonparametric Statistics 3/24

  • The Jackknife

    Jackknife estimation of bias

    Note that, if is unbiased,

    E =1

    n

    ni=1

    E(i)

    =

    However, suppose that E() = + an1 + bn2 +O(n3)Then

    E( ) = an(n 1) +O(n

    3)

    Patrick Breheny STA 621: Nonparametric Statistics 4/24

  • The Jackknife

    Jackknife estimation of bias

    Thus,

    bjack = (n 1)( ),

    the jackknife estimate of bias, is correct up to second order

    Furthermore, the bias-corrected jackknife estimate,

    jack = bjack,

    is an unbiased estimate of , again up to second order

    Patrick Breheny STA 621: Nonparametric Statistics 5/24

  • The Jackknife

    Example: Plug-in variance

    For example, consider the plug-in estimate of variance: = n1

    i(xi x)2

    The expected value of the jackknife estimate of bias is

    E(bjack) = n

    = Bias()

    Furthermore, it can be shown that the bias-corrected estimateis

    jack = s2,

    the usual unbiased estimate of the variance

    Patrick Breheny STA 621: Nonparametric Statistics 6/24

  • The Jackknife

    Pseudo-values

    Another way to think about the jackknife is in terms of thepseudo-values:

    i = n (n 1)(i)Note that

    1

    n

    i

    i = n (n 1)

    = bjack,

    which is the bias-corrected estimate jack

    The idea behind pseudo-values is that it allows us to think ofthe bias-corrected estimate as simply the mean of nindependent data values

    Patrick Breheny STA 621: Nonparametric Statistics 7/24

  • The Jackknife

    Properties of the pseudo-values

    The pseudo-values {i} are not, in general, independent,although note that for the special case of a linear statistic = n1

    i a(xi), i = a(xi) i.e., for the mean, i = xi

    A reasonable idea, therefore, is to treat i as linearapproximations to iid observations and approach inference forjack as we would the sample mean

    Thus, letting s2 denote the sample variance of thepseudo-values, consider

    vjack =s2

    n

    as an estimate of V()

    Patrick Breheny STA 621: Nonparametric Statistics 8/24

  • The Jackknife

    Example: Mean

    As a somewhat trivial example, suppose = x

    bjack = 0

    vjack = s2/n, where s2 is the usual, unbiased estimate of the

    variance

    Patrick Breheny STA 621: Nonparametric Statistics 9/24

  • The Jackknife

    Jackknife confidence intervals?

    The pseudo-value statement of the jackknife suggests thefollowing approach to constructing confidence intervals:

    jack t1/2;n1SEjack,

    These intervals turn out to be similar to those produced bythe functional delta method, for reasons that will be discussedsoon

    Patrick Breheny STA 621: Nonparametric Statistics 10/24

  • The Jackknife

    Homework

    Homework: Write an R function called jackknife whichimplements the jackknife. The function should accept twoarguments: x (the data) and theta (a function which, whenapplied to x, produces the estimate). The function should return anamed list with the following components:

    bias the jackknife estimate of bias

    se the jackknife estimate of standard error

    values the leave-one-out estimates {(i)}Please submit the function via Dropbox and name your fileBreheny-jack.R, with your last name replacing Breheny

    Patrick Breheny STA 621: Nonparametric Statistics 11/24

  • The Jackknife

    Is vjack a good estimator?

    Obviously, vjack is a good estimator of V(x), where it isequivalent to the usual estimator

    The same logic extends to any linear statistic

    Furthermore, if g is function continuously differentiable at ,it can be shown that vjack is a consistent estimator of g(x):

    vjackV{g(x)}

    P 1

    Patrick Breheny STA 621: Nonparametric Statistics 12/24

  • The Jackknife

    Is vjack a good estimator? (contd)

    However, there are also cases where vjack is not a goodestimator of the variance of an estimate

    In particular, vjack can be shown to perform poorly when theestimator is not a smooth function of the data

    A simple example of a non-smooth estimator is the median

    Patrick Breheny STA 621: Nonparametric Statistics 13/24

  • The Jackknife

    Jackknife applied to the median

    Suppose we observe the sample {1, 2, . . . , 9, 10}What will our leave-one-out estimates look like?

    We will obtain five 5s and 5 6s

    There is nothing special about these particular numbers: forany data set with an even number of observations, we willalways obtain two unique values for (i), each with n/2instances

    Patrick Breheny STA 621: Nonparametric Statistics 14/24

  • The Jackknife

    Inconsistency

    This doesnt seem like a good way to estimate the variance ofthe median, and it isnt

    It can be shown that the jackknife variance estimate isinconsistent for all F and all p

    In particular, for the median,

    vjack

    V()d(222

    )2

    Patrick Breheny STA 621: Nonparametric Statistics 15/24

  • The Jackknife

    Jackknife and influence function

    There is a close connection between the jackknife andinfluence functions

    Viewing the jackknife as a plug-in estimator, it calculates nestimates based on a slightly altered version of the empiricaldistribution function and compares these altered estimates tothe plug-in estimate in order to assess variability of theestimate

    What CDF is the jackknife using?(1

    n 1 , , 0, ,1

    n 1)

    Patrick Breheny STA 621: Nonparametric Statistics 16/24

  • The Jackknife

    Jackknife and influence function (contd)

    This is very similar to the idea behind the influence function,with

    1 = nn 1

    = = 1n 1

    In a sense, then, the jackknife is a numerical approximation tothe functional delta method

    Indeed, an alternative name for the functional delta method isthe infinitesimal jackknife

    Patrick Breheny STA 621: Nonparametric Statistics 17/24

  • The Jackknife

    The positive jackknife

    There is an important difference, however, between thejackknife and the functional delta method: the delta methodadds point mass to observation xi, while the jackknife takesmass away

    Another take on the jackknife, then, is to compute nestimates {(i)} by adding an observation at xi instead oftaking one away (i.e., = 1/(n+ 1))

    This method is called the positive jackknife; however, it is notcommonly used

    Patrick Breheny STA 621: Nonparametric Statistics 18/24

  • The Jackknife

    The delete-d jackknife

    Another variation on the jackknife that has been proposed iscalled the delete-d jackknife

    As the name suggests, instead of leaving out one observationwhen calculating the collection {(s)}, the delete-d jackknifeleaves out d observations

    This approach has merit: in particular, it can be shown that ifd is appropriately chosen, then the delete-d jackknife estimateof variance is consistent for the median

    However, it has the drawback that instead of calculating nleave-one-out estimates, we now have to calculate

    (nd

    )leave-d-out estimates a much larger number, oftenbordering on the computationally infeasible

    Patrick Breheny STA 621: Nonparametric Statistics 19/24

  • The Jackknife

    Homework

    Homework: What would the collection of weights {wi}ni=1 looklike for the delete-d jackknife?

    Patrick Breheny STA 621: Nonparametric Statistics 20/24

  • The Jackknife

    Homework

    Homework: The standardized test used by law schools is calledthe Law School Admission Test (LSAT), and it has a reasonablyhigh correlation with undergraduate GPA. The course websitecontains data on the average LSAT score and averageundergraduate GPA for the 1973 incoming class of 15 law schools.

    (a) Use the jackknife to obtain an estimate of the bias andstandard error of the correlation coefficient between GPA andLSAT scores. Comment on whether the estimate is biasedupward or downward.

    (b) If x and y are drawn from a bivariate normal distribution, then

    nV() P (1 2)2. Use this to estimate the standard errorof .

    Patrick Breheny STA 621: Nonparametric Statistics 21/24

  • The Jackknife

    Homework (contd)

    (c) On page 21 of our textbook, the author gives the influencefunction for the correlation coefficient:

    L(x, y) = xy 12(x2 + y2),

    where

    x =x x

    2x

    and y is defined similarly. Use this to estimate the standarderror of ; compare the three estimates (a)-(c).

    Patrick Breheny STA 621: Nonparametric Statistics 22/24

  • The Jackknife

    Homework (contd)

    (d) For each data point (xi, yi), make a plot of vs. the mass atpoint i, as the point mass at i varies from 0 to 1/3 (and therest of the mass is spread evenly on the rest of theobservations).

    (e) Comment on why some plots slope upwards and others slopedownwards.

    (f) Extra credit: Comment on the how the shape of the curverelates to the comparison between the delta method andjackknife estimates of the variance

    Patrick Breheny STA 621: Nonparametric Statistics 23/24

  • The Jackknife

    Hints

    The R function cov.wt can be used to calculate weightedcorrelation coefficients, taking on an optional argument thatconsists of a vector of weights for each observation

    Please do not turn in 15 pages of plots use mfrow to make agrid of plots on one page

    You will likely have to reduce the margins of your plots toeliminate whitespace; you are free to design your plot how youwish, but I will pass on the suggestion par(mfrow=c(5,3),mar=c(2,4,2,1))

    Patrick Breheny STA 621: Nonparametric Statistics 24/24

    The Jackknife