Download pdf - Using matched substitutes to adjust for nonignorable ... · 1 Using matched substitutes to adjust for nonignorable nonresponse: an empirical investigation using labour market data

Using matched substitutes to adjust for nonignorable

nonresponse: an empirical investigation using labour market

data

Richard Dorsett

Policy Studies Institute

23 June 2004

Abstract

This paper assesses the potential for reducing attrition bias by replacing survey dropouts with individuals from a refreshment sample, identified using propensity score matching. By linking administrative records with survey data, it is possible to observe outcomes for dropouts and therefore to test models of attrition. Doing so reveals the dropout process to be nonignorable such that the commonly-used method of reweighting non-dropouts is ineffective in overcoming attrition bias. By constructing artificial refreshment samples, the potential for replacing dropouts with individuals from a targeted refreshment sample is demonstrated. Where the targeting of the refreshment sample is imperfect, the success of the approach is more qualified. However, controlling for historic outcomes, the attrition bias can still be reduced. Keywords: attrition, propensity score matching, refreshment sample, unemployment. Address for correspondence: Richard Dorsett, Policy Studies Institute, 100 Park Village East, London NW1 3SR, UK. Tel: 020 7468 2316; Fax: 020 7388 0914; Email: [email protected]

1

Using matched substitutes to adjust for nonignorable nonresponse: an empirical

investigation using labour market data

1. Introduction

Increasingly, research in economics, sociology and other disciplines makes use of panel

data. The appeal of such data (in which the same units are observed at multiple points in

time) is that they permit the consideration of more complex relationships than is possible

using cross-section data. There are now well-established panel data sets in the UK

(British Household Panel Survey and Labour Force Survey, for example) and new panels

are being developed (the ESRC-funded Millennium Birth Cohort Study, for example).

The availability of such rich data has prompted developments in statistical and

econometric theory, and methods of examining and modelling micro-relationships that

allow for dynamics have grown in sophistication.

However, the drawback to panel data is that they are often subject to more severe missing

data problems than are cross-section data. Specifically, individuals who participate in the

first wave of a panel may drop out in later waves. This may mean that those who

continue to respond become unrepresentative of the original sample. Hence, problems of

inference can arise, particularly if the tendency to drop out is systematically related to an

outcome of interest. In view of this, methods to overcome the problem of attrition bias

have a potentially important empirical role.

2

Dolton (2002) introduced a new method of addressing the problem of attrition by

replacing survey dropouts with substitutes drawn from a refreshment sample and selected

via propensity score matching. However, in his empirical application, dropouts’

outcomes are never observed so it is not possible to know whether the substitution is

successful in overcoming attrition bias. In this paper, administrative data are linked to

longitudinal survey data collected for the purpose of evaluating a UK active labour

market programme (the New Deal for Young People -NDYP), thereby allowing the

unemployment status for all wave 1 respondents to be observed on an ongoing basis,

regardless of whether they subsequently dropped out. By comparing the level of

unemployment for non-dropouts to that for wave 1 respondents as a whole, the extent to

which attrition biases estimates of unemployment (the substantive focus of this paper)

can be quantified. Analogous comparisons provide a means of assessing the extent to

which replacing dropouts with substitutes from a refreshment sample can overcome

attrition bias.

The empirical part of the paper begins by formally testing the model generating attrition.

This in itself is an unusual step since most analyses are based on an untestable

assumption as to which attrition model is appropriate. However, when survey data can be

linked to administrative data in the way described above, such a test can provide a useful

guide as to how best to proceed to overcome attrition bias. Second, the success of

approaches in the spirit of Dolton (2002) is considered. A complication with the data at

hand is that no refreshment samples are available. To overcome this, artificial

refreshment samples are constructed. Summarising the results, when a suitable

3

refreshment sample is available, the approach works well. When the refreshment sample

is not ideal, the success of the approach is more qualified. However, the desired result

can be achieved by controlling for administratively-held historic outcomes in an intuitive

way.

The structure of the paper is as follows. In the next section, the two dominant models of

attrition are discussed along with a recently advanced test that is possible when a

refreshment sample is available or when, as in this case, outcome data are available in

administrative records for the full sample. In Section 3, the use of refreshment sampling

with matching (RSM) is described. The structure of the data is presented in Section 4

and the underlying model of attrition investigated. Section 5 describes the approach that

was taken to construct the artificial refreshment samples. The effectiveness of RSM in

overcoming attrition bias is assessed in Section 6. Section 7 discusses the implications of

the results.

2. Background – models of attrition

There has long been an awareness of the problems caused by sample attrition and

numerous methods of addressing the problem exist. The choice of appropriate method is

dependent on the process generating the missing values. Where observations are missing

completely at random (MCAR), no bias results from basing subsequent analysis on only

non-attriting respondents. However, the MCAR model will rarely apply in practice. In

4

fact, it can be tested empirically since it is a special case of the missing at random (MAR)

model. The MAR model implies that the missing data process is ignorable (Rubin, 1976;

Little and Rubin, 1987) and is sometimes referred to as selection on observables

(Fitzgerald et al., 1998). If MAR holds, multiple imputation methods (see King, 2001 for

a recent example) can be used to allow unbiased inferences (Little and Rubin, 1989).

Alternatively, weights can be used to restore the sample to representativeness. This is the

more common approach among econometricians.

While MAR may be a valid model in some instances, it will often be the case that the

outcome variable of interest is not independent of the attrition process. In such a

scenario, the process generating attrition is nonignorable. In a labour market context, this

can be exemplified by those changing job being more likely to drop out of the survey

(due to lack of time, moving house or some other reason). Hausman and Wise (1979)

(HW) developed a two-step procedure for the nonignorable case which is in the spirit of

Heckman’s selection model (Heckman, 1979). This involves estimating the probability

of dropping-out and using the results to construct an adjustment term that can be included

as a regressor in the outcome equation to correct for the selected nature of the resulting

sample. This requires an assumption about the form of the joint distribution of the errors

in the selection and outcome equations. Credible implementations also require a suitable

instrument (a variable that influences attrition but not outcomes). This can be a major

obstacle in practice.

5

In those empirical analyses that acknowledge the problem of attrition, the approach used

to tackle it is typically based on an untestable assumption about the process generating

the missing values. Such assumptions may be explicit or implicit but they exist

nonetheless. In some applications, a refreshment sample may be available. This is a

random sample from the original population that can be used to replace individuals who

have dropped out of the panel (Ridder, 1992). For the case where a refreshment sample

exists, Hirano, Imbens, Ridder and Rubin (2001) (HIRR) develop a model of attrition that

nests both the MAR and HW models and allows their assumptions to be tested.

Adopting the HIRR notation, the panel data cover two periods and are characterised by

attrition. Let YBitB be a vector of all time-varying variables for individual i at time t and let

XBi B be a vector of all fixed variables for individual i. In the first time period, a random

sample of size NBPB from a fixed population is drawn. This is the panel. For each

individual in the panel, XBi B and YBi1 B are observed. A subset of size NBBPB of this sample does

not drop out of the second stage survey. This is the balanced panel (BP) and for these

individuals YBi2 B is observed. The remaining NBIPB = NBPB – NBBPB comprise the incomplete panel

(IP). Those in the IP drop out of the survey in the second stage and YBi2B is therefore not

observed. In addition to the panel data set, in the second period a new random sample

from the original population is drawn. This is the refreshment sample (size NBRB).

The issue of initial nonresponse is not considered in this paper. The focus is on those

individuals who are interviewed in the first time period. Some will not respond when

approached a second time. Let WBi B be an indicator of willingness to respond at the second

6

wave; WBi B = 1 represents those who respond at the second wave and WBi B = 0 represents

those who do not. HIRR show that knowing the joint distribution of (YB1B, YB2 B, X) or

possibly the conditional distribution of (YB1 B, YB2 B) given X allows the MAR and HW models

of attrition to be tested. Using a refreshment sample, these distributions can be identified.

With this notation, the MAR model implies

W C YB2 B | YB1 B, X

where C denotes independence. In this scenario, first period variables can be used to

explain drop-out and the attrition problem in the sample can be overcome by weighting

the balanced panel observations by the inverse of the probability of attriting or by

multiple imputation.

Under HW,

W C YB1 B | YB2 B, X

which implies that the probability of attrition depends on contemporaneous variables but

not on first period variables. This is known as selection on unobservables since attrition

depends partly on variables that are not observed when the individual drops out.

7

HIRR show that the restrictions implied by the MAR and HW models allow the resulting

marginal distribution of the second period outcome to be tested against that of the

refreshment sample. Denoting the conditional probability Pr(YBi2 B = 1 | YBi1 B = y, WBi B = w) by

qByw B and the probability Pr(Y Bi1 B = y, W Bi B = w) by r Byw B , the marginal distribution of YBi2 B can be

calculated as Pr(Y Bi2 B = 1) = q B00B . r B00B + qB01B . r B01B + qB10B . r B10B + qB11B . r B11B. The terms qB00 B and

q B10B cannot be observed but can be retrieved for the MAR and HW models due to the

linear restrictions these models imply. Specifically, MAR implies

qB00B = qB01 B

and

qB10B = qB11.B

HW implies

qB00B =

{r B10B . r B01B . (1 - qB01B) – r B11 B . r B00B . (1 - qB11 B)} / {rB00 B . rB11 B . qB11B . (1 – qB01 B) / qB01 B – r B11B . r B00B . (1 – qB11 B)}

and

qB10B = {qB00 B . rB00B . qB11 B . rB11 B} / {qB01B . r B01B . r B10B} .

8

The approach taken in this paper when examining the form of attrition is similar to that of

HIRR. However, instead of estimating the marginal distribution of YBi2 Bfrom a

refreshment sample, the linked administrative data allow it to be observed for all wave 1

respondents.

3. Refreshment sampling with matching

As noted, knowledge of the model generating survey dropout is useful since it provides

guidance on how best to address the attrition problem. Under MAR, the consensus

among statisticians is that multiple imputation methods are preferable. A more common

approach among econometricians is to simply reweight the balanced panel. One

drawback of this approach is that no account is taken of the additional uncertainty in

subsequent inference introduced by the fact that the weights are themselves estimates.

Under HW, assuming a suitable instrument can be found, the approach described in

Section 2 must be altered for each specific application. This can be cumbersome in

practice. Furthermore, estimating more complicated models under HW can be

prohibitively difficult.

In view of these complications, other approaches are of great potential interest. One

solution is simply to replace dropouts with survey substitutes from a refreshment sample.

Rubin and Zanutto (2001) provide a summary of the use of such refreshment samples. A

general problem with this approach is that since such substitutes are, by their very nature,

9

respondents, they may differ from those dropouts they replace. This can be addressed to

some extent by targeting the refreshment sample. That is, the refreshment sample can be

drawn disproportionately from those who have low response rates or are likely to drop

out. Furthermore, steps can be taken to help ensure that the substitutes are similar to the

dropouts with regard to a number of characteristics.

Dolton (2002) used single nearest-neighbour propensity score matching to identify

suitable replacements from a targeted refreshment sample. The theoretical foundations of

propensity score matching were first presented in Rosenbaum and Rubin (1983). It has

become popular mainly as an evaluation technique (see, for example, Heckman et al.,

1999). In that context, its use is to estimate the effects of some treatment. In order to

estimate the treatment effect on a particular outcome, an estimate of the outcome that

would have resulted in the absence of treatment is required. This counterfactual outcome

is unobservable. Furthermore, it cannot credibly be inferred directly from the outcomes

of the untreated since they are likely to differ substantially in their characteristics.

Propensity score matching offers a means of estimating the counterfactual.

In order to provide a context for RSM, it is useful to consider the details of the matching

approach. Formally, the impact of treatment on outcome Y is

(1) ∆ = YP

1P – YP

0P

10

where YP

1P is the potential outcome conditional on treatment and YP

0P is the potential

outcome conditional on non-treatment. The mean effect of treatment on the treated is:

(2) θ = E(∆ | S = 1) = E(YP

1P – YP

0P | S = 1) = E(YP

1P | S = 1) - E(YP

0P | S = 1)

where S=1 denotes treatment and S=0 denotes non-treatment. The problem arises from

the term E(YP

0P | S = 1). This is the mean of the counterfactual which, since it is

unobservable, must be identified and estimated.

Matching estimators rely on the Conditional Independence Assumption (CIA):

(3) YP

0P C S | X = x, ∀ x∈ χ .

where X is a set of conditioning variables and χ denotes the part of the attribute space for

which the treatment effect is defined. Intuitively, the assumption states that, conditional

on observable characteristics, potential outcomes are independent of receiving treatment.

This allows non-treated outcomes to be used to infer counterfactual outcomes for the

treated. However, this is only valid if there are non-treated individuals for all values of X

for the treated (the support condition):

(4) Pr ( S = 1 | X = x ) < 1.

11

The parameter θ cannot be identified for values of X which do not satisfy this support

condition. Matching operates by constructing counterfactual outcomes using the

outcomes of the matched individuals from the non-treated group. The mean impact of

treatment can then be estimated as the mean difference in the outcomes of the matched

pairs.

Rosenbaum and Rubin showed that if assumption (3) holds then

(5) YP

0P C S | P(X) = p(x), ∀x∈ χ

where P(X)=Pr(S=1|X) is the propensity score, the probability of receiving treatment.

The important advantage of Rosenbaum and Rubin’s innovation is that the

dimensionality of the match can be reduced to one. Rather than matching on a vector of

characteristics, it is possible to match on just the propensity score. Having done so, the

mean effect is again estimated as the mean difference in the outcomes of the matched

pairs.

The key innovation of the RSM approach is to use matching as a means of replacing

survey dropouts. In this application, ‘treatment’ is dropping out of the survey. For each

dropout, a substitute from the refreshment sample is identified through matching as that

individual who is most similar with regard to a number of key characteristics. By so

doing, RSM may offer a means of restoring the representativeness of the sample and

thereby overcoming the attrition problem.

12

Operationally, the process of identifying the substitutes is as follows:

1. construct a refreshment sample (see Section 5)

2. pool the survey dropouts and the individuals from the refreshment sample into a

single dataset

3. estimate the probability of being a dropout rather than in the refreshment sample

(the propensity score) using a probit model.

4. initialise the matching weight of all members of the refreshment sample to 0.

5. identify the closest match for each dropout:

a) choose one dropout, i

b) find the individual, j, in the refreshment sample with the closest value of

the propensity score to that of i

c) increment the matching weight of j by 1

d) return to a) until all dropouts have been matched.

This procedure results in the identification of refreshment sample individuals who can be

used to replace dropouts, thereby allowing subsequent analysis to be based on a complete

dataset. It is clear from the algorithm that a single refreshment sample member may

substitute for multiple dropouts. This must be accounted for when carrying out further

analysis. The extent to which this process overcomes attrition bias can be assessed by

comparing the mean outcome of the dropouts with the (weighted) mean outcome of the

replacements. This is analogous to the estimation of θ in equation (2). The size and

13

significance of ^θ in this context provides an indication as to how successfully the survey

substitutes capture the outcomes of the dropouts they replace.

4. Data – characterising attrition in the NDYP surveys

NDYP is an active labour market programme introduced in Britain in 1998. It is

mandatory for all those aged 18-24 who have been claiming unemployment benefits for a

period of six months or more. It was evaluated using a survey design that involved two

stages of interviews with the same individuals. The sample was drawn from those

entering NDYP between 29 August and 27 November 1998. Survey interviews took

place at about 6 and 15 months after sampling and the results of these surveys are

reported in Bryson et al. (2000) and Bonjour et al. (2001). As noted in Section 1, survey

data were linked to administrative data allowing unemployment outcomes for the full

sample to be observed regardless of whether individuals responded to the survey.

The structure of the NDYP data is set out in Table 1. The entries for the balanced and

incomplete panels indicate what would be observable were the unemployment outcome

only observed for those who responded to the survey. Since the survey data are linked to

administrative records, the entries for the panel show the distribution of stage 2 outcomes

for the balanced and incomplete panel combined. This allows the true probability of

being unemployed at stage 2 to be observed.

14

<TABLE 1>

Knowing the true probability of stage 2 unemployment, the restrictions imposed by the

attrition models can be tested. The parameters that are directly estimable from Table 1

are given in Table 2.

<TABLE 2>

The estimates of qB00 B and qB10B implied by MAR and HW are given in Table 3. Using these

estimates, the probability of stage 2 unemployment can be estimated. Under the MAR

model, this is estimated at 0.496 while the HW estimate is 0.428. The true probability is

0.449. Bootstrapping the difference between the true and the model-based estimates

results in absolute t-statistics of 8.814 and 1.155 for MAR and HW respectively. From

this, it appears that the HW model better characterises the data and in fact the MAR

model is emphatically rejected.

<TABLE 3>

Table 4 shows how unemployment differs over time between the drop-outs and those

who remain in the sample. The first column shows the probability of being unemployed

around the time of the second stage interviews. The difference between the stayers and

the dropouts is statistically significant. The subsequent columns show how this

difference evolves. In fact, very soon after the point of sampling, significant differences

15

are evident. The size of these differences grows over time, reducing only at the most

recent observation period. The statistical significance of these differences is consistent

throughout. Those dropping out of the survey are less likely to be unemployed. This

accords with the situation of those entering employment being less likely to respond to

surveys due to lack of available time, moving to take up work, or some other reason. It

should be remembered that the client group for NDYP is young and therefore more likely

to be geographically mobile.

<TABLE 4>

This divergence between stayers and dropouts in the probability of unemployment is

shown in more detail in Figure 1 (where the vertical lines indicate approximate fieldwork

dates).

<FIGURE 1>

5. Creating refreshment samples

Since no refreshment samples were available with the NDYP data, such samples were

artificially constructed. The approach taken to achieve this is illustrated in Figure 2.

<FIGURE 2>

16

The starting point is the original sampling frame drawn from administrative records and

including all NDYP entrants, regardless of whether they responded to the first or second

interview. Of these, a proportion responded to the first wave of interviews. Since the

focus in this paper is on attrition rather than nonresponse, the main interest is in those

who responded at wave 1 (denoted R=1 in Figure 2). Two uniformly distributed random

numbers, ε B1B and εB2 B, were generated for each individual in the sample frame: ε B1B was used to

divide the wave 1 respondents into two equally-sized sub-samples. The basic idea is to

use the RSM technique to address the attrition in one of these sub-samples (referred to as

the ‘test’ sample in the remainder of this paper) using the other sample (the ‘non-test’

sample) to construct a refreshment sample.

Given the process by which these sub-samples were identified, they will exhibit similar

(within sampling variation) patterns of attrition to that among the wave 1 respondents as a

whole. Since there were approximately 6,000 wave 1 respondents, of whom nearly 2,600

dropped out, the test sample and the non-test sample both comprise roughly 3,000

individuals, of whom roughly 1,300 drop out. Hence, the potential of the RSM approach

can be explored by examining how well the artificially-constructed refreshment samples

can provide substitutes for the dropouts from the test sample in such a way that the

attrition bias is attenuated.

In fact, a number of different refreshment samples were constructed. The targeted

refreshment sample (TRS) is constructed as all dropouts from the non-test sample. This

17

will be roughly equal in size to the number of dropouts in the test sample. As noted

above, the aim of a TRS is that it over-represents those likely to drop out. In the artificial

case considered in this paper, the targeting is absolute in the sense that the refreshment

sample includes only those actually who drop out. Hence, it allows the RSM approach to

be investigated in the ideal case where the targeting of the refreshment sample is perfect.

The second refreshment sample considered is the non-targeted refreshment sample

(NTRS). This is constructed as a random sample of the non-test sample of size equal to

the TRS. In practice, this was achieved by selecting those individuals from the non-test

sample with a value of εB2 B smaller than the overall proportion of wave 1 respondents who

drop out. The interest of the NTRS is that it provides a means of comparing the

effectiveness of RSM when a targeted refreshment sample is available with its

effectiveness when the refreshment sample is not targeted or is not perfectly targeted.

Since targeting a refreshment sample may, in practice, be difficult, it is of interest to

investigate whether RSM works when used with a refreshment sample that is not

targeted.

The third refreshment sample is the ‘practical’ refreshment sample (PRS). The purpose

of this refreshment sample is to approach a more realistic situation whereby the

refreshment sample is not perfectly targeted but is imperfectly targeted. In practice, the

targeting of the refreshment sample will be less precise than the TRS considered here and

will contain those individuals who are merely suspected of being more likely to drop out

of the sample. This requires some judgement on the part of the survey designer. The

18

purpose of the PRS is to investigate this more realistic scenario. In this case, the

sampling frame was randomly divided using εB1 B and the probability of responding at wave

1 was modelled (for those with εB1 B ≥ 0.5) jointly with the probability of responding at

wave 2 for the wave 1 respondents (that is, for those with εB1 B ≥ 0.5 and R=1). A sub-

sample (of equal size to the TRS) of those in the non-test sample for whom this

probability is the greatest was chosen as the PRS.

The remainder of the paper assesses the performance of RSM using these three

refreshment samples. In addition, using the balanced panel as a refreshment sample is

also considered. This is of interest since, in practice, BP exists without the need to carry

out an additional survey. Consequently, should RSM be successful when using BP as a

refreshment sample, this indicates a simple way to address the attrition problem.

However, given the rejection of MAR as the model of attrition, it is unlikely that

replacing dropouts with substitutes from BP will be successful. To see this, note that the

resulting dataset would simple amount to a reweighting of BP which is only valid if

MAR holds. More promising perhaps, is to combine BP with one of the other

refreshment samples, as in Dolton (2002). The effectiveness of this is considered below.

6. Assessing RSM

The algorithm set out in Section 3 was followed in order to generate an estimate of θ that

shows, for a given refreshment sample, the extent to which RSM succeeded in replacing

19

dropouts in such a way that attrition bias was overcome. However, a valid concern is that

the results may be specific to the particular test sample and refreshment samples

constructed. That is, since the definition of these samples contains a random element, it

may be that this random component is driving the results. To address this, the entire

process of constructing refreshment samples and estimating θ was bootstrapped (10,000

replications).

The key results in this section are based on the bootstrapped estimates. However, to

provide an insight into the mechanism of RSM, the next sub-section describes the process

of implementing RSM using just one version of each of the refreshment samples (i.e. not

bootstrapped). This is intended to allow an insight into the more formal (i.e.

bootstrapped) results that follow.

6.1Implementing RSM – descriptives and diagnostics

The TRS and the NTRS are straightforward to construct. The PRS is more involved

since it involves simultaneous estimation of wave 1 and wave 2 response. The aim is to

construct a refreshment sample that over-represents those predicted to drop out. It is

useful to consider how such a sample might be constructed in practice. The PRS is

drawn from the same population as the main sample. However, it only becomes relevant,

and therefore only has to be surveyed, after the second wave of main sample interviews.

20

At this stage, only those variables that exist in the administrative records are available.

These will often be fairly limited in scope.

The results of estimating a model of wave 1 and wave 2 response using administrative

data are given in Table 5. As an overall comment, the process of initial nonresponse is

characterised differently from that of attrition. Most notably, unemployment at roughly

the time of the wave 1 fieldwork is an important correlate of initial response, while

unemployment at the time when the second interviews were carried out is strongly

associated with wave 2 response. This accords with the earlier comment on attrition

being nonignorable in this dataset.

<TABLE 5>

Using these results, the probability of responding at wave 1 but not at wave 2 was

generated. The PRS was drawn from those wave 1 respondents who had the highest

probability of dropping out in the second wave. It was constructed to be equal in size to

the TRS.

Figure 3 depicts the change over time in unemployment for the three refreshment samples

and for that half of the sampling frame from which they were drawn. Clear differences

are evident. For the sampling frame, unemployment fell from a level of 78 per cent in

December 1998 to 38 per cent in June 2000. The level of unemployment in the NTRS is

similar to that of the sampling frame at the start of the period. The differences increase

21

around the time of the wave 1 fieldwork and then decline thereafter. The results of Table

5 have shown initial response to be positively associated with unemployment. Since all

those in the NTRS respond at wave 1, the expectation is that their level of unemployment

around the time of the wave 1 interview will be higher than among the sample frame

whose definition does not constrain them to respond at wave 1. A similar logic explains

the trend seen among the TRS. Wave 2 response is associated with remaining

unemployed. Since none of those in the TRS responds at wave 2, they are more likely to

have left unemployment.

The PRS displays a very different trend, although still consistent with expectations.

Since this refreshment sample comprises those individuals from the non-test sample who

are most likely to respond at wave 1 and not respond at wave 2, it will over-represent

those with the characteristics associated with such response behaviour. Since

unemployment status at the time of interview is positively associated with response, it

follows that the PRS will disproportionately represent those who were unemployed at the

time of the first survey but who had exited unemployment by the time of the second

survey. This gives rise to the trend seen in Figure 3.

<FIGURE 3>

Table 6 present some descriptive statistics for the samples. The first column of figures

describes the test sample as a whole. Ideally, the end result of the RSM process will be to

replace dropouts such that this profile is mirrored in the resulting refreshed sample. The

second and third columns describe respectively the characteristics of those who drop out

22

and those who remain in the test sample. The most notable differences between the two

groups are that those in the IB are more likely than those in the BP to be from a minority

ethnic group and less likely to own their accommodation. In the fourth column, the

profile for the NTRS is given. This should be identical (within sampling variation) to the

test sample and, indeed, no major differences from column 1 are evident. Similarly, the

fifth column which describes the characteristics of those in the TRS should resemble the

second column since it comprises all dropouts from the non-test sample. The final

column describes those in the PRS. They tend to be younger than those in the other

samples, are less often from a minority ethnic group and more often unpartnered. They

are less likely to have been employed and more likely to have been unemployed at the

time of wave 1 interviews.

<TABLE 6>

The results of estimating the propensity score are presented in Table 7. The first column

shows the results using a pooled dataset made up of the IP and the BP to estimate the

probability of being in the IP. The variables included in the specification were chosen as

those most likely to be related to dropping out. The next three columns present

analogous results when the model is estimated on the IP and, respectively, the NTRS, the

TRS and the PRS. The final three columns correspond to the preceding three but include

the BP as an additional potential source of refreshment substitutes. A significant positive

coefficient for a given characteristic indicates that that characteristic is more commonly

found among the IP than it is among those in the refreshment sample. As expected, there

23

are no significant coefficients when the model is estimated using the IP and TRS. The

most significant coefficients are evident when the model is estimated using the IP and the

PRS.

<TABLE 7>

These results were used to generate the propensity scores required for matching. Single

nearest neighbour matching was used since, intuitively, this relates more directly than

other matching methods to the desired aim of identifying an individual substitute for each

dropout. Table 8 summarises the matching weights for the identified substitute samples.

The results show that the balanced panel combined with the TRS performs best, with a

mean weight of 1.31. This combination also performs best when considering the decile

of comparators with the largest weights since they account for only 20 per cent of

dropouts. This can be interpreted as a measure of the concentration of matches in a small

number of individuals and indicates whether there may be an over-reliance on individual

comparators.

<TABLE 8>

In Table 9, a different type of diagnostic is presented that illustrates the extent to which

the matching has balanced the characteristics of the dropouts with those of the

refreshment sample. Consider the first pair of columns for which the BP is used as the

refreshment sample. In the first column, the (pre-matching) differences between

24

dropouts and those in the BP are given. For each variable listed, the cell entry represents

the mean difference between the dropouts and the BP expressed as a percentage of its

standard error. Thus, it provides, on a comparable metric, the degree of similarity

between the means of the two groups. Clearly, some characteristics are more similar than

others. For example, there is little difference in the proportion who have any educational

qualifications but there is a marked difference in the proportion belonging to a minority

ethnic group. Averaging across all variables yields the entry in the final row which

captures the overall level of balance between the two groups across all variables. The

second column provides analogous results to those in the first column, except that now

the dropouts are compared with the substitutes from the BP who were identified through

matching.

<TABLE 9>

Comparing the two columns provides an indication of the extent to which matching

improves the balance between the two groups; in the case of the BP, balance improves

from 5.01 to 2.29. This level of post-matching balance is comparable to that for both the

NTRS and the TRS (2.57 and 2.44, respectively). However, the balance is not as good

when using the PRS (5.85). For NTRS, TRS and PRS, balance is always improved by

pooling with the BP.

25

6.2 Can RSM work?

In the remainder of this section, the success of RSM in overcoming attrition bias is

considered. As noted in Section 3, this is done by comparing the mean outcome of the

dropouts with the (weighted) mean outcome of their replacements. The size and

significance of ^θ indicates how successfully the survey substitutes capture the outcomes

of the dropouts they replace. A significant difference suggests that the outcomes of the

dropouts have not been captured by the replacements.

Table 10 presents these results for a number of refreshment samples. The first row

relates to unemployment at around the time of the second wave of interviews.

Subsequent rows relate to unemployment at December 1998 and at quarterly intervals

thereafter, up until June 2000. This allows the evolution of differences to be observed.

For each unemployment outcome considered, ^θ is given, together with the associated t-

statistic and the mean squared error (multiplied by 100 for ease of interpretation). A

negative value of ^θ shows that unemployment is lower among the dropouts than it is

among those selected to replace them.

<TABLE 10>

The first column shows the results when using the BP as a refreshment sample. As

mentioned before, a reweighting of the BP is only appropriate if attrition is MAR so it is

not surprising that this approach is not successful in overcoming attrition bias. From

26

June 1999 onwards, the replacements are more likely than the dropouts to be

unemployed. Using the NTRS (column 2) is more successful but still ^θ is significant

from December 1999 onwards. The level of bias is noticeably smaller than when the BP

is used, however. By June 2000, mean unemployment was 7 percentage points higher

among replacements drawn from the NTRS than for dropouts. The analogous figure

when drawing replacements from the BP was 12 percentage points.

The results show that taking replacements from the TRS can overcome attrition bias

(column 3). At all points, ^θ is zero and the mean squared error is also smaller than with

any of the other refreshment samples. The more realistic case of the PRS is shown in

column 4. These results reflect the pattern seen in Figure 3. In the early quarters, ^θ is

negative but in September and December 1999 ^θ becomes positive, indicating that

unemployment is higher among the dropouts than among their replacements. For the last

two quarters, ^θ is small and insignificant, indicating that attrition has been overcome

from around the time of wave 2 interviews onwards.

The remaining three columns in Table 10 show the results when combining the BP with

each of the other refreshment samples. Drawing replacements from BP and NTRS

combined (column 5) is less successful than drawing replacements from NTRS alone.

Similarly, drawing replacements from BP and TRS combined (column 6) is less

successful than drawing replacements from NTRS alone. The results when combining

BP with PRS are more mixed. Column 7 shows that ^θ is smaller and less often

27

significant in the early quarters compared to when replacements are taken from PRS

alone. However, attrition bias is evident for the last two quarters when taking

replacements from BP and PRS combined.

The implication of these results is that RSM can overcome attrition bias but that it needs

a targeted refreshment sample. Using the BP as a source of possible replacements

reduces the effectiveness of RSM in this analysis. The PRS can overcome attrition bias

from the point of dropout onwards but will misrepresent unemployment at earlier periods.

In view of the relevance of the PRS, it is appropriate to investigate whether it is possible

to improve on its performance. The important role of wave 1 and wave 2 unemployment

in the construction of the PRS has already been noted. The RSM approach was repeated

adding wave 1 and wave 2 unemployment status to the list of variables included in the

estimation of the propensity score. The reason for doing this is that by including these

variables, the matching process seeks to balance them across the dropouts and the

refreshment samples. In other words, the substitutes should be similar to the dropouts in

terms of the outcome variables. The results are shown in Table 11.

<TABLE 11>

The first two columns shows the results when using the BP and the NTRS respectively as

refreshment samples. Relative to the earlier results for these refreshment samples, there

is some improvement in performance. However, attrition bias is still evident from

28

December 1999 onwards in the case of BP and from March 2000 onwards in the case of

NTRS. The results for the PRS (column 3) show ^θ to be insignificant throughout,

suggesting that attrition bias has been successfully overcome. This is due in part to the

higher standard errors implied in column 3, suggesting some imprecision will be

introduced to any subsequent analysis. The mean squared error is comparable to that for

the NTRS. The final two columns show the results when combining BP with NTRS and

PRS respectively. As before, this serves to worsen the performance of RSM.

7. Summary and discussion

In this paper, administrative data were linked to survey data in order to demonstrate the

extent to which attrition in a longitudinal survey can bias analysis of unemployment.

Those dropping out of the sample were shown to be more likely to have exited

unemployment than those who remained in the sample such that estimates of

unemployment based on the latter group would tend to be biased upwards. The most

common approach to dealing with such attrition bias is to re-weight the continuing

respondents such that the sample remains representative with regard to observable

characteristics. In practice, the MAR assumption that validates this approach is rarely

questioned. In this paper, a formal test of the attrition model strongly rejected the MAR

assumption, indicating that an alternative means of addressing the attrition bias was

required. Where it is possible to link survey and administrative records in the manner

29

described in this paper, it seems that performing such a test could routinely be used to

provide a valuable insight into the appropriate means of dealing with attrition.

The substantive results have shown that when a targeted refreshment sample is available

it is possible to use matching to replace dropouts in such a way that attrition bias is

rendered insignificant. The targeting is important: using a non-targeted refreshment

sample or replacing dropouts with individuals in the balanced panel resulted in an over-

representation of the unemployed. Furthermore, while it may be intuitively appealing to

boost the number of potential replacements by combining the targeted refreshment

sample with the balanced panel, the results suggest that doing so may be counter-

productive.

Although the approach appears to work when a targeted refreshment sample is available,

the targeting is likely to be imperfect in practice. An insight into the more realistic

scenario of a refreshment sample comprising those merely suspected of being more likely

to drop out was provided by the ‘practical’ refreshment sample. The results showed that

it was possible to use this refreshment sample to overcome attrition bias, albeit possibly

at the price of higher variance for any subsequent analysis.

While caution must be exercised when generalising these results, they suggest that the

RSM approach may be effective in practical applications. The targeting of the

refreshment sample is of central importance.

30

In the remainder of this paper, some consideration is given to the situation where it is not

possible to link administrative records to survey data in the manner described above. In

this case, estimating a model in order to predict dropout may not be possible so an

alternative means of identifying the refreshment sample is needed. One approach would

be to base the targeting of the refreshment sample on an understanding gained from

experience of characteristics associated with nonresponse or dropout. This is the

approach taken in Dolton (2002).

An alternative approach is to intensify efforts to achieve interviews with those who

dropped out and to regard the newly achieved interviews as the refreshment sample. In

effect, what this would amount to is boosting the proportion of observations accounted

for by the balanced panel and drawing the refreshment sample from those who dropped

out but were subsequently recovered. The justification for this is that the refreshment

sample thus defined would tend to have characteristics more similar to the dropouts as a

whole. Some support for this is suggested by Lynn et al. (2001) who show that hard-to-

get respondents in the UK Family Resources Survey (i.e. those needing six or more visits

to achieve an interview or those who are persuaded to co-operate after initially refusing)

are more likely to be in work than other respondents.

There are a number of possible means of increasing the chances of eventually achieving

interviews with dropouts. For example, if the data are collected by means of postal

questionnaire or telephone interview then there is the option of instead using face-to-face

interviews for hard-to-get cases since these tend to achieve a higher response rate.

31

Response rates can also be improved by tracing address changes more assiduously,

increasing the number of call-backs or varying the times at which interviewers call. In

some cases, financial incentives to participate may be appropriate.

Acknowledgements

The author gratefully acknowledges financial support from the ESRC (ref: R000223901).

Helpful comments on earlier versions of this paper were given by Peter Dolton, Jeff

Smith and Michael White as well as participants at seminars at the Policy Studies

Institute and the University of Westminster; the annual general meeting of the Statistical

Society of Canada in Halifax, Nova Scotia; the second workshop on Research

Methodology in Amsterdam; and the Royal Statistical Society one day conference on

attrition and non-response in social surveys in London. I am also grateful to the

Department for Work and Pensions for allowing me access to the data. The usual

disclaimer applies.

32

References

Bonjour, D., Dorsett, R., Knight, G., Lissenburgh, S., Mukherjee, A., Payne, J., Range,

M., Urwin, P., White, M. (2001) New Deal for Young People: national survey of

participants: stage 2. Employment Service Report 67, Sheffield.

Bryson, A., Knight, G. and White, M. (2000) New Deal for Young People: national

survey of participants: stage 1. Employment Service Report, 44, Sheffield.

Dolton, P. (2002) Reducing attrition bias using targeted refreshment sampling and

matched imputation. University of Newcastle mimeo.

Fitzgerald, J., Gottschalk, P. and Moffitt, R. (1998) An analysis of sample attrition in

panel data: the Michigan Panel Study of Income Dynamics. Journal of Human

Resources, 33, 251-299.

Hausman and Wise (1979) Attrition bias in experimental and panel data: the Gary

Income Maintenance Experiment. Econometrica, 47, 455-474.

Heckman, J. (1979) Sample selection bias as a specification error Econometrica 47: 153-

61.

Heckman, J., Lalonde, R. and Smith, J. (1999) The economics and econometrics of active

labour market programs. In Handbook of labor economics vol III (eds. Ashenfelter,

O. and Card, D.), pp. 1865-2097. Amsterdam: North Holland.

Hirano, K., Imbens, G., Ridder, G and Rubin, D. (2001) Combining panel data sets with

attrition and refreshment samples. Econometrica, 69, 1645-1659.

King, G. (2001) Analysing incomplete political science data: an alternative algorithm for

multiple imputation. American Political Science Review, 95, 49-69.

Little, R. and Rubin, D. (1987) Statistical analysis with missing data. New York: Wiley.

33

Little, R. and Rubin, D. (1989) The analysis of of social science data with missing values.

Sociological Methods and Research, 18, 292-326.

Lynn, P., Clarke, P, Martin, J. and Sturgis, P. (2001) The effects of extended interviewer

efforts on nonresponse bias. In Survey nonresponse (eds. Groves, R., Dillman, D.,

Eltinge, J. and Little, R.), pp. 135-147. New York: Wiley.

Ridder, G. (1992) An empirical evaluation of some models for non-random attrition in

panel data. Structural change and economic dynamics, 3, 337-355

Rosenbaum, P. and Rubin, D (1983) The central role of the propensity score in

observational studies for causal effects. Biometrika, 70, 41-50.

Rubin, D. (1976) Inference and missing data Biometrika, 63, 581-592.

Rubin, D. and Zanutto, E. (2001) Using matched substitutes to adjust for nonignorable

nonresponse through multiple imputations. . In Survey nonresponse (eds. Groves,

R., Dillman, D., Eltinge, J. and Little, R.), pp. 389-402. New York: Wiley.

Tables and figures Table 1: Summary statistics: YBitB indicates unemployment at time t, WBi B indicates willingness to respond to stage 2 survey. (Sub) sample YBi1B YBi2B WBiB NBalanced panel 0 0 1 792 0 1 1 279 1 0 1 859 1 1 1 1405 3335Incomplete panel 0 0 969 1 0 1606 2575Panel 0 3256 1 2654 5910 Table 2: Directly estimable parameters W=0 W=1 qBy1 B

YB1B=0 0.164 0.181 0.261 (0.005) (0.005) (0.013)YB1B=1 0.272 0.383 0.621 (0.006) (0.006) (0.010)Standard errors in parentheses Table 3: Model-based parameters MAR HW TRUEqB00B 0.261 0.145 (0.013) (0.024)qB10B 0.621 0.440 (0.010) (0.050)Pr(yB2B=1) 0.496 0.428 0.449 (0.008) (0.019) (0.006)Standard errors in parentheses Table 4: Probability of being unemployed by year. Claimant unemployed (%): N Stage 2 Week commencing: Interview 28/12/98 29/3/99 28/6/99 27/9/99 27/12/99 27/3/00 26/6/00Non-drop-outs 50.5 81.5 68.4 60.3 52.6 49.4 49.4 46.1 3358Drop-outs 37.7 78.9 64.8 54.1 44.5 38.1 34.9 33.9 2590%-point difference 12.8 2.6 3.6 6.3 8.1 11.3 14.4 12.2 P-value (%) 0.0 1.2 0.4 0.0 0.0 0.0 0.0 0.0

Table 5: Modelling probability of responding to wave 1 and to wave 2 interviews using administrative data Response at

wave 2 Response at wave 1

Age: 18-19 0.177 0.183 (2.06)* (3.17)** Age: 20-21 0.104 0.145 (1.24) (2.48)* Age: 22-23 0.208 0.017 (2.48)* (0.29) Female 0.061 0.091 (1.13) (2.29)* Partner 0.175 0.301 (1.75) (4.20)** Number of JSA claims since Jan 1995 -0.030 -0.031 (2.68)** (4.03)** Region: northern -0.005 -0.184 (0.04) (2.09)* Region: north west 0.002 -0.217 (0.02) (2.71)** Region: Yorkshire & Humberside -0.046 -0.022 (0.44) (0.26) Region: Wales -0.321 -0.132 (2.29)* (1.23) Region: west midlands -0.048 -0.146 (0.40) (1.58) Region: east midlands and eastern -0.079 -0.136 (0.71) (1.56) Region: south west -0.313 0.033 (1.72) (0.22) Region: London and south east -0.449 -0.507 (3.65)** (6.89)** Area type: rural, high unemployment 0.618 0.214 (5.02)** (2.54)* Area type: rural/urban, tight labour market 0.185 0.213 (1.91) (3.12)** Area type: rural/urban, high unemployment 0.334 0.193 (3.33)** (2.66)** Area type: rural/urban, tight labour market 0.091 -0.049 (1.08) (0.81) Area type: urban, high unemployment 0.184 0.072 (2.48)* (1.33) Preferred occupation: clerical and secretarial 0.096 (2.22)* rural area 0.301 (2.18)* Unemployed week beginning 15-Feb-1999 0.370 (9.72)** Unemployed week beginning 25-Oct-1999 0.214 (4.19)** Constant 0.165 -0.035 (0.84) (0.37) Observations 5517 5517 Absolute value of z-statistics in parentheses * significant at 5% level; ** significant at 1% level

Table 6: Characteristics of the sample and refreshment samples Test sample IB BP NTRS TRS PRSaged 18-19 0.24 0.22 0.25 0.25 0.22 0.35aged 20-21 0.33 0.33 0.34 0.33 0.33 0.37aged 22-23 0.25 0.26 0.25 0.24 0.26 0.13aged 24-25 0.17 0.18 0.17 0.18 0.19 0.18female 0.29 0.28 0.29 0.30 0.29 0.31ethnic minority 0.47 0.51 0.44 0.46 0.49 0.32has partner 0.13 0.12 0.14 0.15 0.15 0.07dependent children in household 0.09 0.08 0.10 0.10 0.10 0.10no educational qualifications 0.35 0.35 0.35 0.34 0.37 0.37no vocational qualifications 0.56 0.58 0.54 0.53 0.57 0.54drivers licence 0.23 0.24 0.22 0.24 0.24 0.22drivers licence and car 0.14 0.14 0.14 0.15 0.15 0.14long-term health problem 0.24 0.24 0.25 0.24 0.24 0.21housing: social housing 0.51 0.51 0.51 0.52 0.52 0.53housing: mortgage or owned 0.30 0.26 0.32 0.29 0.25 0.28employed at first interview 0.25 0.25 0.25 0.25 0.26 0.20unemployed at first interview 0.47 0.49 0.46 0.47 0.50 0.54number of jsa claims since jan 1995 4.10 4.15 4.07 4.12 4.13 4.10total days unemployed before ND 642.99 649.66 637.99 639.40 655.28 624.48duration of current claim at ND entry 291.09 288.34 293.15 288.66 283.84 288.08Local unemployment rate at ND entry 5.74 5.54 5.89 5.79 5.63 5.37N 2974 1274 1700 1300 1316 1300

Table 7: Results of estimating the propensity score using the incomplete panel and varying the refreshment sample BP NTRS TRS PRS BP &

NTRS BP & TRS

BP & PRS

aged 18-19 -0.109 -0.119 -0.007 -0.652 -0.113 -0.059 -0.348 (1.19) (1.22) (0.07) (6.63)** (1.42) (0.75) (4.41)** aged 20-21 -0.056 -0.030 -0.020 -0.382 -0.049 -0.034 -0.195 (0.71) (0.36) (0.25) (4.55)** (0.72) (0.51) (2.89)** aged 22-23 -0.037 0.033 -0.001 0.228 -0.011 -0.018 0.042 (0.50) (0.42) (0.02) (2.61)** (0.17) (0.28) (0.64) female -0.030 -0.067 -0.033 -0.112 -0.047 -0.029 -0.066 (0.55) (1.17) (0.58) (1.91) (1.00) (0.63) (1.41) ethnic minority 0.155 0.114 0.050 0.543 0.135 0.109 0.295 (3.20)** (2.21)* (0.98) (10.03)** (3.19)** (2.59)** (6.96)** has partner -0.165 -0.185 -0.164 0.533 -0.172 -0.161 0.082 (1.82) (1.95) (1.73) (4.83)** (2.18)* (2.03)* (1.01) dependent children -0.031 -0.008 0.014 -0.451 -0.018 -0.009 -0.204 (0.29) (0.07) (0.13) (4.01)** (0.20) (0.09) (2.26)* no educational qualifications -0.018 0.002 -0.030 0.003 -0.010 -0.023 -0.012 (0.34) (0.03) (0.54) (0.06) (0.21) (0.49) (0.26) no vocational qualifications 0.116 0.150 0.050 0.187 0.125 0.083 0.142 (2.34)* (2.83)** (0.94) (3.43)** (2.89)** (1.93) (3.27)** drivers licence 0.190 -0.021 0.065 -0.068 0.089 0.130 0.084 (2.16)* (0.24) (0.72) (0.72) (1.20) (1.73) (1.11) drivers licence and car -0.147 -0.006 -0.101 -0.039 -0.075 -0.114 -0.103 (1.43) (0.05) (0.96) (0.35) (0.85) (1.30) (1.15) long-term health problem -0.021 -0.016 -0.020 0.120 -0.020 -0.021 0.035 (0.37) (0.27) (0.33) (1.92) (0.40) (0.43) (0.72) housing: social housing -0.201 -0.124 0.014 -0.057 -0.162 -0.094 -0.134 (3.17)** (1.88) (0.22) (0.84) (2.98)** (1.76) (2.46)* housing: mortgage or owned -0.365 -0.183 0.045 -0.119 -0.279 -0.178 -0.252 (5.17)** (2.46)* (0.61) (1.56) (4.57)** (2.94)** (4.11)** employed at first interview 0.066 0.059 -0.068 0.182 0.060 0.011 0.089 (1.00) (0.83) (0.95) (2.40)* (1.03) (0.19) (1.52) unemp at first interview 0.078 0.037 -0.075 -0.050 0.056 0.010 0.014 (1.37) (0.60) (1.21) (0.80) (1.12) (0.20) (0.29) num. JSA claims since 1/95 0.002 -0.006 0.005 0.011 -0.002 0.003 0.004 (0.14) (0.47) (0.43) (0.86) (0.17) (0.33) (0.39) total days unemp before ND -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 -0.000 (0.21) (0.10) (0.46) (4.17)** (0.22) (0.31) (2.27)* length of claim at ND entry -0.000 -0.000 0.000 -0.000 -0.000 -0.000 -0.000 (0.98) (0.73) (0.54) (0.12) (1.01) (0.39) (0.91) local unemp rate at ND entry -0.059 -0.045 -0.020 0.069 -0.051 -0.042 -0.013 (4.57)** (3.27)** (1.41) (4.38)** (4.51)** (3.64)** (1.09) Constant 0.283 0.324 0.112 -0.199 -0.128 -0.239 -0.288 (1.99)* (2.10)* (0.74) (1.23) (1.03) (1.94) (2.29)* Observations 2954 2552 2566 2562 4244 4258 4254 Absolute value of z-statistics in parentheses. * significant at 5% level; ** significant at 1% level

Table 8: Distribution of weights for the matched samples Type of refreshment sample: BP NTRS TRS PRS BP &

NTRS BP & TRS

BP & PRS

weight: 1 488 334 382 353 639 721 6722 194 209 202 149 193 192 1823 62 80 62 63 56 36 494 24 25 34 29 11 11 105 8 22 16 24 5 1 36 2 5 5 7 47 5 3 4 5 8 1 1 2 3 9 1 1

10 1 11 2 12 1 13 1 14 15 16 17 18 19 1

N 785 679 707 640 904 961 920mean weight 1.61 1.86 1.79 1.97 1.40 1.31 1.37% top decile 25 24 25 0.30 22 20 22

Table 9: Measures of concordance between dropouts and refreshment samples BP NTRS TRS PRS BP & NTRS BP & TRS BP & PRS

Pre/post matching: pre post pre post pre post pre post pre post pre post pre postaged 18-19 5.42 1.50 6.02 4.29 0.80 3.43 28.38 0.53 5.68 0.56 2.74 4.15 15.62 4.73aged 20-21 1.65 7.57 0.41 0.51 0.68 3.37 9.59 5.98 0.76 1.18 1.23 0.34 5.12 5.18aged 22-23 3.06 1.45 5.11 2.92 1.21 2.53 33.52 7.68 3.95 0.18 2.25 2.36 15.37 1.70Female 2.42 2.81 4.19 5.24 1.39 0.88 7.20 12.67 3.19 2.10 1.97 0.70 4.50 1.92ethnic minority 13.53 1.43 10.06 0.16 4.63 5.39 39.49 4.26 12.02 0.64 9.64 0.79 24.49 0.64has partner 5.68 1.65 7.65 7.22 8.85 0.69 16.20 2.15 6.54 2.81 7.08 0.23 3.09 2.97dependent children in household 4.07 3.31 4.74 2.47 4.85 1.37 4.74 6.87 4.37 0.28 4.42 6.33 4.37 1.10no educational qualifications 1.00 2.82 2.78 4.82 2.86 1.15 2.69 4.62 1.77 0.66 0.69 3.47 0.60 1.16no vocational qualifications 7.29 3.52 10.55 1.12 3.13 0.48 8.70 6.07 8.70 1.12 5.48 2.24 7.90 2.88drivers licence 6.12 2.27 0.85 0.74 0.02 2.96 4.79 4.32 3.06 1.68 3.43 2.43 5.54 2.64drivers licence and car 0.53 0.45 2.45 0.89 2.99 0.89 1.71 9.59 1.36 1.80 1.61 3.59 0.44 6.58long-term health problem 2.25 0.56 0.88 3.72 1.26 4.27 5.93 0.95 1.66 2.41 1.82 1.85 1.24 4.68housing: social housing 0.90 1.43 2.11 0.63 1.45 0.95 4.42 9.99 1.43 0.32 1.14 1.11 2.43 5.87housing: mortgage or owned 13.56 0.35 5.90 1.42 3.14 2.00 3.50 12.32 10.27 1.41 6.43 1.24 9.25 3.88employed at first interview 0.09 0.55 0.09 3.29 2.01 1.27 11.52 2.09 0.09 3.48 0.93 2.37 4.83 3.34unemployed at first interview 5.22 3.02 3.33 2.38 2.06 5.23 10.99 1.59 4.40 3.65 2.04 6.50 1.80 4.12number of jsa claims since jan 1995 3.66 0.50 1.61 1.96 0.89 5.21 2.18 2.84 2.77 1.87 2.43 0.83 3.02 6.39total days unemployed before ND 2.53 4.29 2.23 1.02 1.22 2.99 5.54 8.78 2.40 4.53 0.90 3.53 3.82 1.29duration of current claim at ND entry 1.77 3.75 0.12 2.63 1.74 2.47 0.10 8.02 1.07 1.27 0.28 3.11 0.97 1.62TTWA unemployment rate at ND 19.45 2.62 13.72 3.87 5.16 1.22 9.72 5.60 16.97 0.24 13.33 2.80 7.33 0.42Mean 5.01 2.29 4.24 2.57 2.52 2.44 10.55 5.85 4.62 1.61 3.49 2.50 6.09 3.15

Table 10: Differences in outcomes between dropouts and substitutes from refreshment samples BP NTRS TRS PRS BP &

NTRS BP & TRS

BP & PRS

unemployed at W2 -0.13* -0.07* 0.00 0.08* -0.10* -0.07* -0.04 (6.07) (2.94) (0.01) (2.31) (5.12) (3.46) (1.72) [1.67] [0.58] [0.06] [0.82] [1.09] [0.56] [0.24]unemployed Dec 98 -0.02 -0.02 0.00 -0.09* -0.02 -0.02 -0.05* (1.48) (0.80) (0.00) (3.31) (1.32) (0.94) (2.59) [0.09] [0.07] [0.04] [0.83] [0.07] [0.05] [0.26]unemployed Mar 99 -0.03 -0.02 0.00 -0.07* -0.03 -0.02 -0.05* (1.74) (0.86) (0.01) (2.24) (1.51) (1.02) (2.22) [0.15] [0.09] [0.05] [0.67] [0.11] [0.07] [0.30]unemployed Jun 99 -0.06* -0.03 0.00 0.03 -0.05* -0.03 -0.03 (3.03) (1.44) (0.01) (0.79) (2.59) (1.69) (1.10) [0.43] [0.17] [0.06] [0.19] [0.28] [0.16] [0.12]unemployed Sep 99 -0.08* -0.04 0.00 0.16* -0.06* -0.04* 0.01 (3.73) (1.81) (0.01) (3.64) (3.15) (2.09) (0.50) [0.66] [0.26] [0.06] [2.61] [0.44] [0.24] [0.10]unemployed Dec 99 -0.11* -0.06* 0.00 0.11* -0.09* -0.06* -0.02 (5.30) (2.53) (0.01) (2.93) (4.42) (2.98) (0.82) [1.26] [0.45] [0.06] [1.40] [0.83] [0.43] [0.11]unemployed Mar 00 -0.14* -0.08* 0.00 0.03 -0.11* -0.08* -0.07* (6.82) (3.29) (0.00) (0.94) (5.79) (3.86) (3.31) [2.09] [0.70] [0.06] [0.18] [1.35] [0.68] [0.59]unemployed Jun 00 -0.12* -0.07* 0.00 0.02 -0.10* -0.07* -0.06* (5.74) (2.82) (0.01) (0.61) (4.96) (3.29) (2.95) [1.47] [0.51] [0.06] [0.12] [0.96] [0.50] [0.45]* - significant at 5% level; t-statistics in parentheses; 100*mean square error in brackets; bootstrapped (10,000 reps)

Table 11: Differences in outcomes between dropouts and substitutes from refreshment samples, including wave 1 and wave 2 unemployment status in propensity score model BP NTRS PRS BP & NTRS BP & PRSunemployed at W2 -0.07* -0.04 -0.03 -0.06* -0.06* (3.81) (1.91) (0.71) (3.29) (3.31) [0.57] [0.21] [0.29] [0.37] [0.39]unemployed Dec 98 -0.01 -0.01 -0.01 -0.01 0.00 (0.48) (0.29) (0.17) (0.44) (0.28) [0.03] [0.04] [0.15] [0.03] [0.02]unemployed Mar 99 0.00 0.00 0.00 0.00 0.00 (0.17) (0.07) (0.09) (0.14) (0.08) [0.03] [0.04] [0.20] [0.03] [0.03]unemployed Jun 99 -0.02 -0.01 -0.01 -0.02 -0.02 (1.11) (0.46) (0.26) (0.90) (0.88) [0.08] [0.06] [0.20] [0.06] [0.06]unemployed Sep 99 -0.01 0.00 0.01 -0.01 -0.01 (0.58) (0.24) (0.24) (0.47) (0.40) [0.04] [0.04] [0.21] [0.03] [0.03]unemployed Dec 99 -0.05* -0.03 -0.02 -0.04* -0.04* (2.67) (1.33) (0.47) (2.30) (2.39) [0.27] [0.12] [0.23] [0.18] [0.21]unemployed Mar 00 -0.10* -0.06* -0.04 -0.08* -0.08* (5.20) (2.51) (1.07) (4.43) (4.47) [1.12] [0.38] [0.37] [0.72] [0.74]unemployed Jun 00 -0.09* -0.05* -0.04 -0.07* -0.07* (4.25) (2.15) (1.01) (3.74) (3.76) [0.77] [0.29] [0.34] [0.52] [0.53]* - significant at 5% level; t-statistics in parentheses; 100*mean square error in brackets; bootstrapped (10,000 reps)

prop

ortio

n un

empl

oyed

probability of unemployment over time

stayer dropout

01jan1999 01apr1999 01jul1999 01oct1999 01jan2000 01apr2000

.2

.4

.6

.8

Figure 1: Difference between stayers and dropouts in the probability of being unemployed

Figure 2: Constructing the refreshment samples

0.30

0.40

0.50

0.60

0.70

0.80

0.90

Dec-98

Feb-99

Apr-99

Jun-9

9

Aug-99

Oct-99

Dec-99

Feb-00

Apr-00

Jun-0

0

prop

ortio

n un

empl

oyed

sample frameNTRSTRSPRS

Figure 3: Differences between samples in the changing probability of unemployment

Sampling frame (including initial nonrespondents)

N=11,159

Non-test samplen=N*pr(R=1)/2

Test samplen=N*pr(R=1)/2

TRSnTRS =nIP

NTRSnNTRS =nTRS

PRSnPRS =nTRS

BPnBP = n*pr(w=1)

IPnIP = n*pr(w=0)

ε1 < 0.5, R=1 ε1 ≥ 0.5, R=1

ε2 < Pr(w=0)

nTRS cases with largest pr(w=0)w=0w=1 w=0


N=11,159



TRSnTRS =nIP

NTRSnNTRS =nTRS

PRSnPRS =nTRS

BPnBP = n*pr(w=1)

IPnIP = n*pr(w=0)

ε1 < 0.5, R=1 ε1 ≥ 0.5, R=1

ε2 < Pr(w=0)



N=11,159



TRSnTRS =nIP

NTRSnNTRS =nTRS

PRSnPRS =nTRS

BPnBP = n*pr(w=1)

IPnIP = n*pr(w=0)

ε1 < 0.5, R=1 ε1 ≥ 0.5, R=1

ε2 < Pr(w=0)