29
Adjusting for non-response in a longitudinal survey: comparisons of weighting and imputation. Susan McVie 1, Paul Norris 2 and Gillian M Raab 3 Draft Report for MOLS 2006 Conference Please do not cite or quote without permission. This paper is a summary of work that will be incorporated into a full paper after the conference. References to the literature (in particular) will need expanding and the detail of the method used will be shortened. 1. Centre for Law and Society, University of Edinburgh, 31 Buccleuch Place, Edinburgh EH8 9JS. [email protected] 2. School of Social and Political Studies, University of Edinburgh, 10 Buccleuch Place EH8 9LL [email protected] . 3. School of Community Health, Napier University, Edinburgh EH4 2LD. [email protected] Abstract The performance of weighting and imputation in adjusting for non-response is compared using six sweeps of longitudinal data from the Edinburgh Study of Youth Transitions and Crime. This survey contains a large number of questions with ordinal responses where any response other than the lowest is often fairly rare. Once the appropriate covariates have been identified, weighting is relatively straightforward to carry out, whereas imputation for this type of data leads to many technical difficulties. Eventually, two of the six imputation strategies attempted produced reasonable results.

Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. [email protected] Abstract

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Adjusting for non-response in a longitudinal survey: comparisons of weighting and imputation.

Susan McVie1, Paul Norris2 and Gillian M Raab3

Draft Report for MOLS 2006 Conference Please do not cite or quote without permission. This paper is a summary of work that will be incorporated into a full paper after the conference. References to the literature (in particular) will need expanding and the detail of the method used will be shortened.

1. Centre for Law and Society, University of Edinburgh, 31 Buccleuch Place, Edinburgh EH8 9JS. [email protected] 2. School of Social and Political Studies, University of Edinburgh, 10 Buccleuch Place EH8 9LL [email protected]. 3. School of Community Health, Napier University, Edinburgh EH4 2LD. [email protected]

Abstract The performance of weighting and imputation in adjusting for non-response is compared using six sweeps of longitudinal data from the Edinburgh Study of Youth Transitions and Crime. This survey contains a large number of questions with ordinal responses where any response other than the lowest is often fairly rare. Once the appropriate covariates have been identified, weighting is relatively straightforward to carry out, whereas imputation for this type of data leads to many technical difficulties. Eventually, two of the six imputation strategies attempted produced reasonable results.

Page 2: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

The imputation methods produced greater differences from the observed data than weighting. Possible reasons for this are discussed along with the advantages and disadvantages of the two methods. Keywords Weighting, imputation, longitudinal data

Page 3: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

1 Introduction

1.1 Missing data in longitudinal analyses A frequently encountered problem in large-scale quantitative surveys is that of missing data, due both to item and individual non-response. In longitudinal studies, the problem is particularly pronounced since data may be missing at not just one time-point but several, and the pattern of missing data may not just be monotone (due to dropout) but intermittent, as subjects may return after one or more missed sweeps. Data that are missing completely at random (MCAR) pose little or no problem; however, in the vast majority of surveys, missing data are non-ignorable since data are missing not at random (MNAR) and this can pose a serious risk to the production of un-biased survey estimates. A key problem with missing data is that little is often known about the characteristics of those who do not respond and how or to what extent non-response relates to the survey outcomes of interest. A redeeming feature of longitudinal studies is that, amongst those who have participated at least once, a certain amount of information is known about the missing population which means that aspects of bias can be adjusted for more reliably. Two approaches may be used to handle missing data – weighting and imputation. The more commonly used of the two, data weighting is a procedure that attempts to correct the distributions in the sample data to approximate those of the population from which it is drawn. Usually, this is partly a matter of expansion and partly a matter of correction or adjustment for dealing with both non-response and non-coverage. This method provides data that reflect the characteristics of the population (on known parameters) rather than the sample. Data imputation, which is less commonly applied, involves the replacement of missing data with a substitute. The substitute value may be ‘borrowed’ from another subject by matching the individual characteristics of respondents (more commonly used in cross-sectional studies); or estimated based on the responses of the individuals themselves at earlier or later time-points (using longitudinal data), with reference to the pattern of response amongst other similar respondents. There are several factors that make missing data problems different in longitudinal data, some of which may facilitate dealing with non-responders, while others may hinder. And these aspects may affect weighting and imputation differently. Methods for longitudinal data, such as the use of repeated measures analysis, may deal with some missing data internally so that there is no need to use either weighting or imputation. However, if the influence of covariates which may change at different sweeps is of interest then some method of dealing with missing data for covariates needs to be considered.

Page 4: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Imputation methods may seem the natural choice for longitudinal data because responses at one sweep may be highly informative about missing responses at other sweeps. It should then be fairly straightforward to use imputed data in longitudinal analyses. But the sheer volume of data on each subject in a longitudinal study may make imputation difficult in practice. Weighting for longitudinal data may be complicated by the possibility of many patterns defined by the responses at each sweep of the survey. This will require a minimum of one weight for each sweep of the survey, and additional weights when combinations of sweeps are used in the same analysis. We are unsure how weighting might be used in a repeated measures analysis. As we mention above, this would not matter if only the outcome being modelled were missing. But bias can also be introduced in a complete case analysis if covariates are missing not-at-random (ref Stats in Med). Table 1 summarises these and other characteristics of weighting and imputation for longitudinal survey data. Weighting Imputation Practical issues Familiarity and practice **Well-established

methodology Less familiar

Difficult to decide how to carry out

** Some choices but generally straightforward and only factors affecting response and outcome needed.

Many choices and all variables to be used in the analysis with any missing data need to be included

Item non-response Some other method **Can be included along with unit non-response

Difficulties in using data A different set of weights has to be used for each sweep of a longitudinal survey

**Imputed data can cover all potential respondents at every sweep

Analysis Straightforward with weights in survey software

Needs special software or extra calculations to

Page 5: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

incorporate uncertainty due to imputations.

Assumptions of method Ignorability assumption Conditional on completely

observed variables used for weighting, but not fully on other variables.

**Conditional on all variables in the imputation model.

Model assumptions Model to predict non-response must be the correct one and non-response must be ignorable given these items.

Complete multivariate model (perhaps defined by conditionals) must be correct for items with missing data

Recommended strategy

Use parsimonious models to avoid introducing noise

Use large models so that all potential sets of analysis variables are in the imputation model.

Table 1 Comparisons of weighting and imputation for missing data in longitudinal data (**indicates an advantage).

Page 6: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

The literature on methods for dealing with missing data tends to deal with these two approaches separately, despite the fact that certain key principles for implementing the procedures are shared. For example, neither approach should lead to bias or distributional change in the data, or add significant extra variance to estimators; both methods should, wherever possible, rely on data from the sample rather than making external assumptions about the likely nature of missing data; and neither procedure should result in important sample estimates being based too heavily upon either imputed or weighted values. Despite these fundamental similarities, little guidance is given on which is the best approach to use and in what circumstances, or even whether both methods can be used simultaneously to address different problems. A key aim of this paper is to present a comparative analysis of these two methods of dealing with missing data and to assess the practical difficulties, the strengths and the limitations of each approach. In this paper we will be considering how the two main approaches to missing data, weighting and imputation, perform in adjusting for missing data in a longitudinal survey. Our aim is not to devise new methodologies, but to use existing methods in the most appropriate way. We will be able to compare the ease of use and face validity of the two methods. Since we do not know what the true values of the missing observations should be we cannot say definitively which method is more correct, but simply compare the results we have obtained from the two methods. Our goal is to produce a resource that can be used by other analysts, without others having to carry out further imputation procedures themselves. Thus for weighting we wish to have only one set of weights for each sweep, rather than for any combination of sweeps. Our imputation procedure should be capable of being used to fill missing values of all of the several hundred variables that are available over the six sweeps of a survey. In order for the experience of the methods to reflect what an analyst using just one approach would achieve the three authors worked independently to develop different approaches. One author (SMcV) used weighting while the other two developed imputation methods each with a different software package. PN used STATA and GR used SAS. 2 The survey data The Edinburgh Study of Youth Transitions and Crime is a prospective longitudinal study that involves a cohort of around 4,300 young people who started secondary school in the City of Edinburgh in 1998, aged between 11 and 12. The main aim of the study is to examine causal pathways of offending and to establish why some young people get involved in persistent and serious

Page 7: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

offending, while many others do not. The main source of information about the young people is a self-completion questionnaire that was administered annually over six sweeps, most commonly in a school classroom situation. The questionnaire included mainly closed questions, with some simple routing, and was carefully piloted to ensure its suitability for the literacy level of the age group. Assistance was provided to those who fell below the necessary reading level. All schools covering the relevant age range were asked to participate, including state-funded mainstream and special educational schools and the city’s independent sector schools. All of mainstream schools in the city consented to take part in the research, although a small number of special and independent schools declined, representing a 7.8% non-response rate amongst all children eligible to have participated. Amongst the participating schools, parental consent was sought using a passive method, in which parents were informed about the study in writing and asked to sign and return the response slip only if they wished to withdraw their child from the research. This yielded a further 3.4% non-response amongst the potential eligible cohort. These missing data are not considered within the remit of this paper, since very little was known about these individuals, although it is worth noting that such non-response problems could only be handled using weighting rather than imputation. The independent sector schools had a high level of intake at sweeps two and three; therefore, it was decided to include any new school pupils moving into the city as potential eligible cohort members. At the same time, study participants who left the city permanently at sweeps two or three were not tracked, as no forwarding address details were available at this time. As a result of these changes to the eligible population, the cohort base was not ‘fixed’ until the start of sweep four of data collection, at which point a total of 4328 individuals were identified as the final cohort. This is the base number that was initially considered for the data weighting and imputation approaches used in this paper. In addition, however, eight individuals were excluded from the base population because they died or opted out of the study permanently at subsequent sweeps. This resulted in a base total of 4320 for consideration in this paper.

2.1 Survey methodology and non-response The Edinburgh Study involves a complex mixed-method design, involving both self-completion and official record data collection (for more details on the design of the study see Smith and McVie 2003). This paper is concerned only with missing data arising from the self completion element of the study. At each sweep, a paper and pencil questionnaire was administered to young people

Page 8: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

predominantly within school classrooms, in exam-like conditions. The survey was administered by researchers, with no teacher present in the majority of cases, to reinforce confidentiality and to minimise questionnaire error (questionnaires were checked on completion for item non-response). Strenuous efforts were made to maximise the response rate, including several follow-up visit to schools to capture absentees and home visits to interview persistent non-attendees. Table 2 summarises the non-response at each sweep of the survey. During the first four sweeps, the proportion of cohort members surveyed at school was very high as they had not yet reached the minimum school leaving age.1 At sweeps five and six, however, a number of different strategies were adopted to ensure a high response rate. Those who were still attending school were, for the most part, surveyed as normal. School leavers (and those who were still attending school but, for a variety of reasons, could not be surveyed there) were initially sent a postal questionnaire which, if not returned, was followed up by a face-to-face visit from an interviewer. Despite extensive attempts to track all cohort members, individual response rates at sweeps five and six of the study dipped considerably lower than at earlier sweeps. The factors which influenced unit non response are described in section 2.3 on weighting. Item non-response made a smaller contribution overall to the total number of missing items. In addition, this problem reduced over time, most probably due to a combination of educational improvement amongst cohort members and persistent efforts by the research team to minimise the problem.

sweep % unit non response

% any item non response

Average no of questions missed by item non

responders2 % complete 1 5.88 7.03 1.51 87.50 2 3.75 6.33 1.37 90.16 3 1.74 6.05 1.54 92.32 4 4.24 4.69 1.49 91.27 5 10.69 4.46 1.31 85.33

1 It is worth noting that even those who were persistent truants or non-attenders were mostly surveyed in a school environment, since most were attending special educational resources with an adapted timetable to encourage them to remain in school. 2 This refers to non-response to any of the 90 questions on offending behaviour considered here.

Page 9: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

6 18.29 4.11 1.39 78.35 All sweeps 55.86

Table 2 Rates of unit and item non-response at each of six sweeps

2.2 Survey items imputed In order to compare the results of weighting and imputation approaches, we selected a subset of 90 questions on offending that were asked across the six sweeps of the survey. In the majority of cases, the same questions were asked at every wave of the survey, while some questions were only asked at certain sweeps. A complete list of questions asked is given at Appendix 1. Each one was asked in the same format involving an initial screener question which, if answered positively, was followed up with a series of follow-up questions including one on how many times they did this. An example of the questions asked is shown in Figure 1.

Figure 1 Sample question for breach of the peace

Page 10: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

At each sweep, three possible total scores are calculated from the questions asked . These are 1) prevalence of any offending 2) variety of offending (number of types of offence committed) and 3) volume of offending, calculated by adding the scores for each sweep. The total volume score does not represent a true count of all offending, as the upper values are capped at 6 (for between 6 and 10 times) and 11 (for more than 10 times) thus representing the minimum count possible. Figure 2 illustrates that the peak for offending (shown here as the volume score) is at sweep 3 and declines at later sweeps. Figure 2 also illustrates that restricting analysis to the 56% who were complete responders at every sweep selects a sample with a lower rate of offending.

0

2

4

6

8

10

12

14

sweep1

sweep2

sweep3

sweep4

sweep5

sweep6

available casescomplete cases

Figure 2 Volume of offending at each sweep shown for all available data and for those with complete responses to all questions at every sweep. All of the offending questions produce a very skewed distributions, with most responders being non-offenders and a small but appreciable number being persistent offenders. In addition, some offences are more common than others; therefore, the questions make different contributions to the total volume score. In order to compare the distributions produced by different methods of adjustment for non-response we selected a set of questions with different prevalence levels. These are summarised in Table 3. Two of the questions are fairly common offences, while the other two are less common.

Page 11: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Sweep 1 Sweep 2 Sweep 3 Sweep 4 Sweep 5 Sweep 6

Boys Rowdy or rude in a public place 1.156 1.755 2.131 1.799 1.405 1.106 Stolen from a shop 1.019 1.002 1.190 1.094 0.730 0.395 Robbed a person 0.079 0.059 0.075 0.153 0.042 0.061 Credit card fraud not asked before sweep 6 0.047 Girls Rowdy or rude in a public place 1.664 2.348 2.839 2.708 2.430 1.744 Stolen from a shop 1.594 2.154 2.690 2.506 1.709 1.083 Robbed a person 0.016 0.027 0.032 0.034 0.025 0.028

Cheque or credit card fraud not asked before sweep 6 0.014

Table 3 Mean volume score contribution from selected questions

2.3 Weighting methodology A variety of weighting approaches exist for dealing with individual missing data, including cell, rim and post-stratification weighting. These approaches tend to be used when little or nothing is known about the non-responders, and assumptions are made about them based on the characteristics of those who do respond or the broader population marginal distribution. Where key information is known about the non-responders, however, a more reliable method of calculating weights is logistic regression modelling where the dependent variable is ‘response’ and known characteristics of both the responders and non-responders are the independent variables. This approach calculates the predicted probability of belonging to the response group, the inverse of which provides the weight to be applied to that individual. The strength of the weight is determined by the likelihood of an individual with certain characteristics responding to the survey, where these characteristics are known to be related both to the survey outcomes of interest, in this case offending, and also to response propensity. Larger weights are applied to those who have the least probability of responding. For the purposes of this paper, six cross-sectional weights were constructed where the base sample was all those who formed the final cohort at sweep four. This, therefore, excluded any individuals who had dropped out of the survey at sweeps two or three, and

Page 12: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

included any new members who had joined at these sweeps. In addition, those who died during the course of the study or who were permanently opted out (either by their own choosing or by proxy) were also excluded from this analysis. The base number for analysis was 4320. At each sweep, the cohort members were assigned a value of 1 if they responded and 0 if they did not, forming the dependent variable for logistic regression analysis. A selection of key variables was then identified as suitable independent variables for modelling. Two criteria were used for selecting the independent variables: first, they were associated with the outcome variables of interest (most importantly, offending); and second, they were associated with response rate propensity. A total of nine variables were identified from the study data that met these criteria. These are summarised in table A, below. It is worth noting that these variables were derived from a variety of sources, not just the self-reports of the individual cohort members, thus providing a more robust approach to increasing precision in the construction of the weights. These alternative data sources included: administrative data collected on cohort members; a survey of cohort members’ parents; census data; school records; and records of the children’s Reporter3. Briefly, all nine of the independent variables proved to be significantly associated with response at three or more sweeps. Details are provided at Appendix 2. Non-response was consistently associated with being male, being older than average, being an early school leaver, living in a deprived household and being known to the children’s Reporter (both generally and for offending). The characteristics of the non-responders did change in some respects between the early and later sweeps, however. At sweeps 1 and 2, non-responders were more likely to be non-white, non-serious offenders and non-truants; whereas, from sweep 4 onwards, the non-responders were more likely to be white, persistent offenders and known truants. This shift in the profile of the non-responders is probably due to a change in reason for non-response. In the early sweeps, non-response was largely due to non-availability (i.e. many pupils, especially those in the independent sector which includes students from abroad, had not yet joined the Edinburgh schools), whereas at later sweeps non-response was due to pupil attrition or apathy on the part of respondents. Variable name Description

3 The children’s Reporter is the official responsible for dealing with young people under the age of 16 in Scotland who are perceived to be in need of compulsory measures of care. Referrals may be made to the Reporter on a number of grounds, including offending.

Page 13: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Gender Binary variable (male=1, female=0). Age Categorical variable differentiating between three age groups (those

who were within one standard deviation of the average age=0, those who were older=1 and those you were younger=2).4

Ethnic group Binary variable (white=1, non-white=0). Early school leaver Binary variable differentiating between those who left school

immediately at or before the minimum leaving age of 16 (yes=1, no=0).

Individual deprivation Binary variable based on a combination of parental socio-economic status and neighbourhood deprivation scores based on census data (high deprivation=1, average to low deprivation=0).

Persistent offending Binary variable indicating whether the individual had reported offending more than five times during any sweep of data collection (yes=1, no=0).

Official record Binary variable indicating whether the individual had been referred to the children’s Reporter at any sweep (yes=1, no=0).

Offending record Binary variable indicating whether the individual was known to the children’s Reporter as an offender at any sweep (yes=1, no=0).

Truancy record Binary variable indicating whether the individual was recorded by the school as being a known truant (yes=1, no=0).

Table 4 Independent variables used in weight construction The independent variables were entered into the regression model in a forward stepwise manner, with cut-off criteria for inclusion set at p<0.05. 3 Imputation methodology There is a greater diversity of methods for imputation than is the case for weighting. They can be divided into those that use empirical procedures, such as hot deck methods, and model based methods. We will consider the second type only here as they seem more appropriate for the highly structured data from longitudinal surveys. 4 Although the study was based on a single age cohort, there was considerable variation in the ages of cohort members spanning a three year period.

Page 14: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

A variety of model based procedures have been proposed and the methodology has been reviewed in a recent paper in the Journal of Official Statistics (Munnich and Rassler 2005). In all of these methods a joint multivariate model for all of the survey data is used, or approximated by the procedures. We selected the 90 offending variables mentioned in section 2.2 along with the same nine independent variables that were used in the weighting approach listed in section 2.3. The completely observed variables can be used in the imputation as part of the joint distribution. Since no imputation will be carried out for them it is commonly considered (refs) that distributional assumptions are of minor importance for them. The variable about offending more than five times at any sweep was not included in the imputation model since this information was available in the detailed questions. (Note: some of the results here do include it but differences in missing it out are small).

3.1.1 Based on Normal Distribution Shaffer (1997) has proposed using a normal distribution and it has been suggested (ref) that it may be practical to use it even for clearly non-normal data by appropriate add-hoc methods used after or during the imputation. These procedures have been implemented in several packages (Splus, SAS, R, STATA) and various evaluations have been carried out (eg paper in press for Stats in Medicine). Schafer (1997) has also proposed methods based on multinomial distributions for categorical data, but computational problems make these unsuitable for only a few variables at a time.

3.1.2 Chained equations In recent years discontent with the normality assumption has lead to a number of different groups proposing an approach that involves specifying the conditional distribution for each variable’s dependence on the others and proceeding iteratively to impute each one from the others. These methods are variously known as ‘chained equations’, ‘sequential regression models’ and ‘regression switching’. They have been implemented in SAS (via the IVEWARE macros, Ranguhtan et al) in S-plus and R (the MICE library) and in STATA (ICE procedures). The distribution can be selected appropriately for each variable with missing data, and either all other variables can be included in each model, or a selected model chosen. Choices of models in existing chained equation software include the following six options that might be considered for the offending questions in the survey.

Page 15: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Method 1. Normal distributions (making the model equivalent to a multivariate normal one if all variables are included in every model)

Method 2. A proportion of zeros and a normal distribution for the non-zero items. (IVEWARE only)

Method 3. Polytomous logistic regression (categorical data with more than 2 categories)

Method 4. Poisson regression for count data (IVEWARE only)

Method 5. Ordinal logistic regression (STATA ICE package only)

Method 6. Predictive mean matching, also known as PRIMA (predictive imputation matching) which predicts all the observations from a linear model and replaces the missing data with the observed case whose predicted value is closest to the predicted value for the missing case (STATA and S-plus/R only, but a variant can be programmed in SAS).

We have explored all of these using SAS and STATA software. We were unable to use the mice routines in R because, at the time we were carrying out the analysis, there was a problem with the R version of the MICE code. We hope to investigate this further later.

4 Results

4.1 Weighting A summary of the results of analysis are presented in table 5, which indicates the odds ratio for each of the significant explanatory variables at each sweep.

Odds ratio for independent variables at each sweep Independent variable 1 2 3 4 5 6 Gender .760 .647 .572 Age: younger than mean 1.614

Page 16: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Age: older than mean .467 .385 .555 Ethnic group .223 .206 .543 .621 Early school leaver .627 .358 .253 .197 .022 .266 Individual deprivation Persistent offending 1.449 1.738 2.856 .707 Official record .462 .289 .162 .389 .419 Offending record .304 .476 .492 Truancy record 4.637 4.986 .799 Notes: Variables for which odds ratios shown were significant within the model at p<.05. Table 5 Variables significant in explaining response at each sweep The results emerging from table B indicate that fewer variables were significant in predicting probability of responding at sweeps 3 and 4, which is not surprising given that the completion rate amongst the final cohort was highest at these time points (see table 2). The only covariate to have no impact on response, when controlling for these other factors, was individual deprivation. Most of the other emerging factors were as expected. Non-response was higher amongst those who left school at the earliest opportunity at every sweep and amongst those were at the older end of the age spectrum, although not at all sweeps. Being male and being white predicted non-response at both the early and late sweeps. Being known to the children’s Reporter, either generally or for offending, predicted non-response at every sweep. There was some disparity in the direction of explanation for offending and truancy, which proved to be significant predictors of response at sweeps 1 and 2 (and 3 for offending), yet non-response at sweep 6. However, this is consistent with the descriptive results, which showed that non-responders at the first two sweeps were less likely to be offenders, whereas those who did not respond at sweeps five and six were more likely to be offenders. Each regression model produced a probability of response (p) based on the strength of the independent variables to predict likelihood of individual participation at each sweep. The reciprocal of the overall model-predicted probability (1/p) was calculated to produce the individual weights used in the analysis for this paper. A descriptive summary of the weights produced is shown in table 6. The smallest weights were found to be at sweeps 3 and 4, which is unsurprising since these were the sweeps with the highest response rate amongst the base sample. The largest single weight was 3.44 at sweep two, although the mean weights for the first four sweeps were all very close to one. The weights calculated for sweeps 5 and 6 were larger, as would be expected since response rates declined most at these sweeps, although the maximum weights did not exceed those for sweeps 1 and 2. Overall, the variance of the weights was low, particularly in the first four sweeps, and there were few outliers.

Page 17: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Consideration was given to using weighting classes; however, this was not pursued further due to the small variation in the weights produced and the lack of significant outliers influencing design effects. Sweep Minimum Maximum Mean Std dev Variance

1 1.01 3.08 1.0690 .10109 .010 2 1.00 3.44 1.0436 .08693 .008 3 1.00 1.33 1.0190 .04360 .002 4 1.01 1.47 1.0498 .08411 .007 5 1.00 2.91 1.1717 .29352 .086 6 1.05 3.29 1.2922 .36579 .134

Table 6: Descriptive summary of the weights calculated for each sweep

4.2 Imputation The various approaches to imputation discussed above were attempted and in each case the distribution of observed and imputed values was examined to check that the results looked reasonable. The first four methods mentioned above were found to be unsatisfactory for the reasons given below. Methods 1 and 2: Normal distribution assumptions with rounding of values to 0 to 7.

This gave distributions for the imputed data with means that were reasonable, but with distributions of a very different form from the observed data. An example is shown in Figure 3. Various strategies of rounding and restricting values and having a separate subgroup of zeros (method 2) were tried but there was no improvement seen.

Page 18: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Rowdy or rude in a public place S1

010

2030

4050

6070

80

0 1 2 3 4 5 6 7+

% o

f cas

es

ObservedNormal imputation

Figure 3 : Distribution of observed and imputed values, normal imputation. Method 3: Chained equations with polytomous regression This method was tried with the SAS IVEWARE software. If all of the other 89 questions were considered as categorical variables in predicting each of the 6 independent probabilities for a question with missing values, a total of 6 x 89 x 6 = 3209 dummy variables would be required. Unsurprisingly, these methods failed badly. Either the programs would crash and/or a failure of convergence would occur at some links in the chain. Failures were related to items with very few respondents recording any offending. Sometimes results would be obviously wrong with almost all the cases imputed to one category. Restricting the models to a small number of predictors still gave problems. Some of the failures related to cases where the variance covariance matrix used to to incorporate uncertainty in the coefficients had become singular. Examination of the coefficients showed that the perturbations of the coefficients were extreme. An attempt to use the importance sampling in IVEWARE resulted in a program failure.

Page 19: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Method 4: Chained equations with Poisson regression for count data.

Since the offending data are counts, aggregated at the higher levels, this model might be considered reasonable. The two aggregated upper groups were treated as counts of 6 or 7 and the imputed values were bounded so as not to exceed 7. None of the marginal distributions of the counts were close to Poisson distributions, but the conditional distributions might be better. This was a more parsimonious model than model 3. It gave some similar problems, but restricting the maximum number of predictors to any number up to 50 (from a total of 101) gave results that appeared reasonable even when a few links in the chain had not converged. However, for most variables the results gave fewer responses in the high categories of the imputed data than for the observed. This is quite out of line with what we expected from our exploration of the data. It seems likely that the results are being produced by forcing the imputed data to a Poisson distribution. Further attempts to model the aggregated counts more accurately by first simulating the true value of the aggregated counts resulted in an even worse set of imputed data.

Shoplifting S6

0

1

2

3

4

5

0 1 2 3 4 5 6 7+

% o

f cas

es

ObservedPoisson

92.2 95.2

Page 20: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Figure 4 Results for observed and imputed imputation using chained equations with Poisson models. Note that the top of category zero is truncated with the values given above..

Method 5: Ordinal logistic regression. The ICE package for STATA that includes this method does not allow any automatic selection of variables to produce reduced models. Thus the initial attempt to impute all 90 variables proved impossible to complete. The model failed to converge, most probably due to a scarcity of data on frequency of offending for the less prevalent offences, such as robbery (personal communication from Patrick Royston 01/09/2005). Given this problem, imputation was attempted by clustering the offending variables into groups of related variables (e.g. violent offences, property offences, etc) and imputing each group of variables separately, while controlling for the nine independent variables listed in section 2.3. This approach is good in theory, as the imputed values of each variable are then based on known data about offences which are criminologically related. Nevertheless, this method was also problematic because it was not always clear how to cluster the less common offence types and the models containing these variables often still failed to converge. In the end, an incremental approach was taken to the ordinal logistic regression method of imputation. This involved firstly creating a dataset of five imputations for around two thirds of the offending variables (using all nine control factors). This dataset contained the maximum number of imputable offending variables without the model failing to converge. In the main, the variables included were the most prevalent offences (such as assault and breach of the peace) and the variables collected at sweeps 3 and 4 (these being the sweeps with least missing data). This dataset containing 5 imputations was then used as the basis for a series of smaller single imputations to create a complete data series for those offending variables which failed to converge in the first imputation model. Typically 3 or 4 additional variables were imputed at a time, using all of the previously imputed offending variables as well as the control variables. This process was repeated until all the offending variables had being imputed. One potential limitation of this incremental approach is that while the values imputed at the latter stages take account of the likely values of all offending variables imputed earlier, the reverse is not the case. As such this approach does not replicate the cycling method of full multiple imputation. An alternative procedure would have been to specify separate equations for each of the variables with missing data, but this would have been an onerous task for 90 variables.

Page 21: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Royston (2005) suggests that when conducting imputation on ordinal variables it may be preferable to create a series of binary variables from the ordinal variables and build these into the imputation routine to get around the fact that ordered categories may not be equidistant (which we know to be the case with the Edinburgh Study data due to the capping). When working with large datasets, such as that in this paper, this would create some highly complex coding and our experience with polytomous regression suggests that it would not be helpful here.

Method 6: Predictive mean matching/ PRIMA.

A variant of this method was programmed in SAS. It involved first carrying out a complete imputation assuming a multivariate normal distribution for all variables. For each imputed dataset predictive matching was then carried out by fitting a regression model for each conditional model, including the measured values and the imputed values from a normal distribution without rounding or bounds. Predictive matching was carried out for all variables in each imputation by replacing the missing value with the measured value with a predicted value closest to the predicted value for the missing observation. For all methods the imputed data were checked by comparing distributions with the observed values and by checking that there was some correlation between imputed values of the same variable generated for different imputations. Only the last two methods from the six tried gave results that appeared reasonable. Five imputations of each were produced and the results of these are compared with weighting below.

4.3 Comparison of results between weighting and imputation

4.3.1 Mean values and distributions of corrected data Table 7 compares the total scores with and without the imputed data. Overall, it is fair to say that the impact of both weighting and imputation is small at the aggregate level, even at those sweeps which had the highest levels of non-response. A similar pattern is seen for all three methods, with weighting giving a slightly lower score than the available data at sweeps 1 and 2 but higher at later sweeps, especially sweep 6. The pattern for the two imputation methods are similar, most particularly for volume and variety of offending, but with the predictive matching giving higher scores for prevalence. In terms of prevalence of offending, weighting and

Page 22: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

ordered logistic regression produce the most similar results. Both imputation methods give a greater increase in scores than is seen for weighting at most sweeps.

Page 23: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Percent prevalence of any offending Difference from available data

Sweep available ordered logistic

predictive matching weighting

ordered logistic

predictive matching weighting

1 72.90% 72.63% 73.03% 72.67% -0.27% 0.13% -0.23% 2 72.25% 72.08% 72.40% 72.05% -0.17% 0.15% -0.20% 3 76.62% 76.48% 76.66% 76.74% -0.15% 0.03% 0.12% 4 69.96% 70.09% 70.26% 70.49% 0.13% 0.30% 0.53% 5 61.52% 62.58% 63.47% 62.22% 1.07% 1.96% 0.70% 6 41.84% 45.33% 46.51% 44.17% 3.49% 4.67% 2.33% Volume of offending Difference from available data

Sweep available ordered logistic

predictive matching weighting

ordered logistic

predictive matching weighting

1 7.67 8.10 7.97 7.63 0.43 0.30 -0.04 2 8.72 9.08 9.10 8.70 0.36 0.37 -0.02 3 12.24 12.80 12.78 12.35 0.55 0.53 0.11 4 10.72 11.40 11.30 11.09 0.68 0.58 0.37 5 6.99 7.76 7.63 7.52 0.76 0.64 0.52 6 3.42 4.58 4.06 4.04 1.16 0.64 0.62 Variety of offending Difference from available data

Sweep available ordered logistic

predictive matching weighting

ordered logistic

predictive matching weighting

1 2.25 2.33 2.34 2.24 0.08 0.09 -0.01 2 2.55 2.61 2.64 2.54 0.07 0.09 -0.01 3 2.97 3.07 3.07 2.98 0.10 0.11 0.02 4 2.59 2.68 2.69 2.65 0.09 0.10 0.06 5 1.73 1.84 1.87 1.81 0.11 0.14 0.08 6 0.94 1.13 1.11 1.06 0.19 0.16 0.12

Table 7 : Comparison of observed data with data sets including imputations, mean prevalence volume and variety at each sweep.

Page 24: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

When we examined the patterns of observed weighted and imputed responses to individual questions the distributions varied from question to question, but generally showed similar patterns for the two imputation methods, while the weighting produced results closer to the observed data. Some examples are shown in Figure 5.

Shoplifting S6

0

5

10

15

0 1 2 3 4 5 6 7+

% o

f cas

es ObservedweightingOrderedPredictive

Fraud S6

00.10.20.30.40.50.60.70.80.9

1

0 1 2 3 4 5 6 7+

% o

f cas

es ObservedweightingOrderedPredictive

Breach of Peace S6

0123456789

10

0 1 2 3 4 5 6 7+

% o

f cas

es Observed

weighting

Ordered

Predictive

Robbery S6

0

0.2

0.4

0.6

0.8

1

0 1 2 3 4 5 6 7+

% o

f cas

es ObservedweightingOrderedPredictive

77.1. 77.3 68.6 68,6 98.9.98.7 98.6 98.1

99.3 99.3 99.0 98.6 92.2 92.3 84.7 87.1

Page 25: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Figure 5 : Distribution of observed and imputed data for selected questions at sweep 6. Note that the top of category zero is truncated and the values given above each set of bars.

4.3.2 Estimated standard errors

Table 8 gives the estimated standard errors for the observed data, the weighted analysis and the two imputation methods. The imputation standard errors include the between-imputation component estimated from five imputations. The differences between the standard errors that apply to different methods are small.

PREVALENCE VARIETY VOLUME Mean Std

Error Mean Std

Error Mean Std

Error observed 0.4184 0.0084 0.9436 0.0289 3.4200 0.1500weighted 0.4417 0.0088 1.0623 0.0366 4.0385 0.2052ordered 0.4522 0.0083 1.1293 0.0336 4.5552 0.2222predictive 0.4648 0.0088 1.1035 0.0295 4.0193 0.1466

Table 8 Estimates and their standard errors for total scores at sweep 6 calculated by different methods. In all of these comparisons subjects with any item non-response at a sweep had to be excluded from both the observed data and the weighted data. For comparability of the standard errors these cases have also been excluded from the imputed data. 5 Conclusions Summary points only

• Weighting is relatively straightforward, once suitable control variables have been identified and it applies to all variables in the survey.

Page 26: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

• Imputation needs to be watched with care and can give obviously invalid results for some variables, especially those with rare categories..

• Imputation appears to give a somewhat greater adjustment for non-response than weighting for these data. Possible reasons for this that need further exploration are:

o Imputation includes an adjustment for item non-response o A direct relationship between answers to the questions about offending and non-response is highly plausible for these

data and thus imputation may do better in controlling for this aspect of the missing data mechanisms. o However, imputation comes at a price of untestable assumptions o And given the problems with some other methods we feel somewhat less than confident that everything has gone

smoothly with imputation. • Both methods have advantages and disadvantages that may affect different longitudinal surveys differently so an

understanding of both methods is important. • Our work so far raises further questions that we hope to consider shortly:

o The weighting we used here included a variable derived from items with missing data (offending on 5 or more items at any sweep). Might this have introduced a bias since those with missing data would be unlikely to achieve this?

o Could weighting have worked differently if we had included questions asked at some sweeps, e.g. weighting for sweep 6 from sweep 3, by including information for respondents who answered sweep 3 but not sweep 6.

o Should model-based imputation be attempted for so many variables? Or should some empirical method (e.g. hot-deck procedures) have been used?

o To what extent are the results from this survey generalisable to other surveys? 6 References Munnich R and Rassler S PRIMA (2005) A new multiple imputation Procedure for Binary Variables. Journal of Official Statistics 21 (2) 325-341. Ranguhtan et al (Ref) Royston (2005) Multiple imputation of missing values: update. Stata Journal 5, 188-201.

Page 27: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Schafer JL (1997) Analysis of incomplete multivariate data. London: Chapman and Hall. Smith, D.J. and McVie, S. (2003) Theory and method in the Edinburgh Study of Youth Transitions and Crime. British Journal of Criminology, 43 (1): 169-195. Tang L, Song J, Belin TR, Unützer U (2005) A comparison of imputation methods in a longitudinal randomized clinical trial Statistics in Medicine 24,(14) 2111-2128. Appendix 1 Variables used for imputation (n=90)

Page 28: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Question Sweep 1 Sweep 2 Sweep 3 Sweep 4 Sweep 5 SweepDid you set fire or tried to set fire to someone's property? ARS X X X X X X Did you claim social security benefit/housing benefit you weren't entitled to? BFT X Were you noisy or cheeky in public so that got in trouble/ people complained? BOP X X X X X X Did you buy something that you knew or suspected was stolen? BSG X Did you travel on a bus or train without paying/paying wrong fare? BUS X X X X X Did you break into a car or van to steal something out of it? CBK X X X X X X Did you use stolen cheque book/credit/cash card to get money or buy something? FRD X Did you write or spray paint on property that did not belong to you? GRF X X X X X Did you break into/try to break into house or building to steal something? HBK X X X X X X Did you hit, kick or punch someone on purpose (fight with them)? HIT X X X X X X Did you steal money or something else from your home? HOM X X X X Did you steal or ride in a stolen car, van or motorbike? JRD X X X X X Did you take a car, van or motorbike without permission? TWP X Did you hurt or injure any animals or birds on purpose? PET X X X X X Did you hit or pick on someone because of their race or skin colour? RAB X X X X Did you use force, threats or a weapon to steal something from someone? ROB X X X X X X Did you sell stolen goods? RST X X Did you steal money or something else from school? SCL X X X X Did you steal something from a shop or store without paying? SHP X X X X X X Did you skip or skive school? SKV X X X X X X Did you deliberately damage or destroy property that did not belong to you? VND X X X X X X Did you carry a knife/other weapon for protection or in case needed in fight? WEP X X X X X X

Page 29: Adjusting for non-response in a longitudinal survey: comparisons … · 2008. 6. 4. · 3. School of Community Health, Napier University, Edinburgh EH4 2LD. g.raab@napier.ac.uk Abstract

Appendix 2

Details of weighting factors for responders and non-responders at each wave Descriptive statistics for independent regression variables Sweep one Sweep two Sweep three

Non-resp Resp p-value Non-resp Resp p-value

Non-resp Resp p-value

% left school at minimum leaving age 31.9 29.2 ns 46.3 28.7 .000 82.4 28.5 .000 % male 55.9 50.2 ns 48.1 50.6 ns 59.5 50.4 ns % younger than average 16.6 23 .048 18.1 22.8 ns 17.9 22.7 ns % older than average 35.1 18.6 .000 39.9 18.8 .000 28.1 19.5 .087 % white 81.1 95.4 .000 80.2 95.1 .000 94.6 94.5 ns % offended >5 times at any sweep 48 59.3 .000 48.1 59 .006 66.2 58.5 ns % known to children's Reporter 23.6 18.5 .044 35.8 18.2 .000 81.1 17.7 .000 % manual parents/high deprivation 45.3 43.7 ns 54.9 43.4 .004 70.3 43.4 .000 % known to school as a truant 26 55.6 .000 29.6 54.8 .000 79.7 53.4 .000 % known to children's Reporter as offender 14.2 10.6 ns 21.6 10.4 .000 64.9 9.8 .000 Sweep four Sweep five Sweep six

Non-resp Resp p-value Non-resp Resp p-value

Non-resp Resp p-value

% left school at minimum leaving age 79.2 27.2 .000 93.7 21.7 .000 62.8 21.9 .000 % male 55.7 50.3 ns 63.9 48.9 .000 63.7 47.6 .000 % younger than average 20.7 22.7 ns 14.8 23.6 .000 22.2 22.8 ns % older than average 27.9 19.3 .009 20.7 19.6 ns 21.7 19.3 ns % white 98.9 94.3 .008 95 94.5 ns 94.4 94.5 ns % offended >5 times at any sweep 72.1 58 .000 77.1 56.4 .000 75.6 54.8 .000 % known to children's Reporter 64.5 16.8 .000 52.8 14.7 .000 43.5 13.3 .000 % manual parents/high deprivation 67.8 42.8 .000 65.8 41.2 .000 58.7 40.5 .000 % known to school as a truant 74.9 52.9 .000 75.1 51.3 .000 70.1 50.2 .000 % known to children's Reporter as offender 48.6 9.1 .000 38.5 7.5 .000 29 6.7 .000